# What is Regression Analysis?

The regression analysis is a statistical method that can estimate the relationship between two or more variables. This method can provide a better understanding of how the value of the dependent variable changes, when one of the independent variables change. The analysis assumes that other independent variables remain constant when running the calculations for working a variable.

In essence, a *dependent* variable is the outcome you are trying to analyze and predict, whereas an *independent* variable, also known as regressor, is the inputs that affects the *dependent* variable(s).

Regression analysis can be done using various techniques. Excel can solve linear regression analysis problems using the *least squares* method. Linear regression method assumes a linear correlation between *independent* and *dependent* variables by the formula;

y = bx + a

- y: dependent value
- x: independent value
- b: the slope of the regression line
- a:
*Y-intercept*, a point where the regression line intersects the y-axis

In this article, we're going to be using a sample data set to go over different methods. You can download the workbook below.

# In Excel

You can create a regression analysis in Excel using any of these three methods:

**Analysis ToolPak**add-in- Formulas
- Regression Graph

## Analysis ToolPak

The Analysis ToolPak add-in is a very useful tool that shines in data analysis. It is a ‘hidden’ add-in, meaning that it’s not active in Excel by default. You can activate it from the **Add-Ins** dialog. To actiave it, follow these steps:

- Go to
**FILE > Options > Add-Ins**. - Select
**Excel Add-ins**in the**Manage**dropdown and click the**Go** - Select the
**Analysis ToolPak**and click OK.

Add-in will be placed under the **DATA** tab with the name of **Data Analysis** after activation. Begin by going into this dialog. Here, select the **Regression **option and click the **OK** button to open the **Regression **dialog.

In the **Regression** dialog, you need to specify the references of ranges containing the X and Y values. The X and Y values are *independent* and *dependent* variables respectively. To build a multiple regression model, use two or more *adjacent* columns for *independent* variables (X).

Before clicking **OK** button to create analysis output, let’s go over the other options in the dialog:

**Labels**: Check whether your data has titles in the first row.**Constant is Zero**: Check whether*dependent*value should be equal to*0*when the*independent*value equals*0*.**Confidence Level:**Check and modify the percentage value to apply a custom confidence level.**Output options:**Choose where analysis is to be placed.**Residuals:**Check the residual values (the difference between the predicted and actual values you want to see in the output). Plots will also give you a chart.**Normal Probability Plots:**Check to see the normal probability information to the regression analysis results on a chart.

Excel will generate multiple tables and charts based on the options you choose. These tables are placed under 3 sections:

**Summary Output**contains basic statistics about regression, ANOVA (analysis of variance) information, and information about the regression line.**Residual Output**shows you how far away the actual data points are from the predicted data points.**Probability Output**contains the normal distribution of regression analysis results.

**Note: **Even though the **Regression** tool in the **Data Analysis** can provide an analysis of the results, it is not dynamic, and you have to open the **Regression** dialog every time when using a new set of data.

## Commonly Used Formulas

You can use formulas to reach certain values when performing a simple regression analysis. Excel’s statistical functions like these will provide useful when creating reports:

- LINEST
- SLOPE
- INTERCEPT
- CORREL

Let’s go over each of these functions and see how they work on a few examples. The **LINEST** function calculates a straight line that best fits your data using the "least squares" method, and returns an array describing that line. It takes 4 arguments, two of which are the array of values that contain the *independent* and *dependent* variables. The remaining two arguments are optional *TRUE/*FALSE arguments for the *Constant is Zero* assumption, and if additional regression stats are to be calculated.

Using a function with X and Y values will return *slope* and *Y-intercept* values of the regression equation *y = bx + a*. To return both values, you need to select 2 adjacent cells, and press the *Ctrl + Alt + Enter* combination when entering the formula.

For example,

=LINEST(B2:B21,A2:A21)

**Note: **These results are the **Coefficient** numbers in the **Summary Output** of **Regression** tool.

The **SLOPE** and **INTERCEPT** functions return the same *slope* and *Y-intercept* values. If you do not want to use array functions, you can use these functions instead. Aside from the **LINEST**, these functions do not have any other optional arguments.

=SLOPE(B2:B21,A2:A21)

=INTERCEPT(B2:B21,A2:A21)

The **CORREL** function returns the correlation coefficient of two arrays. You can use the correlation coefficient to determine how strongly the two variables are related to each other. This value can also be shown in an analysis created with the **Regression** tool. This coefficient is named *Multiple R*.

You can generate *residual *values using the *slope* and *Y-intercept*. Based on the *y = bx + a* formula, you need to multiply the *slope* with the *X* value, and add the *Y-intercept*. The difference between the *actual* *Y* and *predicted Y* gives the residual value.

Assume that we call the slope and intercept values *“slope” *and *“y_intercept”*. Our formula for the first predicted value becomes:

=A2*slope+y_intercept

The formula to find the difference between *predicted* and *actual* value would be,

=B2-G2

We then copy down both formulas to generate the values for each value in our data table.

## Regression Graph

Visualizing data is one of the easiest way to identify outliers and compare multiple elements. Excel versions 2010 or newer support **Trendlines**. This is a powerful tool that can show the regression between two series without any calculations.

Make sure that the *independent* variables (X) are on the first column, and *dependent* variables (Y) are on the second. Then, select your data table to start creating a chart.

Create a **Scatter Chart** from **INSERT > Scatter (Charts)**

Add a *linear* **Trendline** by using the plus sign next to chart, or right-click a data point.

The trendline formula does the same thing as the **Regression **tool and **Formula** methods. You can also improve the looks with few modifications. Double-click on the trendline to open options pane.

You can add an *Equation Formula *and* R ^{2} *values to the chart by enabling the related options in the

**Format Trendline**menu.

You can also modify the chart properties like the color of the trendline, the chart and axis titles, and axis formatting from the same menu.