Copyright

How to Create a Multiple Regression Analysis

Instructor: Bob Bruner

Bob is a software professional with 24 years in the industry. He has a bachelor's degree in Geology, and also has extensive experience in the Oil and Gas industry.

A multiple regression analysis is a statistical method that's used to compare the effects of multiple independent variables on a single dependent variable. Care must be taken when setting up the input data, and when analyzing the individual results.

So Many Variables

If you opened a new business and found that your sales fluctuated significantly on a daily, weekly, or monthly basis, how would you go about determining what was behind those differences? Were sales simply a function of the number of customers who walked in? Or were you selling items that only appealed to teenagers, who shop more frequently on weekends or in the summer? Perhaps your sales rose and fell as a result of advertising or sales campaigns, or maybe everything simply depended on the weather that day?

When we want to compare the effect of one input variable on another output variable, we can create a simple bi-variate regression using sample data. In more complex situations, multiple regression can be used to analyze and predict the effects of a number of independent variables on one dependent variable. Special care must be taken when using this process to ensure that the data assumptions and results are valid.

The Basic Equation

The multiple regression equation is similar to the bi-variate equation, extended to account for n independent variables:

Y = a + b1 X1 + b2 X2 +b3 X3 + … + bn Xn

We see that each independent variable, Xn, factors into the overall linear regression estimate of Y, the dependent variable. From this we can also calculate a single R-squared value that represents the goodness of fit, or the proportion of the variability that is accounted for by all the independent variables taken together. More importantly, we can also analyze each independent variable as a separate entity.

Initializing the Data

Adequate sample data is required for the purpose of deriving the multiple regression equation. Prior to deriving the multiple regression formula we can create a scatter plot and a bi-variate regression for each independent variable and the common dependent variable. Any data outliers should be confirmed, and erroneous samples removed. For example, an online sale to someone who indicated they were three years old does not seem realistic. Any data that has a poor correlation, identified by a low R-squared value, can typically be excluded from the subsequent multi-variate analysis.

Each variable we wish to include in the multiple regression analysis must have the same number of observations. Missing values for any of the observations require that we remove the entire observation from the analysis. While the output estimate will be a continuous variable, the input variables can be continuous or ordinal values.

Creating the Analysis

The regression analysis is done by solving a system of simultaneous equations using matrix algebra, or by using statistical software applications. In either case, the key to the calculation process is that it uses a form of statistical control to estimate the effect of each independent variable. In this process, each variable is analyzed independently by holding all the other variables constant.

As a result of this process, not only do we obtain the full prediction formula for the dependent variable, each individual variable also receives its own estimated regression coefficient, as well as expected statistical values like the standard error, t-test, p-value, and confidence interval.

Analyzing the Data

The most obvious way to use the regression equation is to make predictions of the expected results for various input scenarios. In order to do this, we simply substitute in the values that are likely to be encountered for the independent variables and do the math to make a prediction of the resulting output.

We also want to compare the effects of the individual variables. The individual coefficients represent the magnitude of the relationship between each independent variable and the dependent variable. Numerically, this represents the expected change in the dependent variable for each increment of 1 in the independent variable.

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account
Support