Artem has a doctor of veterinary medicine degree.
Do you have a favorite pizza place? Let's just suppose you want to find out how additional pizza toppings affect the total cost of a pizza across all the different pizza places in your city. To do this you pick up the phone and start calling all the different pizza places, writing down the total cost of the pizza with one, two, three, etc., toppings on it at each place.
Once you are done, you will need to fit your data with an equation and, just as importantly, find out if your mathematical model for the data is a good fit.
Coefficient of Determination Derived
In this lesson, we will talk about a statistical construct that is used to estimate the predictive power of you model. The coefficient of determination denoted as big R2 or little r2 is a quantity that indicates how well a statistical model fits a data set. In mathematical terms, it specifies how much of the variation in the dependent variable y is characterized by a variation in the independent variable x.
You may be wondering what r is, since we only defined r2. You can think of the correlation coefficient denoted as big R or little r as a measure of the statistical relationship between x and y. As the focus of this lesson is the coefficient of determination, just remember that r stands for the correlation coefficient, simple as that.
Okay, let's do a simple derivation of the coefficient of determination. In the image, you see we start with plot containing a set of points, x and y, in which we assume there is a linear relationship between the x and y variables. Note that this linearity assumption is made to simplify the derivation and that a similar process can be used for non-linear models.
Shown is a plot with three sample points. We now try to find the regression line, which a line of best fit for the data points. The line in green shows one attempted line of best fit.
We can simplify this line by the equation y = mx + b, which is the standard equation for a line. To calculate the sum of the squared errors between each data point and our line of best fit, we perform the follow computation:
In this equation the term SSEreg line stands for the square sum of errors from the regression line.
Our next step is to find out how the y value of each data point differs from the mean y value of all the data points. In particular we need to compute the sum of the squares of these differences to the right of the equals sign, as shown below.
The term SSEmean y line stands for squared sum of errors from the mean y value.
We now have everything we need to compute the coefficient of determination, as you can see below.
Coefficient of Determination Computed
Let's do an example together, to solidify everything I just covered as it's probably a bit confusing. Suppose we are given the following data set you see in this table.
How do we calculate the determination coefficient in this case?
We can start by calculating the correlation coefficient using the following formula:
Here is a data table with the calculated values with n being the sample size of 6.
Plugging in these values into the equation for little r, I just gave you, we get r = 0.92782. To compute the coefficient of determination, all we need to do is square r. Doing so we arrive at r2 = 0.8609. You can now see a visual representation of all of this.
Now try rewinding back to the data set and solving for r and r2 by yourself, just for fun and practice.
Since we did cover quite a bit, I think it's time we recap everything, no? In this lesson we have learned about the coefficient of determination in the context of linear regression analysis. This quantity, designated as big R2 or little r2, indicates how well a statistical model fits a data set.
In addition, recall that the correlation coefficient, denoted as R or r, is a measure of the statistical relationship between x and y. To derive the coefficient of determination it is necessary to start with a simple dataset and make an attempt to draw the line of best fit, subsequently observing the errors between the regression line and each data point, as well as the errors of the y coordinates of each point and the mean y value. We can come up with an expression for the coefficient of determination. Furthermore, we have seen an example of computing the coefficient of determination, by first calculating the correlation coefficient and then squaring it.
To unlock this lesson you must be a Study.com Member.
Create your account
Register to view this lesson
Unlock Your Education
See for yourself why 30 million people use Study.com
Become a Study.com member and start learning now.Become a Member
Already a member? Log InBack