Back To Course

Statistics 202: Calculus-Based Probability & Statistics12 chapters | 69 lessons

Instructor:
*Maria Airth*

Maria has a Doctorate of Education and over 15 years of experience teaching psychology and math related courses at the university level.

Covariance and correlation are not the same, but they are closely related to each other. This lesson reviews these two statistical measures with equations, explanations, and real-life examples.

You are the new owner of a small ice-cream shop in a little village near the beach. You noticed that there was more business in the warmer months than the cooler months. Before you alter your purchasing pattern to match this trend, you want to be sure that the relationship is real.

How can you be sure that the trend you noticed is real? Covariance and correlation are two measures that can tell you, statistically, whether or not a real relationship exists between the outside temperature and the number of customers you have. In this way, you can make an informed choice about your purchasing pattern.

**Covariance** is a statistical measure that shows whether two variables are related by measuring how the variables change in relation to each other. This is clear when you break down the word. Co- as a prefix often indicates some sort of joint action (like co-workers, co-owners, coordinate) and variance refers to variation or change. So, covariance measures how two things change together. It tells you if there is a relationship between two things and which direction that relationship is in.

**Correlation**, like covariance, is a measure of how two variables change in relation to each other, but it goes one step further than covariance in that correlation tells how strong the relationship is.

Let's work through these two statistical measures one at a time to get a good understanding of them.

To get started, we'll assume that you gathered data on six different days and created this chart:

Temperature | Number of Customers |
---|---|

98 | 15 |

87 | 12 |

90 | 10 |

85 | 10 |

95 | 16 |

75 | 7 |

So, we know that covariance is the measure of whether or not two variables vary (or change) in a predictable way together. This could be **positive covariance**, meaning as one increases the other increases, or **negative covariance**, meaning that as one increases the other decreases.

The formula for covariance is:

Wow, it looks a bit scary! Don't worry. It isn't as scary as it looks.

Walking through this formula, we see that the covariance of the two variables (*x*,*y*) is equal to the sum of the products of the differences of each item and the mean of its variables all divided by one less than the total number of items in the set. The *x* and *y* with an overscore (line on top) represent the means of each variable.

Okay, that was a bit of a mouthful as well. Again, not as hard as it sounds.

First you need to find the mean of each variable. Typically, we call the first mentioned variable *x*, so that would be temperature, and the second variable *y*, that would be the number of customers in our example.

So, the mean of *x* is (98+87+90+85+95+75)/6= 88.33.

The mean of *y* is (15+12+10+10+16+7)/6= 11.67

Now you subtract each value from its respective mean and then multiply these new values together.

The next step is to add all the products together, which yields the value 125.66.

The final step is to divide by (n-1) = 6 - 1 = 5.

125.66/5 = 25.132

The covariance of this set of data is 25.132. The number is positive, so we can state that the two variables do have a positive relationship; as temperature rises, the number of customers in the store also rises.

What this doesn't tell us is how strong this relationship is. To find the strength, we need to continue on to correlation.

To determine the strength of a relationship, you must use the formula for correlation coefficient. This formula will result in a number between -1 and 1 with -1 being a **perfect inverse correlation** (the variables move in opposite directions reliably and consistently), 0 indicating no relationship between the two variables, and 1 being a **perfect positive correction** (the variables reliably and consistently move in the same direction as each other).

The formula is:

The correlation coefficient is represented with an *r*, so this formula states that the correlation coefficient equals the covariance between the variables divided by the product of the standard deviations of each variable.

Can you see why correlations are a stronger measure than covariance? A correlation coefficient uses the covariance of a set and takes it one step further. It is a good thing we've already calculated the covariance of our set.

To find the divisor for this equation, we first have to find the standard deviations of each variable. Here is the formula:

Here is a chart showing the calculations required to find the sds of these variables.

Okay, all that is left to find our final correlation coefficient is to divide the covariance by the product of the standard deviations found above.

That is: 25.132/(8.14 x 3.39) = 0.912.

The *r* value is quite large at 0.912 (almost 1), thus you know that there is a very strong positive relationship between the temperature and your number of customers. This information should help you to alter your purchasing patterns to match your business flow.

Remember that a correlation (or relationship) is just that, a relationship. It does not tell you why the relationship exists. Correlation does not equal causation! The temperature alone is not necessarily causing customers to come to your store; there is just a reliable trend between the movements of the two variables.

While the formulas look confusing, finding the **covariance** and **correlation** of a set of data really just takes time and effort to repeat steps for all sets of data. In the end, you will be able to tell if, in what direction (**positively** or **negatively/inversely**) and how strongly two variables are related. Understanding the relationship between data can allow you to make informed decisions, but remember that a relationship does not equate to a cause no matter how strongly related the variables are to each other.

To unlock this lesson you must be a Study.com Member.

Create your account

Are you a student or a teacher?

Already a member? Log In

BackWhat teachers are saying about Study.com

Already registered? Login here for access

Did you know… We have over 160 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

You are viewing lesson
Lesson
4 in chapter 5 of the course:

Back To Course

Statistics 202: Calculus-Based Probability & Statistics12 chapters | 69 lessons

- Bivariate Distributions: Definition & Examples
- Marginal & Conditional Probability Distributions: Definition & Examples
- Independent Random Variables: Definition & Examples
- Covariance & Correlation: Equations & Examples
- Histograms in Probability Distributions: Use & Purpose
- Go to Multivariate Probability Distributions

- Computer Science 109: Introduction to Programming
- Introduction to HTML & CSS
- Introduction to JavaScript
- Computer Science 332: Cybersecurity Policies and Management
- Introduction to SQL
- Algorithmic Analysis, Sorting & Searching
- Computer Programming Basics
- Stacks & Queues for Data Structures
- Functions & Modules in Programming
- Built-In Data Types for Programming
- CEOE Test Cost
- PHR Exam Registration Information
- Claiming a Tax Deduction for Your Study.com Teacher Edition
- What is the PHR Exam?
- Anti-Bullying Survey Finds Teachers Lack the Support They Need
- What is the ASCP Exam?
- ASCPI vs ASCP

- Dorsiflexion vs. Plantar Flexion
- Process Synchronization in Operating Systems: Definition & Mechanisms
- Plastic: Types & Uses
- Decision Making: Skills & Techniques
- Graphics Library in Python: Definition & Examples
- Rainforest Project Ideas
- Basketball Shooting Lesson Plan for Elementary
- Quiz & Worksheet - Love in A Midsummer Night's Dream
- Quiz & Worksheet - Memory Partitioning Overview
- Quiz & Worksheet - Finding the Centroid of a Triangle
- Quiz & Worksheet - Understanding Scotomas
- Flashcards - Measurement & Experimental Design
- Flashcards - Stars & Celestial Bodies
- Common Core English & Reading Worksheets & Printables
- 12th Grade Math Worksheets & Printables

- Macroeconomics Textbook
- Foundations of Education: Certificate Program
- French Revolution Study Guide
- NY Regents Exam - Geometry: Help and Review
- U.S. History II: Certificate Program
- TExES Science 7-12: Acid-Base Chemistry
- TExES Math 4-8: Trigonometric Functions & Graphs
- Quiz & Worksheet - Communication Skills for Romantic Relationships
- Quiz & Worksheet - Health Impacts of Functional Foods & Nutraceuticals
- Quiz & Worksheet - Sestet in Poetry
- Quiz & Worksheet - What is a Bomb Calorimeter?
- Quiz & Worksheet - Bronze Age Egypt: Culture & Architecture

- What is Classification in Science? - Definition & System
- Muslim Persecution in America
- FTCE Professional Education Test: Passing Score
- AP English Literature & Composition Reading List
- NGSS Assessment Boundaries & Rubric
- Money Lesson Plan
- How to Pass the Social Studies GED Test
- New York State Science Standards for Grade 4
- Homeschooling in New York
- DNA Structure Lesson Plan
- Earth Science Projects for Middle School
- Where Can I Find SAT Chemistry Practice Tests?

- Tech and Engineering - Videos
- Tech and Engineering - Quizzes
- Tech and Engineering - Questions & Answers

Browse by subject