Copyright

Understanding Statistical Variability

An error occurred trying to load this video.

Try refreshing the page, or contact customer support.

Coming up next: Importance of Controlled Tests in Scientific Research

You're on a roll. Keep up the good work!

Take Quiz Watch Next Lesson
 Replay
Your next lesson will play in 10 seconds
  • 0:01 Data Distributions
  • 1:21 Range
  • 2:30 Variance
  • 3:47 Standard Deviation
  • 5:50 Lesson Summary
Save Save Save

Want to watch this again later?

Log in or sign up to add this lesson to a Custom Course.

Log in or Sign up

Timeline
Autoplay
Autoplay
Speed Speed
Lesson Transcript
Instructor: Sarah Friedl

Sarah has two Master's, one in Zoology and one in GIS, a Bachelor's in Biology, and has taught college level Physical Science and Biology.

Summary statistics are great for providing an overview of your data, but sometimes you need to know more, like distribution. Understanding this variability will give you a clearer picture of all your data and how they relate to each other.

Data Distributions

You have been charged with a task. Your boss has asked you to provide some statistics that describe the distribution of the ages of all the workers at two different plants. This sounds fairly simple, right? You go to the plants, record the age of each employee, and get to work.

The first thing you might do is calculate the mean, or average age of each group of workers. You might also find the median, which is the middle age value in your data set. And, finally, you might calculate the mode, which is the most common age found among the workers.

These values are helpful for summarizing the data, but they don't yet get at what your boss wants. For example, if the mean age of workers at both plants is 30 years, you might think that both plants have the same distribution of ages. But, in fact, we still know nothing about the age distribution at either plant!

What we need is something that actually measures the dispersion of those ages. In other words, the spread of those ages across the workers at each plant. In this lesson, we'll discuss three important measures of data dispersion (otherwise known as variability) that will help you report back to your boss: range, variance, and standard deviation. Let's get started!

Range

The range of a data set is pretty straightforward. It's simply the difference between the largest and smallest values. So, if at Plant 1 the workers range in age from 25 to 65, then your statistical range is simply 65 - 25, or 40. At Plant 2, the workers have a different minimum and maximum. In this case, the youngest worker is 18, and the oldest worker is 52. So, the statistical range for Plant 2 would be 52 - 18, or 34.

The range is a good measure of the total spread of your data. You have to be careful, though, because it's only an overall spread, not how evenly the data are dispersed. For example, the range of ages at Plant 1 was larger than the range of ages at Plant 2. But if all but one of the workers at Plant 1 are between 25 and 35, then that one person who is 65 skews your range quite a bit! It only takes one outlier like this to have an influence on your range, so keep that in mind.

Variance

Next up is variance. Like the range, this also measures the spread of the data. But unlike the range, variance measures how the data are spread around the mean. Range only finds the difference between each end, or extreme. The variance, however, describes the distribution of all the data points.

The range only told us the total spread of the data, not where most of the data points fall, and this is where the variance can be quite helpful. A very small range means only that the two end points are near each other. But a small variance means that most of the data points are close to the mean and therefore each other. Likewise, a large range means that our two end points are far apart, but a large variance means that our data are very spread out from the mean, and again, from each other.

So, in the case of the workers at Plant 1, that one 65-year-old person would have a much smaller effect on the results because most of the other workers are about the same age. Therefore, our distribution would be fairly centered around the mean because that is where most of the data points fall. We can see where the outlier falls in the distribution, but we can also see how little of an effect is has on describing the rest of the data points.

Standard Deviation

Finally, we come to the most used measure of data dispersion. This is called the standard deviation, and it's simply the square root of the variance. The main difference between this and variance is that variance values are squared, making them a little more difficult to work with. If we take the square root of the variance, we get the same units as our original data, so this makes standard deviation more useful in terms of interpreting our data.

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account
Support