Outlier in Statistics: Definition & Explanation

Outlier in Statistics: Definition & Explanation
Coming up next: Raw Score: Definition & Explanation

You're on a roll. Keep up the good work!

Take Quiz Watch Next Lesson
 Replay
Your next lesson will play in 10 seconds
  • 0:00 What Is an Outlier?
  • 0:52 Where Do Outliers Come From?
  • 2:05 Detecting Outliers
  • 5:17 Lesson Summary
Save Save Save

Want to watch this again later?

Log in or sign up to add this lesson to a Custom Course.

Log in or Sign up

Timeline
Autoplay
Autoplay
Speed

Recommended Lessons and Courses for You

Lesson Transcript
Instructor: Yolanda Williams

Yolanda has taught college Psychology and Ethics, and has a doctorate of philosophy in counselor education and supervision.

An outlier is any value that is numerically distant from most of the other data points in a set of data. Learn about the sources of outliers, histograms, scatterplots, the number line, and more.

What Is an Outlier?

Imagine that you were conducting a research study to see if an improvement in mood could increase the speed of high school track runners. You take a total of 51 students from two high schools and measure the distance they can run in 60 seconds (measured in feet). You measure them both before their mood improved and after to compare the difference. The following table summarizes your findings.

All of the runners improved except for one. Though it may be hard to recognize just from viewing the table, you have an outlier.

In this example, -86 is an outlier. An outlier is any value that is numerically distant from most of the other data points in a set of data. We know that -86 is far below any of the other values in our data set. It is not uncommon to find an outlier in a data set.

Where Do Outliers Come From?

The most common source of outliers is measurement error. For example, it could be that there were battery problems with the timer that caused the alarm to go off before the runner's 60 seconds were up. Another cause of outliers is experimental error. For example, it could be that the running signal was not loud enough for all of the athletes to hear, resulting in one runner having a late start. This would put the runner's time far below that of the other runners. An outlier can also be due to chance.

Other sources of outliers include:

  • Human error (i.e. errors in data entry or data collection)
  • Participants intentionally reporting incorrect data (This is most common in self-reported measures and measures that involve sensitive data, i.e. teens underreporting the amount of alcohol that they use on a survey)
  • Sampling error (i.e. including high school basketball players in the sample even though the research study was only supposed to be about high school track runners)

If it is determined that an outlier is due to some type of error (i.e. measurement or experimental error), then it is okay to exclude the data point from the analysis. However, if the outlier was due to chance or some natural process of the construct that is being measured, it should not be removed.

Detecting Outliers

The easiest way to detect an outlier is by creating a graph. We can spot outliers by using histograms, scatterplots, number lines, and the interquartile range.

Histogram

Suppose that we were asked to create a histogram using the data that we collected from the high school track runners. The following is the histogram of the change in distance for each of the track runners.

If you look at the chart, you can see that there is one value that lies far to the left side of all the other data. This data point is an outlier. If you look at all of the other data and exclude the outlier, you notice that it is in the shape of a normal distribution. When this happens, it is likely that the outlier is due to some type of error.

Let's say that you found out that the runner whose distance decreased by 86 feet came down with an illness and had to stop running during his 60 seconds to throw up. Since he was not running for the entire 60 seconds, it would make sense that his running distance decreased. You would not include this outlier in your analysis, since the measurement did not account for him stopping and being sick.

Scatterplot

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account
Support