Back To Course

Statistics 101: Principles of Statistics11 chapters | 142 lessons | 9 flashcard sets

Watch short & fun videos
**Start Your Free Trial Today**

Start Your Free Trial To Continue Watching

As a member, you'll also get unlimited access to over 55,000 lessons in math, English, science, history, and more. Plus, get practice tests, quizzes, and personalized coaching to help you succeed.

Free 5-day trial
Your next lesson will play in
10 seconds

Lesson Transcript

Instructor:
*Rudranath Beharrysingh*

In this lesson, we will look at the Normal Distribution, more commonly known as the Bell Curve. We'll look at some of its fascinating properties and learn why it is one of the most important distributions in the study of data.

Jane is about to take an SAT. The school she is applying for gives priority to candidates whose SAT scores are in the 84th percentile or above. Jane wonders what she should score on the test to achieve this.

Sam is designing an electric car. To design it properly, he needs to know how long 95% of the lithium ion batteries will last.

What do these questions have in common? They can be solved with a greater understanding of the normal distribution. **The normal distribution** is a continuous distribution of data that has the shape of a symmetrical bell curve. It's also known as the Bell Curve. It is also called the **Gaussian Distribution**, after Carl Gauss who created a mathematical formula for the curve.

So, what's so special about this curve? A lot of data in nature have this shape when compiled and graphed. For example, heights and weights of men and women have this distribution. Standardized test scores are normally distributed. Sometimes lifespans of manufactured parts or equipment form a normal distribution.

By compiling the data into a frequency table and graphing in a histogram, we can often see this phenomena. Notice that the normal distribution, or curve, has a bell shape and is symmetrical:

This is a property of the normal distribution. Another property is that 'mean = median = mode.' This is because the shape of the data is symmetrical with one peak.

And, since the curve is symmetrical, the mean or median or mode (which are all the same number for this distribution) divide the data in half. From now on, we will just refer to this value in the middle as the mean:

However, note that the symbol Mu represents a population mean, and *x* bar represents a sample mean.

The spots on the bell curve that have the steepest slope up and down (called inflection points) are very significant. The corresponding points on the horizontal axis are one standard deviation from the mean, and 68% of the data lie in here!

So what does that mean? (No pun intended). Well, suppose heights of men are normally distributed with an average or mean height of 68.5 inches and a standard deviation of three inches. We can generalize that 68% of men are between 68.5 - 3 = 65.5 inches and 68.5 + 3 = 71.5 inches tall! That's quite a generalization, but it is perfectly true if the data is normally distributed!

We mentioned standard deviation. The **standard deviation** is a measure of spread or variability of the data. The larger it is, the more spread out the data is. The standard deviation is calculated slightly differently for a population as opposed to a sample. The formulas and symbols for both types are given below:

Let's look at the sample standard deviation (called *S*). It says *S* is equal to the square root of the sum (of each value minus the mean (called *x* bar) all squared) divided by *n* minus 1, which is the number of values minus 1.

For the population, the standard deviation symbol is called Sigma, and the only difference in the calculation is you subtract the population mean Mu from each value, and there is a division by the population size called big *N*.

This calculation can be tedious, but many statistical programs can easily calculate the standard deviation. For this video, we will refer to the standard deviation as std. dev., regardless of whether we are talking about a sample or a population.

More importantly, the standard deviation is a measure of spread. We can think of data in terms of distance from the mean, or in terms of standard deviations or tick marks! And, the normal curve has the property that 68% of the data lay within one standard deviation of the mean.

Is that it? No. There's more! 95% of the data lie within two standard deviations of the mean.

For example, suppose the lifespans of lithium ion batteries are normally distributed with a mean lifespan of 20,000 hours and a standard deviation of 1000 hours. We can conclude that 95% of these batteries will last between 20,000 - (2 * 1000) = 18,000 hours and 20 + (2 * 1000) = 22,000 hours.

Is there another part to this rule? You betcha, and it says that 99.7% of the data is within three standard deviations of the mean, which pretty much captures all of the data except for 0.3%! And, you can see, this means there is not much data left over in the tails of the curve:

For example, if SAT scores are normally distributed with a mean score of 550 and a standard deviation of 80 points, we could generalize that 99.7% of SAT scores are between 550 - (3 * 8) = 310 and 550 + (3 * 80) = 790.

The generalizations about the percentage of data within certain standard deviations from the mean is called the **empirical rule**, or the **68-95-99.7 rule**, and it says that for normally distributed data, 68% of the data is within one standard deviation of the mean, 95% of the data is within two standard deviations of the mean and 99.7% of the data is within three standard deviations of the mean.

These percentages can be broken down further. Since the curve is symmetrical and 68% of the data is within one standard deviation of the mean, half of 68% or 34% of the data must lie to the left and to the right of the mean within one standard deviation. Similarly, the area between one standard deviation and two standard deviations will be 95% - 68% = 27%. However, the curve is symmetrical. And so, this can be halved to give 13.5% of the data between one standard deviation and two standard deviations on each side. And, a similar calculation can be done for the area between two and three standard deviations from the mean. This is 99.7 - 95 = 4.7%, then 4.7% / 2 = 2.35% on each side of the curve. A summary of these percentages is shown in the graph below:

So how is this useful? Let's go back to Jane. She wants her SAT score to be in the 84th percentile or above. The 84th percentile is the test score that 84% of scores lie below. With a mean test score of 550 and a standard deviation of 80, we can redraw the above graph as such; and then add the percentages from left to right until we get close to 84%. We can see that 0.15% + 2.35% + 13.5% + 34% + 34% = 84%:

This corresponds to a test score of 630. Thus Jane needs to score at 630 on the test to be in the 84th percentile.

Remember, Sam wanted to know how long 95% the lithium batteries lasted. Using the 68-95-99.7 rule, we can see that 95% of these batteries last between 18,000 hours and 22,000 hours:

Let's take on one more example. Suppose a certain company makes tires that last 30,000 miles average with a standard deviation of 3,000 miles, and suppose the distribution of these lifespans is normal. You buy a tire from this company. What is the probability it will last more than 36,000 miles?

Up until now, we spoke of percentages, but the percentages in the normal curve can also be interpreted as probabilities. Based on the 68-95-99.7 rule, we see that 36,000 miles is two standard deviations or tick marks above the mean, and so the percentages above this are 2.35% + 0.15% = 2.5%:

2.5% of these tires will last longer than 36,000 miles. So, the probability a random tire from the company will last more than 36,000 miles is 0.025 or 2.5%

In this video, we introduced the **normal distribution**. Normally distributed data has a symmetrical bell shape when graphed. The middle of the curve represents the mean, which is equal to the median, which is equal to the mode. We learned the **empirical rule** or **68-95-99.7% rule**, which states for normally distributed data, 68% of the data is within one standard deviation of the mean, 95% of the data is within two standard deviations of the mean and 99.7% of the data is within three standard deviations of the mean.

We used the empirical rule to find the percentages between values of interest. And the rule can also be used to determine the percentile or rank of a certain value. For example, we saw that 95% of a certain lithium battery lasted between 18,000 and 22,000 hours given an average lifespan of 20,000 hours and a standard deviation of 1000 hours. Another example of its use was the fact that scoring 630 on a standardized test, with a mean score of 550 and a standard deviation of 80, put you in the 84th percentile. And finally, we saw that the percentages in the normal curve can also be interpreted as probabilities.

Having studied this lesson, see if you can:

- Give definitions for the terms 'normal distribution' and 'empirical rule'
- Cite the different ways in which the normal distribution and empirical rule can be used

To unlock this lesson you must be a Study.com Member.

Create
your account

Already a member? Log In

BackDid you know… We have over 95 college courses that prepare you to earn credit by exam that is accepted by over 2,000 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

You are viewing lesson
Lesson
5 in chapter 6 of the course:

Back To Course

Statistics 101: Principles of Statistics11 chapters | 142 lessons | 9 flashcard sets

- Go to Probability

- Graphing Probability Distributions Associated with Random Variables 6:33
- Finding & Interpreting the Expected Value of a Continuous Random Variable 5:29
- Developing Continuous Probability Distributions Theoretically & Finding Expected Values 6:12
- Probabilities as Areas of Geometric Regions: Definition & Examples 7:06
- Normal Distribution: Definition, Properties, Characteristics & Example 11:40
- Estimating Areas Under the Normal Curve Using Z-Scores 5:54
- Estimating Population Percentages from Normal Distributions: The Empirical Rule & Examples 4:41
- Using the Normal Distribution: Practice Problems 10:32
- Using Normal Distribution to Approximate Binomial Probabilities 6:34
- How to Apply Continuous Probability Concepts to Problem Solving 5:05
- Go to Continuous Probability Distributions

- Go to Sampling

- English 310: Short Stories
- Nurse Entrance Test (NET): Exam Prep & Study Guide
- CCMA Basic Exam: Study Guide & Test Prep
- Personalized Learning in the Classroom
- Certified Emergency Nurse (CEN): Study Guide & Exam Prep
- 19th-Century British Short Stories
- 19th & Early 20th-Century American Naturalist Short Stories
- 19th-Century Russian Realism in Short Stories
- Early 20th-Century Feminist Short Stories
- Medical Records & HIPAA
- Professional Publications in Literacy
- Dyslexia Programs in Texas
- Study.com's Teacher Edition
- Study.com School Plans
- Study.com's Virtual Classrooms
- How to Set Up a Class and Invite Students in Your Study.com Virtual Classroom
- How to View Grades and Export CSVs in Your Study.com Virtual Classroom

- Tree Diagrams in Math: Definition & Examples
- How to Square a Trinomial
- How to Find the Least Common Multiple of Expressions
- Converting 1 Radian to Degrees
- Rip Van Winkle Literary Criticism
- Working Papers in the Audit Process: Definition & Development
- German Genitive Pronouns
- Treaty of Paris Lesson Plan
- Quiz & Worksheet - The Advancement of Learning by Francis Bacon
- Quiz & Worksheet - Practice Graphing Radical Functions
- Quiz & Worksheet - Finding the Major Axis of an Ellipse
- Quiz & Worksheet - Types of Triangles & Their Properties
- Quiz & Worksheet - Point of Care Technology in Healthcare
- Developing Presentation Skills Flashcards
- Hypothesis Testing in Statistics Flashcards

- TExES Marketing Education 8-12: Practice and Study Guide
- SAT Subject Test Literature: Practice and Study Guide
- CSET Earth and Planetary Science Subtest III: Practice and Study Guide
- Calculus: Tutoring Solution
- FTCE Physics: Test Practice and Study Guide
- SBA Math - Grade 6: Factoring
- Probability & Variability in Statistics
- Quiz & Worksheet - The Perks of Being a Wallflower Synopsis
- Quiz & Worksheet - Sodium Bicarbonate
- Quiz & Worksheet - Shakespeare's The Tragedy of Othello
- Quiz & Worksheet - Function & Structure of Microfilaments
- Quiz & Worksheet - Cartography

- Summary of Annette Lareau's Unequal Childhoods
- Hiram Revels: History & Biography
- What is the GED?
- Free Writing Contests
- Soil Activities for Kids
- MCAT Test Dates
- How Long Should I Study For the GMAT?
- AP Statistics Exam Format
- Life Cycle of a Butterfly Lesson Plan
- Electoral College Lesson Plan
- TExES Test Dates
- Accredited Online GED Classes

Browse by subject