Lesson Transcript

Instructor:
*Yuanxin (Amy) Yang Alcocer*

Amy has a master's degree in secondary education and has taught math at a public charter high school.

Categorical data can be estimated but not predicted. Learn why in this video lesson along with how to read and gather information from a bar graph of categorical data.

**Categorical data** is data that can be grouped. Examples include age group, favorite color, dog breed, etc. These are all categorical in nature because if you ask different people about their favorite color, you can begin to separate the people into groups. You can separate them into groups that like blue, groups that like yellow, etc. The same goes for age group and dog breed. These categories can all be grouped.

A **bar graph**, a graph with bars of varying heights, is a good visual way to represent the categorical data from a survey for analysis. What you will usually see is your survey option on the x-axis and the results on the y-axis.

Let's look at one bar graph to see how we can make estimates and predictions using the data we see. The particular bar graph above shows the results of a survey we did asking people who kept *Bettas* as pets what color *Betta* they had. These are the results we got from our survey: we got 20 red *Bettas*, 30 blue *Bettas*, 15 purple *Bettas*, 4 white *Bettas* and 2 yellow *Bettas*.

Betta Color |
Result |
---|---|

Red | 20 |

Blue | 30 |

Purple | 15 |

White | 4 |

Yellow | 2 |

So, what kind of estimates and predictions can we make based on this information?

We can certainly make some estimates based on this information. From our graph, we see that blue is the tallest bar, so blue is the most popular color. We can estimate that the majority of *Betta* owners keep blue *Bettas*. We see that yellow is the shortest, so we can estimate that yellow *Bettas* are not as popular among *Betta* owners.

The majority of *Betta* owners keep blue, red and purple *Bettas*. So, if I had a *Betta* store, I would do best by selling these three colors. Yellows and whites are very few, so I wouldn't do so well by selling those. These are the kinds of estimates I can make based on what I see from our bar graph.

What about predictions? The thing about categorical data is that because they are groups, there is not much to say about other groups that aren't listed on the graph. If you don't have data for other groups, there is not much you can predict about them. With mathematical data, you can continue the pattern and make a prediction on what may happen with data that is outside the range of the data you have. But with categorical data, you can't do that because the groups you have data for are not connected to other groups.

Looking at our graph, can we make a prediction about colors that aren't listed? No, because we can't say for certain what color group will come next. And, there is no pattern to draw out.

What have we learned? We've learned that **categorical data** is data that can be grouped. A good visual way to represent categorical data is with a **bar graph**, a graph with bars of varying heights. We can make estimates about what is most common and what is least common, but we can't make predictions on categorical data.

Following this lesson, you'll have the ability to:

- Define categorical data and bar graph
- Explain how to read a bar graph
- Describe what kind of estimates you can make by reading a bar graph
- Summarize why you cannot make predictions based off categorical data

