Devin has taught psychology and has a master's degree in clinical forensic psychology. He is working on his PhD.
This lesson explores what a chi-square test is and when it is appropriate to use it. Using a simple example, we will work on understanding the formula and how to calculate the p-value.
Definitions Involved in Chi-Square Test
I've been reading a lot about undercover officers lately, and it made me start wondering how many police officers work undercover versus how many apply to be in the program. I mean, not everyone who applies can work undercover because they may not fit a need or their scores on psychological tests just don't measure up.
If the numbers were really close between those who applied and those who got in, we would need to know if there is a statistically significant difference. Statistically significant means the difference in the results did not occur by random chance. This is almost always represented by a lowercase p, which stands for probability.
If you have read any psychological research articles, you may have seen p < .05, which means that the probability of these results being a fluke is less than 1 in 20 times. This has been the agreed upon level of chance that results can be wrong for quite a while. We'll get into how you figure it out for a chi-square in just a moment.
What we need is a specific statistical test to allow us to take categorical data, like those who did make it into the undercover program and those who did not. What we need is a chi-square, which is a statistical test used to compare expected data with what we collected.
What a chi-square will tell us is if there is a large difference between collected numbers and expected numbers. If the difference is large, it tells us that there may be something causing a significant change. A significantly large difference will allow us to reject the null hypothesis, which is defined as the prediction that there is no interaction between variables. Basically, if there is a big enough difference between the scores, then we can say something significant happened. If the scores are too close, then we have to conclude that they are basically the same.
The actual formula for running a chi-square is actually very simple:
(o-e)^2 / e
You take your observed data (o), and subtract what you expected (e). You square the results, and then divide by the expected data in all the categories.
To use the number we find, we refer to the degrees of freedom, usually labeled as df for short, and is defined for the chi-square as the number of categories minus 1. Due to the nature of the chi-square test, you will always use the number of categories minus 1 to find the degrees of freedom. The reason this is done is because there is an assumption that your sample data is biased, and this helps shift your scores to allow for error.
You will then locate a chi-square distribution table, which is found in almost every statistical textbook printed. Using your degrees of freedom, you will locate the p-value you're interested in using the process below; typically the p-value is .05. If you can, see if your number is greater than .01, which means that your results could only happen by chance 1 in 100 times. Because of copyright restriction issues, we won't be able to provide a full image of the chi-square distribution table, but below is basically what they look like and how you find the digit you're looking for.
This is what a chi-square distribution table typically looks like.
Over 79,000 lessons in all major subjects
Get access risk-free for 30 days,
just create an account.
To find your p-value, you follow the left hand column of the degrees of freedom. If we have 10 categories, we have 9 degrees of freedom. We would move 9 places down on the left hand side. Next, we will follow the row of 9 degrees of freedom to the right until we reach the .05 level. If the number from your formula is greater than the one found in the chart, then you have a statistically significant finding. It's sort of like playing Battleship, except it's with degrees of freedom and the p-value.
If your number is closer to one of the other values in your degrees of freedom, then you will report that p-value. Each of the other levels holds different p-values, usually .95, .50, .25 and so on. This allows researchers who don't have significant results to report their p-values. If your value is not higher than the .05 value but is higher than the .25 value, then this means that your p > .25, and thus, not within the boundary of acceptable chance.
One last thing about running a chi-square: the number of observed or expected cannot be less than five. This is too low of a number for the statistics to handle and basically results in invalid findings.
This will get a little clearer when we look at an example. Let's say we have undercover police officers and our officers who applied to the undercover program but didn't make it in. Pulling numbers out of thin air, let's say 500 officers applied in total, and 200 were taken on as undercover officers. Our research question will be: 'If we expect half of the officers who apply to the program to get in, is there a difference between our expectation and the observed?'
If we expected half the applicants to get in, that would mean we expect 250 to get in. Our formula looks like this:
(200 - 250)^2 / 250 -50^2 / 250 10
We then check our table and find that at 1 degree of freedom, because we have only two categories (officers in the program and officers rejected from the program), we find that the p-value is 3.84. This means that there is something going on, and that there is a statistically significant difference between the number of expected undercover officers and the number that actually is.
When looking at categorical data, statistically significant data (defined as the difference in the results did not occur by random chance) is found by using a chi-square, which is a statistical test used to compare expected data with what we collected. With this, you can test the null hypothesis, which is defined as the prediction that there is no interaction between the variables.
After completing this video lesson, you should be able to:
Define statistically significance
Explain the purpose of using a chi-square test
Identify the formula for running a chi-square
Describe how to use a chi-square distribution table
Did you know… We have over 200 college
courses that prepare you to earn
credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the
first two years of college and save thousands off your degree. Anyone can earn
credit-by-exam regardless of age or education level.