Emily Cummins received a Bachelor of Arts in Psychology and French Literature and an M.A. and Ph.D. in Sociology. She has instructor experience at Northeastern University and New Mexico State University, teaching courses on Sociology, Anthropology, Social Research Methods, Social Inequality, and Statistics for Social Research.
When you think of the word ''test,'' you probably imagine yourself sitting in a classroom about to answer an exam. This might bring back stressful memories of school! But in psychology, the word ''test'' has a particular meaning, and it's a little more in-depth than simply some questions you need to answer. In particular, in this lesson we're talking about psychometric tests, which are scientific and systematic ways to test someone's ability to do a job or measure their personality or some mental ability (abilities which can be things like math or even critical thinking).
Psychometrics means the study of developing measurements. So there's an entire field of study dedicated to just how we write things, like exams. Psychometric tests are standardized, and they are designed to assess a particular variable. The people who write them try to make them objective and unbiased. In this lesson, we'll talk about two of these kinds of test theories: classical test theory and item response theory. Think of these as theories about how psychologists create tests and measures.
Classical Test Theory
Classical test theory (CTT) in psychometrics is all about reliability. We use the word ''reliable'' or ''reliability'' often in our colloquial language. Your friend who is always on time is reliable, for instance. But in psychology, reliability refers to how consistent a test or measure is. In other words, if you took the same test several times, you should get about the same score each time. So, assuming the conditions are the same, you'd get the same score on a test because the test itself is well designed.
There are three ideas we need to keep in mind when we're talking about CTT: test score, error, and true score. The test score is what we call the observed score. So, if you take a math exam and get an 85, that's your test score.
Error refers to, well, exactly what it sounds like! It's the amount of error that is found in a test or measure. This might be a mistake in the test, or it might also refer to things in the external environment that we can't totally control but that impact testing. Let's say you're taking your history exam, and there's construction going on in the building next door. Hammering isn't great for concentration, is it? This is a form of error because the terrible noise might impact your score.
Then, we have the true score. This is the score you would have achieved if there were absolutely no errors in the measurement. Alas, this isn't really possible. But psychometrics assumes everyone has, in theory, a true score. We can calculate this true score with an equation.
Charles Spearman, a psychologist and statistician, thought that we could reduce random error as much as possible, thereby making tests better. Spearman is widely considered one of the founders of CTT. So the important take away of CTT is that it's a theory that tries to explain and deal with error so our tests are more reliable.
Item Response Theory
Item response theory (IRT) is all about your performance on an exam and how it relates to individual items or questions on a test. IRT is an example of what psychologists call a latent trait model. These models try to figure if there's an underlying trait that that accounts for your performance on a test. The mathematical models we use to calculate IRT measurements are much more sophisticated than CTT, and with IRT we don't even need a sample of test takers.
Generally, a computer is required to do the calculations of IRT measurements as they would be very long and complex to do by hand. Here's the major assumption of IRT: a test taker has a latent ability, and this can be measured regardless of the content of the test. For example, IRT suggests that we can have a huge test bank of questions and it doesn't matter which 15 questions we select to make up an exam. The test taker's ability is independent of this. And, since a computer has done calculations about each individual item, we can pick any item we want out of the test bank, and it will give us a good idea of a test taker's ability.
Generally speaking, CTT measurements are easier to calculate than IRT because CTT calculates something about the test as a whole, as opposed to each individual item, like IRT. So, for example, if you're taking a biology exam, CTT calculates the reliability of that exam, but IRT would try and analyze each individual question on that test. Today, major exams like the SATs or the GREs use techniques from IRT. This is because IRT increases the reliability of a test more than techniques of CTT.
There was quite a bit in there, so let's take a moment or two to review what we've learned. As we saw in this lesson, psychometrics is basically the study of developing psychological tests and measurements. Whether we're talking about a standardized exam like the SATs or a personality test, this branch of psychology deals with how we can design tests to be objective, unbiased, and reliable, with reliability in this context being how consistent a test or measure is.
Two major theories about the development of tests are classical test theory and line item response theory. Classical test theory (CTT) is all about reliability. CTT explains how we can calculate a true score, which is basically the score a test taker would achieve if there were no error at all in the test-taking process, with error in this case being, of course, the amount of error found in the testing. Since this is basically impossible, we look at someone's observed score, which is the score he or she actually achieved. CTT basically tells us how consistent a test is, as in how reliable it is.
Then we learned about item response theory (IRT), which is a bit more complicated than CTT. Rather than looking at the reliability of the test as a whole, IRT looks at each item. Basically, IRT suggests a test taker's ability is independent from the item or question on a test. Test takers have a latent ability, per what psychologists call the latent trait model, which tries to figure if there's an underlying trait that accounts for your performance on a test. IRT explains why we can create a big bank of questions and select them at random and measure a test taker's ability no matter which questions come up. This lets us do things like create random test banks.
So next time you find yourself stuck in a classroom taking a test, remember how much goes into its development!
To unlock this lesson you must be a Study.com Member.
Create your account
Register to view this lesson
Unlock Your Education
See for yourself why 30 million people use Study.com
Become a Study.com member and start learning now.Become a Member
Already a member? Log InBack