Lesson Transcript

Instructor:
*Ninger Zhou*

Ninger has taught in teacher education programs and has received her Ph.D. in Educational Psychology.

The reliability coefficient is a user-friendly way to show the consistency of a measure. In this lesson, we will become familiar with four methods for calculating the reliability coefficient.

What is your experience with reliable scales or tests? For example, have you ever used a scale to keep track of your weight? If your weight is generally consistent, huge fluctuations might mean that something is going very wrong in your body, or your scale is no longer reliable. If you subjected your scale to a reliability test, you might find that is has a very low reliability coefficient. A new scale, one that provides you with familiar readings, would most likely have a high reliability coefficient.

In testing situations, scales should provide us with reliable measurements that do not fluctuate dramatically when the things being measured remain the same. If someone's ability has not changed significantly, his/her test scores should not vary by much, no matter how many times he/she takes a test.

In social science, **reliability** describes the *consistency of a measure*. **Reliability coefficient** *quantifies the degree of consistency*. There may be many reasons why a test is not consistent, such as errors in assessment that occur when the testing environment has an influence on how the participants perform, or other issues related to how the tests are designed. *Calculating the reliability coefficient can help us understand such errors in testing.*

There are different ways to calculate the coefficient, including the four types of reliability coefficients we'll discuss here. Don't worry too much about how to do these calculations by hand. Statistical software, such as SAS and SPSS, can help you compute all four types of coefficients conveniently.

Consider the following hypothetical scenario: You give your students a vocabulary test on February 26 and a retest on March 5. If there are no significant changes in your students' abilities, a reliable test given at these two different times should yield similar results. To find the test-retest reliability coefficient, we need to find out the correlation between the test and the retest. In this case, we can use the formula for the correlation coefficient, such as Pearson's correlation coefficient:

*N is the total number of pairs of test and retest scores*.

For example, if 50 students took the test and retest, then N would be 50. Following the N is the Greek symbol **sigma**, which means *the sum of*. *xy* means we multiply *x* by *y*, where *x* and *y* are the test and retest scores. If 50 students took the test and retest, then we would sum all 50 pairs of the test scores (*x*) and multiply them by the sum of retest scores (*y*).

Let's take a look at another hypothetical situation: You and a colleague are grading some student essay assignments together and want to see how consistent you both are when it comes to scoring. Here, you can use the inter-rater reliability formula to calculate how consistent the two of you have been when rating the assignments. The inter-rater reliability coefficient is often calculated as a Kappa statistic. The formula for inter-rater reliability Kappa is this:

In this formula, *P observed is the observed percentage of agreement*.

For example, if you and your colleague rate the same students exactly the same 18 out of 20 times, then you actually agreed on 90% of the ratings.

*P chance is the proportion of an agreement expected by chance*. In other words, *P chance* is the probability of two raters agreeing with each other when we assume they have been rating students randomly.

Now imagine that you want to know if a test you created has good reliability, but you don't have time to repeat the test to get the test-retest reliability coefficient. As an alternative, you can split the questions on the test into two halves and treat one half as the test and the other half as the retest. Test questions can be divided at random or according to even and odd-numbered items. The formula for calculating the split-half coefficient is the Spearman-Brown formula:

In this formula, *rhh is the correlation between the two halves of the test*. We can calculate it using the Pearson's correlation formula for test-retest reliability.

**Internal consistency** is *a type of reliability that is closely related to the split-half reliability* we mentioned previously. At this point, you may have noticed that the split-half reliability coefficient is dependent on how you split the questions on the test and may be wondering: Why don't we identify all the possible split-half forms, compute all the coefficients, and average them?

Well, some statisticians have proposed using **Cronbach's alpha**, *a method for determining internal consistency*. We can use Cronbach's alpha to calculate coefficients by averaging the reliability coefficients of all the possible combinations of split-half forms. This method also allows us to see if all the items on a test are measuring the same construct, or have good internal consistency.

In the formula for Cronbach's alpha,

*a subscript xx is Cronbach's alpha**N is the number of items on a test**Mean r subscript xx is the mean inter-item correlation, which can be calculated with the correlation coefficient*

The **reliability coefficient** is *a way to quantify the consistency of a measure*. There is more than one way to calculate reliability coefficients, depending on what type of reliability you need to identify. *One of the most commonly used reliability coefficients is internal consistency*, which is often calculated using **Cronbach's alpha**. Other methods include **inter-rater reliability, split-half reliability**, and **test-retest reliability**. While you can perform the calculations for these four formulas by hand, statistical software, like SAS and SPSS, can provide you with faster results.

Browse by subject