# Random Variables: Discrete and Continuous

## Random Variable Definition

A random variable, also known as a stochastic variable, means a collection of possible outcomes and their corresponding probabilities. In practical use, the meaning of **random variable** can be intuitively understood to be a variable that may take on different values randomly but whose value is not known.

More specifically, random variable definition

is as a set of possible outcomes, called a **sample space**, along with a **probability distribution function** that assigns specific outcomes or groups of outcomes to numbers between 0 and 1 that represent probabilities.

The outcome can represent an event that will happen in the future, like the result of rolling a 6-sided dice. In this example, the sample space is the set of integers from 1 to 6, with each integer corresponding to one side of the dice. For a fair dice, the probability of each of these outcomes is 1/6.

A random variable does not necessarily need to represent something that will happen in the future. A random variable can also represent a quantity that already exists but for which the precise value is unknown. For example, in a doctor's office, the *systolic blood pressure of the next patient to be treated* could be seen as a random variable. Now, the patient has some particular systolic blood pressure, but it is not precisely known until measured.

### Sample Space Examples

Consider the example of rolling six-sided dice. The sample space **S** is a finite set of six integers:

{eq}S_\text{dice roll} = \{1,2,3,4,5,6\} {/eq}

In the blood pressure example above, the sample space is the set of nonnegative real numbers because blood pressure is measured as a single real number and cannot be negative:

{eq}S_\text{blood pressure} = \{x \in \mathbb{R} \mid x\geq 0\} {/eq}

Finally, consider flipping a coin repeatedly until it first comes up heads. The random variable representing *the number of coin flips required to get heads* has a sample space that is all of the positive integers (the natural numbers):

{eq}S_\text{flip coin until heads} = \{x \mid x \in \mathbb{N} \} = \{1,2,3, \ldots \} {/eq}

## What Is a Random Variable?

If you have ever taken an algebra class, you probably learned about different variables like *x*, *y* and maybe even *z*. Some examples of variables include *x* = number of heads or *y* = number of cell phones or *z* = running time of movies. Thus, in basic math, a **variable** is an alphabetical character that represents an unknown number.

Well, in probability, we also have variables, but we refer to them as random variables. A **random variable** is a variable that is subject to randomness, which means it can take on different values.

As in basic math, variables represent something, and we can denote them with an *x* or a *y* or any other letter for that matter. But in statistics, it is normal to use an *X* to denote a random variable. The random variable takes on different values depending on the situation. Each value of the random variable has a probability or percentage associated with it.

## Types of Random Variable

There are two types of random variables: **discrete** random variables and **continuous** random variables. Random variables are classified as discrete or continuous based on whether the sample space is **countable** or **uncountable**.

Discrete and continuous random variables are different in that, for a discrete random variable, each outcome in the sample space has an associated probability, while for a continuous random variable, each outcome instead has a **probability density** and probabilities are instead assigned to *ranges* of outcomes.

## What is a Discrete Random Variable?

A discrete random variable is defined as a random variable for which the sample space is countable. A countable sample space is one that has either a finite number of outcomes, like rolling a six-sided dice, or has a **countably infinite** number of outcomes. An infinite sample space is countably infinite when it's possible to assign a natural number (a positive integer) to each outcome.

### Discrete Random Variable Example

In the example above, where a coin is repeatedly flipped until heads come up, the sample space of the number of flips this takes is countably infinite, and therefore this random variable is classified as discrete according to the definition of a discrete random variable.

For a discrete random variable, every outcome in the sample space has an associated probability, and the random variable as a whole can be described using a probability distribution function in the form of a histogram.

The probability distribution function P gives the specific probabilities of the different outcomes. The probability that a person gets heads on the first coin flip is 1/2, so this means that P(1) = 1/2, as shown in this histogram.

The probability that it takes two coin flips in getting first heads is equal to the probability of getting tails on the first flip *and* getting heads on the second; that is, the probability is {eq}\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}{/eq}. Likewise, the probability that they get the first heads on the {eq}n^{\text{th}} {/eq} coin flip is {eq}\frac{1}{2^n} {/eq}. Note that the sum of all of the probabilities in the probability distribution function is always 1.

## What is a Continuous Random Variable?

A continuous random variable is defined as a random variable for which the sample space is **uncountable**. Usually, this means that the random variable can take on values from a range of real numbers. One example could be a person's *systolic blood pressure*. This is measured as a positive real number, and a typical value is approximately 120 mmHg.

For another example, consider throwing a dart at a circular dartboard. The *distance of the dart from the center of the dartboard* is a continuous random variable because it could be any real number between 0 if it were to hit the center exactly, and the radius of the dartboard, if it were to hit the very edge.

### Continuous Random Variable Example

Suppose a dart is thrown at a dartboard with a radius of 1 meter, and it lands at some point on the board. The random variable representing the distance from the center must be some number between 0 and 1 meters. Therefore the sample space for this random variable example is the interval of the real number line {eq}S = [0,1] {/eq}

For a continuous random variable like this, individual outcomes in the sample space don't have probabilities, but rather a probability *densities*. The continuous random variable is represented by a **probability density function** with a continuous domain instead of a probability distribution function with a discrete domain.

With a continuous random variable, probabilities are associated with *ranges* of the sample space, called **events**, instead of individual points in the sample space. Probabilities are determined by integrating the probability density function over a range. The integral of the probability density function over the entire sample space is always 1, similar to how all of the probabilities in a discrete probability distribution function always sum to 1.

In this random variable example, to find the probability that the dart lands within 0.2 meters of the center of the target denoted P(x < 0.2), integrate the probability density function {eq}f(x) = -2x+2 {/eq} over the range {eq}[0,0.2] {/eq}:

{eq}P(x < 0.2) = \int_0^{0.2} \left ( -2x + 2 \right ) dx = 0.36 {/eq}

This means there is a probability of 0.36, or a 36% chance, that the dart will land within this range.

Events can also be collections of ranges. In general, in a continuous sample space, an event is an open subset of a sample space or any countable union or intersection of open subsets, along with their complements. This means that ranges can be conjoined using logical operators like OR, AND, and NOT to yield other events. For example, the chance of the dart landing within 0.1 meters OR greater than 0.9 meters from the target is an event with an associated probability. It can represented as {eq}P(x < 0.1 \text{ OR } x > 0.9) {/eq}, and has a probability of

{eq}P(x < 0.1) + P(x > 0.9) = \int_0^{0.1} \left (-2x + 2 \right ) dx + \int_{0.9}^1 \left ( -2x + 2 \right ) dx = 0.2 {/eq}

## Lesson Summary

Random variables link the possible outcomes of some random event with probabilities. A random variable can be thought of as a function that has a domain including all possible outcomes of the random event, called a **sample space**. The **probability distribution function** maps specific outcomes to probabilities between 0 and 1 in the case of a discrete random variable. For a continuous random variable, the probability distribution function maps certain *subsets* of outcomes to probabilities. These subsets are referred to as **events** and can be specific ranges of the sample space or combinations of ranges. In a continuous random variable, the individual outcomes have **probability densities** instead of probabilities. The probability of an event is be found by integrating over the probability density function.

## Discrete Random Variables

Let's see an example. We'll start with tossing coins. I want to know how many heads I might get if I toss two coins. Since I only toss two coins, the number of heads I could get is zero, one, or two heads. So, I define *X* (my random variable) to be the number of heads that I could get.

In this case, each specific value of the random variable - *X* = 0, *X* = 1 and *X* = 2 - has a probability associated with it. When the variable represents isolated points on the number line, such as the one below with 0, 1 or 2, we call it a discrete random variable. A **discrete random variable** is a variable that represents numbers found by counting. For example: number of marbles in a jar, number of students present or number of heads when tossing two coins.

** X is discrete** because the numbers that

*X*represents are isolated points on the number line.

The number of heads that can come up when tossing two coins is a discrete random variable because heads can only come up a certain number of times: 0, 1 or 2. Also, we want to know the probability associated with each value of the random.

# of Heads | Probability |
---|---|

0 | 0.25 |

1 | 0.5 |

2 | 0.25 |

In the table, you will notice the probabilities. We will see how to calculate the probabilities associated with each value of the variable. However, what we see above is called a **probability distribution** for the number of heads (our random variable) when you toss two coins. A probability distribution has all the possible values of the random variable and the associated probabilities.

## Continuous Random Variables

Let's see another example.

Suppose I am interested in looking at statistics test scores from a certain college from a sample of 100 students. Well, the random variable would be the test scores, which could range from 0% (didn't study at all) to 100% (excellent student). However, since test scores vary quite a bit and they may even have decimal places in their scores, I can't possibly denote all the test scores using discrete numbers. So in this case, I use intervals of scores to denote the various values of my random variable.

When we have to use intervals for our random variable or all values in an interval are possible, we call it a continuous random variable. Thus, **continuous random variables** are random variables that are found from measuring - like the height of a group of people or distance traveled while grocery shopping or student test scores. In this case, ** X is continuous** because

*X*represents an infinite number of values on the number line.

Let's look at a hypothetical table of the random variable *X* and the number of people who scored in those different intervals:

Test Scores | Frequency(% of students) |
---|---|

0 to <20% | 5 |

20% to <40% | 20 |

40% to <60% | 30 |

60% to <80% | 35 |

80% to 100% | 10 |

Since I know there are one hundred students in all, I could also have a column with relative frequency or percentage of students that scored in the different intervals. We calculate this by dividing each frequency by the total (in this case, 100). We then either leave the answer as a decimal or convert it to a percentage. Thus, like the coin example, the random variable (in this case, the intervals) would have certain probabilities or percentages associated with it. And this would be a probability distribution for the test scores.

Test Scores | Relative Frequency |
---|---|

0 to <20% | 5% |

20% to <40% | 20% |

40% to <60% | 30% |

60% to <80% | 35% |

80% to 100% | 10% |

## Probabilities Range Between 0 and 1

In the study of probability, we are interested in finding the probabilities associated with each value of these random variables. You may notice that, as a decimal, no probability is ever greater than one, nor are they negative. This is always true. For any designation of the random variable, the probability is always between zero and one, never negative and never greater than one. In math books, you will see this written as:

Which says that *P*(*X*) is always between 0 and 1.

The notation of *P* and then parentheses around *X* - *P*(*X*) - means the probability of *X*. Remember, *X* is the random variable. One note here: it does not matter if you use capital or common letters for the random variable or for *P*, as long as you are consistent!

## Sum of Probabilities for a Distribution

Perhaps you noticed above that in each table the sum of all probabilities added up to 1 or 100%. However, for continuous random variables, we can construct a histogram of the table with relative frequencies, and the area under the histogram is also equal to 1.

This graph is often called a density curve for the continuous random variable. Thus, a **density curve** is a plot of the relative frequencies of a continuous random variable. In math books, the property that the sum of the probabilities is given in short hand notation as:

The Greek symbol is called Sigma, capital sigma, and means sum. And so the statement says that the sum of the probabilities in a probability distribution equal 1.

## Some More Examples

Let's just look at a few examples of classifying random variables.

Suppose I'm looking at the number of defective tires on the car. Let *X* = the number of defective tires on the car. Is *X* discrete or continuous? Well, since there are usually four tires on the car, *X* can range from 0-4. However, it can only be 0, 1, 2, 3 or 4. So *X* is a discrete random variable.

Okay let's look at another example. Suppose I am measuring the running time of movies that are currently playing in theaters in my city. Let *X* = the running time of movies. Is *X* discrete or continuous?

Since movie times vary quite a lot and the length of the movie can be measured to the nearest minute or fraction of a minute or even seconds, depending on how accurate you want to be, *X* is a continuous random variable. When collecting my data, it would make sense to compile the data into intervals of running times as opposed to creating a category for each individual running time.

One more example: You play a game where you toss a coin and record the number of tosses it takes to get two heads in a row. So let the random variable *X* = the number of times the coin is tossed to get two heads in a row. Using *H* for heads and *T* for tails, we could have sequences like these:

*HH* two tosses

*THH* three tosses

*TTHH* four tosses

*HTHH* four tosses

*THTHH* five tosses

*TTTHH* five tosses

and so on ...

Is *X* discrete or continuous?

Well for the above sequences *X* = 2, *X* = 3, *X* = 4, *X* = 5 and so on.....But we can't have 1.5 tosses or 1.25 tosses. Thus, *X* is a discrete random variable.

However, note that *X* can go on infinitely, since theoretically, we could toss forever and never get two heads in a row - although the probability of this happening is extremely small. But nonetheless, *X* is discrete since it represents isolated points on the number line (albeit these points go on forever).

## Lesson Summary

So let's recap:

A **random variable** is really just a variable that has certain values associated with it. In addition, each value of the random variable or each range of values of the random variable has probabilities associated with it.

If the random variable represents isolated numbers on the number line, we call it **discrete**.

If the random variable represents an infinite range of numbers or measurements, we call it **continuous**.

Generally, discrete random variables are most often integers, and continuous random variables have a few to a lot of decimal places.

We also saw that probabilities are always between zero and one, and the sum of the probabilities in a probability distribution equals one for a discrete random variable or the area under the density curve is one for a continuous random variable.

So why the fuss over random variables? Well, by defining *X*, the random variable, to be something, it eliminates us having to write long sentences about what we are talking about, and we can now go on to calculating probabilities and generating probability distributions for our random variable *X*.

## Learning Outcomes

Following this video lesson, you should be able to:

- Define random variable
- Differentiate between discrete and continuous random variables
- Identify what the sum of the probabilities in a probability distribution equals

To unlock this lesson you must be a Study.com Member.

Create your account

## What Is a Random Variable?

If you have ever taken an algebra class, you probably learned about different variables like *x*, *y* and maybe even *z*. Some examples of variables include *x* = number of heads or *y* = number of cell phones or *z* = running time of movies. Thus, in basic math, a **variable** is an alphabetical character that represents an unknown number.

Well, in probability, we also have variables, but we refer to them as random variables. A **random variable** is a variable that is subject to randomness, which means it can take on different values.

As in basic math, variables represent something, and we can denote them with an *x* or a *y* or any other letter for that matter. But in statistics, it is normal to use an *X* to denote a random variable. The random variable takes on different values depending on the situation. Each value of the random variable has a probability or percentage associated with it.

## Discrete Random Variables

Let's see an example. We'll start with tossing coins. I want to know how many heads I might get if I toss two coins. Since I only toss two coins, the number of heads I could get is zero, one, or two heads. So, I define *X* (my random variable) to be the number of heads that I could get.

In this case, each specific value of the random variable - *X* = 0, *X* = 1 and *X* = 2 - has a probability associated with it. When the variable represents isolated points on the number line, such as the one below with 0, 1 or 2, we call it a discrete random variable. A **discrete random variable** is a variable that represents numbers found by counting. For example: number of marbles in a jar, number of students present or number of heads when tossing two coins.

** X is discrete** because the numbers that

*X*represents are isolated points on the number line.

The number of heads that can come up when tossing two coins is a discrete random variable because heads can only come up a certain number of times: 0, 1 or 2. Also, we want to know the probability associated with each value of the random.

# of Heads | Probability |
---|---|

0 | 0.25 |

1 | 0.5 |

2 | 0.25 |

In the table, you will notice the probabilities. We will see how to calculate the probabilities associated with each value of the variable. However, what we see above is called a **probability distribution** for the number of heads (our random variable) when you toss two coins. A probability distribution has all the possible values of the random variable and the associated probabilities.

## Continuous Random Variables

Let's see another example.

Suppose I am interested in looking at statistics test scores from a certain college from a sample of 100 students. Well, the random variable would be the test scores, which could range from 0% (didn't study at all) to 100% (excellent student). However, since test scores vary quite a bit and they may even have decimal places in their scores, I can't possibly denote all the test scores using discrete numbers. So in this case, I use intervals of scores to denote the various values of my random variable.

When we have to use intervals for our random variable or all values in an interval are possible, we call it a continuous random variable. Thus, **continuous random variables** are random variables that are found from measuring - like the height of a group of people or distance traveled while grocery shopping or student test scores. In this case, ** X is continuous** because

*X*represents an infinite number of values on the number line.

Let's look at a hypothetical table of the random variable *X* and the number of people who scored in those different intervals:

Test Scores | Frequency(% of students) |
---|---|

0 to <20% | 5 |

20% to <40% | 20 |

40% to <60% | 30 |

60% to <80% | 35 |

80% to 100% | 10 |

Since I know there are one hundred students in all, I could also have a column with relative frequency or percentage of students that scored in the different intervals. We calculate this by dividing each frequency by the total (in this case, 100). We then either leave the answer as a decimal or convert it to a percentage. Thus, like the coin example, the random variable (in this case, the intervals) would have certain probabilities or percentages associated with it. And this would be a probability distribution for the test scores.

Test Scores | Relative Frequency |
---|---|

0 to <20% | 5% |

20% to <40% | 20% |

40% to <60% | 30% |

60% to <80% | 35% |

80% to 100% | 10% |

## Probabilities Range Between 0 and 1

In the study of probability, we are interested in finding the probabilities associated with each value of these random variables. You may notice that, as a decimal, no probability is ever greater than one, nor are they negative. This is always true. For any designation of the random variable, the probability is always between zero and one, never negative and never greater than one. In math books, you will see this written as:

Which says that *P*(*X*) is always between 0 and 1.

The notation of *P* and then parentheses around *X* - *P*(*X*) - means the probability of *X*. Remember, *X* is the random variable. One note here: it does not matter if you use capital or common letters for the random variable or for *P*, as long as you are consistent!

## Sum of Probabilities for a Distribution

Perhaps you noticed above that in each table the sum of all probabilities added up to 1 or 100%. However, for continuous random variables, we can construct a histogram of the table with relative frequencies, and the area under the histogram is also equal to 1.

This graph is often called a density curve for the continuous random variable. Thus, a **density curve** is a plot of the relative frequencies of a continuous random variable. In math books, the property that the sum of the probabilities is given in short hand notation as:

The Greek symbol is called Sigma, capital sigma, and means sum. And so the statement says that the sum of the probabilities in a probability distribution equal 1.

## Some More Examples

Let's just look at a few examples of classifying random variables.

Suppose I'm looking at the number of defective tires on the car. Let *X* = the number of defective tires on the car. Is *X* discrete or continuous? Well, since there are usually four tires on the car, *X* can range from 0-4. However, it can only be 0, 1, 2, 3 or 4. So *X* is a discrete random variable.

Okay let's look at another example. Suppose I am measuring the running time of movies that are currently playing in theaters in my city. Let *X* = the running time of movies. Is *X* discrete or continuous?

Since movie times vary quite a lot and the length of the movie can be measured to the nearest minute or fraction of a minute or even seconds, depending on how accurate you want to be, *X* is a continuous random variable. When collecting my data, it would make sense to compile the data into intervals of running times as opposed to creating a category for each individual running time.

One more example: You play a game where you toss a coin and record the number of tosses it takes to get two heads in a row. So let the random variable *X* = the number of times the coin is tossed to get two heads in a row. Using *H* for heads and *T* for tails, we could have sequences like these:

*HH* two tosses

*THH* three tosses

*TTHH* four tosses

*HTHH* four tosses

*THTHH* five tosses

*TTTHH* five tosses

and so on ...

Is *X* discrete or continuous?

Well for the above sequences *X* = 2, *X* = 3, *X* = 4, *X* = 5 and so on.....But we can't have 1.5 tosses or 1.25 tosses. Thus, *X* is a discrete random variable.

However, note that *X* can go on infinitely, since theoretically, we could toss forever and never get two heads in a row - although the probability of this happening is extremely small. But nonetheless, *X* is discrete since it represents isolated points on the number line (albeit these points go on forever).

## Lesson Summary

So let's recap:

A **random variable** is really just a variable that has certain values associated with it. In addition, each value of the random variable or each range of values of the random variable has probabilities associated with it.

If the random variable represents isolated numbers on the number line, we call it **discrete**.

If the random variable represents an infinite range of numbers or measurements, we call it **continuous**.

Generally, discrete random variables are most often integers, and continuous random variables have a few to a lot of decimal places.

We also saw that probabilities are always between zero and one, and the sum of the probabilities in a probability distribution equals one for a discrete random variable or the area under the density curve is one for a continuous random variable.

So why the fuss over random variables? Well, by defining *X*, the random variable, to be something, it eliminates us having to write long sentences about what we are talking about, and we can now go on to calculating probabilities and generating probability distributions for our random variable *X*.

## Learning Outcomes

Following this video lesson, you should be able to:

- Define random variable
- Differentiate between discrete and continuous random variables
- Identify what the sum of the probabilities in a probability distribution equals

To unlock this lesson you must be a Study.com Member.

Create your account

#### What is random variable and its types?

A random variable is a function that associates certain outcomes or sets of outcomes with probabilities. Random variables are classified as **discrete** or **continuous** depending on the set of possible outcomes or **sample space**.

#### How to identify a random variable?

A variable is a random variable when it is meant to represent the outcome of some random event. Usually, it is denoted by a capital letter, like X or Y.

### Register to view this lesson

### Unlock Your Education

#### See for yourself why 30 million people use Study.com

##### Become a Study.com member and start learning now.

Become a MemberAlready a member? Log In

Back