Pairing Data Structures to Data Types in R Programming

Instructor: Eric H. Johnson

Eric has worked as a software developer and computer science instructor and has a master's degree in computer science.

Depending on the type of data you have, you will need to store it in the appropriate data structure. This lesson introduces you to data structures in R and how to use them.

Data Types in R:

R was designed as a data language, and when we think of data we typically think of numbers. Lots of them. R has built-in data structures that work well with numbers, but also character strings, and mixes of the two. The type of data structure you use depends on the type of data you need to process.

Atomic Vectors

Atomic vectors in R are typically just called vectors. A vector can hold a sequence of values, all of the same type.

We create a vector using the c() function, short for combine:

`a_vals <- c(35, 28, 27.2, 50, 61.7, 2.5)`

By default, R stores numeric values as double-precision (64 bit) floating-point numbers. Note that in R, you never declare the type of variable, R manages variable types in the background.

To create a vector of integer values, append ''L'' to each number:

`> int_vals <- c(2L, 4L, 6L, 8L)`

To create a vector of character strings, enclose each string in double quotes:

`> names <- c(''Alice'', ''Bob'', ''George'', ''Ringo'')`

As before, we simply created a variable by assignment, without having to declare its type.

R supports logical values, which consist of the values TRUE and FALSE, which you can abbreviate as T and F when you enter them.

`> log_vals <- c(T, F, T, T, F)> log_vals[1] TRUE FALSE TRUE TRUE FALSE`

Logical values can be the result of conditional tests, such as:

`> test_30 <- a_vals > 30> test_30[1] TRUE FALSE FALSE TRUE TRUE FALSE`

The function typeof() tells you what type of values are stored in a variable:

`> typeof(a_vals)[1] ''double''> typeof(int_vals)[1] ''integer''> typeof(names)[1] ''character''> typeof(test30)[1] ''logical''`

In R, ''double'' means double-precision floating point. The function typeof() reports only one type for each vector because all values in a vector must be the same type. We therefore say that vectors require that data be homogeneous.

To get the number of items in a vector, use the length() function:

`> length(names)[1] 4`

Single values, such as:

`> i <- 12> i[1] 12`

are stored in one-element vectors. R has no scalar data structure. The length of i is therefore 1:

`> length(i)[1] 1`

Arrays and Matrices

R also supports the use of multidimensional vectors, known as arrays. You can create an array with any number of dimensions. An array of two dimensions is called a matrix. Here we will see examples of operations on matrices, but keep in mind that most can also be used for arrays.

You can create a matrix in several ways. The most convenient is the matrix() function, where you specify the number of columns and rows with the ncol and nrow parameters:

`m <- matrix(1:12, ncol=4, nrow=3)> m[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12`

The dim() function tells you the dimensions of a matrix:

`> dim(m)[1] 3 4`

R supports assignment to the dim() function to change the dimensions of an array:

`> dim(m) <- c(4, 3)> m[,1] [,2] [,3][1,] 1 5 9[2,] 2 6 10[3,] 3 7 11[4,] 4 8 12`

If you have used other programming languages much, this will appear very strange to you. In R it works, in just about any other language it does not.

The numbers in square brackets at the row and column headers indicate the index statements to use to extract that particular row or column. So to get the 3rd row of matrix m, use the statement:

`> m[3,][1] 3 7 11`

R supports a powerful indexing syntax that allows you to extract just about any combination of rows and columns you like from a matrix. This extends to arrays of any dimension.

The typeof() function works the same with arrays as it does with vectors:

`> typeof(m)[1] ''integer''`

Function typeof() tells us what kind of data is contained in the structure, but not the kind of structure itself. Use function class() to tell what type of structure a variable is:

`> class(m)[1] ''matrix''`

Note that we have integers even though we did not put an ''L'' after each number. This is a result of using 1:12 to specify the values, which creates integer values.

Just as for vectors, use the length() function on an array to get the total number of elements in it:

`> length(m)[1] 12`

You can also store character strings in arrays:

`> name_mat <- matrix(names, ncol=2, nrow=2)> name_mat[,1] [,2][1,] ''Alice'' ''George''[2,] ''Bob'' ''Ringo''`

To arrange the data by rows, set the byrow parameter to TRUE:

`> name_mat <- matrix(names, ncol=2, nrow=2, byrow=T)> name_mat[,1] [,2][1,] ''Alice'' ''Bob''[2,] ''George'' ''Ringo''`

Because arrays are really vectors, they require data to be homogeneous. But what if we need to store data of different types, and have each retain its properties, so that numbers can act like numbers even when they are mixed in with character strings? Read on.

Heterogeneous Data

Vectors in R make all elements the same type. This also happens if you try to store numeric and character types in the same vector:

`> m <- c(2, 4, ''Bob'', 8.25)> m[1] ''2'' ''4'' ''Bob'' ''8.25''> typeof(m)[1] ''character''`

Note that even though 2, 4, and 8.25 are all valid numbers, the vector m contains all character data. This is because ''Bob'' is a character string. In a vector, if even one element is a character string, R makes all the elements character strings, because while character strings can store any sequence of characters, numeric types cannot. A character string can store ''8.25'' as easily as it can store ''Bob'', even though ''8.25'' as a character string loses its meaning as a number. However, while a double can store the value 8.25, it cannot store the character string ''Bob''.

When R forces all items in a vector to be the same type, it selects the most flexible type for the data that you have provided, and makes all the items be that type. This is known as coercion.

Recall that if you append ''L'' to the end of a numeric literal, R will store it as an integer:

`> i <- 12L> i[1] 12> typeof(i)[1] ''integer''`

But because of coercion, if you want to store integers in a vector, all numbers in that vector must be integers. Here we will try to store some integers and doubles in the same vector:

`> nums <- c(2L, 4L, 6.25, 8L)> nums[1] 2.00 4.00 6.25 8.00> typeof(nums)[1] ''double''`

Because a double can store an integer value, but an integer cannot store a double, R makes all the values doubles.

List Data Structures for Heterogeneous Types

In many cases, we want to store data of different types in the same data structure. In other words, we want a data structure that can store heterogeneous data.

Let's repeat a previous example with an R data structure called a list. Instead of using the c() function, as we do for vectors, we use the list() function:

`> ml <- list(2L, 4, ''Bob'', 8.25)> ml[[1]][1] 2[[2]][1] 4[[3]][1] ''Bob''[[4]][1] 8.25`

Lists allow you to store instances of different types in the same data structure. A list is similar to a vector in that a list can store a sequence of values, but it also stores the type of each value. This is why, when we display the list ml, it displays each item in its own sublist.

Lists are recursive data structures. This means that lists can contain other lists. Consider the following sequence of R statements, where we explore the list we just created:

`> typeof(ml)[1] ''list''>> ml[1][[1]][1] 2> typeof(ml[1])[1] ''list''> typeof(ml[[1]])[1] ''integer''>> ml[2][[1]][1] 4> typeof(ml[2])[1] ''list''> typeof(ml[[2]])[1] ''double''>> ml[3][[1]][1] ''Bob''> typeof(ml[3])[1] ''list''> typeof(ml[[3]])[1] ''character''`

This gets a little cryptic, due to the recursive nature of lists. When we use a statement like ml[1] or ml[2] to directly access the elements of a list, we get a sublist. To get the actual content of that sublist, we have to use double brackets, as in ml[[1]] or ml[[2]]: only then do we get the actual content of each sublist.

Another way to examine a list is with the function str(), which gives you a more compact display of the structure of your list:

`> str(ml)List of 4\$ : int 2 \$ : num 4 \$ : chr ''Bob'' \$ : num 8.25`

Note here that the type designations are abbreviated, or different: it reports type ''double'' as ''num''.

Besides character and numeric types, lists can contain vectors:

`> vl <- list(''Bob'', c(2, 4, 6.25), 47L)> vl[[1]][1] ''Bob''[[2]][1] 2.00 4.00 6.25[[3]][1] 47> str(vl)List of 3\$ : chr ''Bob'' \$ : num [1:3] 2 4 6.25 \$ : int 47`

To access the vector, index it as you would any other item:

`> vl[[2]][1] 2.00 4.00 6.25`

To unlock this lesson you must be a Study.com Member.

Register to view this lesson

Are you a student or a teacher?

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Back
What teachers are saying about Study.com

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.