Copyright

Practical Application for Programming in R: Data Cleaning in R

Instructor: Alexis Kypridemos

Alexis is a technical writer for an IT company and has worked in publishing as a writer, editor and web designer. He has a BA in Communication.

In this practical lesson, you will be given several different data sources and asked to perform different cleaning tasks on that data, such as avoiding errors on import, imputing values, and reshaping data.

Lesson Overview & Knowledge Required

This lesson will provide several data sources, and ask you to perform several cleaning tasks on those sources. In some cases, you will be asked to perform multiple tasks on a single data source.

Knowledge required to complete these tasks includes:

  1. Understanding of basic data cleaning techniques like imputation and coercion, and how to accomplish these in R.
  2. Knowledge of R data structures and data types, as well as basic programming concepts like conditionals and loops.

Program Code

Two CSV files will be used for these tasks, named ''activities1.csv'' and ''activities2.csv''. To create these, you may copy the lines of text below, paste them into a text editor and save as .csv files in your R working directory. If you are not sure which is your working directory, or want to change it, use the getwd() and setwd() functions respectively.

''activities1.csv'':

1
3
0.5
2
-1
NA

''activities2.csv'':

Activities, Time
tennis, hiking, 1, 2

Once saved as CSV files in your R working directory, use the code below to read the data from the files into data frames in R.

ac1<-read.csv("activities1.csv")
ac2<-read.csv("activities2.csv")

Code Application

You will find that executing the functions above without additional modification will throw errors and / or warnings. Modify the functions, not the data, so that the data can be read into the data frames ac1 and ac2 without issue.

Tip: the error messages you receive will indicate what is wrong with the data. You can use this information to help you determine what you need to address in your code to successfully read in the data. Here's an example of such an error message:


Error in read.table(file = file, header = header, sep = sep, quote = quote,  : more columns than column names

In addition: Warning message:

In read.table(file = file, header = header, sep = sep, quote = quote,  : incomplete final line found by readTableHeader on 'activities1.csv'

Reading in the csv files one line at a time to avoid errors is accomplished with the readLines() function. However, in the case of the activities1.csv file, the additional argument skip=1 has to be added to omit importing the first row. This is done because the first row contains two values, whereas the next row contains four, and if not addressed, will throw an error.

ac1<-read.csv(text=readLines("activities1.csv", warn=FALSE), skip=1)
ac2<-read.csv(text=readLines("activities2.csv", warn=FALSE))

Follow-Up Questions

  1. Make sure all the entries in the activities2.csv file are read in as data, and not headers.
  2. Impute (substitute) any NA or negative values in the activities2.csv file with the mean value of the column where such values are found.
  3. Reshape the ac1 data frame so that it consists of two columns, Activity and Time, with the names of the activities (hiking, tennis) listed in the first column, and the numeric values listed in the second column.

Answer Key

1. Importing the data in activities2.csv without headers:

ac2<-read.csv(text=readLines("activities2.csv", warn=FALSE), header=FALSE)

2. Substituting NA values in ac2 with the mean:

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account
Support