Removing Space From Strings in R Programming

Lesson Transcript
Instructor: Mark Smithers

Mark has taught college Math and Computer Science, worked as a Web Programmer and Software Engineer and has a master's degree in computer science.

This lesson will discuss the use of removing spaces from a string. Removing spaces from strings is a common practice and the procedure has many applicable uses. Methods to address this problem will be taught in this lesson. Updated: 07/20/2020

Removing Spaces From Strings

Sometimes our data contain extra spaces that can create problems during data analysis. So it's a good thing to know how to get rid of those extra spaces (which include tabs, line feeds, and carriage returns) when the need arises. What situations could we run into? Well, there are many reasons to remove spaces from strings. Some of them are:

  • Extra spaces at the beginning, end, even in the middle
  • Trying to count the number of actual characters in a string
  • Trying to sort a list of strings
  • Comparing two strings
  • Encoding a URL to a website

If our R program encounters spaces during any of these tasks, we will get unexpected results (incorrect character counts, improperly sorted lists, incorrect string comparisons, and URLs that won't work)

An error occurred trying to load this video.

Try refreshing the page, or contact customer support.

Coming up next: String Padding to a Specified Width in R Programming

You're on a roll. Keep up the good work!

Take Quiz Watch Next Lesson
 Replay
Your next lesson will play in 10 seconds
  • 0:04 Removing Spaces From Strings
  • 0:56 Dealing with Extra Spaces
  • 2:09 Counting and Sorting
  • 4:17 String Comparison
  • 5:12 URL Encoding
  • 5:48 Lesson Summary
Save Save Save

Want to watch this again later?

Log in or sign up to add this lesson to a Custom Course.

Log in or Sign up

Timeline
Autoplay
Autoplay
Speed Speed

Dealing with Extra Spaces

Excess spaces can happen. It's life. Perhaps someone was typing late at night and the person was only half awake, or the person fell asleep on his keyboard. Extra spaces can make their way into documents and will need to be removed programmatically. No worries. R has some handy, built-in functions to take care of that. The trimws() function will remove leading or trailing spaces in a string.

For example, here is a string with an extra space at the beginning and the end:


sentenceString <- ' Dan is here. '
sentenceString = trimws(sentenceString)
sentenceString


The code here removes the leading and trailing spaces and assigns the output back to the variable. Without assigning the results to some variable, the recoded string is not stored for reuse.

This function is great for leading and trailing spaces. But what if we have spaces in the middle of a string vector? Another way to remove spaces is with sub(), and it can do that. This code appearing here is an example of that:


sentenceString <- 'Debili tated'
searchString <- ' '
replacementString <- ''
sentenceString = sub(searchString,replacementString,sentenceString)
sentenceString


The sub() function replaces the first occurrence of the searchString in the sentenceString and replaces it with the replacementString. In this example the search string is a space and the replacement string is empty space.

Counting and Sorting

The sub() function only replaces the first occurrence. What if we have lots of them? The gsub() function is just like sub(), except it replaces all occurrences. When might we need to do that?

Imagine you were asked to write a program that counts the number of characters in a string. How you might count characters in a string and how the computer counts them may differ, because the computer sees the ASCII codes, not the character symbols. The computer sees ASCII code 32 (a space). You see a space.

For example, if you were to ask a third grader how many letters are in the string ''D a n'', he or she would probably say three. They might even get a gold star for the day. However, if you ask a computer, it will tell you seven because it counts the spaces in the front, end, and between all letters. In this case, all spaces in the string must be removed to get a more accurate ''human'' count. Here is code that will accomplish the task of removing extra spaces.


sentenceString <- ' D a n '
searchString <- ' '
replacementString <- ''
sentenceString = gsub(searchString,replacementString,sentenceString)
sentenceString


The gsub() function is similar to the sub() except that it will replace all, not just the first, occurrences of the searchString in the sentenceString and replace it with the replacementString. This figure appearing here illustrates the computer program output of a character count when there is an extra space at the beginning of the string.

Figure 1: Character Count
Character Count

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it now
Create an account to start this course today
Used by over 30 million students worldwide
Create an account