Mark has taught college Math and Computer Science, worked as a Web Programmer and Software Engineer and has a master's degree in computer science.
Removing Spaces From Strings
Sometimes our data contain extra spaces that can create problems during data analysis. So it's a good thing to know how to get rid of those extra spaces (which include tabs, line feeds, and carriage returns) when the need arises. What situations could we run into? Well, there are many reasons to remove spaces from strings. Some of them are:
- Extra spaces at the beginning, end, even in the middle
- Trying to count the number of actual characters in a string
- Trying to sort a list of strings
- Comparing two strings
- Encoding a URL to a website
If our R program encounters spaces during any of these tasks, we will get unexpected results (incorrect character counts, improperly sorted lists, incorrect string comparisons, and URLs that won't work)
Dealing with Extra Spaces
Excess spaces can happen. It's life. Perhaps someone was typing late at night and the person was only half awake, or the person fell asleep on his keyboard. Extra spaces can make their way into documents and will need to be removed programmatically. No worries. R has some handy, built-in functions to take care of that. The trimws() function will remove leading or trailing spaces in a string.
For example, here is a string with an extra space at the beginning and the end:
sentenceString <- ' Dan is here. '
sentenceString = trimws(sentenceString)
The code here removes the leading and trailing spaces and assigns the output back to the variable. Without assigning the results to some variable, the recoded string is not stored for reuse.
This function is great for leading and trailing spaces. But what if we have spaces in the middle of a string vector? Another way to remove spaces is with sub(), and it can do that. This code appearing here is an example of that:
sentenceString <- 'Debili tated'
searchString <- ' '
replacementString <- ''
sentenceString = sub(searchString,replacementString,sentenceString)
The sub() function replaces the first occurrence of the searchString in the sentenceString and replaces it with the replacementString. In this example the search string is a space and the replacement string is empty space.
Counting and Sorting
The sub() function only replaces the first occurrence. What if we have lots of them? The gsub() function is just like sub(), except it replaces all occurrences. When might we need to do that?
Imagine you were asked to write a program that counts the number of characters in a string. How you might count characters in a string and how the computer counts them may differ, because the computer sees the ASCII codes, not the character symbols. The computer sees ASCII code 32 (a space). You see a space.
For example, if you were to ask a third grader how many letters are in the string ''D a n'', he or she would probably say three. They might even get a gold star for the day. However, if you ask a computer, it will tell you seven because it counts the spaces in the front, end, and between all letters. In this case, all spaces in the string must be removed to get a more accurate ''human'' count. Here is code that will accomplish the task of removing extra spaces.
sentenceString <- ' D a n '
searchString <- ' '
replacementString <- ''
sentenceString = gsub(searchString,replacementString,sentenceString)
The gsub() function is similar to the sub() except that it will replace all, not just the first, occurrences of the searchString in the sentenceString and replace it with the replacementString. This figure appearing here illustrates the computer program output of a character count when there is an extra space at the beginning of the string.
Now suppose you have another assignment to sort the names in a text file in alphabetical order. If you ask that same third grader which comes first, ''Alex'' or ''Dan,'' the third grader will say ''Alex.'' In contrast, the computer program will tell you ''Dan'' because the ASCII value for a space (32 in decimal) is lower than either the ASCII value for the letter ''A'' (65 in decimal) or the ASCII value for the letter ''D'' (68 in decimal). Using trimws(), sub(), or gsub() would work here. This next figure appearing here illustrates the incorrect result of computer sorting when there is an extra space at the beginning of a string.
Suppose you needed to write some login code for a website. This would require a database query in which the text field entered by the user would be compared to the entire database of user names. That's why you sometimes see a delay before the website brings you to your inbox. If someone typed in ''StudyOHolic99,'' with an extra space in the front, in the username portion, but in the database it was ''StudyOHolic99,'' no space in front, they most likely would not get into their account. They might even leave bad reviews on Google review about the website. The reason the person will not be able to log in is because the code takes what was typed in the username portion and looks for an exact match in the database. A trimming function (like the one shared previously) will fix this right up for you. Figure 3 illustrates string comparison when there is an extra space in the input string.
URL encoding is the formatting of a string passed on a URL line. This usually involves replacing many characters, including spaces, with another sequence of characters that are more URL friendly. For example, spaces in URLs usually become '%20'.
However, you probably won't have to worry about that. URL encoding is a built-in function in many programming languages, including the R programming language, and does not need to be implemented unless you are extremely ambitious. This is a good thing to know in advance!
All right, let's briefly review what we've learned. Spaces sometimes need to go away, temporarily or permanently, from a string. Spaces can throw off functions such as sorting, string comparison, and character count. Also, spaces in URLs need to be encoded before they can be used, and it's good to know that most languages will do that for you. A developer must decide if a space needs to be altered or not. There are functions that make the process simple, with examples like:
- trimws(), which will remove leading or trailing spaces in a string
- sub(), which replaces the first occurrence of the searchString in the sentenceString and replaces it with the replacementString
- gsub(), which is just like sub(), except it replaces all occurrences
To unlock this lesson you must be a Study.com Member.
Create your account
Register to view this lesson
Unlock Your Education
See for yourself why 30 million people use Study.com
Become a Study.com member and start learning now.Become a Member
Already a member? Log InBack