Back To Course

Computer Science 114: Programming in R8 chapters | 62 lessons

Instructor:
*Giorgos-Nektarios Panayotidis*

George-Nektarios has worked as a tutor and student consultant for five years and has a 4-year university degree in Applied Informatics.

In this lesson, we'll explore the purpose of data transformation in the context of R Programming. You'll find out how it is used and carried out through exploring the related R functions/operations.

Imagine that you're a supplier for a company that produces furniture. You need to load the furniture products into a truck for delivery to other cities. You've carefully calculated the space in the truck and need to homogenize the way that each piece of furniture is placed into the truck to ensure everything fits. In other words, you need to eliminate skewness or asymmetry in the products' arrangement.

Handling data in a previously arranged model in R isn't all that different. Each data variable needs to follow a uniform, homogeneous distribution, and skewness has to be eliminated. In R programming, this can be achieved with data transformation.

Let's assume that we have a statistical model that we want to fit. In order to get meaningful results, we need statistical significance for the chosen explanatory/independent variables parameters/coefficients. Furthermore, in order to run this test, we would usually assume that the residuals need to follow the normal distribution, something formally called as homoscedasticity.

The point is that we need several assumptions to be fulfilled with regards to this model. Maybe the simplest of these assumptions would be the ''additivity'' attribute of the model, which implies a linear model, also known by the more complex technical term the ''Generalized Additive Model'' (GAM). Simply put, additivity means a variable's relationship such as this: *Y = B0 + B1*X1 + B2*X2 + e*.

The typical purpose of data transformation is the adherence to a (statistical) model's assumption in R. Therefore, in order to attempt to fulfill those rules, we would need to make all variables behave in a uniform way, much like the furniture in the earlier shipping example. This uniformity is managed via the so-called **data transformation** process, which is really nothing more than a specific **mathematical operation**, such as logarithms and powers. Keep in mind that different forms of data transformation exist, according to what data irregularity is faced, and consequently, what particular change needs to be accomplished in the data.

The R programming language comes packed with quite a few data transformation functions. Remember, these functions are designed to perform ordinary mathematical operations. However, they are also used to perform some of the most frequently required transformations in the data. Let's explore some of the most fundamental R functions/operations for data transformation.

The first type we'll explore are the transformation functions that produce an output of the data that is raised to a specific power or fraction power (root). These functions usually tackle skewness issues in the data, which in turn imply violation of the above-mentioned very basic additivity assumption. They include the following:

This is the square root. If we assume the name of a data variable that we want to transform is ''var_1'', then we could apply this function as sqrt(var_1) and then assign the result to a new variable name.

In order to derive another power or root, we would simply use the ''^'' operator. For example, to get the cube root, we would use the following code: var_1^(1/3) and then perform the assignment to the transformed data variable name using the R assignment operator.

Please note that in order to perform these transformations, you first need to insert the appropriate R library, which, in this case, is ''companion''. To render it usable by R, you need to type and enter: library(rcompanion).

The logarithmic transformation is also very common. It often has to do with model residuals violating normality assumptions. In other words, the assumption that residuals have the same variance (homoscedasticity) and the two most basic functions of the kind are:

In order to make use of ''log'', you would simply type something like log(var_1) in R and also make the appropriate assignment to a new variable. The log function produces the natural logarithm of its input.

Similarly, in order to get the base-10 logarithmic transformation, you'd type: log10(var_1).

In each instance, the produced variable is called log-transformed.

**Data transformation** is used when input to a certain model needs to undergo a specific **mathematical operation** in order to fit the model's assumptions. This may be due to issues violating a usual general linear model regarding explanatory variables or residuals. There are two basic forms of data transformation functions/operations in R. The first is **Power/Root Transformation Functions/Operations**, operations raising to a power or fraction power. The second is **Log Transformation Functions**, which perform logarithmic transformations on the data.

To unlock this lesson you must be a Study.com Member.

Create your account

Are you a student or a teacher?

Already a member? Log In

BackWhat teachers are saying about Study.com

Already registered? Log in here for access

Did you know… We have over 160 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

You are viewing lesson
Lesson
1 in chapter 8 of the course:

Back To Course

Computer Science 114: Programming in R8 chapters | 62 lessons

- Data Transformation in R Programming: Definition & Purpose
- Dplyr in R Programming: Definition & Functions
- Filter Function in R Programming
- Arrange Function in R Programming
- Select Function in R Programming
- Mutate Function in R Programming
- Summarize Function in R Programming
- Practical Application for Programming in R: Transforming Data Using Dplyr Functions
- Go to Transforming Data in R Programming

- SIE Exam Study Guide
- Indiana Real Estate Broker Exam Study Guide
- Grammar & Sentence Structure Lesson Plans
- Foundations of Science Lesson Plans
- Career, Life, & Technical Skills Lesson Plans
- Business Costs, Taxes & Inventory Valuations
- Using Math for Financial Analysis
- Assessments in Health Education Programs
- Governmental Health Regulations
- Understanding Health Education Programs
- AFOQT Prep Product Comparison
- ACT Prep Product Comparison
- CGAP Prep Product Comparison
- CPCE Prep Product Comparison
- CCXP Prep Product Comparison
- CNE Prep Product Comparison
- IAAP CAP Prep Product Comparison

- Saadat Hasan Manto: Biography & Works
- What is an Agile Environment? - Definition & Example
- Assessing a Patient's Nutritional & Gastrointestinal Status
- States Rights & the Civil War
- American Government Project Ideas for High School
- Supreme Court Case Project Ideas
- Letter E Activities
- Quiz & Worksheet - Japanese Industrialization
- Quiz & Worksheet - Confucian Virtue Ethics
- Quiz & Worksheet - Achievements of President Jackson
- Quiz & Worksheet - Catherine Earnshaw
- Analytical & Non-Euclidean Geometry Flashcards
- Flashcards - Measurement & Experimental Design
- Common Core English & Reading Worksheets & Printables
- Social Studies Lesson Plans

- UExcel Organizational Behavior: Study Guide & Test Prep
- Human Growth and Development: Homework Help Resource
- NY Regents Exam - US History and Government: Test Prep & Practice
- STAAR English I: Test Prep & Practice
- MTEL General Science (10): Practice & Study Guide
- The Congress: Powers & Elections Lesson Plans
- Genetics - Principles of Heredity: Biology 101 Lesson Plans
- Quiz & Worksheet - Socialization Changes Throughout Life
- Quiz & Worksheet - What is Project Management Framework?
- Quiz & Worksheet - Using the General Term of a Geometric Sequence
- Quiz & Worksheet - Asides in Literature
- Quiz & Worksheet - TOEFL Listening: Lectures

- The First Battle of Bull Run: Summary, Significance & Facts
- Earned Value Management: Definition, Formula & Examples
- Third Grade Georgia Science Standards
- Professional Development Resources for High School Teachers
- Failed the USMLE Step 1: Next Steps
- How to Learn Spanish for Kids
- How to Pass a Psychology Test
- Arizona Science Standards for 6th Grade
- EPT Test Dates
- What is Credit Recovery in High School?
- How to Pass the Series 6 Exam
- WV College & Career Readiness Standards for Social Studies

- Tech and Engineering - Videos
- Tech and Engineering - Quizzes
- Tech and Engineering - Questions & Answers

Browse by subject