# Data Integration in Data Mining

Instructor: David Gloag
We want our information to give us more than simply the sum of its parts. We want information to be seamless and timely. We want to learn things from the information we collect, and we want the information to be accurate and relevant. In this lesson, we'll look at data mining and data integration and how the two are related.

## Getting More from Information

We gather so much information. It's a wonder we can keep track of it all. We personally gather digital information in the form of music or movies, businesses gather sales and accounting information, and governments gather tax and social issue information. But is gathering information enough? Shouldn't we get more from our efforts than a big pile of information? Indeed, we should. In this day and age, we must also derive something useful from the information we gather, something that will extend the boundaries of what we know. Many techniques and technologies can help. They fall into the general category of data mining.

## What is Data Mining?

Data mining is a discovery process. By that we mean a process that looks at organizing and recognizing patterns in large amounts of information. Data mining is multidisciplinary, borrowing techniques and know-how from

• Artificial Intelligence
• Computer Science
• Databases
• Machine Learning
• Statistics

Ultimately, the purpose of data mining is to derive new information and conclusions from information sets that were seemingly random.

Consider the following. Say we have a set of values: 12, 4, 0, 20, 16, and 8. A jumble of values, but maybe we can learn something from them. Let's apply some order and arrange these values from smallest to largest. The set is now 0, 4, 8, 12, 16, and 20. Examining the list, we see that it is a sequence--specifically, a set of values (i) that adhere to this formula:

• i = 4k, where k = 0, 1, 2, 3, 4, 5

Using this, we can take our formula a step further and expect that the next value in the sequence will be 24 (4 x 6). We used data mining to organize, recognize, derive new information, and predict using an existing information set.

Okay, now that we've got the idea, let's look at something more real-world. A video streaming service such as Netflix tracks the movies that you watch over the course of a month. They do this ensure that you are getting charged correctly, but that's not all the only purpose they have for the information. Taking a closer look at the dates you watched the movies and the actors involved in those movies, they could deduce that your favorite actor is Robert De Niro, and Saturday night is movie night. A carefully written email sent to you on Friday listing a De Niro film you haven't streamed could result in more sales for Netflix.

## What is Data Integration?

Conceptually, data integration is straight forward: New information is merged with information that already exists. Any business that regularly collects information is concerned with data integration. Businesses want their information to be accurate and up-to-date. If you think about it, data integration even affects individuals. For example, we collect a new phone number from our friends, we add new music to our cell phones, or we receive personal email. We are receiving new information and merging it with existing information. Most of this process is transparent to us because it happens behind the scenes, but it is there nonetheless.

To unlock this lesson you must be a Study.com Member.

### Register to view this lesson

Are you a student or a teacher?

#### See for yourself why 30 million people use Study.com

##### Become a Study.com member and start learning now.
Back
What teachers are saying about Study.com

### Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.