Collections of databases that work together are called data warehouses. This makes it possible to integrate data from multiple databases. Data mining is used to help individuals and organizations make better decisions.
A database consists of one or more files that need to be stored on a computer. In large organizations, databases are typically not stored on the individual computers of employees but in a central system. This central system typically consists of one or more computer servers. A server is a computer system that provides a service over a network. The server is often located in a room with controlled access, so only authorized personnel can get physical access to the server.
In a typical setting, the database files reside on the server, but they can be accessed from many different computers in the organization. As the number and complexity of databases grows, we start referring to them together as a data warehouse.
A data warehouse is a collection of databases that work together. A data warehouse makes it possible to integrate data from multiple databases, which can give new insights into the data. The ultimate goal of a database is not just to store data, but to help businesses make decisions based on that data. A data warehouse supports this goal by providing an architecture and tools to systematically organize and understand data from multiple databases.
As databases get larger, it becomes increasingly difficult to keep the entire database in a single physical location. Not only does storage capacity become an issue, there are also security and performance considerations. Consider a company with several offices around the world.
It is possible to create one large, single database at the main office and have all other offices connect to this database. However, every single time an employee needs to work with the database, this employee needs to create a connection over thousands of miles, through numerous network nodes. As long as you are moving relatively small amounts of data around, this does not present a major challenge.
But, what if the database is huge? It is not very efficient to move large amounts of data back and forth over the network. It may be more efficient to have a distributed database. This means that the database consists of multiple, interrelated databases stored at different computer network sites.
To a typical user, the distributed database appears as a centralized database. Behind the scenes, however, parts of that database are located in different places. The typical characteristics of a distributed database management system, or DBMS, are:
- Multiple computer network sites are connected by a communication system
- Data at any site are available to users at other sites
- Data at each site are under control of the DBMS
You have probably used a distributed database without realizing it. For example, you may be using an e-mail account from one of the major service providers. Where exactly do your e-mails reside? Most likely, the company hosting the e-mail service uses several different locations without you knowing it.
The major advantage of distributed databases is that data access and processing is much faster. The major disadvantage is that the database is much more complex to manage. Setting up a distributed database is typically the task of a database administrator with very specialized database skills.
Once all the data is stored and organized in databases, what's next? Many day-to-day operations are supported by databases. Queries based on SQL, a database programming language, are used to answer basic questions about data. But, as the collection of data grows in a database, the amount of data can easily become overwhelming. How does an organization get the most out of its data without getting lost in the details? That's where data mining comes in.
Data mining is the process of analyzing data and summarizing it to produce useful information. Data mining uses sophisticated data analysis tools to discover patterns and relationships in large datasets. These tools are much more than basic summaries or queries and use much more complicated algorithms. When data mining is used in business applications, it is also referred to as business analytics or business intelligence.
Consider an online retailer that sells a wide variety of products. In a typical day, it may sell thousands of different products to tens of thousands of different customers. How does the company leverage all this data to improve its business? One strategy is to discover which products are often bought together.
This would make it possible to create product bundles that are attractive to customers. Another method is to develop profiles for customers. A company could ask, based on past purchases, which products might the same customer also be interested in? This makes it possible to make suggestions to the customer and increase sales.
Another scenario is fraud detection. Have you ever had your credit card company contact you regarding a suspicious transaction? How does this work? Let's say you're a construction worker in Minneapolis. Normally, you use your credit card at the grocery store, the mall and some local restaurants, all within the Minneapolis area.
Suddenly, your credit card is used to pay for a high-end hotel in Miami Beach, several nightclubs and a jewelry store. It could very well be that you went down to Miami for a romantic weekend with your girlfriend because you are going to propose to her. But, it is also quite possible that your credit card was stolen and you have not noticed it yet.
So, the credit card company has sophisticated algorithms running in real-time to identify patterns that are out-of-the ordinary based on your demographics and past spending habits. A suspicious transaction triggers an alert, and you are contacted by their fraud detection department. Pretty clever and all thanks to data mining.
Data mining algorithms are often designed to get better over time as more data is collected and the outcomes of the analysis are checked for accuracy. You probably recognize these scenarios. Data mining has become integrated into many businesses, especially those with a strong online presence.
In summary, databases are often stored in a central computer system known as a computer server. A data warehouse is a collection of databases that work together. This makes it possible to examine patterns and trends by combining multiple databases.
Distributed databases are used to store a database at multiple computer sites to improve data access and processing. Data mining is the process of analyzing data and summarizing it to produce useful information. Data mining uses sophisticated data analysis tools to discover patterns and relationships in large datasets.
Once you've completed this lesson, you'll be able to:
- Describe what data warehouses do and their importance
- List some characteristics of a distributed database management system
- Summarize how data mining works and provide an example of its usefulness