Artem has a doctor of veterinary medicine degree.
Exploring Your Data
Have you ever seen a raw data set? Maybe it was in a comma delineated file. There's not much you can do with that. Not much sense you can make of it.
But with something known as exploratory data analysis, you can open up your eyes to a world of many possibilities, connections, and interesting tidbits you'd never otherwise spot.
In this lesson, we define some of the many aspects of exploratory data analysis and go over a couple of examples of when it might come in handy.
What is Exploratory Data Analysis?
Exploratory data analysis, or EDA, is a (mainly) visual approach and philosophy that focuses on the initial ways by which one should explore a data set or experiment. Two main aspects of EDA are:
- Openness. A person exploring the data should be open to all possibilities prior to its exploration.
- Skepticism. One must ensure that the obvious story the data tells is not misleading.
What is the General Purpose of EDA?
There is no formal set of techniques that are used in EDA. Remember, EDA is an approach to how we analyze data, not a specific set of methods set in stone. It's a philosophy and art more so than a science.
Its purpose is to take a general view of some given data without making any assumptions about it. We are trying to get a feel for the data and what it might mean as opposed to reject or accept some sort of premise around it before we begin its exploration.
In other words, with EDA we let the data speak for itself instead of trying to force the data into some sort of pre-determined model.
Nevertheless, some techniques are used to help us get a feel for the data. For instance, we can categorize data, quantify some of its basic aspects, or visualize it.
For instance, raw data can be plotted using histograms or other visualization techniques. Sometimes, the data is juxtaposed in a manner that helps us spot important patterns within or between data sets.
What is EDA Used For?
EDA is used for:
- Catching mistakes and anomalies
- Gaining new insights into data
- Detecting outliers in data
- Testing assumptions
- Identifying important factors in the data
- Understanding relationships
And perhaps, most importantly, EDA is used to help figure out our next steps with respect to the data. For instance, we might have new questions we need answered or new research we need to conduct.
So when would we use exploratory data analysis, specifically in the marketing field?
Well, let's say you work for a retailer that sells 100 different kinds of shoes. There are dress shoes, hiking boots, sandals, etc. Using EDA, you are open to the fact that any number of people might buy any number of different types of shoes.
You visualize the data using exploratory data analysis to find that most customers buy 1-3 different types of shoes. Sneakers, dress shoes, and sandals seem to be the most popular ones. No surprise there but at least you were open to different possibilities.
But after a closer look, the data helps you visualize something else. There is a small but significant group of people who buy 50 or more different types of shoes in any given year. That's something that would've been hard to spot without EDA, and had you not been open to this possibility, you might've dismissed this outright before.
Of course, you should immediately be skeptical about this. Make sure it's not just a glitch in the data set of some sort.
Let's assume that it's not. With EDA's purpose in mind, this outlying data should raise a few questions. Who are these people? Why do they buy so many shoes? Are these customers people or businesses?
You can further explore the data to get your answer or, if necessary, collect more data that can be explored later to get an answer. It might even open up a new customer pool you didn't think you even had!
Here's another example. Let's say that you're about to start a company offering to do people's taxes. Taxes are really confusing. Because of this, your website is designed in a way that clearly and easily explains important tax information in a readily digestible manner. It's so easy, even 6th-grade kids can understand it!
As a result, you expect most of your customer base is going to be not very well educated and not very well off as a result. Therefore, you'll set your prices to match this segment of the market accordingly.
Upon the exploration of your website's data, however, you notice that most of your readership is well-educated and well-off. What happened here? Perhaps even the well-educated get confused by taxes or don't want to take the time to figure out the complex terminology.
It seems you might have misunderstood your market base. Of course, you must be skeptical. Maybe the well-educated and well-off are visiting your website. But are they going to buy your service at higher prices, necessarily?
Further exploratory data analysis can help answer these and many other questions.
We, however, need to summarize this lesson.
Exploratory data analysis, EDA, is a philosophy, art, and a science that helps us approach a data set or experiment in an open, skeptical, and open-ended manner.
EDA allows us to find out what kind of model the data might reveal, not the model we must fit our data to. EDA doesn't have any particular techniques, but many approaches rely on visuals, like graphs, to help us understand what the data is telling us and what we must explore.
Overall, EDA can help us:
- Catch mistakes
- Gain new insights
- Detect outliers
- Test assumptions
- Understand relationships
To unlock this lesson you must be a Study.com Member.
Create your account
Register to view this lesson
Unlock Your Education
See for yourself why 30 million people use Study.com
Become a Study.com member and start learning now.Become a Member
Already a member? Log InBack