Back To CourseBig Data Tutorial & Training
5 chapters | 53 lessons
Temitayo has 11+ years Industrial Experience in Information Technology and has a master's degree in Computer Science.
Databases are classified based on the number of concurrent users, database size, data location, their use and time sensitivity of the information gathered.
Databases such as transaction or operational databases, record transactions immediately and accurately, reflecting critical daily operations. An example of this is a bank's customer database.
A data warehouse however, is used to store historical information in databases captured from diverse sources for the purpose of aiding tactical or strategic decision making. Its use falls under the database classification of time sensitivity.
Let's talk fashion. The fashion industry thrives on customer loyalty (we love our labels), customer changing tastes (we like to stay trendy), timing and designer inspirations. Consider Gladys Fashion House (GFH),which keeps abreast by analysing trends (collated historical data) in all areas of the industry, captured from diverse sources.
GFH is able to predict its clients' taste and expected sales surge from the analysis of these sets of data and as a result, is able to stock up in anticipation of the next fashion season (critical decisions).
The first thing that comes to mind with poor quality data, is incorrect data which actually forms only part of the poor quality issue. On the flip side, good quality data may not necessarily be free from errors either! Where does that leave us? Quality data can be defined as data that consistently meets the needs of the knowledge worker and user requirements. It is data that is usable and applicable to the business requirement.
In GFH, its ensuring the perfect outfit is picked out for the perfect occasion. A well-tailored (good quality) workman's overall may not be suitable for dinner (wrong occasion) but perfect for a construction site (good quality and appropriate needs met).
Data quality satisfies the following attributes or characteristics:
The first five attributes cover most of the common issues lacking in poor data quality and as long as our data satisfy these attributes, it is considered to be error-free. Error-free data, however, does not necessarily constitute quality data as we have seen from the workman's overall illustration earlier.
If GFH set a trend that is considered hot (its creative and likeable by clients) ''error-free'' but released in the wrong season ''bad timing'', poor sales figures would result. Data must be timely and useful (as depicted in attributes 6 and 7). The bottom line is that the data must be well suited where it's needed, when it's needed.
A stylist at GFH can pick a perfect outfit for a complete stranger. How? Historical data analysis and training! She requests the occasion, estimates your age, assess your body type, what you currently have on, knows what's trending and voila! she picks you right outfit.
The sales representatives need demographic analysis and gender distribution to make a good sales pitch. While the financial analyst leaves no room for errors. He needs precise records of customer purchases down to the last cent to make accurate financial forecasts. We see that each knowledge worker requires a different level of accuracy , completeness and consistency in the data.
Poor quality data are injected in the following processes:
Data entry errors, including misspellings, numerical transpositions, incorrect codes, misplaced data and abbreviations can occur as companies migrate their businesses to the web and, in the process, allow customers and suppliers enter data directly into their systems.
The ability of the systems to check the data as it is entered. Validation routines are however limited. Valid data could be permissible, but not necessarily correct.
An example here, is a GFH item: button-front denim skirt with item code G2345545. However, during data entry, item code G2343545 was keyed in. Item code validation criteria requires: item code's first character must be a letter followed by 7 numbers. The entry therefore satisfies the criteria which made it permissible during the validation routine, but the correct code should have been G2345545 making it incorrect but goes undetected.
Organizations frequently change their systems and as they do these migrations, mismatched syntax formats can occur.
For example, the GFH naming convention ,in the old system, is ''Button-front denim skirt''. In the new system though, the naming convention is ''Denim button-front skirt''.
New data fields or codes are added by administrators who fail to inform managers of connecting systems and front line staff re-use of existing fields to capture new information unforeseen by application developers.
Here's another example: GFH releases a nice set of unisex t-shirts. Clothes are usually categorised male or female The data-entry personnel just categorize them under female since unisex is not listed in the data entry options.
The IT Departments always seems the default when it comes to placing the responsibility of ensuring quality data. Though their involvement is vital, they actually do not have the authority to change management processes or behavioral patterns.
To impact quality data, management must be prepared to initiate data quality programs, overseen by an authoritative designate in every section of the business.
System performance is always critical to IT administrators and sometimes during data loading they may deactivate certain functions such as referential integrity to maintain optimal system performance. Consequently, data or tables updated at the source, may have integrity issues in the warehouse.
Data warehouse collate information from diverse data sources, and as a result, numerous application interfaces are needed to access it. These interfaces, constitute complex infrastructures with updates from these data sources sometimes erroneously omitted thereby affecting other systems. Subsequent corrective updates could be tedious and very expensive to make.
Data defects have varied sources and keeping data quality at acceptable levels should become more of an organizational lifestyle than a project, with deliberate efforts and planned coordination throughout the organization. Discipline has to be the key. Stringent users determine the quality.
Data warehouses store data collated from diverse sources to be analysed and used for tactical and strategic decision making.
Quality data issues arise from data entry processes, change of source systems, administrative manipulations, data loading, and complexity of the infrastructure.
Good quality data is a delicate balance between data accuracy and data usability. Error free data does not automatically constitute quality data. Dealing with diverse sources compounds the problem but the bottom line is generating data that fit and suitably meets the business requirement.
To unlock this lesson you must be a Study.com Member.
Create your account
Already a member? Log InBack
Did you know… We have over 160 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.
To learn more, visit our Earning Credit Page
Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.
Back To CourseBig Data Tutorial & Training
5 chapters | 53 lessons