Sooner or later, any business analyst faces low-quality data. Inaccurate, partial, confusing – they all cause a headache. Complicated work with sources and data processing in Excel, scripts, connectors and complex schemes.

As practice has shown, the process of preparing data is the main part of the work. This includes error detection, structure transformation and cleaning. As a result, under the impact is the end user – business – as the speed of report development decreases, and sometimes their quality as well.

Companies not only miss opportunities, but also lose money. Conducted by Gartner research shows, that “the average financial effect of poor data quality for organizations costs $ 9.7 million per year”.

Where does the dirty data come from?

The larger the organization and, correspondingly, the data flow, the more steps are taken to standardize them. But the percentage of inaccuracies is higher. What is the reason?

  1. Human error

People are the most common cause of dirty data, according to studies Experian. Variability of input methods, manual insertion into spreadsheets and even a simple spelling error create a lot of problems for the analyst.

  1. Disparate systems and sources

Organizations often store data in several systems, with different structures, requirements for integration and aggregation. As a result, duplicate or missing fields and a mountain of unmatched tables are obtained. Plus, data can use different names or values in different systems.

  1. Changing of requirements

In the process of the company’s development, administrators and data engineers are increasingly beginning to change the details of the introduction, at best creating new fields, at worst – rewriting all the structures. Often, analysts are not even aware of the changes until they export data to the BI system. Hello, mess.

Data preparation issues and ways to solve them.

Time-consuming processes in different departments

Most of the work – is not a data analysis, but cleaning and reformatting. Well, if this happens in the ETL-system, and after all, the self-written Excel-tables can also go to the course. And each time new data arrives, analysts need to manually repeat these steps.

In addition to disappointment, the analyst and business users get the bonus of having to fight for every “right” piece of information.

Traditionally, the preparation of data is handled by the IT department. They have the access necessary to introduce new sources into centralized repositories. And sometimes, several teams are responsible for different segments of the information work, so the analyst at the end can hardly imagine how many stages of “processing”.

Solution: Developing flexible processes and selecting the right tools.

“Make the decision to trust the data specialist, providing him with the necessary tools and access. This will allow not to wait for your turn in the chain, increase the quality of reporting and reduce the burden on IT “

Venkatesh Shivanna, Senior Analyst and Data Architect. Development of computer games.

Preparation requires deep knowledge of company data

Before preparing the data it is important to understand their location, structure and composition, as well as details, for example, field definitions. Experts call this process “data discovery”. This is a fundamental element of training.

The ability to independently prepare data in BI tools greatly facilitated the work. But a large number of analytics distract from the structure of the company, from other departments, which are generators of the initial information.

What data exists, where do they live, how are they defined by other units? Confusion in definitions can hinder analysis or, even worse, lead to inaccurate results.

Solution: creation of company standards for data definition.

The goal of standardization is to reduce the number of parts and define fields that vary from department to department. The output can be the creation of a single data dictionary. This will enable analysts to understand how terms are used in each business segment.

Brian Davis, project engineer, called such a dictionary “priceless”.

Constantly monitor and implement standards for storage and data entry. This work can lead to incredible results, if you follow the relevance of the dictionary. Management control is necessary at all stages: creating a glossary, its location, frequency of updating, etc. The obsolescence of the vocabulary or its failure to comply with the workers can cause damage and lead to the receipt of incorrect data.

Reality of data preparation by different departments. Data Prep

ETL systems can be quite complex, and this immediately limits the number of experienced users. But! Even if analysts and business users do not have access to data preparation tools, this does not mean that they can not perform these tasks in other applications. To date, there are tools available to users with different levels of technical training. How to find a balance in the work of a simple and IT-user with well-structured data and not duplicate work? The presence of uncoordinated data preparation units leads to a decrease in efficiency, scalability and controllability.

“The more repositories we have, the more data interpretations. This causes distrust of the result. ”

Jason harmer, operational director of the national insurance company

Solution: Joint work in the process of data preparation.

Combine the capabilities of different departments. Researches of Business Application Research Center (BARC) showed that the most satisfied with the result are analysts of the companies in which “the preparation of data was a joint work between IT and business departments.”

Power to analysts! Since in the work with data the main role is historically assigned to IT, it is important that analysts are aware of all the nuances, including the details, transformations or additions. Plan regular sessions of business and administrators, share standardized workflows and allow analysts to prepare data faster and more efficiently.

Author: Julia Grits

Tableau whitepaper
Gartner, Smarter with Gartner, How to Create a Business Case for Data Quality Improvement. January 9, 2017
TDWI, TDWI Upside, Five Key Elements Your Data Governance Business Glossary May Be Missing. February 16, 2016