What is the difference between Data Preparation and Data Exploration?

The rise of Big Data and Modern BI, as described by Gartner , has multiplied terms related to “data”. Today there are about twenty. Even if an IB professional knows exactly what it is, it is often not easy for him to explain to the profession or top management the subtleties of these terms.

To clarify this term, sharp (and sometimes marketing), this article returns in the first place on the difference between the “Data Preparation” and “Data Exploration” (other keywords will be discussed later).

To begin with, Data Preparation and Data Exploration are two preliminary phases of analytics. They relate to how raw data is ingested in BI software. But they do not have the same function.

Data Preparation

This is the very first phase of a BI project or a self-service BI use.

“Data Preparation is the process of transforming raw data into useful information for users who must make decisions,” explains Eric Delattre of BIRST France (Infor).

Specifically, “data preparation encompasses the merging of multiple data sources, unnecessary data filtering , consolidation , data aggregation, and calculation of additional values based on raw data.”

For Qlik France, the data preparation is the part before the exploration and before the analysis. “This includes everything related to the ETL (Extraction, Transformation, Loading) and the quality of the data,” says the subsidiary.

In Qlik Sense, the Data Preparation is done with a graphical interface which wants to be used in self-service. “This is one of the great strengths of Qlik Sense,” says the publisher.

Data Exploration

Data mining follows the preparation phase.

For Qlik France, “it concerns all the part devoted to the analysis of data ” prepared ”. The goal is to answer questions and find answers. ”

Eric Delattre, from BIRST, clarifies this step. “Data Exploration is the process by which trades can interactively explore the data presented to them. For example, data mining includes drilling at lower drill down levels, filtering to display a subset of data, or rearranging the data to better understand it. ”

In summary, this step is – as its name suggests – an exploration of the unprocessed data.

Boxes and holes

To symbolize the difference between the two, Qlik uses a metaphor: “imagine data analysis as a way to play with boxes”.

“The preparation will consist of creating the box and filling it with your data to you. Once the box is filled, we will put on the lids with holes of different sizes and shapes to be able to look at what is inside. This is the exploration part .

In the end, the self-service is the ability to create its “own lid with holes adapted to my needs. But, it’s still the same data I’m looking at. “

Leave a Reply

Your email address will not be published. Required fields are marked *