Why you should do data transformation away from BI tools

Richard Makara, CEO & Founder at reconfigured

Why does getting a new dashboard take so long to be delivered, is error-prone, and often sits unused? Does every ad-hoc question from stakeholders results in you having to redo all of your previous work?

The reason:

You are likely misusing your BI tool, and trying to bend it to do something it wasn't meant to do. This is a typical symptom of missing well-defined core business data models. Most of the time spent building the report is used on basic data cleaning instead of analysis. I remember my days with Periscope - in theory, it was meant for last-mile analysis and charting. But we had to use it for data modeling, creating these middle layers of clean data we would materialize.

Naturally, the tool wasn’t bending to it, and there were many delays, and sometimes the data would not even refresh. We should have done the heavy lifting elsewhere and not used the shared resources.

I am biased, but I 100% believe in separating analysis and data “prepping” into their own domains.

Let me explain why performing data transformations directly in your BI tool can be slow and inefficient.

  1. Simple transformations like renaming columns, pre-joining data, and flattening JSONs are resource heavy and can be done more efficiently outside the tool.
  2. Most analysis is based on some form of pre-aggregation. Doing so outside the last mile can improve query performance and avoid slow BI tool performance.
  3. Window functions and data pivoting can be resource-intensive, and performing these operations outside the tool can improve efficiency.
  4. Using pre-joined data can reduce the burden on the BI tool, improve performance and avoid incorrect joins
  5. Doing the same tedious transformation over and over again wastes team resource and frankly kills any joy anyone has for analytics.

By performing data transformations outside of your BI tool, you can speed up data processing, reduce the burden on the tool, and improve overall data quality. You also maintain control over business-critical data definitions and are one step closer to achieving data enlightenment - the semantic layer that powers all your data questions.

Semantic transformation layer

You can customize the process to meet your needs and ensure your data is clean and accurate. Here however, it’s essential to understand your own data journey - do you REALLY need heavy customization, or can predefined templates solve 80% of your questions?

The logical question is then where and how. Products like dbt have pioneered the idea of having a dedicated transformation layer. Recently Google/Looker announced doubling down on Looker Modeler, a way to build the semantic layer outside Looker itself.

In conclusion, data transformation is a critical step in the data analysis. While many BI tools offer data transformation capabilities, it's best to do this work outside the tool.

Simple transformations, pre-aggregation, window functions, data pivoting and melting, and pre-joins are best performed outside a BI tool. Doing the transformation work outside of a BI tool allows you to maintain control over your data and enable reusing it easier.

It also has an unverified impact on your team's motivation and stress levels ;-)

Feel like chatting instead?

We're in early-stages of development and always looking to hear from analytics professionals. Catch us via our calendar below.

Latest and greatest posts

Search Pivot