How to build a Semantic Model?

Tatu Virtanen, CSM at reconfigured

The key to a successful collaboration is to order data based on business logic, not based on its source.

Investments in a centralized data warehouse can lead to low outcomes if people can't interpret the data. It is hard for business teams to work with data if the starting point is to read SQL queries. Unclarity leads different job functions to combine financial data and product usability metrics differently, creating a lack of trust in data when comparing numbers.

The puzzle's missing piece is creating metrics and business concept definitions directly in the centralized data warehouse. When ordered by business logic, the data tables are more readable; everyone finds the content as a logical business concept is presented per table, with cleaned and transformed information describing it in each column. The idea we are talking about is a Semantic Layer of your data. Let us introduce this concept next!

A semantic layer is a business representation of corporate data that helps end users access data autonomously using common business terms. A semantic layer maps complex data into familiar business terms such as product, customer, or revenue to offer a unified, consolidated view of data across the organization.
-Wikipedia

Building Blocks of a Semantic Layer

The rest of the blog post introduces the concepts behind the semantic model. As Wikipedia put it, the key is organizing data into familiar business terms so users can access data autonomously.

Entity

Let's look at your customers first. You have a contract with a company with multiple users for your products and services. Your data already has three distinctive business objects that can be individual data tables in your warehouse; Customer Company, Contract, and User. These data tables are Entities that include multiple columns of cleaned and transformed data called Attributes.

Customer Company Entity

The line between an Entity and an Attribute is not strict because sometimes business needs will define what an Entity is: when there are three types of customers; Enterprise-plan, Individual-plan, and Free-plan. You can have one Entity Customer with an Attribute Customer Type or three different Entities because you often follow customer types separately.

Attribute

Attributes are individual values, often a column in one of your source tables or a combination of three different columns from three sources; sign-up, sign_up, and SignUp all mean the same, so only one Attribute is needed to present them in the Semantic Model. Creating an Attribute includes cleaning and transforming the raw data.

Semantic transformation layer

There are different kinds of Attributes:

  • Identifiers; email, user_id, and account_id. One Attribute must uniquely and globally identify that Entity. Technically we talk about primary and foreign keys.
  • Dimensions; categories such as Country and Company size
  • Values; such as name or product price
  • Combination of other Attributes; Full Name Attribute can consist of two Attributes, 'First Name' and 'Last Name'

Source tables

Looking at the source tables in the warehouse is usually the point of getting lost. There are too many data sets, and you don't know if the numbers are trustworthy. The data is there but not accessible without representing it in business terms. A source can be your CRM, and a source table, for example, Hubspot Contacts synced with Fivetran (ETL tool that moves data).These two examples are from two different data sources because you measure users' activities in the product and communicate with them via a messaging app. When we have Entities and Attributes in place, we can connect the events to the correct objects and analyze them later.

This data for the User comes from two different data sources because you measure users' activities in the product and communicate with them via a messaging app. When we have Entities and Attributes in place, we can connect the events to the correct objects and analyze them later.

Event data

It is easy to measure more events than necessary and clutter the data warehouse. Product event examples:

  • User A "logged in"
  • User B "sent a support message"
  • User A "opened a marketing message"

When building a Semantic Model, we are interested in events worth reporting: what actions does a user need to do before you can qualify them for the Enterprise subscription? Grouping multiple raw events into a new, more meaningful event is also data modeling.Representation of these user events also requires data from their company. Here the power of well-defined Entities comes into play: you can quickly create a table of product-qualified leads by connecting company data with product data.

Feel like chatting instead?

We're in early-stages of development and always looking to hear from analytics professionals. Catch us via our calendar below.

Latest and greatest posts

Search Pivot