Niko Korvenlaita, CTO & Co-Founder at reconfigured
We had the honor to interview Robert Harmon, a seasoned data architect who currently works at Firebolt. With over a decade of experience in ad tech and tech support, Robert has carved a niche for himself in the data world.
Dimensional Modeling and the Kimball Method
Dimensional modelling is a technique used in data warehousing to organize data into a structure that makes it easier to understand and use. The Kimball Method is a popular methodology for dimensional modelling, which is focused on creating a structure that is easy to understand and use.
Fact table is the core of a dimensional model and is used to record activities such as sales
Attributes are created from the fact table, forming a star pattern for maintaining relationships between entities
Challenge in dimensional modeling is to maintain the star structure and avoid circular references and complex structures
The customer dimension often includes location information and a sales representative to maintain the star structure
The limits of the star schema Kimball dimension scheme depend on the specific use case and finding a balance between complexity and constraint.
Data Integrity and Constraints
When building data systems, data will inevitably go wrong, so constraints are necessary to manage errors. Constraints are like guardrails in a bowling alley, allowing you to deal with errors. In a data warehouse, transactions should fail if a customer does not fit the format of a customer. This helps to identify and correct errors quickly.
Negotiating the Definition of Data in Operational Data Store and Data Warehouse
Decision on constraints and definitions of data in data management is challenging and requires negotiation between stakeholders such as data engineers, transactional data architects, database administrators, and business users
Negotiation takes place in the operational data store where the representation of data from operational systems meets definitions in the data warehouse
Data modeling is a crucial step in the data management process, defining relationships between data elements
The schema in the operational data store acts as a gatekeeper, defending agreed-upon data definitions
Any discrepancies or errors in data ingested into the ODS trigger an alert to the operational teams, indicating a problem in the data.
The Role of the Operational Data Store
Operational data store (ODS) serves as a central repository for operational data and data warehouse definitions, avoiding fragmentation and facilitating access for data warehouse teams.
Business users play an important role in negotiating data definitions, ensuring the data management process is driven by business value.
Effective communication between data warehouse architects and business users is necessary to analyze business processes and identify areas of improvement.
Business process analysis is a crucial tool for companies to understand and improve their operations, with the Malcolm Baldridge standard serving as a benchmark for high-performing businesses.
A Double-Edged Sword Dimensional modelling is often touted as an excellent solution for database design, but it is not without its challenges. Using dimensional modelling can lead to updating anomalies and data quality issues if not used properly. It is essential to understand the limitations of dimensional modelling and to use it in conjunction with other techniques like normalization to ensure data quality.
Declarative Referential Integrity:
A Must-Have Declarative referential integrity is an essential aspect of database design. It is a way of enforcing constraints on the data stored in a database. With it, logical mistakes and data quality issues are safe. It is essential to ensure that declarative referential integrity is a part of the database design process.
Proper Database Design:
The Key to Data Quality Proper database design is the key to ensuring data quality. It involves understanding the data being stored, the relationships between it, and the constraints that need to be in place. This involves using techniques like normalization and declarative referential integrity to ensure that the data stored is accurate and relevant.