Over the years, we’ve described common dimensional modeling mistakes, such as our October ’03 “Fistful of Flaws” article in Intelligent Enterprise magazine. And we’ve recommended dimensional modeling best practices countless times; our May ’09 “Kimball’s Ten Rules of Dimensional Modeling” article has been widely read. While we’ve identified frequently-observed errors and suggested patterns, we haven’t […]
Kimball Design Tips
A student in a recent Data Warehouse Lifecycle in Depth class asked me for an overview of the Kimball Lifecycle approach to share with their manager. Confident that we’d published an executive summary, I was happy to oblige. Much to my surprise, our only published Lifecycle overview was a chapter in a Toolkit book, so this Design Tip […]
A junk dimension combines several low-cardinality flags and attributes into a single dimension table rather than modeling them as separate dimensions. There are good reasons to create this combined dimension, including reducing the size of the fact table and making the dimensional model easier to work with. Margy described junk dimensions in detail in Kimball Design Tip #48: […]
Successful data warehouse and business intelligence solutions provide value by helping the business identify opportunities or address challenges. Obviously, it’s risky business for the DW/BI team to attempt delivering on this promise without understanding the business and its requirements. This Design Tip covers basic guidelines for effectively determining the business’s wants and needs. First, start by properly preparing […]
Most ETL tools provide some functionality for handling slowly changing dimensions. Every so often, when the tool isn’t performing as needed, the ETL developer will use the database to identify new and changed rows, and apply the appropriate inserts and updates. I’ve shown examples of this code in the Data Warehouse Lifecycle in Depth class using standard INSERT […]
Students often blur the concepts of snowflakes, outriggers, and bridges. In this Design Tip, I’ll try to reduce the confusion surrounding these embellishments to the standard dimensional model. When a dimension table is snowflaked, the redundant many-to-one attributes are removed into separate dimension tables. For example, instead of collapsing hierarchical rollups such as brand and category into columns […]
In the dimensional modeling world, we try very hard to separate data into two contrasting camps: numerical measurements that we put into fact tables, and textual descriptors that we put into dimension tables as “attributes”. If only life were that easy… Remember that numerical facts usually have an implicit time series of observations, and usually participate in numerical […]
Many transaction processing systems consist of a transaction header “parent” with multiple line item “children.” Regardless of your industry, you can probably identify source systems in your organization with this basic structure. When it’s time to model this data for DW/BI, many designers merely reproduce these familiar operational header and line constructs in the dimensional world. In this Design […]
Meaningless integer keys, otherwise known as surrogate keys, are commonly used as primary keys for dimension tables in data warehouse designs. Our students frequently ask us – what about fact tables? Should a unique surrogate key be assigned for every row in a fact table? Although for the logical design of a fact table, the answer is no, […]
We are firm believers in the principle that business requirements drive the data model. Occasionally, we’ll work with an organization that needs to analyze Type 2 changes in a dimension. They need to answer questions like “How many customers moved last year?”, or “How many new customers did we get by month?” which can be difficult with the […]