Students often blur the concepts of snowflakes, outriggers, and bridges. In this Design Tip, I’ll try to reduce the confusion surrounding these embellishments to the standard dimensional model. When a dimension table is snowflaked, the redundant many-to-one attributes are removed into separate dimension tables. For example, instead of collapsing hierarchical rollups such as brand and category into columns […]

The notion of time pervades every corner of the data warehouse. Most of the fundamental measurements we store in our fact tables are time series, which we carefully annotate with time stamps and foreign keys connecting to calendar date dimensions. But the effects of time are not isolated just to these activity-based time stamps. All […]

In the dimensional modeling world, we try very hard to separate data into two contrasting camps: numerical measurements that we put into fact tables, and textual descriptors that we put into dimension tables as “attributes”. If only life were that easy… Remember that numerical facts usually have an implicit time series of observations, and usually participate in numerical […]

The Kimball Group has been exposed to hundreds of successful data warehouses. Careful study of these successes has revealed a set of extract, transformation, and load (ETL) best practices. We first described these best practices in an Intelligent Enterprise column three years ago. Since then we have continued to refine the practices based on client […]

In this white paper, Ralph proposes a comprehensive architecture for capturing data quality events, as well as measuring and ultimately controlling data quality in the data warehouse. This scalable architecture can be added to existing data warehouse and data integration environments with minimal impact and relatively little upfront investment. Using this architecture, it is even […]

Many transaction processing systems consist of a transaction header “parent” with multiple line item “children.” Regardless of your industry, you can probably identify source systems in your organization with this basic structure. When it’s time to model this data for DW/BI, many designers merely reproduce these familiar operational header and line constructs in the dimensional world. In this Design […]

When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process. The power of a dimensional model comes from a careful adherence to “the grain.” A clear definition […]

Meaningless integer keys, otherwise known as surrogate keys, are commonly used as primary keys for dimension tables in data warehouse designs. Our students frequently ask us – what about fact tables? Should a unique surrogate key be assigned for every row in a fact table? Although for the logical design of a fact table, the answer is no, […]

We are firm believers in the principle that business requirements drive the data model. Occasionally, we’ll work with an organization that needs to analyze Type 2 changes in a dimension. They need to answer questions like “How many customers moved last year?”, or “How many new customers did we get by month?” which can be difficult with the […]

With the current industry buzz focused on master data management (MDM), it’s time to revisit one of the most critical elements of the Kimball method. Back in 1999, Ralph Kimball wrote an Intelligent Enterprise column entitled The Matrix. The 1999 movie of the same name spawned two sequels, but we haven’t devoted a column to […]