When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process. The power of a dimensional model comes from a careful adherence to “the grain.” A clear definition […]

Virtually every fact table has one or more time related dimension foreign keys. Measurements are defined at specific points of time and most measurements are repeated over time. The most common and useful time dimension is the calendar date dimension with the granularity of a single day. This dimension has surprisingly many attributes. Only a few of […]

The three fundamental themes that make up the soul of every data warehouse are drilling down, drilling across, and handling time. In Part One of “The Soul of the Data Warehouse,” I showed that drilling down was nothing more than adding a row header, any row header, to an existing query. Although we often grouse about […]

Although data warehouses come in many shapes and sizes and deal with many different subject areas, every data warehouse must embody a few fundamental themes. The three most important are drilling down, drilling across, and handling time. Modern data warehouses so deeply embed these three themes that I think an “if-and-only-if” relationship has developed between […]

In debugging literally thousands of dimensional designs from my students over the years, I have found that the most frequent design error by far is not declaring the grain of the fact table at the beginning of the design process. If the grain isn’t clearly defined, the whole design rests on quicksand. Discussions about candidate […]

Dimensional modeling is a design discipline that straddles the formal relational model and the engineering realities of text and number data. Compared to entity/relation modeling, it’s less rigorous (allowing the designer more discretion in organizing the tables) but more practical because it accommodates database complexity and improves performance. Contrasted with other modeling disciplines, dimensional modeling […]

There are two powerful ideas at the foundation of most successful data warehouses. First, separate your systems. Second, build stars and cubes. In my previous column, I described a complete spectrum of design constraints and unavoidable realities facing the data warehouse designer. This was such a daunting list that I worried that you would head […]

Over the years, I have found that a matrix depiction of the data warehouse plan is a pretty good planning tool once you have gathered the business requirements and performed a full data audit. This matrix approach has been exceptionally effective for distributed data warehouses without a center. Most of the new Web-oriented, multiple organization […]

The global data warehouse introduces a whole new world of design issues  As soon as the geographic spread of our data warehouse crosses a time zone or a national boundary, a whole host of design issues arise. For the sake of a label, let’s call such a warehouse a global data warehouse, and let’s collect all […]

According to the Webster’s Unabridged Dictionary, a surrogate is an “artificial or synthetic product that is used as a substitute for a natural product.” Thatýs a great definition for the surrogate keys we use in data warehouses. A surrogate key is an artificial or synthetic key that is used as a substitute for a natural […]