Most relational databases support the use of a null value to represent an absence of data. Nulls can confuse both data warehouse developers and users because the database treats nulls differently from blanks or zeros, even though they look like blanks or zeros. This design tip explores the three major areas where we find nulls in our source […]

Dimensional modeling is a design discipline that straddles the formal relational model and the engineering realities of text and number data. Compared to entity/relation modeling, it’s less rigorous (allowing the designer more discretion in organizing the tables) but more practical because it accommodates database complexity and improves performance. Contrasted with other modeling disciplines, dimensional modeling […]

Many of you are already familiar with the data warehouse bus architecture and matrix given their central role in building architected data marts. The corresponding bus matrix identifies the key business processes of an organization, along with their associated dimensions. Business processes (typically corresponding to major source systems) are listed as matrix rows, while dimensions appear as matrix […]

There are two powerful ideas at the foundation of most successful data warehouses. First, separate your systems. Second, build stars and cubes. In my previous column, I described a complete spectrum of design constraints and unavoidable realities facing the data warehouse designer. This was such a daunting list that I worried that you would head […]

Over the years, I have found that a matrix depiction of the data warehouse plan is a pretty good planning tool once you have gathered the business requirements and performed a full data audit. This matrix approach has been exceptionally effective for distributed data warehouses without a center. Most of the new Web-oriented, multiple organization […]

The global data warehouse introduces a whole new world of design issues  As soon as the geographic spread of our data warehouse crosses a time zone or a national boundary, a whole host of design issues arise. For the sake of a label, let’s call such a warehouse a global data warehouse, and let’s collect all […]

According to the Webster’s Unabridged Dictionary, a surrogate is an “artificial or synthetic product that is used as a substitute for a natural product.” Thatýs a great definition for the surrogate keys we use in data warehouses. A surrogate key is an artificial or synthetic key that is used as a substitute for a natural […]

Drawing the Line Between Dimensional Modeling and ER Modeling Techniques Dimensional modeling (DM) is the name of a logical design technique often used for data warehouses. It is different from, and contrasts with, entity-relation modeling (ER). This article points out the many differences between the two techniques and draws a line in the sand. DM […]

The importance of the time dimension in data marts and data warehouses. The time dimension is a unique and powerful dimension in every data mart and enterprise data warehouse. Although one of the tenets of dimensional modeling is that all dimensions are created equal, the truth is that the time dimension is very special and […]

Insurance is an important and growing sector for the data warehousing market. Several factors have come together in the last year or two to make data warehouses for large insurance companies both possible and extremely necessary. Insurance companies generate several complicated transactions that must be analyzed in many different ways. Until recently, it wasn’t practical […]