The global data warehouse introduces a whole new world of design issues
As soon as the geographic spread of our data warehouse crosses a time zone or a national boundary, a whole host of design issues arise. For the sake of a label, let’s call such a warehouse a global data warehouse, and let’s collect all these design issues in one place. From a designer’s perspective, once the code is open for change, we might as well consider all the design changes for the global data warehouse at once.
Synchronizing Multiple Time Zones
Many businesses measure the exact time of their basic transactions. The most common measured transactions include retail transactions at conventional stores, telephone inquiries at service desks, and financial transactions at bank teller machines. When a business spans multiple time zones, it is left with an interesting conflict. Does it record the times of these transactions relative to an absolute point in time, or does it record the times relative to local midnight in each time zone? Both of these perspectives are valid. The absolute time perspective lets us see the true simultaneous nature of the transactions across our entire business, whereas the local time perspective lets us accurately understand the transaction flow relative to the time of day. In the United States, “everyone” gets off work at 5 p.m., watches the news at 6, and eats dinner at 6:30.
It’s tempting to store each underlying transaction with an absolute timestamp and leave it up to the application to sort out issues of local times. Somehow, this seems to be a conservative and safe thing to do, but I don’t support this design. The database architect has left the downstream application designer with a complicated mess. Doing a coordinated local-time-of-day analysis across multiple time zones is nightmarish if all you have is a single absolute timestamp. Transaction times near midnight will fall on different days. Some states, such as Indiana and Arizona, do not observe daylight savings time. Reversing the design decision and storing the transaction times as relative to local midnight just recasts the same application problem in a different form. What we need instead is a more powerful design.
Figure 1: Timestamp design for businesses with multiple time zones.
The timestamp is recorded simultaneously in both absolute and relative formats. Additionally, I recommend separating the calendar day portions of the timestamps from the time-of-day portions of the timestamps. We end up with four fields in a typical transaction fact table. The two calendar-day fields should be surrogate keys pointing to two instances of a calendar-day dimension table. These key entries in the fact table should not be actual SQL date stamps. Rather, these keys should be simple integers that point to the calendar date dimension table. Using surrogate (integer) keys for the actual join lets us deal gracefully with corrupted, unknown, or hasn’t-happened-yet dates. We split the time of day from the calendar date because we don’t want to build a dimension table with an entry for every minute over the lifetime of our business. Instead, our calendar day dimension table merely has an entry for every day. In any case, we don’t have unique textual descriptors for each individual minute, whereas we do have a rich array of unique textual descriptors for each individual day.
The two time-of-day fields are probably not keys that join to dimension tables. Rather, they are simply numerical facts in the fact table. To constrain such time-of-day facts, we apply BETWEEN constraints to these fields. If we do a lot of these kinds of constraints, it will be helpful to build an index on each of these of these time-of-day fields.
Although this double-barreled design uses a bit more storage space (three extra fields) in the fact table, the application designers will be delighted. Both absolute and relative time analyses will “fall out” of the database, regardless of how many time zones your business spans.
Multiple National Calendars
A multinational business spanning many countries can’t easily keep track of an open-ended number of holidays and seasons across many different countries. As happens so often in database design, there are two different perspectives that we need to address. We need the calendar from the perspective from a single country (is today a holiday in Singapore?) as well as across collections of countries all at once. (Is today a holiday anywhere in Europe?)
Figure 2: Design for an open-ended number of calendars.
The primary calendar dimension contains generic entries independent of any particular country. These entries include weekday names, month names, and other useful navigational fields such as day, week, and month numbers. If your business spans major basic calendar types such as Gregorian, Islamic, and Chinese calendars, then it would make sense to include all three sets of major labels for days, months, and years in this single table.
The calendar dimension I just described provides the basic framework for all calendars, but each country has a small number of unique calendar variations. I like to handle this with a supplementary calendar dimension whose key is the combination of the calendar key from the main calendar dimension together with the country name. Figure 2 also shows this supplementary table. You can join this table to the main calendar dimension or directly to the fact table. If you provide an interface that requires the user to specify the country, then the attributes of the supplementary table can be viewed logically as being appended to the main calendar table, which lets you view you calendar through the eyes of any single country at a time.
You can use the supplementary calendar table to constrain groups of countries. The grouping can be geographic or by any other affiliation you choose for a country (such as Supplier Business Partners). If you choose a group of countries, you can use the EXISTS clause of SQL to determine if any of the countries has a holiday on a particular date.
Collecting Revenue in Multiple Currencies
Multinational businesses often book transactions, collect revenues, and pay expenses in many different currencies.
Figure 3: Multiple currency design.
The primary amount of the transaction is represented in the local currency. In some sense, this is always the “correct” value of the transaction. For easy reporting purposes, a second field in the transaction fact record expresses the same amount in a single global currency, such as United States dollars. The equivalency between the two amounts is a basic design decision for the fact table, and is probably an agreed upon daily spot rate for the conversion of the local currency into the global currency. Now a business can easily add up all transactions in a single currency from the fact table by constraining in the country dimension to a single currency type. It can easily add up transactions from around the world by summing the global currency field.
But what happens if we want to express the value of a set of transactions in a third currency? For this, we need a currency exchange table, also shown in Figure 3. The currency exchange table typically contains the daily exchange rates both to and from each the local currencies and one or more global currencies. Thus, if there are 100 local currencies and three global currencies, we would need 600 exchange rate records each day. It is probably not practical to build a currency exchange table between each possible pair of currencies because for 100 currencies, there would be 10,000 daily exchange rates. It is not likely, in my opinion, that a meaningful market for every possible pair of exchange rates actually exists.
As most of you know, many of the European nations (known as the European Union, or EU) are standardizing on a single European currency known as the euro. The euro is significant from a data warehouse point of view; don’t look at it as just another currency. The euro brings with it some specific financial reporting and data warehousing requirements. The most significant requirements are the three currency conversion requirement and the six decimals of precision requirement.
For all currency conversion calculations performed between EU countries, a currency must first be converted into the euro, and then the euro value converted into the second currency. Every currency conversion among EU countries must take this two-step process; you can’t convert directly between currencies. These conversions in the data warehouse, of course, can be implemented from the design of the previous section, where the global currency is assumed to be the euro.
The second mandate is that you must perform all currency conversion calculations with six decimals of precision to the right of the decimal point. The purpose of this requirement is to place a maximum bound on the rounding error of currency conversion calculations. The big issue here is not the exchange factor, but rather the precision of any numeric field that stores a currency valued amount. If any such field truncates or rounds to less than six decimals of precision to the right of the decimal point for any EU currency, then this field cannot be used as a source field for a currency conversion to euros. (Ouch!) You have to make sure that your databases and spreadsheets don’t perform this rounding or truncation implicitly if they have native support for European currencies.