Warren Thornthwaite and Joy Mundy co-authored this guide to building a successful business intelligence system and its underlying data warehouse databases using Microsoft SQL Server 2005.
They provide invaluable advice about designing, developing, deploying, and operating your Kimball Method data warehouse system on the Microsoft BI platform.
Note: You can also find an updated edition of this book based on the SQL Server 2008 R2 version here.
Tools and Utilities
Chapter 1—Defining Business Requirements
Additional Interview Summaries—These are abbreviated summaries that will give you an idea of the kinds of analytic themes you might hear from different parts of the organization. (Updated 17 Oct 2006)
Adventure Works analytic themes list—This list includes all the themes identified in the interview summaries listed above and in the book. Try and create your own list first, then look at this one to see how you did. (Updated 17 Oct 2006)
Chapter 2—Designing the Business Process Dimensional Model
Dimensional modeling spreadsheet—This Excel spreadsheet (zipped) contains the annotated data model for the MDWT_AdventureWorksDW database used throughout the book. The zip file now includes an updated MDWT AW2008 model based on SQL Server 2008 and a blank version of the spreadsheet so you can use it to create your own logical and physical data models. The spreadsheet contains a macro for generating a basic database CREATE script that you can use to instantiate your model in SQL Server. (Updated 6 March 2009)
Data profiling reports—This Reporting Services project is a simple example of a data profiling tool. Version 2 is a stand-alone report set and includes a few bug fixes. We are keeping the original version available, but requires the installation of two stored procedures in the target database. (Updated 25 April 2008)
Naming conventions—Many universities make their internal information available on the internet. You can search for current examples like this one from Indiana University. Stanford University was kind enough to give us permission to make a draft set of their data model and DBA naming standards available as well. (Added 18 Apr 2006)
Chapter 4—Setup and Physical Design
Table partitioning script—This SQL script illustrates how to partition a table. It also shows how to create a pseudo-partition table, populate it with data (presumably using fast load techniques), and swap it into the partitioned table.
Dimensional modeling spreadsheet—This is the same Excel spreadsheet we described earlier, in Chapter 2. (Updated 6 March 2009)
Database create script—This SQL script is exactly the script generated by the Excel spreadsheet. If for some reason you can’t run the Excel macro, here’s what it generates. (Added 08 Jun 2006)
Chapter 5—Designing the ETL System
High level ETL system design diagram—This Visio diagram illustrates the ETL process for the MDWT_AdventureWorksDW database. We’ve also copied the content as .gif files, in case you don’t have Visio installed.
The Dimension Master package that we described in Chapter 5 is available under the Chapter 6 content described below. It’s part of the Integration Services solution for historical loading.
Chapter 6—Developing the ETL System
Integration Services solution for historical loading of the MDWT_AdventureWorksDW database. For best results, unzip to the c:\MDWT_Projects folder. You can find a ReadMe document here. (Updated 09 Jun 2006)
Sample package for custom SCD handling. This package computes checksums on the SCD1 and SCD2 attributes, and uses those checksums in a Script transform to perform the SCD handling. Please install the Checksum SSIS transform, available from www.sqlis.com, before you look at this package.
Breakpoint in ScriptTask package shows the Script Task. We show a breakpoint inside the script, and also how to use and reference package variables inside the script. The script itself is very simple.
Chapter 9— Building the BI App in Reporting Services
Instructions for creating an SSIS to RS Link. On pages 415-416 we describe a way to invoke Reporting Services from and Integration Services package. The idea is to have the ETL package automatically start a set of reports as soon as the ETL job is finished. This zip file has more detailed instructions and an example SSIS/RS solution. It is not for the faint at heart. You need to be competent with SSIS, RS, and the relational database to make this work.
Chapter 13—Metadata Plan
Metadata database create and load scripts—These scripts create a metadata database and populate it based on the extended property values generated from the modeling spreadsheet described in Chapter 2. We’ve also included the modeling spreadsheet that generated the database create script in case you want to modify the metadata data model. (Updated 16 Aug 2006)
Reporting package—This is a set of Reporting Services reports that run against the metadata tables described in Chapter 13. You can create these metadata tables using the scripts in the previous bullet. (Updated 16 Aug 2006)
Examples not described in the book
Integration Services solution for historical loading of the AW_Orders database. This solution was described in talks that Joy gave at TechEd2006 and on a webcast on 7/11/2006. For best results, unzip to the c:\MDWT_Projects folder. You can find a ReadMe document here. You can view the webcast at Microsoft’s TechNet. (Updated 15 Jul 2006)
Data Mining solution to create a simple cross-sell product recommendation model. This solution was described in webcast that Warren gave on 7/17/2006. You can ReadMe document here. You can view the webcast at Microsoft’s TechNet. (Updated 17 Jul 2006)
Date Dimension Resources
Date table auto-gen—This Word 2003 document is a set of instructions that guide you through using Analysis Services to auto-generate a fully populated relational Date dimension. OK, “auto” might be a bit generous, but it is easier than creating the table from scratch. (Added 05 Sep 2007)
This zipped spreadsheet from the original Data Warehouse Lifecycle Toolkit book generates a set of SQL insert statements: Date_Dimension Spreadsheet. (Added 26Nov2007)
Intelligent reader, Don McMunn, maintains a data dimension kit.
Slowly Changing Dimension Management with SQL 2008
There are two scripts in this zip file. The first creates a set of tables and populates them with sample data. The second demonstrates how to use the new MERGE command in SQL Server 2008 to manage Type 1 and Type 2 changes in a dimension table. These are explained in more detail in Kimball University Design Tip #107. (Added 13 Nov 2008)
In spite of our best efforts, there are a few errors in the book. Here are the major ones we’ve spotted so far, along with the corrected text and/or figures. (Many thanks to the folks who have brought some of these to our attention—we appreciate your careful reading!)
Page: 60 Date Added: 6/10/06
Figure 2.5 has two problems. First, the DateKey in the SalesRep table should be switched with the SalesRepKey in the Orders table. Second, the arrows should be reversed. I spent too much time trying to get the arrows to line up with the columns (which Visio doesn’t let you do if you actually have the foreign key relationship declared), and didn’t pay attention to the actual relationships.
Click to display a corrected version of Figure 2.5. Note that you may need to create an additional table to hold unique values of SalesRepGroupKeys to join between the fact table and the SalesRepGroup table. This will correctly support the one-to-many foreign key relationship from the fact table.
Page: 61 Date Added: 6/10/06
Figure 2.6 has similar problems. Click here for a corrected version of Figure 2.6. Note that in addition to the arrows and foreign key indicators, we removed CustomerKey from the Orders fact table. Obviously, the whole reason for the bridge table is because you can’t have a one-to-many relationship between Orders and Customer. You could, however, include a PrimaryCustomerKey in the fact table, if you knew which one that was, and if a fair amount of Orders analysis was based on primary customer attributes.
Page: 139 Date Added: 8/18/06
The section on RAID should emphasize a configuration known as RAID 1+0 (or RAID 10) rather than RAID 0+1. RAID 1+0 is striping on top of mirrors and RAID 0+1 is a mirror on top of stripes. While there is little difference in terms of performance or cost, RAID 1+0 is more fault tolerant. For a good description of the difference, see:
Page: 160 Date Added: 6/14/06
The code sample in the Insert an Unknown Member Row should read:
SET IDENTITY_INSERT Dim_MyDim ON
INSERT Dim_MyDim (MyDim_Key, MyDim_TK, Attribute1, Attribute2)
VALUES (-1, NULL, ‘Unknown’, ‘Unknown’)
SET IDENTITY_INSERT Dim_MyDim OFF
We may have to get the syntax right, but we don’t have to agree with it.
171 6/10/06 The table name in the ALTER TABLE command is slightly off. It should read:
ALTER TABLE PseudoPartition_200405 SWITCH TO PartitionTable Partition 6
The partition script example you can download from the Tools and Utilities page is correct.
Page: 233 Date Added: 10/17/06
In the first paragraph, we suggest you set the AuditKey to 0 for now, and that we’ll come back to fix it later, but we never do. As a result, your Promotions package will fail when you try and run it on page 236 because there is no row with a key of 0 in the DimAudit table.
Try setting the AuditKey to -1 which will map to the default row in the DimAudit table that is added as part of the script from the Adventure Works dimensional modeling spreadsheet.
Page: 305 Date Added: 11/15/07
In the third bullet point, we talk about reference dimensions. That paragraph should read:
Reference dimension: A kind of Kimball Method outrigger table. In the relational dimensional model, you may have a piece of a dimension that’s referenced by two or more other dimensions. For example Customer, Vendor, and Shipping Destination could all refer to a set of geographic information. Sometimes you will design an outrigger Geography table that the other dimensions reference. In Analysis Services this design is called a Reference Dimension. In practice we’ve found it generally works best to avoid this Analysis Services feature and instead to fold the shared information into each primary dimension. You can do this by way of relational views or within the Data Source View layer.
Page: 322 Date Added: 4/18/2007
In the sidebar on cube terminology, we say “You’re allowed to create multiple cubes within a single database, but the best practice is to create a database with a single cube.” While this was the design intent when we wrote the book, it has turned out to not be a best practice. Here is a link to a Microsoft paper on this topic, and an excerpt that matches our current recommendation:
“Having many measure groups in a cube can adversely affect the query performance of the cube, even for queries that only use a single measure group. If your DSV contains many potential fact tables and they will not usually be used together, consider creating multiple smaller cubes containing topic areas that are used together. Then, you can enable occasional cross-topic querying by creating linked measure groups that join together all measure groups in a special cross-topic cube.”
This paper is titled OLAP Design Best Practices for Analysis Services 2005—definitely worth reading:
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/ olapdbpAnalysis Services2005.mspx