ETL Architecture Fundamentals (Download PDF version)

Why Attend

This class provides an overview of all the factors necessary for effectively designing the back room ETL system of your DW/BI environment. It tries to guarantee that critical processes within the ETL system are not overlooked. Even if you don’t have an immediate qualified need for every ETL subsystem on our list, it is likely that you will over time. By the end of this course, you will understand how your data warehouse ETL system can be built to anticipate these potential requirements.

This is not a microscopic code-oriented implementation class; it is a vendor-agnostic architecture class for the designer who must keep a broad perspective. The course is organized around the 34 necessary ETL subsystems which are discussed in detail throughout the course.

Bringing Kimball University onsite enables everyone on the team to get on the same page with a common vocabulary and understanding of core techniques; the result is more efficient education with lower travels cost and lost productivity, plus less downstream “tire spinning” within the team.

Who Should Attend

This course is designed for those responsible for building the back room ETL system of a data warehouse environment, including ETL architects, ETL designers and developers, and data warehouse operational staff.

Since dimensional models are the ultimate ETL deliverables, some familiarity with the basic principles of dimensional modeling is necessary. Students can gain this knowledge by reading the following articles:

Instructor

Bob Becker, instructor of Kimball University’s public ETL Architecture in Depth course

Course Details – Day 1

Surrounding the Requirements

  • Business needs
  • Data profiling
  • ETL tool versus coding

Introduction to 34 ETL Subsystems

  • Subsystem 1 – Data profiling
  • Subsystem 2 – Change data capture
  • Subsystem 3 – Extract
    • Extract window
    • Immediate transformations
    • Extract staging table designs
    • Retention
  • Subsystems 4, 5 and 6 – Cleaning
    • Data quality architecture
    • Data quality screens
    • Error event fact table
    • Audit dimension
    • Compliance tracking
  • Subsystem 28 – Sorting
  • Subsystem 7 – De-duplication and survivorship
  • Subsystem 8 – Conforming
    • Conformed dimensions
    • Mapping incompatible structures
  • Subsystems 9, 10, 11, 12 and 15 – Delivering dimension tables
    • Time variance designs
    • Surrogate key generator
    • Bridge tables
    • Hierarchies
    • Special dimensions

Course Details – Day 2

Continuation of the 34 ETL Subsystems

  • Subsystems 13, 14 and 16 – Delivering fact tables
    • Fact table builders for transaction, periodic, and accumulating grains
    • Surrogate key pipeline
    • Graceful extensibility
    • Late arriving data
  • Subsystems 17 and 18 – Dimension manager and fact provider
    • Responsibilities and procedures
    • Real time complexities
    • Distributed, federated warehouses
    • Delivering remote dimensions, attributes, and facts
  • Subsystems 19 and 20 – Delivering aggregates
    • Aggregations
    • Aggregate navigation
    • OLAP cubes
  • Subsystem 21 – Data propagation manager
    • Feeding data mining
    • Presentation layer extracts
    • 3rd party flat files
  • Subsystem 22 – Job scheduler
  • Subsystems 23 and 24 – Backup and recovery
  • Subsystems 25 and 26 – Version control
    • Version control
    • Migration and testing
  • Subsystems 27, 29 and 30 – Managing ETL workflow
    • Workflow monitor
    • Job scheduler
    • Lineage and dependency tracking
    • Problem escalation system
  • Subsystems 31, 32, 33 and 34 – Development and operations
    • Parallel processing and pipelining
    • Security
    • Compliance
    • Metadata

Modifying the ETL Architecture for Real Time Data Warehousing

  • Hot partition
  • Streaming versus batch ETL