One of the key components of the business intelligence (BI) architecture is a semantic layer. The semantic layer provides a translation of the underlying database structures into business user oriented terms and constructs. It is usually part and parcel of the query and reporting tool. OLAP or cube databases also include a BI semantic layer.
Some BI layers are microscopically thin, others are rich and robust. The very minimum functionality that would qualify as a semantic layer includes:
- An organizing structure that presents the data elements in a way that’s intuitive to business people. In most tools, you will organize tables and columns in folder structures. The Kimball Method tells you to dimensionally structure your data warehouse, which sets you miles ahead of those who try to deliver BI directly on top of transactional database structures. But even if you’re working atop a clean dimensional model, a semantic layer will provide opportunities to improve navigability and find-ability.
- An opportunity to rename data elements so they make sense to the business users. Of course, the Kimball Method strongly recommends that the data warehouse tables and columns be named as the users would like to see them, but this functionality is sometimes implemented in the semantic layer.
- An interface to hold business-oriented descriptions of data elements. Ideally, you’d store and expose multiple flavors of metadata: business description, example values, and dimension attribute change policy. In a perfect world, the BI semantic layer would expose the full lineage of each data element: which transaction system data element was the source, which ETL jobs touched this attribute, and when was it loaded into the data warehouse. Realistically, few semantic layers support more than a single description for each data element. In my experience, most DW/BI teams don’t even bother to populate that single description.
- A mechanism to define calculations and aggregation rules. For example, you might define in the semantic model that inventory balances are additive across most dimensions, but when aggregating across time, you want to take the period ending balance or divide the sum of the balances by the number of time periods.
Is the semantic layer a mandatory component of a DW/BI architecture? The answer is yes, if you plan to open the doors to your DW/BI system for ad hoc use. Every hour you spend on building a rich semantic layer will pay off in improved adoption and success in the user community. Don’t simply run the wizard for your BI tool and generate a semantic layer that’s equivalent to the underlying tables (even if they’re dimensional). Spend the time to make it as rational and polished as you can.
The downside of investing in a semantic layer is that you can expect to make that investment multiple times. Most organizations find they need several BI tools to meet the diverse needs of their user communities. Each tool has its own semantic layer, which can seldom be copied from one tool to another (even tools sold by the same vendor). One of the many challenges for the BI team is to keep a similar look and organizational structure across the BI tools. Once again, the Kimball Method’s focus on getting the relational dimensional data model right will considerably ease that task by slimming down the BI tools’ semantic layers.
There’s only one scenario where I might buy the argument that you can get away without a semantic layer. If the doors to your DW/BI system are closed to all ad hoc users, and all access is mediated by professional report developers, you can make it work without a BI semantic layer. This is a lukewarm recommendation on several fronts. Most importantly, how can you close the doors to ad hoc users? That’s crazy. Additionally, developers are people too and even if they can write SQL by hand and look up definitions in an external data dictionary, why torture them?
If you buy the argument that you need a semantic layer, your next question may be whether you need a dimensional data warehouse. Some observers, especially some BI tool vendors, argue that you can skip the relational data warehouse and deliver the dimensional experience virtually. This sounds appealing – no one really wants to build an ETL system – but it’s a chimera. No semantic layer tool provides the transformation and integration functionality of an ETL tool. Most BI tools are great at what they do; don’t break them by attempting to do ETL in the semantic layer. And don’t forget that the ETL back room adds value to the data by cleaning, standardizing, conforming, and de-duplicating, all steps that a BI tool cannot do.
That said, I have worked with clients who have succeeded in hooking up a BI tool directly to the normalized (transactional or ODS) data model. The most common scenario is a prototype: let’s show people what a Business Objects universe or Tableau interface might look like to generate excitement and financial support for an enterprise data warehouse. Another scenario is to meet operational reporting requirements by building a semantic layer atop the transaction system. But in most cases, this operational semantic layer is a relatively minor component of an enterprise analytic environment that includes a real data warehouse. I’m immediately skeptical of any organization that claims its transaction systems are sufficiently clean, and analytic needs simple enough, that they don’t need to instantiate their dimensional models. I’m not opposed to the idea in theory, but in practice I still don’t see it working.
I don’t think I’ve ever seen an organization over-invest in a semantic layer, but I’ve seen lots of data warehouses fail because of under-investment. Buy a decent BI tool (there are dozens), and spend time developing the semantic layer. Otherwise you’re selling short your very substantial investment in design and development of a technically solid data warehouse.