The increasingly popular data visualization tools deliver an environment that business analysts love. They provide the ability to define calculations, and more importantly, to explore and experiment with the data. The products have finally innovated away from the old standbys of tables, bar charts, and pie charts, making it easier for users to draw visual insights from their data. Business users can even create nice dashboard elements with drill-downs and interactivity to share with their colleagues.
I love these newly mainstream tools. If I were a business analyst, I’d relentlessly request a license purchase from my management. I’d promise a huge return on investment, and the odds are pretty good that I’d be able to deliver. From a user’s perspective, one of the most valuable features of these tools is the ability to hook together disparate data, including data from outside the data warehouse or even outside the company. From the Cloud. From Google Analytics. From Excel.
However, if I were an IT person, I would be ambivalent. On the one hand: give the users what they want. They can pull data from multiple sources, combine it as they wish, create their own analyses. Self-service BI (see Design Tip #153) moves from dream to reality. Hurray! On the other hand: anarchy. Each analyst creates her own datasets in the data visualization tool, naming elements as she wishes, burying transformations and calculations within tool-specific “code.” It’s stovepipe data marts or spreadmarts all over again. Inconsistent definitions, inconsistent names, and inconsistent data moved and transformed multiple times the-same-but-different.
Stepping aside from the exciting analyst-geek perspective to the less fun auditing and compliance perspective, the sprinkling of constraints and integration logic deep within individual data visualization tools can turn into a nightmare. What if the analyst claims to demonstrate a $10 million profit opportunity from looking at the marketplace through a half dozen data feeds stitched together with hand crafted integration rules in the data visualization tool? Are these integration rules documented? Where do you look in the tool to see what the analyst did? Are other analysts using exactly the same rules?
The salespeople from these data visualization tool vendors sell a dream to the business user. They tell users they don’t need to wait for IT to build a star schema—you can hook the tool right up to the source system or operational data store and build your in-memory cube structure yourself (or maybe get some cheap consulting assistance for the geeky bits). And in the short to medium term, they’re correct. But they don’t talk about what this environment is going to look like in a year or two, when the user community’s uncoordinated analytic demands are straining the IT infrastructure. Or when the users get tired of maintaining the baroque code underlying their mission critical dashboard and try to pass it off to IT. Not to mention the multiple versions of the truth problem that is the inevitable consequence of such anarchy. Those of us who have been building data warehouses for a long time have seen this algae bloom of “desktop databases” several times, first with 4GL languages in the 1980s, then with the proliferation of data marts in the 1990s, and now again with the amazing power of the new data visualization tools. Let’s be clear: we love the data visualization tools. We just want the business insights from these tools to be understandable, consistent, and trustworthy.
In some cases it may be possible to implement your entire BI solution in a data visualization tool, skipping the costly step of planning, designing, and building a data warehouse first. It is almost certainly not a good idea. A Kimball style data warehouse / business intelligence system is more than just star schemas. The most important component of the Kimball method is the focus on conformance: let’s all get together and agree on the names, definitions, and attribute change management techniques for every data element. Define it once, use a real ETL tool to plan and implement that data movement, and store that transformed data in a durable, manageable format (the data warehouse).
If anarchy isn’t the answer, what does the Kimball Group recommend? First, find a place in your architecture for a data visualization tool. Your business users are going to want them, and if the IT organization doesn’t plan an effective architecture, you will find an ineffective architecture evolving under your feet.
An effective architecture relies on a Kimball data warehouse, and the data management implied and required by conformed dimensions. The business users must agree to the concept of one name for one thing, and make that an integral part of the way they communicate and do business. This is a business problem, not an IT problem, though IT should advocate for and enable effective data management. With any kind of self-service BI, it’s not possible to forbid users to change the name or calculation of an object once it’s in their analysis, so we need to help them understand why they shouldn’t.
With a well designed and implemented Kimball data warehouse in place, business users can hook the data visualization tools directly to the relational data warehouse, or to centrally managed cubes. The tools will be used primarily as the label implies: for analysis and visualization. Some users will need to bring in external data—there’s always something that hasn’t yet been acquired by the central data warehouse—but it should be at the periphery. IT needs to maintain good communication with the users to see the external data they’re using, and evaluate whether it should be acquired centrally. Similarly, it will be appropriate to promote some calculations defined by the users into a centrally managed cube or semantic layer.
If your data warehouse is planned but not yet live, you may need to support interim use against an operational data store or even the source system. It’ll work better in the long run if you’re proactively helping the user community use the visualization tools effectively. By creating a partnership, you’ll gain insight into how they’re using the tools. You may even gain some influence over what they do and how they do it. You should be very clear that anything done directly against a source system or ODS is a candidate for re-factoring against a data warehouse once it’s available.
Finally, there are organizations that don’t have a data warehouse, aren’t planning to build one, aren’t working on data management, or have an antagonistic rather than collaborative relationship between IT and the users. If this describes your organization, self-service BI anarchy may be an inevitable outcome