Intelligent Enterprise : Catastrophic Failure (printable version)

November 12, 2001

Catastrophic Failure

By Ralph Kimball

The tragic events of September 11 have made all of us reexamine our assumptions and priorities. We are forced to question our safety and security in ways that would have seemed unthinkable just weeks before.

We have been used to thinking that our big, important, visible buildings and computers are intrinsically secure, just because they are big, important, and visible. That myth has been shattered. If anything, these kinds of buildings and computers are the most vulnerable.

The devastating assault on our infrastructure has also come at a time when the data warehouse has evolved to a near production-like status in many of our companies. The data warehouse now drives CRM and provides near-realtime status tracking of orders, deliveries, and payments. The data warehouse is often the only place where a view of customer and product profitability can be assembled. The data warehouse has become an indispensable tool for running many of our businesses.

Is it possible to do a better job of protecting our data warehouses? Is there a kind of data warehouse that is intrinsically secure and less vulnerable to catastrophic loss?

I have been thinking about writing on this topic for some time, but suddenly the urgency is crystal clear. The following are some important threats that can result in a sustained catastrophic failure of a data warehouse, and possible practical responses.

CATASTROPHIC FAILURES

Destruction of the facility. A terrorist attack can level a building or damage it seriously through fire or flooding. In these extreme cases, everything on site may be lost, including tape vaults and administrative environments. Painful as it is to discuss, such a loss may include the IT personnel who know passwords and who understand the structure of the data warehouse.

Deliberate sabotage by a determined insider. The events of September 11 showed that the tactics of terrorism include the infiltration of our systems by skilled individuals who gain access to the most sensitive points of control. Once in the position of control, the terrorist can destroy the system, logically and physically.

Cyber warfare. It's not news that hackers can break into systems and wreak havoc. The events of September 11 should remove any remaining naive assumptions that these incursions are harmless, or "constructive" because they expose security flaws in our systems. There are skilled computer users among our enemies, who, today, are actively attempting to access unauthorized information, alter information, and disable our systems. How many times in recent months have we witnessed denial-of-service attacks from software worms that have taken over servers or personal computers? I don't believe for a minute that these are solely the work of script kiddies. I suspect that some of these efforts are practice runs by cyber terrorists.

Single point failures (deliberate or not). A final general category of catastrophic loss comes from undue exposure to single-point failures, whether the failures are deliberately caused or not. If the loss of a single piece of hardware, a single communication line, or a single person brings the data warehouse down for an extended period of time, then there is a problem with the architecture.

COUNTERING CATASTROPHIC FAILURES

Distributed architecture. The single most effective and powerful approach for avoiding catastrophic failure of the data warehouse is a profoundly distributed architecture. The "enterprise data warehouse" must be made up of multiple computers, operating systems, database technologies, analytic applications, communication paths, locations, personnel, and online copies of the data. The physical computers must be located in widely separated locations, ideally in different parts of the country or across the world. Spreading out the physical hardware with many independent nodes greatly reduces the vulnerability of the warehouse to sabotage and single point failures. Implementing the data warehouse simultaneously with diverse operating systems (such as Linux, Unix, and NT) greatly reduces the vulnerability of the warehouse to worms, social engineering attacks, and skilled hackers exploiting specific vulnerabilities.

Although building and administering a profoundly distributed data warehouse sounds difficult, I have been arguing for many years that we all do that anyway! Very few of our enterprise data warehouses are centralized on a single, monolithic machine. Although there are a number of approaches to building distributed decision-support systems, in my books and columns I have described a complete view of a "data warehouse bus architecture" that relies on a framework of conformed dimensions and facts to implement a profoundly distributed system in the sense of this column.

Parallel communication paths. Even a distributed data warehouse implementation can be compromised if it depends on too few communication paths. Fortunately, the Internet is a robust communication network that is highly parallelized and continuously adapts itself to its own changing topography. My impression is that the architects of the Internet are very concerned about systemwide failures due to denial-of-service attacks and other intentional disruptions. Collapse of the overall Internet is probably not the biggest worry. The Internet is locally vulnerable if key switching centers (where high-performance Web servers attach directly to the Internet backbone) are attacked. Each local data warehouse team should have a plan for connecting to the Internet if the local switching center is compromised. Providing redundant multimode access paths such as dedicated lines and satellite links from your building to the Internet further reduces vulnerability.

Extended storage area networks (SANs). A SAN is typically a cluster of high-performance disk drives and backup devices connected together via very high-speed fiber channel technology. Rather than being a file server, this cluster of disk drives exposes a block-level interface to computers accessing the SAN that make the drives appear to be connected to the backplane of each computer.

SANs offer at least three huge benefits to a centralized data warehouse. First, a single physical SAN can be 10 kilometers in extent. This means that disk drives, archive systems, and backup devices can be located in separate buildings on a fairly big campus. Second, backup and copying can be performed disk-to-disk at extraordinary speeds across the SAN. And third, because all the disks on a SAN are a shared resource for attached processors, you can configure multiple application systems to access the data in parallel. This design is especially compelling in a true read-only environment.

Daily backups to removable media taken to secure storage. We've known about this one for years, but now it's time to take all of this more seriously. No matter what other protections we put in place, nothing provides the bedrock security that offline and securely stored physical media provide. But before rushing into buying the latest high-density device, give considerable thought as to how hard it will be to read the data from the storage media one, five, and even 10 years into the future.

Strategically placed packet filtering gateways. We need to isolate the key servers of our data warehouse so that they're not directly accessible from the local area networks used within our buildings. In a typical configuration, an application server composes queries that are passed to a separate database server. If the database server is isolated behind a packet filtering gateway, the database server can receive packets from the outside world only if they come from the trusted application server. Therefore, all other forms of access either are prohibited or must be locally connected to the database server behind the gateway. Consequently, DBAs with system privileges must have their terminals connected to this inner network, so that their administrative actions and passwords typed in the clear can't be detected by packet sniffers on the regular network in the building.

Role-enabled bottleneck authentication and access. Data warehouses can be compromised if there are too many different ways to access them and if security is not centrally controlled. Note that I didn't say centrally located; rather, I said centrally controlled. An appropriate solution would be a lightweight directory access protocol (LDAP) server controlling all outside-the-gateway access to the data warehouse. The LDAP server allows all requesting users to be authenticated in a uniform way, regardless of whether they are inside the building or coming in over the Internet from a remote location. Once the user is authenticated, the directory server associates the user with a named role. The application server then makes the decision on a screen-by-screen basis as to whether the authenticated user's role entitles that user to see the information. As our data warehouses grow to thousands of users and hundreds of distinct roles, the advantages of this bottleneck architecture become significant.

There is much we can do to secure our data warehouses. In the past few years our data warehouses have become too critical to the operations of our organizations to remain as exposed as they have been. We have had the wakeup call.

I have written extensively on the aforementioned topics. I cover the design of distributed architectures and discuss packet filtering gateways and role-enabled security comprehensively in the Data Warehouse Lifecycle Toolkit (Wiley, 1998). I describe the application of SANs to data warehouses in my Intelligent Enterprise column "Adjust Your Thinking for SANs" (March 8, 2001).


Ralph Kimball coinvented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Webhouse Toolkit (Wiley, 2000). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.rkimball.com.