Simplifying Business Intelligence

by Steven Cox
  • Home
  • About
  • Contact

Data Warehouse Design Overview

Steven Cox

So after reading the last post, Data Warehouses vs Data Marts, let’s get one thing out of the way. For the purpose of this blog, we’re going to call our data storage layer a Data Warehouse, while following our design philosophy of solving for the customer — not the definition of what we’re calling it (See the Data WareMart section of the Data Warehouses vs Data Marts for details).

Now, onto our high level Data Warehouse Design overview. There are many different approaches to this topic, almost to the point of passionate debate of “Great Taste…Less Filling!” As with anything, your needs will depend on the specifics of the project. The design we’re talking about below has a number of advantages for the majority of projects.

Simplifying Business Intelligence Data Warehouse Architecture

Before we discuss the benefits of this design pattern, let’s understand the above diagram a bit.

  • DB/XML/EDI/Flat File: Mix and match the various data sources that are required to meet your customer’s requirements.
  • Level 1: Data source organized drop zone for data coming from the various data sources.
  • Level 2: Subject area organized data store used as the launch point for downstream systems (which includes our Data Warehouse and more).
  • DM/Cubes: DM = Dimensional Model. Dimensional models are a special database design which is tuned for reporting and analytical systems. Cubes are precalculated views of the dimensional model.
  • User Interface: The access point to the Data Warehouse for our customers.

Layers of Isolation

A close friend shared the following quote with me a few years ago:

The only thing constant in life is change. -François de la Rochefoucauld

I cannot stress this point enough. The business climate today is under ever increasing pressure by a pace of life that is moving faster than ever before. The result is continuous churn in products, services, and processes for a competitive edge. For the lucky teams building systems with tight deadlines and budgets, our designs must be flexible and adaptable to whatever is coming next. For traditional data warehouse designs, change can be a real problem. Using Layers of Isolation can ease the pain.Changes Road Sign

Layers of Isolation is accomplished through the organization and purpose of the data at the various stages of the Data Warehouse. Going back to the Level 1 definition, we know the data is source organized. This allows us to deal with new data sources, logic changes, or new functionality from our source systems without disturbing the rest of the downstream Data Warehouse. Once Level 1 changes are implemented, Level 2 and Dimensional Model changes can be subsequently implemented. The development effort can be executed in parallel for speed or sequentially to minimize system downtime.

Organized, Simplified, Standardized (OSS) Data Store

Level 2 is our organized, simplified, standardized data store. Traditionally Data Warehouses serve as a tool for reporting, but our approach is to also provide a data store for the company. In larger companies it is not uncommon to have multiple systems that do the same or similar functions, with no cross communication. The business does not care about disconnected IT systems, just a unified view of data. There is also more than just multiple systems. Maybe your business sells through a Direct and Retail channel and the business logic in the order and shipment systems is radically different between the two. Our goal is to standardize so that regardless of business logic, data source, terminology, language, currency, time zone, etc. we have a unified view of the company’s data. Think about the benefits of having a data store for all of our business critical information in one place. There will be a number of projects beyond our data warehouse that will benefit tremendously. Disorganized FoldersClarity is important, so let’s define Organized, Simplified, Standardized:

  • Organized: Bring together similar data into one place that is easily identifiable and joinable. Joinable is important if we want the ability to match Orders with Shipments for example.
  • Simplified: The source systems may use complicated or arbitrary column names. Simplified uses an easily identifiable meaning from the name. If Attribute14 is actually Delivery Date, well then, call it Delivery Date. Another simplification that happens is denormalization. Most transactional systems normalize data. It could be that order information is spread across 10 tables in the source systems. Our data store simplifies this by appropriately consolidating data together into fewer tables (optimally 1 or 2 in our Orders example).
  • Standardized: Different business processes and source systems may not use the same terminology. For example, one system might say Promised Delivery Date another may use Estimated Delivery Date. From a business process perspective, what both actually mean is the date we anticipate our customer receiving their order. We might just call this field Expected Delivery Date in Level 2 to standardize the meaning. There may also be cases when business logic will be applied to data. Level 2 is the place to do it once so data is standardized.

As a final benefit of OSS (Organized, Simplified, Standardized), we can do thorough quality assurance on a large chunk of the business logic. This helps us deliver consistent and tested data to downstream systems.

Development Speed

The layers of isolation naturally lead to a faster development cycle. Once the initial design of each layer is completed, developers can work in parallel to complete each section. In a traditional Data Warehouse design, there is typically only one level before the dimensional model, known as staging. In our design, you could say we have two staging environments. This is true and yes, there is additional development work required. We touched briefly on the Level 2 OSS benefits. If another project needs Orders, Level 2 is the place to go. No need to understand the 10 different orders systems in the company. This advantage significantly boosts development speed for downstream applications and more importantly, new Dimensional Models in our Data Warehouse.

Example Time

You have a company that sells widgets through retailers, along with phone, and online orders. Unfortunately, the IT systems used for each of these sales channels do not communicate with each other. A number of problems are created from this. There is no single view of customers, orders, shipments, revenue, marketing effectiveness and more. What is worse, the company is massively expanding internationally and setting up complete new IT systems in each country. The CEO wants a dashboard to understand the health of the business. In our Level 1 layer, we might have tables such as:

Can Phone

  • canada_phone_order_header
  • canada_phone_order_line
  • canada_online_orders
  • canada_online_customers
  • canada_retail_order_header
  • canada_retail_order_line
  • canada_customer
  • mexico_online_orders
  • mexico_retail_order_header
  • mexico_retail_order_line
  • mexico_customers
  • australia_phone_txn_orders
  • australia_customer

The above represents the various IT systems and their deployments to different geographies. You’ll notice the inconsistencies across systems and even within countries. Once in Level 2, we’ll have:

  • Orders
  • Customers

Look good? Within Level 2 we have merged all of the order and customer information into OSS Orders and Customers tables. Next time a new country is brought online, we just add the appropriate Level 1 tables and logic to translate to our OSS Level 2 Tables. All of that without any changes required from Level 2 to our Dimensional Model.

While this is a highly oversimplified example, hopefully it wets your appetite for the deeper discussions on the design layers. At this point, you probably have more questions than answers. Over the next few posts we’ll cover the various layers in detail.


Do you see the value of the design approach? What would you do differently? Please join in and share your perspectives!

Tune in next time for: Level 1 Design Details

Comments
No Comments »
Categories
Architecture, Data Warehouse
Tags
Architecture, Data Warehouse, technical
Comments rss Comments rss
Trackback Trackback

Data Warehouses vs Data Marts

Steven Cox

Within Business Intelligence, you may be familiar with the terms Data Warehouse and Data Mart. This is an often confused and debated topic. Let’s talk about the differences and when to use each. At the root, both are basically tables stored in a database. Where the differences begin are what those tables contain and how they are organized.Warehouse

Define: Data Warehouse

Definition of purpose:

A data warehouse contains generalized data encompassing multiple subject areas with deep granularity designed to meet the needs of a general audience.

That is a lot to take in, so let’s break it down.

Generalized Data

Generalized data means a broad selection of data. In many cases, you may only have 80% of the data required, leaving behind highly specialized data. Specialized data is where the data marts come to play, which we’ll talk about later. Generalized data is generic — a minimal set of business logic has been applied. We want to keep this data as flexible as possible for the general audience.

Multiple Subject Areas

Warehouses contain data from multiple systems or subject areas. For example, if your company sells widgets, your warehouse may contain data around sales, purchase orders, product details, shipments, and inventory. All of these subject areas can provide specific data for the business as a whole.

Deep Granularity

Granularity defines the level of details available. For example: every sales transaction is one granularity, while sales by day by product is another. In the latter example, all individual sales transactions are aggregated to a daily level, therefore the detailed transactions are not available. When we say “Deep Granularity”, deep refers to the lowest level of granularity across the board. In other words, every single sales transaction, all shipments, all waybill numbers, etc.

Designed to meet the needs of a general audience

So what is the general audience? In most cases this means the entire company, division, business unit, etc. While designing a Data Warehouse, there are a lot of design decisions to be made. To support a general audience, you will minimize the amount of business logic, aggregation, mapping, etc. Let’s use a sales example to illustrate. For a general audience, we may just provide basic sales data, quantity sold, sale price, customer, and product. This type of data is robust enough to help the general business understand sales. But what about finance? They may want to know far more details around freight costs and COGS. For this scenario, finance might need their own data mart to assist with their analysis.

Define: Data Mart

So if a Data Warehouse is general purpose, then where do specifics reside? The Data Mart of course! Definition of purpose:

A data mart is comprised of specialized data for one subject area with appropriate granularity designed to meet the needs of the intended audience.

Specialized Data for one subject area

Specialized data is the specific data required for the audience. If we are building a sales data mart, there would not be salary data from HR systems. A data mart’s focus is to solve a data analysis problem for one subject area.

Appropriate Granularity

So a data warehouse has “deep” granularity, but a data mart has “appropriate?” Confusing, yes. As with everything, there is no defined rule, but a case by case basis to follow. When talking to end users, granularity is a key area to understand the requirements around. It is possible end users will want all details down to the number of nuts and bolts on widgets, or just how many widgets fit into a box, or how many boxes fit on a pallet.

Designed to meet the needs of the intended audience

The intended audience is who will use this data mart, whether it be finance, the CEO, or report monkeys in the basement. A data mart’s purpose is to meet the requirements of a specific audience that the generalized data warehouse cannot solve. Let’s pick on our friends in finance again. It could be that finance has really wacky requirements, complex business logic, and custom views of the data. We have to understand what they’re solving for in order to effectively design the data mart.

Data WareMart? YES!Happy Customer

I’ve heard variations on data warehouse/mart, which is just a cute way of calling a pig a pig… data barns, data sheds, data box, data universes, the list goes on. This is what happens when architects get fancy. From an internal marketing perspective, if you have to call the technical side of your Business Intelligence solution something, use Data Warehouse. It is widely known and won’t require explanation.

Okay, so I’ve led you on through this entire post to this point. Time for the “Simplified” point of view: forget what you call it or what it is “supposed” to be. The best solutions built are designed with one and only focus: the customer. If you need specialized data combined with deep granularity, then do it. In order for Business Intelligence to impact the business, it must meet the needs of decision makers while being easy to use. The technical design of the database directly impacts both significantly.


Are you lost or fundamentally disagree? Share your opinions in the comments.

Tune in next week for: Data Warehouse Design Overview

Comments
No Comments »
Categories
Overview
Tags
Data Mart, Data Warehouse
Comments rss Comments rss
Trackback Trackback

Categories

  • Architecture
  • Data Warehouse
  • Overview

Recent Posts

  • Data Warehouse Design: Level 1
  • Data Warehouse Design Overview
  • Data Warehouses vs Data Marts
  • What is Business Intelligence?

Archives

  • September 2008
  • July 2008
  • June 2008

Book Recommendations

  • The Data Warehouse Toolkit
  • Web Analytics: An Hour a Day

Links

  • Ask Tom
  • BI Questions
  • Occam’s Razor by Avinash Kaushik
  • The Business Intelligence Blog
  • The Data Mining Blog
rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox