Simplifying Business Intelligence

by Steven Cox
  • Home
  • About
  • Contact

Data Warehouses vs Data Marts

Steven Cox

Within Business Intelligence, you may be familiar with the terms Data Warehouse and Data Mart. This is an often confused and debated topic. Let’s talk about the differences and when to use each. At the root, both are basically tables stored in a database. Where the differences begin are what those tables contain and how they are organized.Warehouse

Define: Data Warehouse

Definition of purpose:

A data warehouse contains generalized data encompassing multiple subject areas with deep granularity designed to meet the needs of a general audience.

That is a lot to take in, so let’s break it down.

Generalized Data

Generalized data means a broad selection of data. In many cases, you may only have 80% of the data required, leaving behind highly specialized data. Specialized data is where the data marts come to play, which we’ll talk about later. Generalized data is generic — a minimal set of business logic has been applied. We want to keep this data as flexible as possible for the general audience.

Multiple Subject Areas

Warehouses contain data from multiple systems or subject areas. For example, if your company sells widgets, your warehouse may contain data around sales, purchase orders, product details, shipments, and inventory. All of these subject areas can provide specific data for the business as a whole.

Deep Granularity

Granularity defines the level of details available. For example: every sales transaction is one granularity, while sales by day by product is another. In the latter example, all individual sales transactions are aggregated to a daily level, therefore the detailed transactions are not available. When we say “Deep Granularity”, deep refers to the lowest level of granularity across the board. In other words, every single sales transaction, all shipments, all waybill numbers, etc.

Designed to meet the needs of a general audience

So what is the general audience? In most cases this means the entire company, division, business unit, etc. While designing a Data Warehouse, there are a lot of design decisions to be made. To support a general audience, you will minimize the amount of business logic, aggregation, mapping, etc. Let’s use a sales example to illustrate. For a general audience, we may just provide basic sales data, quantity sold, sale price, customer, and product. This type of data is robust enough to help the general business understand sales. But what about finance? They may want to know far more details around freight costs and COGS. For this scenario, finance might need their own data mart to assist with their analysis.

Define: Data Mart

So if a Data Warehouse is general purpose, then where do specifics reside? The Data Mart of course! Definition of purpose:

A data mart is comprised of specialized data for one subject area with appropriate granularity designed to meet the needs of the intended audience.

Specialized Data for one subject area

Specialized data is the specific data required for the audience. If we are building a sales data mart, there would not be salary data from HR systems. A data mart’s focus is to solve a data analysis problem for one subject area.

Appropriate Granularity

So a data warehouse has “deep” granularity, but a data mart has “appropriate?” Confusing, yes. As with everything, there is no defined rule, but a case by case basis to follow. When talking to end users, granularity is a key area to understand the requirements around. It is possible end users will want all details down to the number of nuts and bolts on widgets, or just how many widgets fit into a box, or how many boxes fit on a pallet.

Designed to meet the needs of the intended audience

The intended audience is who will use this data mart, whether it be finance, the CEO, or report monkeys in the basement. A data mart’s purpose is to meet the requirements of a specific audience that the generalized data warehouse cannot solve. Let’s pick on our friends in finance again. It could be that finance has really wacky requirements, complex business logic, and custom views of the data. We have to understand what they’re solving for in order to effectively design the data mart.

Data WareMart? YES!Happy Customer

I’ve heard variations on data warehouse/mart, which is just a cute way of calling a pig a pig… data barns, data sheds, data box, data universes, the list goes on. This is what happens when architects get fancy. From an internal marketing perspective, if you have to call the technical side of your Business Intelligence solution something, use Data Warehouse. It is widely known and won’t require explanation.

Okay, so I’ve led you on through this entire post to this point. Time for the “Simplified” point of view: forget what you call it or what it is “supposed” to be. The best solutions built are designed with one and only focus: the customer. If you need specialized data combined with deep granularity, then do it. In order for Business Intelligence to impact the business, it must meet the needs of decision makers while being easy to use. The technical design of the database directly impacts both significantly.


Are you lost or fundamentally disagree? Share your opinions in the comments.

Tune in next week for: Data Warehouse Design Overview

Comments
No Comments »
Categories
Overview
Tags
Data Mart, Data Warehouse
Comments rss Comments rss
Trackback Trackback

What is Business Intelligence?

Steven Cox

Business Intelligence (BI) is a hot topic. But what does it really mean? This maiden post is the origin on our journey together to find answers. To start off right, let’s begin with my definition of Business Intelligence, which is the foundation of this blog:

Business Intelligence is the convergence of technology and analysis to enable effective decision making.

In the traditional sense, Business Intelligence describes technology. We must take the definition further… technology alone cannot solve the one and only purpose of Business Intelligence: enabling effective decision making. Let’s discuss the definition piece by piece.

Convergence

A computer can be programmed to make decisions based on a set of rules. What happens when a scenario comes along where no rules are written? This is where intelligence comes to play, particularly human intelligence. The human brain is an amazing device which has the ability to process numerous different inputs, resulting in an outcome.

Unfortunately, the brain has limitations. For example, if I had 1,000,000 visitors to this blog every day and wanted to understand what content is most popular, my brain could not digest that much data at once. Technology to the rescue! A database can process massive amounts of data with ease.

Ultimately, convergence is required to solve the limitations of technology and our brains. Together, the goal of Business Intelligence is possible.

Technology

The technology side of Business Intelligence is often comprised of multiple layers. Depending on what your BI goals are, you may add or remove layers. In future posts, I’ll deep dive into different layers. But let us start with the basics.

Data Sources are typically the transactional systems used to run your business. This can include ERP systems, web server logs, manufacturing systems, etc. In some cases, it could be data collected manually on a piece of paper. Keep in mind however, at some point all data must reside in a system where it can be accessed by the Data Movement layer.

Data Movement is the beginning of a robust technical solution. By moving data out of your data sources, you are isolating the performance impact on the critical systems used to run your business. There is another key: you’re not only moving data, but also reorganizing it in a fashion that makes retrieving data for reporting much easier and faster. Data movement is also referred to as ETL (Extract Transform Load).

Data Storage . You have to store the data you moved out of your source system somewhere right? Of course you do! Besides minimizing the impact on your transactional systems, moving data to a new storage location provides the opportunity for further performance tuning for your data queries. A database configured for a transactional system is radically different than a database configured for BI.UPS Corvette In a transactional system, the goal is to retrieve 1 record as fast as possible. Meanwhile, a BI database will acquire millions of records, summarize, then return a concise result. I’m a fan of analogies, so here you go: Imagine if UPS delivered packages with a Corvette. They could deliver 1 or 2 packages really fast, but they would have to drive the Corvette back and forth to the distribution center so many times, it would not be efficient for the hundreds of packages delivered in a given region. So, a slower, yet big truck can hold many packages is more efficient (although not nearly as fun for the delivery driver! :) ).

Data Presentation is the user interface to your data, which provides the capability to retrieve data from data storage and process it into the output the user is looking for. The output comes in many forms, which includes reports, dashboards, RSS feeds, desktop widgets, mobile apps, etc.

Analysis

All too often reports are simply “data dumps” and not effective information to make decisions with. That is the role of analysis. Data is not useful on its own, so it is important for human intelligence to play a role in transforming data into decision enabling information. Cake Ingredients

Business Analysis is the process of taking data and applying human intelligence to create information. You might wonder the difference between data and information. Data is like raw ingredients, egg, flour, sugar. Information is all of the raw ingredients mixed and baked into a yummy birthday cake. While you could eat the raw ingredients, the birthday cake tastes much better! The same is true for data and information. Data can be used on its own, but information is far more useful. Information is raw data combined with knowledge of the business climate and processes to produce actionable information and recommendations.

Enable Effective Decision Making

One of the most important aspects of business are the decisions which drive execution. Business Intelligence is nothing more than an enabling function. It does not make decisions, people do. But, by utilizing the outputs of BI, decision makers can absorb key metrics, apply tribal knowledge and gut feel to result in effective decisions to drive their business forward.


So there you have it… Business Intelligence in a nutshell. How would you define Business Intelligence? Please join in and share your perspectives!

Tune in next week for: Data Warehouses vs. Data Marts

Comments
No Comments »
Categories
Overview
Tags
Overview
Comments rss Comments rss
Trackback Trackback

Categories

  • Architecture
  • Data Warehouse
  • Overview

Recent Posts

  • Data Warehouse Design: Level 1
  • Data Warehouse Design Overview
  • Data Warehouses vs Data Marts
  • What is Business Intelligence?

Archives

  • September 2008
  • July 2008
  • June 2008

Book Recommendations

  • The Data Warehouse Toolkit
  • Web Analytics: An Hour a Day

Links

  • Ask Tom
  • BI Questions
  • Occam’s Razor by Avinash Kaushik
  • The Business Intelligence Blog
  • The Data Mining Blog
rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox