<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Simplifying Business Intelligence &#187; Data Warehouse</title>
	<atom:link href="http://blog.simplifyingbi.net/category/data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.simplifyingbi.net</link>
	<description>by Steven Cox</description>
	<lastBuildDate>Thu, 30 Oct 2008 20:43:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Data Warehouse Design: Level 1</title>
		<link>http://blog.simplifyingbi.net/2008/09/data-warehouse-design-level-1/</link>
		<comments>http://blog.simplifyingbi.net/2008/09/data-warehouse-design-level-1/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 16:25:00 +0000</pubDate>
		<dc:creator>Steven Cox</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Data Warehouse]]></category>

		<guid isPermaLink="false">http://blog.simplifyingbi.net/?p=31</guid>
		<description><![CDATA[Level 1 is the entry point for data and the simplest area in the Data Warehouse. The concept of Level 1 will largely be explained through examples, however there are a few concepts to keep in mind before we start:

Level 1 is for new, incoming data only. No history is maintained in Level 1. Once [...]]]></description>
			<content:encoded><![CDATA[<p>Level 1 is the entry point for data and the simplest area in the Data Warehouse. The concept of Level 1 will largely be explained through examples, however there are a few concepts to keep in mind before we start:</p>
<ul>
<li>Level 1 is for new, incoming data only. No history is maintained in Level 1. Once the data is pulled into the Level 2 area, the data in Level 1 is truncated. This aligns to the purpose of Level 1: Get data from the source systems as quickly and easily as possible.</li>
<li>The tables are <em>data source</em> organized. By keeping the structure of Level 1 organized similarly to the source systems, it simplifies adding new sources in addition to troubleshooting and management.</li>
<li>To increase the maintainability, the naming convention within Level 1 should be close to the source. The exception: it is a good practice to prefix the tables within Level 1 to identify the data source. You will see this concept in our examples below.</li>
</ul>
<h2>Example Introduction</h2>
<p>The example used here will follow through future posts for Level 2 and the Dimensional Model. This is the foundation for those future posts. For this example the premise is fairly simple:</p>
<blockquote style="MARGIN-RIGHT: 0px" dir="ltr"><p>Our task is to build a Data Warehouse for Thingamajig Inc. which will provide analysis capabilities on orders and shipments. Thingamajig Inc. builds widgets and sells them through the web, sales team, phone, and resellers. All orders except resellers are shipped directly from our outsourced logistics partner. Resellers ship their own inventory. All of the order entry systems are separate systems. The Data Warehouse is the aggregation point for all order data in the company. Shipment data will come from our logistics partner and resellers.</p></blockquote>
<p>There are a number of challenges we will have to address in our design. However, the majority of challenges will be addressed in Level 2.</p>
<h2>Orders Design</h2>
<p>We are going to keep the example fairly simple for each order entry system. We&#8217;ll only include a customer and product information. Obviously most order entry systems are far more complex, but we don&#8217;t want to get caught up in those details.</p>
<h3>Web Orders Design</h3>
<p>The Web Orders system is the newest system at Thingamajig Inc. There are tables for order header, order line, product, and customer. The orders contained in the Web system are our end customer orders.</p>
<p style="TEXT-ALIGN: left"><img style="DISPLAY: block; MARGIN-LEFT: auto; WIDTH: 452px; MARGIN-RIGHT: auto; HEIGHT: 466px; TEXT-ALIGN: center" title="Data Warehouse Design Level 1 Web Orders Schema" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/08/level1-www-schema2.png" alt="Data Warehouse Design Level 1 Web Orders Schema" width="452" height="466" /></p>
<h3>Phone Orders Design</h3>
<p>The Phone Orders system is the oldest in the company. This one includes tables for order header, order line, item, customer, and address. The orders contained in the Phone system are our end customer orders either taken over the phone or entered from the field sales team.</p>
<p><img style="DISPLAY: block; MARGIN-LEFT: auto; WIDTH: 398px; MARGIN-RIGHT: auto; HEIGHT: 463px; TEXT-ALIGN: center" title="Data Warehouse Design Level 1 Phone Orders Schema" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/08/level1-phone-schema.png" alt="Data Warehouse Design Level 1 Phone Orders Schema" width="398" height="463" /></p>
<h3>Reseller Orders Design</h3>
<p>Reseller Order data is a standardized text file that we receive from all of our partners. Luckily Thingamajig Inc. was smart by forcing all resellers to use the same file format. The orders we receive from resellers are their orders for product, not the end customer order.</p>
<p><img style="DISPLAY: block; MARGIN-LEFT: auto; WIDTH: 175px; MARGIN-RIGHT: auto; HEIGHT: 203px; TEXT-ALIGN: center" title="Data Warehouse Design Level 1 Reseller Orders Schema" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/08/level1-reseller-schema.png" alt="Data Warehouse Design Level 1 Reseller Orders Schema" width="175" height="203" /></p>
<h3>Orders Design Notes</h3>
<ul>
<li>All of our sources for orders are structured differently</li>
<li>The terminology used in each is different. For example, web calls our product a product, while the phone system refers to it as an item.</li>
<li>The Reseller order information is minimal. It does not contain the ship to address. That information is stored in a spreadsheet on the Reseller Manager&#8217;s computer.</li>
</ul>
<h2>Shipments Design</h2>
<p>Shipments are less complex in that we only have two sources for data: Our resellers and our manufacturing partner.</p>
<h3>Reseller Shipments Design</h3>
<p>Just like our Reseller Orders, the data we receive is in a standardized flat file format. Thingamajig, Inc. uses shipment information to recognize revenue, so it is critical that this data is accurate. The data we receive from resellers is fairly basic as the resellers manage tracking deliveries to customers, etc.</p>
<p><img style="DISPLAY: block; MARGIN-LEFT: auto; WIDTH: 202px; MARGIN-RIGHT: auto; HEIGHT: 186px; TEXT-ALIGN: center" title="Data Warehouse Design Level 1 Reseller Shipments Schema" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/08/level1-reseller-ship-schema.png" alt="Data Warehouse Design Level 1 Reseller Shipments Schema" width="202" height="186" /></p>
<h3>Partner Shipments Design</h3>
<p>Our Partner shipments are provided by an electronic transfer directly to a database. There is more granularity in the data because we follow the tracking numbers to make sure our shipments are delivered to customers.</p>
<p><img style="DISPLAY: block; MARGIN-LEFT: auto; WIDTH: 212px; MARGIN-RIGHT: auto; HEIGHT: 254px; TEXT-ALIGN: center" title="Data Warehouse Design Level 1 Partner Shipments Schema" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/08/level1-partner-ship-schema.png" alt="Data Warehouse Design Level 1 Partner Shipments Schema" width="212" height="254" /></p>
<p>So there you have it. Designed for speed and maintainability, Level 1 is the most simple part of the Data Warehouse architecture. In our next post which will discuss the Level 2 design, the ideas will come together&#8230; so stay tuned!</p>
<hr id="hr" />What do you think? Do you have specific scenarios to discuss? Please join in and share your perspectives!</p>
<p><em><strong>Tune in next time for:</strong> Level 2 Design</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.simplifyingbi.net/2008/09/data-warehouse-design-level-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Warehouse Design Overview</title>
		<link>http://blog.simplifyingbi.net/2008/07/data-warehouse-design-overview/</link>
		<comments>http://blog.simplifyingbi.net/2008/07/data-warehouse-design-overview/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 05:45:00 +0000</pubDate>
		<dc:creator>Steven Cox</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[technical]]></category>

		<guid isPermaLink="false">http://blog.simplifyingbi.net/?p=14</guid>
		<description><![CDATA[So after reading the last post, Data Warehouses vs Data Marts, let&#8217;s get one thing out of the way. For the purpose of this blog, we&#8217;re going to call our data storage layer a Data Warehouse, while following our design philosophy of solving for the customer &#8212; not the definition of what we&#8217;re calling it [...]]]></description>
			<content:encoded><![CDATA[<p>So after reading the last post, <a href="http://blog.simplifyingbi.net/2008/06/data-warehouses-vs-data-marts/">Data Warehouses vs Data Marts</a>, let&#8217;s get one thing out of the way. For the purpose of this blog, we&#8217;re going to call our data storage layer a Data Warehouse, while following our design philosophy of solving for the customer &#8212; not the definition of what we&#8217;re calling it (See the Data WareMart section of the <a href="http://blog.simplifyingbi.net/2008/06/data-warehouses-vs-data-marts/">Data Warehouses vs Data Marts</a> for details).</p>
<p>Now, onto our high level Data Warehouse Design overview. There are many different approaches to this topic, almost to the point of passionate debate of <em>&#8220;Great Taste&#8230;Less Filling!&#8221;</em> As with anything, your needs will depend on the specifics of the project. The design we&#8217;re talking about below has a number of advantages for the majority of projects.</p>
<p><img style="DISPLAY: block; MARGIN: 1em auto; WIDTH: 550px; HEIGHT: 224px; TEXT-ALIGN: center" title="Simplifying Business Intelligence Data Warehouse Architecture" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/07/bi-architecture2.png" alt="Simplifying Business Intelligence Data Warehouse Architecture" width="550" height="224" /></p>
<p>Before we discuss the benefits of this design pattern, let&#8217;s understand the above diagram a bit.</p>
<ul>
<li>DB/XML/EDI/Flat File: Mix and match the various data sources that are required to meet your customer&#8217;s requirements.</li>
<li>Level 1: Data source organized drop zone for data coming from the various data sources.</li>
<li>Level 2: Subject area organized data store used as the launch point for downstream systems (which includes our Data Warehouse and more).</li>
<li>DM/Cubes: DM = Dimensional Model. Dimensional models are a special database design which is tuned for reporting and analytical systems. Cubes are precalculated views of the dimensional model.</li>
<li>User Interface: The access point to the Data Warehouse for our customers.</li>
</ul>
<h2>Layers of Isolation</h2>
<p>A close friend shared the following quote with me a few years ago:</p>
<blockquote style="MARGIN-RIGHT: 0px" dir="ltr"><p>The only thing constant in life is change. <span style="FONT-SIZE: 10px">-François de la Rochefoucauld</span></p></blockquote>
<p>I cannot stress this point enough. The business climate today is under ever increasing pressure by a pace of life that is moving faster than ever before. The result is continuous churn in products, services, and processes for a competitive edge. For the lucky teams building systems with tight deadlines and budgets, our designs must be flexible and adaptable to whatever is coming next. For traditional data warehouse designs, change can be a real problem. Using <em>Layers of Isolation</em> can ease the pain.<img style="DISPLAY: inline; FLOAT: left; MARGIN: 1em 1em 1em 0em; WIDTH: 248px; HEIGHT: 198px" title="Changes Road Sign" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/07/changesroadsign1.png" alt="Changes Road Sign" width="248" height="198" /></p>
<p>Layers of Isolation is accomplished through the organization and purpose of the data at the various stages of the Data Warehouse. Going back to the Level 1 definition, we know the data is source organized. This allows us to deal with new data sources, logic changes, or new functionality from our source systems without disturbing the rest of the downstream Data Warehouse. Once Level 1 changes are implemented, Level 2 and Dimensional Model changes can be subsequently implemented. The development effort can be executed in parallel for speed or sequentially to minimize system downtime.</p>
<h2>Organized, Simplified, Standardized (OSS) Data Store</h2>
<p>Level 2 is our organized, simplified, standardized data store. Traditionally Data Warehouses serve as a tool for reporting, but our approach is to also provide a data store for the company. In larger companies it is not uncommon to have multiple systems that do the same or similar functions, with no cross communication. The business does not care about disconnected IT systems, just a unified view of data. There is also more than just multiple systems. Maybe your business sells through a Direct and Retail channel and the business logic in the order and shipment systems is radically different between the two. Our goal is to standardize so that regardless of business logic, data source, terminology, language, currency, time zone, etc. we have a unified view of the company&#8217;s data. Think about the benefits of having a data store for all of our business critical information in one place. There will be a number of projects beyond our data warehouse that will benefit tremendously. <img style="DISPLAY: inline; FLOAT: right; MARGIN: 1em 0em 1em 1em; WIDTH: 200px; HEIGHT: 233px" title="Disorganized Folders" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/07/folders.png" alt="Disorganized Folders" width="200" height="233" />Clarity is important, so let&#8217;s define Organized, Simplified, Standardized:</p>
<ul>
<li>Organized: Bring together similar data into one place that is easily identifiable and joinable. Joinable is important if we want the ability to match Orders with Shipments for example.</li>
<li>Simplified: The source systems may use complicated or arbitrary column names. Simplified uses an easily identifiable meaning from the name. If Attribute14 is actually Delivery Date, well then, call it Delivery Date. Another simplification that happens is <a href="http://en.wikipedia.org/wiki/Denormalization">denormalization</a>. Most transactional systems <a href="http://en.wikipedia.org/wiki/Database_normalization">normalize</a> data. It could be that order information is spread across 10 tables in the source systems. Our data store simplifies this by appropriately consolidating data together into fewer tables (optimally 1 or 2 in our Orders example).</li>
<li>Standardized: Different business processes and source systems may not use the same terminology. For example, one system might say Promised Delivery Date another may use Estimated Delivery Date. From a business process perspective, what both actually mean is the date we anticipate our customer receiving their order. We might just call this field Expected Delivery Date in Level 2 to standardize the meaning. There may also be cases when business logic will be applied to data. Level 2 is the place to do it once so data is standardized.</li>
</ul>
<p>As a final benefit of OSS (Organized, Simplified, Standardized), we can do thorough quality assurance on a large chunk of the business logic. This helps us deliver consistent and tested data to downstream systems.</p>
<h2>Development Speed</h2>
<p>The <em>layers of isolation</em> naturally lead to a faster development cycle. Once the initial design of each layer is completed, developers can work in parallel to complete each section. In a traditional Data Warehouse design, there is typically only one level before the dimensional model, known as staging. In our design, you could say we have two staging environments. This is true and yes, there is additional development work required. We touched briefly on the Level 2 OSS benefits. If another project needs Orders, Level 2 is the place to go. No need to understand the 10 different orders systems in the company. This advantage significantly boosts development speed for downstream applications <em>and</em> more importantly, new Dimensional Models in our Data Warehouse.</p>
<h2>Example Time</h2>
<p>You have a company that sells widgets through retailers, along with phone, and online orders. Unfortunately, the IT systems used for each of these sales channels do not communicate with each other. A number of problems are created from this. There is no single view of customers, orders, shipments, revenue, marketing effectiveness and more. What is worse, the company is massively expanding internationally and setting up complete new IT systems in each country. The CEO wants a dashboard to understand the health of the business. In our Level 1 layer, we might have tables such as:</p>
<p><img style="DISPLAY: inline; FLOAT: right; MARGIN: 1em 0em 1em 1em; WIDTH: 250px; HEIGHT: 125px" title="Can Phone" src="http://blog.simplifyingbi.net/wp-content/uploads/2008/07/canphone.png" alt="Can Phone" width="250" height="125" /></p>
<ul>
<li>canada_phone_order_header</li>
<li>canada_phone_order_line</li>
<li>canada_online_orders</li>
<li>canada_online_customers</li>
<li>canada_retail_order_header</li>
<li>canada_retail_order_line</li>
<li>canada_customer</li>
<li>mexico_online_orders</li>
<li>mexico_retail_order_header</li>
<li>mexico_retail_order_line</li>
<li>mexico_customers</li>
<li>australia_phone_txn_orders</li>
<li>australia_customer</li>
</ul>
<p>The above represents the various IT systems and their deployments to different geographies. You&#8217;ll notice the inconsistencies across systems and even within countries. Once in Level 2, we&#8217;ll have:</p>
<ul>
<li>Orders</li>
<li>Customers</li>
</ul>
<p>Look good? Within Level 2 we have merged all of the order and customer information into OSS Orders and Customers tables. Next time a new country is brought online, we just add the appropriate Level 1 tables and logic to translate to our OSS Level 2 Tables. All of that without any changes required from Level 2 to our Dimensional Model.</p>
<p>While this is a highly oversimplified example, hopefully it wets your appetite for the deeper discussions on the design layers. At this point, you probably have more questions than answers. Over the next few posts we&#8217;ll cover the various layers in detail.</p>
<hr id="hr" />Do you see the value of the design approach? What would you do differently? Please join in and share your perspectives!</p>
<p><em><strong>Tune in next time for:</strong> Level 1 Design Details</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.simplifyingbi.net/2008/07/data-warehouse-design-overview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.393 seconds -->

