Data Warehouse

Why Use a Data Warehouse?

Scenario 1 - Reporting:
  • ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Bangalore.
  • The Sales Manager wants quarterly sales report.
  • Each branch has a separate operational system.

Solution 1 - Reporting:
  • Extract sales information from each database.
  • Store the information in a common repository at a single site.
  • Generate automated reports for each branch 


Scenario 2 - Dashboards:

  • One Stop Shopping Super Market has huge operational database. 
  • Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.



Solution 2 - Dashboards:

  • Extract data needed for analysis from operational database.
  • Store it in warehouse.
  • Refresh warehouse at regular interval so that it contains up to date information for analysis.
  • Warehouse will contain data with historical perspective.
  • Create dashboards for Executives.





Scenario 3 - Analytics:


  • Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.

Solution 3 - Analytics:


  • Improve the quality of data before loading it into the warehouse.
  • Perform data cleaning and transformation before loading the data.
  • Use query analysis tools to support adhoc queries and decision-making.



What is a Data Warehouse?

Bill Inmon's Definition:

A data warehouse is
  • Subject-oriented
  • Integrated
  • Time-variant
  • Non-volatile
collection of data in support of management’s decision making process.


Subject-oriented means?
  • Data warehouse is organized around subjects such as sales,product,customer.
  • It focuses on modeling and analysis of data for decision makers.
  • Excludes data not useful in decision support process.

            

Integration means?

  • Data Warehouse is constructed by integrating multiple heterogeneous sources.
  • Data Pre-processing are applied to ensure consistency.

  • In terms of data.
  1. Encoding structures.
  2. Measurement of attributes
  3. Physical attribute of Data
  4. Naming conventions. 
  5. Data type format 



Time Variant means?

  • Provides information from historical perspective e.g. past 5-10 years
  • Every key structure contains either implicitly or explicitly an element of time.

Non Volatile means?
  • Data once recorded cannot be updated.
  • Data warehouse requires two operations in data accessing.

  1. Initial loading of data
  2. Access of data
Data Warehouse Architecture



  • Data Warehouse server
  • almost always a relational/dimensional DBMS, rarely flat files
  • OLAP servers
  • to support and operate on multi-dimensional data structures

  • Clients
  • Query and reporting tools
  • Analysis tools
  • Data mining tools.
Data Warehouse Schemas
  1. Star Schema.
  2. Fact Constellation Schema.
  3. Snowflake Schema.

Star Schema

  •          A single, large and central fact table and one table for each dimension.
  •          Every fact points to one tuple in each of the dimensions and has additional attributes.
  •          Does not capture hierarchies directly.



Snow Flake Schema
  •        Variant of star schema model.
  •        A single, large and central fact table and one or more tables for each dimension.
  •        Dimension tables are normalized i.e. split dimension table data into additional tables.



Fact Constellation
  •        Multiple fact tables share dimension tables.
  •        This schema is viewed as collection of stars hence called galaxy schema or fact constellation.
  •        Sophisticated application requires such schema.


Building Data Warehouse


  •        Data Selection
  •        Data Pre-processing
          -Fill missing values
          -Remove inconsistency
  •        Data Transformation & Integration
  •        Data Loading
        -Data in warehouse is stored in form of fact tables and dimension tables.


Need for Data Warehouse

  1.        Industry has huge amount of operational data.
  2.        Knowledge worker wants to turn this data into useful information.
  3.        This information is used by them to support strategic decision making .
  4.        It is a platform for consolidated historical data for analysis.
  5.        It stores data of good quality so that knowledge worker can make correct decisions.
  6.        From business perspective
  •        it is latest marketing weapon 
  •         helps to keep customers by learning more  about their needs .
  •         valuable tool in today’s competitive fast evolving world.

No comments: