Share This
Scroll Down
Categories
//The Concept of Data Warehouse

The Concept of Data Warehouse

 

Data Warehouse term usually pop-up with the trends of Database, Data mining, knowledge or Artificial intelligence; in fact they are related in does indicate to the same domain; so what is Data Warehouse ?

 

 

Data Warehouse

Data Warehouse is a collection of data used to support the decision making process. It is another database with a high volume of data capacity which it does have a concise, integrated and summarized data.

Features

That is not all, a few features must be exist within the database to be a Data Warehouse and they are :
  1. Subject-Oriented: meaningful data which is specific, simple and subjected, it does focus on the modelling and analysis for the decision making.
  2. Integrated: Data Warehouse is a collection of integrated multiple and heterogeneous data sources with a consistent data format; it goes through some techniques called Data cleaning and integration to assure the consistency of the format, encodings, units and names.
  3. Time Variant: Significant meaning time where it does have a historical records back to several years beside each record must have a time stamp, useful for constructing trends and future insights.
  4. Non-volatile: The entered data must be constant and prevent any changes later to remain as an archive, it must allow only initial loading and accessing to the data.

OLTP vs OLAP

The IT systems in general can either be On-line Transaction Processing (OLTP) or On-line Analysis Processing (OLAP), they have significant changes which it makes the different.

OLTP

On-line Transaction Processing it is the type of systems that interact with the database in daily basis to perform basic functions (INSERT, UPDATE, DELETE) where it does require data integrity and detailed schema to work with, below are some of the key factors to distinguish between them:
  • Function: day-to-day operations
  • Source of Data: original data (operational)
  • DB Design: ERD, application-oriented
  • Data: isolated and current updated data
  • Queries: Simple and Short
  • Access to Records: few
  • Users: large
  • DB Size: less than Gigabytes

OLAP

On-line Analysis Processing the main characteristics of the Data Warehouse where it does used for Data mining and facts finding from aggregated and historical records. mostly the data stored in a multi-dimensional form (Cubic) and it can be break down with respective of time, mostly can indicate for OLAP as extractor for the data to another data, below are some of the key factors to compare with OLTP:
  • Function: Planning, Problem solving and decision support
  • Source of Data: multi-sources (many OLTP databases)
  • DB Design: Star schema, Snowflake schema and subject-oriented
  • Data: multi-dimensional, integrated, concise and historical data
  • Queries: Huge and complex
  • Access to Records: Very large
  • Users: few
  • DB Size: Up to Terrabytes
The structure of OLTP and OLAP

Real life example

Amazon website consist of both OLTP and OLAP, the OLTP databases contain general information about the products, the customers, the suppliers and the transactions; the OLTP databases make some plans for the trends and reports of the interaction with website like :
  • Best period to buy electronics products is during Summer holiday then Black Friday
  • Most of the European males have interest in Brand shirts and western novels
  • The average time for shopping product finding for the girls is 11 product before switching to another search key
Trending products for similar interest
These information can be precise and help to arrange for offers, promotions, to pop-up some advertisements and interest of degree for the customer.

 

After having some searches the OLAP shows similar products for my degree

Why Separated from Database ?

  • Higher performance: having separate types is increasing the performance since the Data Warehouse is OLAP and the degree of data it contains is very huge and big so retrieving from it for daily purpose will generate overheads with low performance.
  • Variety of Functions: Data Warehouse does serve the generating reports, extracting knowledge, finding facts and making trends; beside that the format of data in DW been through techniques and modulation before saving them.
  • Security: Not all the users allowed to access to archives or extracted information; some enterprises may want to encapsulate some data and keep it hidden.

Conceptual Models

Data Warehouse has different conceptual model that Databases with respective of dimension and measures; the dimension is indicator for the degree of data like (Location, Time, Supplier, etc..) while the measures do indicate for the level of depth (n).
  1. Star schema: diagram resembles a star form consisting of one or more fact table in the centre and surrounded by dimension tables; uses JOIN to interleave the data and come up with a resolution of data. it does have a direct mapping with a high optimization.
    Star Schema
  2. Snowflake schema: Similar to Star schema but in a normalized form where it does eliminate the duplicates; less used because of higher complexity and lower performance.
    Snowflake Schema

Fact Table

Centroid table consist of metrics, dimensions and facts of the business process containing quantitative information without descriptive information. it does contain huge amount of records as a history of the enterprise.

OLAP Typical functions

  • Roll-up: Breaking down the structure by going lower in the hierarchy to provide more detailed information.
  • Roll-down: Generalize the information and go upper in the hierarchy.
The drilling process (Drilling up)

Types of Data Warehouses

  1. Enterprise Warehouse: The collection of data about the entire enterprise
  2. Data Mart: Simple warehouse where it is subset from the Enterprise warehouse subjected for single matter, single scope and specific group of users; the assemble of Data marts give us the Enterprise warehouse and it is dependent.
Data warehouse collaborated with Data marts

ETL Process

ETL process is the mechanism used to Extract, Transform and Load the data from the warehouse to the Data marts, it does filter and pick up only the relevant data and it may associate with external data marts before the delivery to the end users or clients.
Will try to elaborate more in next blogs about Data Mining and Data Warehousing, please stay tuned.

Resources

  • 809 views
  • 0 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

© Xernel 2018-2021 / All rights reserved.

Get in Touch
Close