Relational DBMS (OLD) - Advanced Database Technologies

 CBSE Class 12 Informatics Practices

Revision Notes
Chapter - 3
Advanced Database Technologies


DATA WAREHOUSE AND DATA-MINING

Definition
The term data warehouse was coined with the definition of Inmon: "A warehouse is a subject- oriented, integrated, time variant and non-volatile collection of data in support of management's decision making process". 

Data Warehouse is a central place where data is stored from different data sources and applications.The term Data Warehouse was first invented by Bill Inmom in 1990.

Characteristics of DW
• 
Subject-oriented: Means that all relevant data about a subject is gathered and stored as a single set in a useful format. Information is presented according to specific subjects or areas of interest. For example, “sales” can be a particular subject.
• Time-variant: Means that the data warehouse contains a history of the subject, as well as current information. It may be long-term data from five to ten years in contrast to the 30 to 60 day time. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

• Non-volatile: Means stable information, Information is consistent; Data in the database is never over-written or deleted once committed. Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.
• Integrated: Stored in a globally acceptable fashion with consistent naming conventions, measurements, encoding structures, and physical attributes, though underlying operational systems store the data differently. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Data Warehouse Architecture:

Add Image content-page-backlog/1239/


• Warehouse database server:
Which is almost always a relational DBMS; rarely flat files.

• OLAP servers:- Which may either be a ROLAP or MOLAP
- Relational OLAP (ROLAP):  ROLAP servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
- Multidimensional OLAP (MOLAP):  MOLAP uses array-based multidimensional storage engines for multidimensional views of data.

• Clients:-The Users or the client of Data warehouses are various Query and reporting tools, Analysis tools and Data mining tools (e.g., trend analysis, prediction)
Processors in Datawarehouse server
- MPP
- Massively parallel processing, a computer configuration that is able to use hundreds or thousands of CPUs simultaneously.
- SMP
- Symmetric multi-processing is a computer configuration where many CPUs share a common operating system, main memory and disks. They can work on different parts of a problem at the same time.

OLAP Vs OLTP

Firstly, OLTP stands for Online Transaction Processing, while OLAP stands for Online Analytical Processing. In an OLTP system, there are a large number of short online transactions such as INSERT, UPDATE, and DELETE.

Whereas, in an OLTP system, an effective measure is the processing time of short transactions and is very less. It controls data integrity in multi-access environments.

 

OLTP

Data Warehouse (DW)

Type of Users

Clerk, IT Professional

Knowledge worker

Purpose

Day to day operations

Decision support

DB Model

Application-oriented (E-R based)

Subject-oriented (Star, snowflake)

Data

Current, Isolated

Historical, Consolidated

View

Detailed, Flat relational

Summarized, Multidimensional

Usage

Structured, Repetitive

Ad hoc

Unit of work

Short, Simple transaction

Complex query

Access Operations

Read/write

Read Mostly

No. of Users

Tens

Millions

Data base size

Thousands, 100 MB-GB

Hundreds, 100GB-TB

Steps involved in creating a Datawarehouse
• Data extraction
• Data cleaning, also called data cleansing or scrubbing,
• Data transformation
• Convert from legacy/host format to warehouse format
• Load
• Sort, summarize, consolidate, compute views, check integrity, build indexes, partition
• Refresh
• Propagate updates from sources to the warehouse

Advantages
• Data warehouses enhance end-user access to a wide variety of data.
• Decision support system users can obtain specified trend reports, e.g. the item with the most sales in a particular area within the last two years.
• Data warehouses can be a significant enabler of commercial business applications.

Data Mining
• Definition: Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data.The information or knowledge extracted so can be used for any of the following applications −

  • Market Analysis
  • Fraud Detection
  • Customer Retention
  • Production Control
  • Science Exploration

> Techniques used in Data Mining: -
Artificial neural networks Genetic algorithms Decision trees Nearest neighbor method Rule induction
Evolution of Datamining

Evolutionary Step

Enabling Technologies

Characteristics

Data Collection (1960s)

Computers, tapes, disks

Retrospective, static data delivery

Data Access (1980s)

Relational databases (RDBMS), (SQL), ODBC

Retrospective, dynamic data delivery at record level

Data Warehousing & Decision Support (1990s)

On-line analytic processing (OLAP), multidimensional databases, data warehouses

Retrospective, dynamic data delivery at multiple levels

Data Mining (Emerging Today)

Advanced algorithms, multiprocessor comp, massive databases

Prospective, proactive information delivery