ETL Concept:
It is a process of extracting the data and transforming into required business format via loading into DWH.
It is a process of extracting the data and transforming into required business format via loading into DWH.
ETL is often a complex combination of process and technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers. It is not a one time event as new data is added to the Data Warehouse periodically – monthly, daily, hourly. Because ETL is an integral, on-going, and recurring part of a data warehouse
o Automated
o Well documented
o Easily changeable
o Well documented
o Easily changeable
There are 2 types of ETL’s used in implementing data acquisition.
Code based ETL:
An ETL application can be developed using some programming languages such as SQL PL/SQL.
Example:
o Oracle SQL Loader
o SAS Based
o SAS Access
o Tera data Load
o Oracle SQL Loader
o SAS Based
o SAS Access
o Tera data Load
GUI based ETL :
An ETL application can be designed using simple graphically user interface point and click techniques.
o Informatica
o Data stage
o AB Initio
o Data Manager
o Data services
o SSIS
o Data stage
o AB Initio
o Data Manager
o Data services
o SSIS
Data Extraction:
It is a process of reading the data from multiple operational source system.
The following are the types of operation source systems.
o Oracle
o Sql server
o DB-2
o Sybase
o Informix
It is a process of reading the data from multiple operational source system.
The following are the types of operation source systems.
o Oracle
o Sql server
o DB-2
o Sybase
o Informix
ERP Sources:
o SAP
o People soft
o C-Bell
o SAP
o People soft
o C-Bell
Legacy sources:
o Mainframe
o Cobol files
o Mainframe
o Cobol files
File sources:
o Flat Files
o XML files
o Flat Files
o XML files
Other sources:
o Excel sheets
o MS-Access
o Excel sheets
o MS-Access
Data Transformation:
It is a process of converting the data and clinzing the data in the staging area. Staging area is a temporary memory or buffer where the data transformation activities take place. The following one the various types of data transformation activities take place
o Data Merging
o Data clinzing
o Data scrubbing
o Data aggregation
It is a process of converting the data and clinzing the data in the staging area. Staging area is a temporary memory or buffer where the data transformation activities take place. The following one the various types of data transformation activities take place
o Data Merging
o Data clinzing
o Data scrubbing
o Data aggregation
Data Merging:
It is process of integrating the data from multiple operational source system.
There are 2 types of data merge operations.
o Horizontal Merging(Join)
o Vertical Merging(Union)
It is process of integrating the data from multiple operational source system.
There are 2 types of data merge operations.
o Horizontal Merging(Join)
o Vertical Merging(Union)
Data Clinzing:
It’s a process of changing inconsistencies and inaccuracies; or a process of removing unwanted data (filtering)
It’s a process of changing inconsistencies and inaccuracies; or a process of removing unwanted data (filtering)
Data Scrubbing:
It is a process of deriving new data definitions.
It is a process of deriving new data definitions.
Data Aggregation:
It’s a process of calculating the summaries using an aggregate function called Sum ()
It’s a process of calculating the summaries using an aggregate function called Sum ()
Data Loading:
It is a process of inserting data into a target system. there is 2 types of data loads.
It is a process of inserting data into a target system. there is 2 types of data loads.
Initial Load or full Load:
It’s a process of inserting the data into an empty the data into an empty target tables very first time. All the required data loads into empty target tables.
It’s a process of inserting the data into an empty the data into an empty target tables very first time. All the required data loads into empty target tables.
Incremental data load or delta load:
It’s a process of loading only new record or any changes records which takes place after initial load
It’s a process of loading only new record or any changes records which takes place after initial load
An ETL Use-Case
In order to demonstrate this, we’ll use a use-case that is similar in concept. In our ETL development example, the human resources department uses a recruiting tool written using a dBase- or FoxPro-like data structure. But the employee data is stored in Microsoft SQL Server.
In order to demonstrate this, we’ll use a use-case that is similar in concept. In our ETL development example, the human resources department uses a recruiting tool written using a dBase- or FoxPro-like data structure. But the employee data is stored in Microsoft SQL Server.
For reference, here is the new_empl.dbf and using an XSLT transform that reads a table and displays it as HTML.
Our steps will be then, to Extract, Transform, and Load
• The extract step will take the data from the dBASE III file and convert it into a more usable format – XML
Our steps will be then, to Extract, Transform, and Load
• The extract step will take the data from the dBASE III file and convert it into a more usable format – XML
• The transform step will change the date format into standard ISO dates, split the name into first and last names, and assign the appropriate manager based on whether the employee is being assigned to inside sales or the external sales force.
• The load step will take the resulting file and send it to SQL Server.
• As a side-effect, it would be nice to get a report of data loaded.
Credit goes to the below website(s) :
This comment has been removed by the author.
ReplyDeleteRegards
ReplyDeleteSridevi Koduru (Senior Oracle Apps Trainer Oracleappstechnical.com)
Please Contact for One to One Online Training on Oracle Apps Technical, Financials, SCM, SQL, PL/SQL, D2K at sridevikoduru@oracleappstechnical.com | +91 - 9581017828.
good info provided here thanks a lot oracle training in chennai
ReplyDeleteThanks a lot. It is useful to be well written, clear and concise
ReplyDeleteETL Testing Online
ETL Testing Training
selenium trainings
ReplyDeletejava training
mulesoft training
etl testing training