2 Setting Up Your Data Warehouse System

Oracle Corporation and its affiliates disclaim any responsibility for any damages caused by the use of this software in unsafe applications. This software and documentation may provide access to or information about third party content, products and services.

Preface

Audience

Documentation Accessibility

To reach Oracle Support Services, use a telecommunications relay service (TRS) to call Oracle Support at. An Oracle Support Services technician handles technical issues and provides customer support according to the Oracle service request process.

About the dd Utility

In your test, you should include all the storage devices you intend to include for your database storage.

About the Orion Utility

Setting Up a Database for a Data Warehouse

How Should I Set the Memory Management Parameters?

Because the target memory initialization parameter is dynamic, you can change the target memory size at any time without changing the. The maximum memory size serves as an upper limit so that you do not accidentally set the target memory size too high.

What Other Initialization Parameter Settings Are Important?

The database then tunes to the target memory size, redistributing memory as needed between the system global area (SGA) and the overall program global area (PGA). Since certain SGA components either cannot shrink easily or must remain at a minimum size, the database also prevents you from setting the target memory size too low.

Accessing Oracle Warehouse Builder

For Windows, select Start>Programs>Oracle>Warehouse Builder, and then select Design Center. Follow the prompts and accept the default settings in the Repository Assistant to create a workspace and assign a user as the workspace owner.

Installing the Oracle Warehouse Builder Demonstration

For Windows, select Start>Programs>Oracle>Warehouse Builder, and then select OMB*Plus. Start the Design Center and log in as the workspace owner with the credentials you specified in the owbdemoinit.tcl script.

Identifying Data Sources and Importing Metadata

Overview of Data Sources

General Steps for Importing Metadata from Sources

About Workspaces, Projects, and other Devices in OWB

Indicate which files should be imported as described in "Launch the Import Metadata Wizard" on page 3-4. Import the metadata for both flat files as described in “Importing Flat File Data” on page 3-5.

Specifying Locations for the Flat Files

Specify where the flat files are located, as described in "Specifying locations for the flat files" on page 3-3. After defining the resources by importing their metadata, you then design the target schema as described in Chapter 4, "Defining Warehouses in Oracle Warehouse Builder".

Creating Modules in the Project

Launching the Import Metadata Wizard

Using the Flat File Sample Wizard

Importing the Flat File Data

Defining Warehouses in Oracle Warehouse Builder

General Steps for Defining a Relational Target Warehouse

Identifying the Warehouse Target Schema

About Flat File Sources in OWB

Simply put, SQL*Loader handles large amounts of data better, while SQL allows for a wider range of complex joins and transformations. However, to use SQL, you must define an external table in the repository module, as described in "Exercise: Adding External Tables to the Target Module" on page 4-3.

Exercise: Adding External Tables to the Target Module

About Dimensions

Hierarchies: A logical structure that uses ordered levels or a set of data values (for a value-based hierarchy) as a way to organize data.

Exercise: Understanding Dimensions

About Levels

Defining Level Attributes

Defining Hierarchies

Dimension Roles

Level Relationships

Dimension Example

Control Rows

To get the actual number of rows in a dimension, count the number of rows by including a WHERE clause that excludes the NULL rows. For example, to get a count for Products, count the number of rows including a WHERE clause to exclude NULL rows in Product.

Implementing a Dimension

Star Schema

Binding

You can also select the dimension node on the canvas and choose Bind from the Object menu. Create the implementation objects (tables or views) that you will use to store the dimension data. Map the attributes in each level of the dimension to the columns that store their data.

About Cubes

Defining a Cube

Cube Measures

Cube Dimensionality

For dimensions that use a relational implementation, a dimension role for each dimension to indicate what role the dimension reference performs in the cube.

Cube Example

Implementing a Cube

Relational Implementation of a Cube

For example, you create the SALES cube that references the TIMES and PRODUCTS dimensions and performs automatic binding for the cube. Alternatively, you can use an existing database table or view to store the cube data. Map the cube's measures and dimension references to the columns that store the cube's data.

Loading Data into Your Data Warehouse

Defining ETL Logic

About Mappings and Operators

Summary of Steps for Defining Mappings

Creating a Mapping

Warehouse Builder saves the mapping definition and inserts its name into the Project Navigator. Warehouse Builder opens the Mapping Editor for the mapping and displays the name of the mapping in the title bar. In the Project Navigator, locate the repository target module in the Databases folder and then in the Oracle Database folder.

Types of Operators

Adding Operators

Adding Operators that Bind to Workspace Objects

Create Unbound Operator with No Attributes

Select from Existing Workspace Object and Bind

Editing Operators

Connecting Operators

Connecting Attributes

Connecting Groups

You can use the mapping editor with an unbound table operator to quickly create staging area tables. Use the Add Table Operator dialog box to select and bind the source table operator in the mapping. The mapping should now look like Figure 5-4 with one source table and one staging area table with no attributes.

Setting Mapping Properties

Target Load Order

To change the loading order, select a target and use the shuttle buttons on the right to move the target up or down in the list.

Reset to Default

Setting Operator, Group, and Attribute Properties

Synchronizing Operators and Workspace Objects

Synchronizing An Operator

By default, Warehouse Builder selects the option to synchronize your selected operator with the corresponding object in the workspace. In this step, you also specify Synchronize from a workspace object to an operator or select the option Synchronize from an operator to a workspace object.

Synchronizing From a Workspace Object to an Operator

Prototype mappings using tables: When working in the design environment, you can choose to design the ETL logic using tables. For production, you may want to use the mapping to access other workspace object types such as views, materialized views, or cubes.

Synchronizing Operators based on Workspace Objects

Synchronizing from an Operator to a Workspace Object

Synchronization from an operator does not affect the dependent relationship between other operators and workspace objects.

Deploying to Target Schemas and Executing ETL Logic

About Deployment

What is a Control Center?

Configuring the Physical Details of Deployment

Deployment Actions

The Deployment Process

Deploying Objects

Starting ETL Jobs

Viewing the Data

Reporting on a Data Warehouse

SQL for Reporting and Analysis

Use of SQL Analytic Capabilities to Answer Business Queries

You can take advantage of the advanced SQL and PL/SQL capabilities that Oracle Database offers to translate business queries into SQL.

How to Add Totals to Reports Using the ROLLUP Function

When to Use the ROLLUP Function

This query is useful, but you may want to see totals for different categories in the same report.

How to Separate Totals at Different Levels Using the CUBE Function

When to Use the CUBE Function

How to Add Subtotals Using the GROUPING Function

When to Use the GROUPING Function

How to Combine Aggregates Using the GROUPING SETS Function

When to Use the GROUPING SETS Function

Instead of adding a separate query to get the totals per channel class, you can use the GROUPING SETS function as illustrated in the example below.

How to Calculate Rankings Using the RANK Function

When to Use the RANK Function

How to Calculate Relative Contributions to a Total

How to Perform Interrow Calculations with Window Functions

How to Calculate a Moving Average Using a Window Function

Use of Partition Outer Join to Handle Sparse Data

The result of a partition outer join is a UNION operation of the outer joins of each of the partitions in the logically partitioned table with the table on the other side of the join.

When to Use Partition Outer Join

Use of the WITH Clause to Simplify Business Queries

When to Use the WITH Clause

Managing a Data Warehouse

Refreshing a Data Warehouse

About Refreshing Your Data Warehouse

Using Rolling Windows to Offload Data

Optimizing Data Warehouse Operations

Avoiding System Overload

Monitoring System Performance

Monitoring Parallel Execution Performance

Monitoring I/O

Using Database Resource Manager

Optimizing the Use of Indexes and Materialized Views

Optimizing Storage Requirements

Using Data Compression to Improve Storage

Eliminating Performance Bottlenecks

Verifying That SQL Runs Efficiently

Analyzing Optimizer Statistics

Analyzing an Execution Plan

The first execution plan shows dynamic pruning, using the KEY values for Pstart and Pstop, respectively. Dynamic pruning means that the database has to figure out which partition or partitions to access at run time. In the case of static pruning, the database knows which partition or partitions to access at the time of parsing, leading to more efficient execution.

Using Hints to Improve Data Warehouse Performance

In that case, use hints to instruct the optimizer to use the optimal execution plan.

Using Advisors to Verify SQL Performance

Improving Performance by Minimizing Resource Consumption

Improving Performance: Partitioning

Improving Performance: Partition Pruning

Partition pruning significantly reduces the amount of data retrieved from disk and reduces processing time, improving query performance and resource utilization. An example of dynamic pruning is using operators or functions in the WHERE condition. Partition pruning affects the statistics of the objects where pruning will occur and will therefore also affect the execution plan of a statement.

Improving Performance: Partitionwise Joins

If you partition the index and table on different columns (using a globally partitioned index), partition pruning also eliminates index partitions even when the partitions of the underlying table cannot be eliminated. An example scenario for static pruning would be an SQL statement containing a WHERE condition with a constant literal on the partition key column.

Improving Performance: Query Rewrite and Materialized Views

Improving Performance: Indexes

Improving Performance: Compression

Improving Performance: DBMS_COMPRESSION Package

Improving Performance: table_compress Clause of CREATE TABLE and ALTER TABLE

Using Resources Optimally

Optimizing Performance with Parallel Execution

How Parallel Execution Works

Setting the Degree of Parallelism

About Wait Events

Backing up and Recovering a Data Warehouse

How Should I Handle Backup and Recovery for a Data Warehouse?

Strategies and Best Practices for Backup and Recovery

Best Practice A: Use ARCHIVELOG Mode

Archived redo logs can be sent and applied to the standby physical database, which is an exact replica of the primary database. The user can only back up the database while it is completely shut down after a clean shutdown. Usually, the only option for media recovery is to restore the entire database, which will lose all transactions since the last backup.

Is Downtime Acceptable?

Best Practice B: Use RMAN

Best Practice C: Use Read-Only Tablespaces

Best Practice D: Plan for NOLOGGING Operations

Additionally, subsequent data operations on which a nologging operation occurred also cannot be recovered even if those operations did not use nologging mode. The presence of NOLOGGING operations should be considered when designing the backup and recovery strategy. There are two approaches to backup and recovery in the presence of NOLOGGING operations: ETL or incremental backup.

Extraction, Transformation, and Loading

An example of implementing this approach is to back up the data warehouse every weekend and then save the files needed to support the ETL process for each night. Essentially, the data warehouse administrator is gaining better performance in the ETL process through logging operations, at a slightly more complex price and. A weakness of this approach is that the burden is on the data warehouse administrator to track all relevant changes that have occurred in the data warehouse.

Incremental Backup

Best Practice E: Not All Tablespaces are Equally Important

For example, sales data in a data warehouse can be critical and in the event of a recovery, this data must be online as soon as possible. But in the same data warehouse, a table that stores clickstream data from a company's website may be much less critical. In this scenario, the tablespaces that contain sales data should be backed up frequently, while the tablespaces that contain clickstream data should only be backed up once every week or two.

Securing a Data Warehouse

Overview of Data Warehouse Security

Why is Security Necessary for a Data Warehouse?

Using Roles and Privileges for Data Warehouse Security

Using Virtual Private Database in Data Warehouses

How Virtual Private Database Works

Overview of Oracle Label Security

How Oracle Label Security Works

Oracle Label Security mediates access to rows in database tables based on a label contained in the row, a label associated with each database session, and Oracle Label Security privileges assigned to the session. It provides access mediation on an application table after a user has been granted the standard database system and object privileges. If the user executes a SELECT statement on the table, Oracle Label Security will evaluate the selected rows and determine whether the user can access them based on the privileges and access labels assigned to the user.

How Data Warehouses Benefit from Labels

Overview of Fine-Grained Auditing in Data Warehouses

Overview of Transparent Data Encryption in Data Warehouses

In general, most organizations do not have the time or expertise to modify existing applications to call encryption routines. In other words, applications can use the same syntax to insert data into the application table, and Oracle Database will automatically encrypt the data before writing it to disk. On subsequent SELECT operations, the data will be transparently decrypted so that the application will continue to function normally.

Index