15
Maintaining the Data Warehouse 15-1
15
Maintaining the Data Warehouse
This chapter discusses how to load and refresh a data warehouse, and discusses:
Using Partitioning to Improve Data Warehouse Refresh
15-2 Oracle Database Data Warehousing Guide
2. Create indexes and add constraints on sales_01_2001. Again, the indexes and constraints on sales_01_2001 should be identical to the indexes and constraints on sales. Indexes can be built in parallel and should use the NOLOGGING and the COMPUTE STATISTICS options. For example:
CREATE BITMAP INDEX sales_01_2001_customer_id_bix ON sales_01_2001(customer_id)
TABLESPACE sales_idx NOLOGGING PARALLEL 8 COMPUTE STATISTICS;
Apply all constraints to the sales_01_2001 table that are present on the sales table. This includes referential integrity constraints. A typical constraint would be:
ALTER TABLE sales_01_2001 ADD CONSTRAINT sales_customer_id REFERENCES customer(customer_id) ENABLE NOVALIDATE;
If the partitioned table sales has a primary or unique key that is enforced with a global index structure, ensure that the constraint on sales_pk_jan01 is
validated without the creation of an index structure, as in the following:
ALTER TABLE sales_01_2001 ADD CONSTRAINT sales_pk_jan01 PRIMARY KEY (sales_transaction_id) DISABLE VALIDATE;
The creation of the constraint with ENABLE clause would cause the creation of a unique index, which does not match a local index structure of the partitioned table. You must not have any index structure built on the nonpartitioned table to be exchanged for existing global indexes of the partitioned table. The exchange command would fail.
3. Add the sales_01_2001 table to the sales table.
In order to add this new data to the sales table, we need to do two things. First, we need to add a new partition to the sales table. We will use the ALTER TABLE ... ADD PARTITION statement. This will add an empty partition to the sales table:
ALTER TABLE sales ADD PARTITION sales_01_2001
VALUES LESS THAN (TO_DATE('01-FEB-2001', 'DD-MON-YYYY'));
Then, we can add our newly created table to this partition using the EXCHANGE PARTITION operation. This will exchange the new, empty partition with the newly loaded table.
ALTER TABLE sales EXCHANGE PARTITION sales_01_2001 WITH TABLE sales_01_2001 INCLUDING INDEXES WITHOUT VALIDATION UPDATE GLOBAL INDEXES;
The EXCHANGE operation will preserve the indexes and constraints that were already present on the sales_01_2001 table. For unique constraints (such as the unique constraint on sales_transaction_id), you can use the UPDATE GLOBAL INDEXES clause, as shown previously. This will automatically maintain your global index structures as part of the partition maintenance operation and keep them accessible throughout the whole process. If there were only foreign-key constraints, the exchange operation would be instantaneous.
The benefits of this partitioning technique are significant. First, the new data is loaded with minimal resource utilization. The new data is loaded into an entirely separate table, and the index processing and constraint processing are applied only to the new partition. If the sales table was 50 GB and had 12 partitions, then a new month's worth of data contains approximately 4 GB. Only the new month's worth of data needs to be indexed. None of the indexes on the remaining 46 GB of data needs to be modified at all. This partitioning scheme additionally ensures that the load processing time is directly proportional to the amount of new data being loaded, not to the total size of the sales table.
Using Partitioning to Improve Data Warehouse Refresh
Maintaining the Data Warehouse 15-3 Second, the new data is loaded with minimal impact on concurrent queries. All of the operations associated with data loading are occurring on a separate sales_01_2001 table. Therefore, none of the existing data or indexes of the sales table is affected during this data refresh process. The sales table and its indexes remain entirely untouched throughout this refresh process.
Third, in case of the existence of any global indexes, those are incrementally maintained as part of the exchange command. This maintenance does not affect the availability of the existing global index structures.
The exchange operation can be viewed as a publishing mechanism. Until the data warehouse administrator exchanges the sales_01_2001 table into the sales table, end users cannot see the new data. Once the exchange has occurred, then any end user query accessing the sales table will immediately be able to see the sales_01_2001 data.
Partitioning is useful not only for adding new data but also for removing and archiving data. Many data warehouses maintain a rolling window of data. For
example, the data warehouse stores the most recent 36 months of sales data. Just as a new partition can be added to the sales table (as described earlier), an old partition can be quickly (and independently) removed from the sales table. These two benefits (reduced resources utilization and minimal end-user impact) are just as pertinent to removing a partition as they are to adding a partition.
Removing data from a partitioned table does not necessarily mean that the old data is physically deleted from the database. There are two alternatives for removing old data from a partitioned table. First, you can physically delete all data from the database by dropping the partition containing the old data, thus freeing the allocated space:
ALTER TABLE sales DROP PARTITION sales_01_1998;
Also, you can exchange the old partition with an empty table of the same structure;
this empty table is created equivalent to steps 1 and 2 described in the load process.
Assuming the new empty table stub is named sales_archive_01_1998, the following SQL statement will empty partition sales_01_1998:
ALTER TABLE sales EXCHANGE PARTITION sales_01_1998
WITH TABLE sales_archive_01_1998 INCLUDING INDEXES WITHOUT VALIDATION UPDATE GLOBAL INDEXES;
Note that the old data is still existent as the exchanged, nonpartitioned table sales_
archive_01_1998.
If the partitioned table was setup in a way that every partition is stored in a separate tablespace, you can archive (or transport) this table using Oracle Database's
transportable tablespace framework before dropping the actual data (the tablespace).
See "Transportation Using Transportable Tablespaces" on page 15-4 for further details regarding transportable tablespaces.
In some situations, you might not want to drop the old data immediately, but keep it as part of the partitioned table; although the data is no longer of main interest, there are still potential queries accessing this old, read-only data. You can use Oracle's data compression to minimize the space usage of the old data. We also assume that at least one compressed partition is already part of the partitioned table. See Chapter 3,
"Physical Design in Data Warehouses" for a generic discussion of table compression and Oracle Database VLDB and Partitioning Guide for partitioning and table
compression.
Using Partitioning to Improve Data Warehouse Refresh
15-4 Oracle Database Data Warehousing Guide