The need for data archiving
Tables sizes in the system are growing continuously and over time they becomes very large. A lot of data from these tables is no longer needed as it may be inactive or logically deleted data. Archiving these records can reduce the size of the tables resulting in enhanced performance and saving of space.
Statistics show that 60% of all companies are storing inactive or unused data in the primary storage servers itself. Also 80% of stored data becomes unused after 90 days.
Magnetic tapes have no moving parts and power consumption for these devices is also much less. The power and cooling costs of disks(HDD and RAID) itself, is comparable to the cost of an archiving system archiving data to tapes. Tapes apart from being more environmentally friendly are also much cheaper than disks.
The archived data can also be readily available for data mining and data analysis purposes, something which is in very high demand in the current times considering the volume explosion of data in the last few years.
Some of the tables having a very high growth rate, which if not kept in check, will result in frequent addition of disks to meet with the continuous demand for space. Many a time purging this data is also not possible as there may be legal implications. Customers may want to inquire on data which is a few years old.
When making a data archiving plan, the following points need to be considered –
1. Identify the data that can be archived from business perspective. This can then be discussed with the development team to get a list of tables that are impacted by this.
2. Establish a clear relation between these tables.
3. Find out the legal implications of archiving the data.
4. Archiving the data to a different schema in a separate archiving tablespace will be a good strategy. Data from this tablespace can then be backed up to tapes and purged from the archiving tablespace.
5. Check for possibilities of the data being needed again when making the backup and purge plan.
6. Restoration. Create a plan for restoration of the archived data in case it is required again.
7. Restartability. This is a very important point to be considered. The archiving process may be a long running process. You may not want to run it in peak peak hours so as existing load on the servers will adversely impact the perfomance. For this the archiving system should have a way of stopping gracefully as and when required. It should also be able to resume from where it has stopped.
8. Ensure the sanctity of the data that is archived. All the required rows and columns from all related tables is archived and there is no loss of data.
9. If new columns are added to the tables, the design should account for these new columns in the archive tables as well. For existing records there should be routines which will be able to update the data of the new columns or default populate data to these new columns.
10. After the data is archived and deleted, maintenance activities will need to be performed on the tables from which data is archived. Some of these activities may be
i. Shrinking the tables to reclaim space.
ii. Gathering of statistics on the affected tables again so that the execution plans being used are optimal.
iii. Rebuilding of indexes.
Some of the common conditions on which data can be archived are
1. Data that is logically deleted and is no longer required.
2. Data that is older than a given date and is no longer be used. A fine example for this would be data from log tables
3. Records where the validity period has expired or the end data is older than current date.
4. Data from history tables and audit tables.
Advantages that can be gained by regular data archiving are
1. Savings in terms of cost of new hard disks. The archived data can be stored on tapes which are less expensive or on some old unused disks.
2. Improved performance of long running queries on large tables.
3. Data backup sizes will be smaller and backups will be faster.
4. Similarly restoration of backed up data will also be faster.
5. Maintenance activities like statistics gathering, index rebuilding will also happen faster.
Archiving can be a periodic scheduled activity like on weekends on a window of time at night when the load on the database is low. Or if the resources being consumed by the archive program are not too high it can be run as a continuous process as it is not a critical and time bound activity.
Another important point that can be considered is partitioning for performance and compression for space savings. With compression you can achieve a space reduction of 40 – 50% for each table compressed. Detailed discussion and benefits of these features of Oracle will be discussed in later articles.
Kindly refer to Oracle Data Archiving : A generic solution for a generic archiving solution for an Oracle database system.