Postgres Data Purge

Would like some details on what data gets store in DB so that data retention can be made on a selective basis.

Solved Solved
0 1 2,249
1 ACCEPTED SOLUTION

The apigee prostrges database contains various tables for every scope which was onboarded where scope means an environment for a particular org.
So if you have created two orgs with two environments each in a particular Apigee installation, you have four scopes which means four sets of tables which contain data for each of the scopes.

The database tables can be roughly categorized into two sets of tables, fact tables, and aggregate data tables.
The fact tables contain the raw data, i.e. one row of data for each request that was successfully logged containing ALL fields that we log. This means that the fact tables are by far the largest tabes.

The way the fact tables are organized is that for each scope there is one parent fact table, e.g. analytics."org1.env1.fact"
This fact table has child fact tables, one of which will be created, and used for the data of one day so three days after onboarding there will be three child tables, e.g. analytics."org1.env1.fact_234", "org1.env1.fact_654", "org1.env1.fact_734".

The aggregate tables, contain certain aggregated data where aggregation happens once every minute. The agg_api table for example contains aggregate data related to api traffic. the aggregate tables are populated based on data in the fact tables, and contain one record for all traffic that occurred during a one minute period. They also only contain a small number of fields which are relevant to the aggregation.

When you run Analytics reports, most of the canned reports use aggregate data for performance reasons, and the custom reports use fact tables.

When you run the apigee purge script, the child fact tables which fall within the time period that you use will be dropped fro the database. In your cases, it sounds like all child fact tables older than seven days will be dropped.

The aggregate tables are not dropped from the database which means you will have aggregate data for more than 7 days on the dashboard, and canned reports.

View solution in original post

1 REPLY 1

The apigee prostrges database contains various tables for every scope which was onboarded where scope means an environment for a particular org.
So if you have created two orgs with two environments each in a particular Apigee installation, you have four scopes which means four sets of tables which contain data for each of the scopes.

The database tables can be roughly categorized into two sets of tables, fact tables, and aggregate data tables.
The fact tables contain the raw data, i.e. one row of data for each request that was successfully logged containing ALL fields that we log. This means that the fact tables are by far the largest tabes.

The way the fact tables are organized is that for each scope there is one parent fact table, e.g. analytics."org1.env1.fact"
This fact table has child fact tables, one of which will be created, and used for the data of one day so three days after onboarding there will be three child tables, e.g. analytics."org1.env1.fact_234", "org1.env1.fact_654", "org1.env1.fact_734".

The aggregate tables, contain certain aggregated data where aggregation happens once every minute. The agg_api table for example contains aggregate data related to api traffic. the aggregate tables are populated based on data in the fact tables, and contain one record for all traffic that occurred during a one minute period. They also only contain a small number of fields which are relevant to the aggregation.

When you run Analytics reports, most of the canned reports use aggregate data for performance reasons, and the custom reports use fact tables.

When you run the apigee purge script, the child fact tables which fall within the time period that you use will be dropped fro the database. In your cases, it sounds like all child fact tables older than seven days will be dropped.

The aggregate tables are not dropped from the database which means you will have aggregate data for more than 7 days on the dashboard, and canned reports.