How to purge old analytics data in OPDK?

Not applicable
 
0 1 673
1 REPLY 1

Not applicable

The purge script only purges fact tables but most dashboard reports use aggregate tables and data in those does not get purged. Having said this, you should not be able to run a custom report for a time frame for which you have purged data.

There are some manual steps that can be taken to purge the aggregate tables.

Run the below query to show the object sizes.

SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size FROM ( SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size FROM ( SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name FROM information_schema.tables ) AS all_tables ORDER BY total_size DESC ) AS pretty_sizes;

Below are the suggested steps (please make sure you have a backup of your DB before carrying them out, and do this during a time with low traffic as this require traffic to be queued while the steps are being carried out):

1. Stop the postgres-server and qppid-server processes on both AX servers

2. On Master run the following query to copy data for the last 60 days (I am assuming the data need for the last 60 days, if you need more then adjust the query)

CREATE TABLE analytics.agg_user_agent_backup AS SELECT * FROM analytics.agg_user_agent WHERE timestamp >= now() - interval '2 months';

3. Validate New Table [ Count of below queries should match ]

select count(1) from analytics.agg_user_agent WHERE timestamp >= now() - interval '2 months';

select count(1) from analytics.agg_user_agent_backup; truncate analytics.agg_user_agent

4. TRUNCATE TABLE analytics.agg_user_agent;

5. Restart postgres-server on local pg

6. Drop Indices:

DROP INDEX analytics.agg_ua_agentfamily;

DROP INDEX analytics.agg_ua_agenttype; DROP INDEX analytics.agg_ua_agentversion; DROP INDEX analytics.agg_ua_devicecategory; DROP INDEX analytics.agg_ua_osversion; DROP INDEX analytics.agg_ua_osfamily;

7. Insert Data Back

INSERT INTO analytics.agg_user_agent SELECT * FROM analytics.agg_user_agent_backup;

8. Validate Agg Table [ Counts should match or analytics.agg_user_agent may be greater as PGA is running ]

SELECT MIN(timestamp), MAX(timestamp), COUNT(1) FROM analytics.agg_user_agent;

SELECT MIN(timestamp), MAX(timestamp), COUNT(1) FROM analytics.agg_user_agent_backup;

9. Drop Temp Table

DROP TABLE analytics.agg_user_agent_backup;

10. Start qpid-server processes.

11. Check if all queries work in UI, and disk space usage is reduced.

For the other agg tables, you should be able to follow similar instructions, replacing the table name as appropriate.

Once again, I must stress to take a database backup and also test in a non-production environment first.