Apigee Private Cloud 15.07 - How To Clean Up Postgres Disk Space

If you need to clean up Apigee analytics messages to clear up disk space, here are the recommended steps to quickly clean up the disk space without having to do a vacuum full.  The advantage of these steps is that you do not have to stop runtime API traffic and you do not need to stop analytics services.  These steps will have you truncating your Apigee analytics tables based on the number of days you would like to retain.

Preliminary items to consider:

1. If you are running out of disk space, we recommend adding disk space immediately.

2. Determine how many days of analytics data you would like to retain.

3. You may consider mounting a temporary disk, as we will need to copy data out to a tab-separated file for the dates that are planned to be retained. Otherwise you can copy it to your local disk or any other disk that has space.

4. Here is a psql query you can use to check for the largest analytics tables:

SELECT  relname as "Table",pg_size_pretty(pg_total_relation_size(relid)) As "Size",
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
  FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;


Steps to Clean Up Postgres Disk Space

Assuming you would like to retain the last five days of data and today's date is 2/2/2017.

1. Use the psql "copy" command with select by daily date ranges to store analytics daily data into the appropriate date files for the last 5 days.

copy (select * from ONLY analytics."org.env.fact" where client_received_start_timestamp between '2017-02-01' and '2017-02-02' ) to /tmp/analytics-2017-02-01.tsv

The above copies the data to a tab-separated file for that date from 2/1 to 2/2. Run this command again for the next 4 days worth of data, like

copy (select * from ONLY analytics."org.env.fact" where 
client_received_start_timestamp between '2017-01-31' and '2017-02-01' ) 
to /tmp/analytics-2017-01-31.tsv

NOTE: replace 'org' amd 'env' with the names of your organization and environment


Do similar copy calls for the remaining dates and create files by the new date naming convention, and organization and environment combinations that you would like to retain the 5 days worth of data from.

2. Find the child table from analytics.childfactables for the dates you want to retain using the below query to map the data to the respective child fact table which the data will need to be copied to.

select * , to_timestamp(starttime/1000) as d from analytics.childfactables where customer='org~env' order by d desc; 

NOTE: replace 'org' and 'env' with the names of your organization and environment

Repeat the same query for each org~env combination to find the child tables which you will be inserting data into.

3. Using the 'copy' comand, copy the <child-fact-table for date> from the appropriate date based file into the child fact table of the respective date which you found in step 2:

COPY analytics."org.env.fact_###" FROM '/tmp/analytics-2017-02-01.tsv'; 

This copy command would reinsert the data in the respective child fact table by its date. Repeat this for each daily tsv file data which you have created for the data and insert them into their respective child table.

4. At this point, you should temporarily stop the qpid-agents (apigee-qpid-server) on all your qpid server nodes. This is the qpid java process. You are stopping the qpid-agents so that data does not flow into postgres as you complete the next step of truncating the parent fact tables.

/opt/apigee4/bin/apigee-service qpid-server stop


5. In psql, issue this command to reclaim the disk space from the parent fact tables.

truncate table ONLY analytics."org.env.fact";

NOTE: replace 'org' amd 'env' with the names of your organization and environment



6. Run the commit command in psql:

commit;



7. Repeat steps 5 and 6 for other parent fact tables for org and environment combinations that you are trying to reclaim space from. If you have multiple organizations and environments you would have other tables with the name analytics."<org>.<env>.fact" which may be large in size. In item 4 of the preliminary section of this article it shows you a query on how to check this.



8. Stop and restart the postgres database.

/opt/apigee4/bin/apigee-service postgresql restart



9. Restart the qpid-agents to allow the analytics messages that were queuing to flow back in.

/opt/apigee4/bin/apigee-service qpid-server start


10 . Verify that disk space has been reclaimed. Login to the UI and check the last 5 days of data is shows up in the charts and new analytics data from current timestamp is being pushed to postgres.
Comments
adevegowda
Staff

@jhunt@apigee.com,

Thanks for writing up this article. It is very helpful.

chandraverma
New Member

is this process clean the aggregated record as well for analytics ? is there any documentation available to see the schema of analytics database.

anvimalia
New Member

whats the difference between this and process recommended here?

https://docs.apigee.com/private-cloud/latest/recurring-analytics-services-maintenance-tasks

Version history
Last update:
‎02-21-2017 04:50 PM
Updated by: