Question on scaling our Analytics database

Not applicable

Hello , we have DB's which have analytics data, as the growth will be high on daily basis, how to scale database and what would be the preferred backup and restore method?

0 7 419
7 REPLIES 7

Not applicable

For example, if we take a backup of Jan 2016 to June 2016 and purge data, after that if the user request data from that backup, how to restore? and how the data will be sync up.

I'm not sure I would restore it. I think instead I would push that data to a different system for archival purposes, perhaps even only retaining the aggregates. It depends on the reasons. If it seems like something unlikely to happen, I wouldn't build out procedures and plan for the disruption of my day-to-day to support it.

Is that an option? Can you push to another system?

Retaining raw data for X days is up to your requirement and preferences.

More raw data = more disk space

Answer to, How to scale disk/storage for Postgres is an interesting question.

Short Answer is, if you are using Network storage, you can scale the storage for Postgres. IF you are using Local Disk, consider adding more disks or upgrade to a node which have higher disk capacity

How to backup and restore :

Answer depends on what kind of storage you are using. IF storage is on Network storage is like SAN / NAS. Best and fastest way to backup and restore is to take Storage level snapshots of the storage volumes.

IF you are using Local Disk, backup can be a standby database (which is good for a physical instance failure) or take a daily pg_dump to cover for logical data corruption.

Hope this helps

Hi @bkrishnankutty

Thanks a ton for your reply,

My DB model is in Master-Slave replication mode. So which type of backup strategy is preferable and can you help me with any document related to this.

@Paul Mibus
Hi paul,

Can you help on this plz.

What kind of storage are you using ?

IF you have network storage, I would recommend using the storage subsystem level snapshots of the data volumes.

As the original answer mentioned, this is highly dependent on your storage type. Volume-level snapshots are a great way to save space on backups if you're comfortable with having all your data "online" without the option for offline backups. It's reasonably safe with an enterprise-level storage system, but be aware that you still have all your eggs in one basket. Our out-of-the-box backup scripts do a raw DB dump, which is the least space efficient method but it's simple, straightforward and guaranteed to restore nicely with a minimum of fuss. The problem is that as your data set grows, the time and effort required to shuffle around backups becomes problematic.

It's important to define your data retention period up front to get accurate space estimates. How long do you need the data? The typical answers are 30, 90 or 180 days, with 90 days being pretty common. For very large data sets, 30 days or less is typical. Note that when you prune detailed, per-transaction data from the analytics database you retain the aggregate reporting data, which may be enough to troubleshoot problems or analyze traffic patterns older than 30 days. Once you know the time horizon, you can plan for both storage needs and backup needs (if any).

Finally, since analytics data is only useful for a limited period of time, it may be an acceptable risk to build an HA analytics backend and simply rebuild if multiple analytics databases are completely lost. The risk of such an event happening is vanishingly small and accepting that risk can save greatly on storage costs for the very largest analytics databases.