How are Postgres servers in different data center in SYNC?

Hi All,

In our on premises deployment of Edge, we have two data centers ie DC-1 and DC-2.

In each data center, we have a master/slave Postgres server setup. Hence, in total we have four Postgres servers.

According to my understanding, all of the Postgres servers are in sync because

1) The MP's of one DC-1 sends analytics data to QPID.

2) The QPID a DC-1 , sends these analytics data to master postgress of DC-1 and it also send the data to QPID server of DC-2.

Same thing happens in other DC-2.

Kindly, let me know if my understanding is correct or wrong.

I was going through the link

http://docs.apigee.com/api-reference/latest/adding-data-center

and I found that our on-premises Postgres architecture is different from what is suggested by this link.

Thanks and Regards,

Gaurav Bhandari

1 6 2,127
6 REPLIES 6

Gaurav,

I presume you have a custom deployment which is configured with a Master->Slave pair in each datacenter. These are probably two independent databases which are being written from the same sources - the Analytics data pipeline, according to the flow you have described.

But there is no guarantee that the databases are synchronized, or share exactly the same data. The distinct Postgres databases can go out of sync due to multiple reasons. For example, if there is a Network Isolation for an extended period of time, then the queue that holds analytics data (Qpid) may fill, and may flush data. Once the networks are connected again, some data will be missing from the stream. Another possibility: one of the qpid's failed after one of the consumer consumed the message, but not the second consumer.

Today in the version of Postgres which ships in current Private Cloud (aka. OPDK), there is no easy mechanism to sync the data between two Postgres databases externally, without adding some custom auditing and syncing the deltas periodically.

Postgres 9.4.X supports Multi master (Master <-> Master) replication, but it's not in road map for OPDK, yet.

@bkrishnankutty

Thank you for your response.

Our Master-Master PG is always in sycn (Except for around 1000 transaction depending on the traffic). I do not know how they are in sync and hence I wanted to understand that.

Also, the documentation that APIGEE provided mentions Master in ONE DC and slave in other. But now where they have mention about Master-Slave PG in both the Data center.

What I am thinking is that one of the QPID Queue in One datacenter sends the Transaction Data to QPID of other data center and that's how PG are in SYNC.

I do not know if this argument provided is correct or not.

APIGEE team, can you please respond.

Thanks and Regards,

Gaurav Bhandari

@gbhandari

QPID from 2 DC wont automatically send data across. You would need to configure them that way. And enable replication one way or two way with master-master replication.

Replication between two PG has to be manually setup and that has nothing to do with Apigee. Neither is it automatic. It is an open source separate product and likewise replication is beyond apigee's scope. Someone from apigee can confirm this as well.

Alternatively you can use Pub/Sub to achieve the same task and no replication would be needed.

hi @Jason Bourne,

In our PROD Env, the Master-Mater PG are in sync and it is automatic. Hence wanted to know how was this setup done. (There is some delay of 1000 transctions.)

@gbhandari. I cannot answer how your PG has this setup. Simply put, If you stand up 2, 3 or "n" Number of PG's, they wont automagically just start talking to each other. There are various topologies/relationships that needs to be first architected and then implemented. So, someone must have done all this for your environment.

Moreover, since you say this is PROD, it would be hard to fathom an open firewall organization. Which in turn means you would have to create physical firewall rules to enable connectivity between all nodes of PG and other components.

All of this just cannot happen on its own.

So to answer your overall q. You would need to go back to whosover stood up this environment for you to answer this.

Not applicable
@gbhandari

Not sure if you are still looking for a response. But we have implemented a master->standby setup for PG while making the master take all the analytics data. Meaning, only one write while all reads

Please refer to install guide "Set up Master-Standby Replication for Postgres" on how to set this up.