Is it possible to run BaaS 2.1 in OPDK 16.01 in an Active/Active, multiple datacenter topology?

Not applicable
 
Solved Solved
0 1 218
1 ACCEPTED SOLUTION

Not applicable

Yes.

The components of BaaS (Usergrid) 2.1 are as follows:

Tomcat – Application Logic

Cassandra – Persistent Data Storage and Graph data

ElasticSearch – Indexed entities in flattened JSON

When entities (documents) are written into BaaS with PUT/POST operations they are persisted immediately and synchronously to Cassandra. A representation of the entity for indexing purposes is also written to Cassandra and a reference to that document to be indexed is placed in a queue. Entity data or the contents of the data persisted in BaaS does not go through this queue - only a UUID reference. We expect that this will be completed in <=50ms. During the time when the document is not indexed the document can still be retrieved by GET /{org}/{app}/{collection}/{uuid | name}. The UUID of the entity is returned in the PUT/POST API response.

There are two options for this internal queue for indexing-

In Memory – Messages produced by a given tomcat will only be visible and therefore processed by that tomcat

Using Amazon SQS and SNS – all tomcats are eligible to receive a message

In the case of a multi-datacenter deployment there are two options. With both options all sets of Tomcat servers in each datacenter will be able to serve traffic and each set of Cassandra nodes in each datacenter should maintain a replicated dataset.

Deployment Option 1:

In this option, the components would be deployed in the following manner:

Tomcat: Active/Active

Cassandra: Active/Active

ElasticSearch: Active/Active

Queue (Distributed): Amazon SNS+SQS

This is how Apigee runs BaaS in the cloud. For the case of an on-premises installation the customer would be responsible for maintaining the Amazon account and credentials required by BaaS.

Deployment Option 2:

In this option, the components would be deployed in the following manner:

Tomcat: Active/Active

Cassandra: Active/Active

ElasticSearch: Active/Passive

Queue (Local): In-Memory

In this case a ‘primary’ datacenter for ElasticSearch. This would involve pointing all Tomcat instances in all datacenters to this instance of ElasticSearch. Even though the Tomcats were pointed at a single ElasticSearch, they could still serve API traffic. Additional latency would only be incurred when doing queries using QL. From West <-> East the latencies are in the ballpark of 40ms on average.

In the case of a loss of connectivity to this datacenter another ElasticSearch cluster in a different datacenter would need to be promoted to be ‘primary’. All Tomcat instances would need to be updated to point to this new primary and a reindex of the data would need to be performed from a Tomcat within the same datacenter. The duration of the reindex would depend on the network latency and the amount of data.

All data is permanently persisted in Cassandra so the reindex of the data is benign.

View solution in original post

1 REPLY 1

Not applicable

Yes.

The components of BaaS (Usergrid) 2.1 are as follows:

Tomcat – Application Logic

Cassandra – Persistent Data Storage and Graph data

ElasticSearch – Indexed entities in flattened JSON

When entities (documents) are written into BaaS with PUT/POST operations they are persisted immediately and synchronously to Cassandra. A representation of the entity for indexing purposes is also written to Cassandra and a reference to that document to be indexed is placed in a queue. Entity data or the contents of the data persisted in BaaS does not go through this queue - only a UUID reference. We expect that this will be completed in <=50ms. During the time when the document is not indexed the document can still be retrieved by GET /{org}/{app}/{collection}/{uuid | name}. The UUID of the entity is returned in the PUT/POST API response.

There are two options for this internal queue for indexing-

In Memory – Messages produced by a given tomcat will only be visible and therefore processed by that tomcat

Using Amazon SQS and SNS – all tomcats are eligible to receive a message

In the case of a multi-datacenter deployment there are two options. With both options all sets of Tomcat servers in each datacenter will be able to serve traffic and each set of Cassandra nodes in each datacenter should maintain a replicated dataset.

Deployment Option 1:

In this option, the components would be deployed in the following manner:

Tomcat: Active/Active

Cassandra: Active/Active

ElasticSearch: Active/Active

Queue (Distributed): Amazon SNS+SQS

This is how Apigee runs BaaS in the cloud. For the case of an on-premises installation the customer would be responsible for maintaining the Amazon account and credentials required by BaaS.

Deployment Option 2:

In this option, the components would be deployed in the following manner:

Tomcat: Active/Active

Cassandra: Active/Active

ElasticSearch: Active/Passive

Queue (Local): In-Memory

In this case a ‘primary’ datacenter for ElasticSearch. This would involve pointing all Tomcat instances in all datacenters to this instance of ElasticSearch. Even though the Tomcats were pointed at a single ElasticSearch, they could still serve API traffic. Additional latency would only be incurred when doing queries using QL. From West <-> East the latencies are in the ballpark of 40ms on average.

In the case of a loss of connectivity to this datacenter another ElasticSearch cluster in a different datacenter would need to be promoted to be ‘primary’. All Tomcat instances would need to be updated to point to this new primary and a reindex of the data would need to be performed from a Tomcat within the same datacenter. The duration of the reindex would depend on the network latency and the amount of data.

All data is permanently persisted in Cassandra so the reindex of the data is benign.