Scaling Apigee Edge Private Cloud (OPDK) - Part 1, Scaling above two data centers

TL;DR;

Google's OPDK documentation is generally focused on the novice to intermediate Operations team, providing guidance that will cover most situations with an expectation that advanced needs will be handled by teams that have built experience with operating Apigee. As such, our publicly documented procedures and best practices often require customizations specific to a customer's specific usage profiles.

This is Part 1 of a series of articles that expand on the best practices and operating procedures by discussing:

  • Scale out of OPDK to beyond two(2) data centers
  • Scaling to above 5000 QPS per data center
  • Optimizing maintenance and upgrade operations in high volume planets

In some cases below, the guidance may appear to conflict with Google’s public documentation. However, advanced use cases may require advanced techniques. These advanced practices are what I will describe below.

 

Part 1: Adding a Third (or higher) data center

Google’s documentation covers a standard topology for a two data center architecture. That documentation notes, “Note: If you require three or more data centers, please contact your Sales Rep or Account Executive to engage with Apigee.” The following details are similar to what will be shared with you by your Customer Engineer. For additional clarification and questions, please do reach out to your Google Customer Engineer.

Overview

The following guide assumes you already have a two data center topology in place. If not, please ensure you understand the basics described in ref-1

Adding a third or higher DC does not follow the same expansion pattern used in moving from one to two DCs (ref-2) Instead the topology for the third and higher DC focuses on expanding the “Gateway” pod (runtime components) and a minimum list of components from the “Central” pod (management components) (ref-3).

Expansion Topologies

The exact components will depend on the model being implemented. Two common expansion models are:

  1. Adding a geographically separate Region
  2. Adding “Availability Zones (AZs)” within a similar physical location
    • This approach is often used to replicate multi-zonal cloud architectures that have multiple physically independent segments within the same general geographic location.
    • Important Note: Cassandra treats each DC as a separate DC whether they happen to be close geographically or not. Replication across DCs is slower than in-DC writes. Apigee configures Cassandra with a LOCAL_QUORUM consistency level to optimize write speeds and data security with the assumption that traffic from a client typically stays within a DC. If load balancing across AZs, there is a risk that data written in AZ1 may not be available in AZ2 when a fast following second call from a client comes in. Typically connection reuse and session affinity will limit this risk.

 

Expansion Components by model:

Regional Expansion

Availability Zone Expansion

  • Router
  • Message Processor
  • Cassandra
  • Zookeeper *
  • Qpid *
  • Router
  • Message Processor
  • Cassandra

* Note: Qpid and Zookeeper are only needed per Region. In an AZ model, Qpid and Zookeeper can be treated as management components outside the runtime AZs for the region.

 

Components shared across models:

Regional Expansion

Availability Zone Expansion

The following management components are only needed in two(2) Regions:


  • LDAP
  • Postgres
  • UI
  • Management Server

Note: I have seen cases where there is a need for the Management Server to exist in additional Regions to provide access to the Management APIs. Having more than two instances is an option if needed.



In an AZ model, place the following components into a “management” zone for each Region:


  • LDAP
  • Postgres
  • UI
  • Management Server
  • Zookeeper *
  • Qpid *

* Note: Qpid and Zookeeper are only needed per Region. In an AZ model, Qpid and Zookeeper can be treated as management components outside the AZs for the region.

* Summary Note: Qpid and Zookeeper components are needed for each Region. However, other management components should only exist in two(2) Regions.

 

Expanded Topologies Examples:

Regional Expansion (US based regions used for reference only):

US-West

US-Central

US-East

Runtime components:

  • Router
  • Message Processor
  • Cassandra

Management Components:

  • Zookeeper
  • Qpid
  • LDAP
  • Postgres
  • UI
  • Management Server

Runtime components:

  • Router
  • Message Processor
  • Cassandra

Management Components:

  • Zookeeper
  • Qpid

Runtime components:

  • Router
  • Message Processor
  • Cassandra

Management Components:

  • Zookeeper
  • Qpid
  • LDAP
  • Postgres
  • UI
  • Management Server

 

Availability Zone Expansion:

Region 1

Region 2

R1-AZ1

R1-AZ2

R2-AZ1

R2-AZ2

Runtime:

  • Router
  • MP
  • Cassandra

Runtime:

  • Router
  • MP
  • Cassandra

Runtime:

  • Router
  • MP
  • Cassandra

Runtime:

  • Router
  • MP
  • Cassandra

R1-Mgmt-Zone:

  • ZooKeeper
  • Qpid
  • LDAP
  • Postgres
  • UI
  • Management Server

R2-Mgmt-Zone:

  • ZooKeeper
  • Qpid
  • LDAP
  • Postgres
  • UI
  • Management Server

 

Component Specific Expansion considerations

Component

Considerations

Runtime Components

Router (R)

Message Processor (MP)

R/MP nodes are needed in each zone or region.

General scaling of R/MP nodes should follow the scaling guidelines: standard (ref-4) and advanced approaches found in Part 2 of this series.

 

Cassandra

As described in Adding a Datacenter (ref-1), all data centers must have the same number of Cassandra nodes.

Cassandra DC vs Rack

Use a distinct DC for each Region or Availability Zone for Cassandra with Apigee: dc-1, dc-2, dc-3, etc.

Note: Apigee’s install script adds “dc-” to the DC number when configuring CASS_HOSTS. In CASS_HOSTS, use only the DC number. Do not attempt to create custom naming.

While Cassandra supports the concept of a “rack” for availability zones, if expanding the planet, only DC expansion is supported.

Advanced Cassandra administrators can use “Rack” notation for new custom installations only (ref-9).

In these advanced topologies, Cassandra should be installed on nodes independent from Zookeeper.

 

Management Components

Qpid

Qpid nodes are only needed in each region.

General scaling of Qpid nodes should follow the scaling guidelines: standard (ref-5).

 

Zookeeper (ZK)

In advanced configurations Zookeeper and Cassandra should be installed onto separate nodes.

Zookeeper provides management configuration information to nodes in each region. ZK nodes can easily support high numbers of runtime nodes in each region. Only a minimum number of ZK nodes are needed in each geographic location.

Zookeeper scaling has unique component scaling requirements in Apigee.

  • Across all DCs, there must be an odd number of Voting nodes
  • Observer nodes can be added to provide redundancy if needed

Example ZK distributions per model:

Regional


US-West

US-Central

US-East

2 ZK Voters

1 ZK Voter

1 ZK Observer

2 ZK Voters


AZ (follow the 2 DC model)


Region 1

Region 2

R1-AZ1

R1-AZ2

R2-AZ1

R2-AZ2

3 ZK Voters

2 ZK Voters

1 ZK Observer



LDAP

It is recommended to limit the number of LDAP nodes to two(2), keeping them as part of a “management” zone.

 

Management Server

It is generally a best practice to limit the number of Management Server nodes to two(2), keeping them as part of a “management” zone.

If needed to support management API access in more than two regions, it is possible to separate Management Server and LDAP nodes per the 13-node topology guide (ref-6).

 

UI

It is generally a best practice to limit the number of UI nodes to two(2), keeping them as part of a “management” zone.

 

Postgres

Postgres only supports a two(2) node installation:

  • Node 1: Primary node
  • Node 2: Standby node

See (ref-7)

Note: during major upgrades a 3rd “standby” node will be added as a temporary node to support rollback activities. Having a third node provides no additional value beyond rollback support during upgrades.

 

Monetization Services

(if needed)

Apply monetization services to the appropriate runtime and management components as described above and in the standard approach for installing monetization (ref-8)

 

 

Advanced Expansion Installation Steps

The general order of component installation in the new data center should follow the standard flow as described in Adding a Data Center (ref-1). However the following additional component specific directions can help smooth the deployment process.

Cassandra

When coming to the step to begin the installation of Cassandra in the new DC, consider leveraging the following additional steps.

1) Confirm 'NetworkTopologyStrategy' set for system tables

Some older guides did not check and update these tables. Confirm the identityzone and system_traces keyspaces are updated

  • Step 13.b.iii for confirmation
  • Steps 13.b.i & 13.b.ii for updates if needed

 

2) Enable auto_bootstrap: false

Apigee templates do not include the ability to set auto_bootstrap: false. The following will backup and configure the Apigee and Customer template files enabling auto_bootstrap: false.

Before Step 4, setup.sh for component “ds”

>> PRE-INSTALL Cassandra

/opt/apigee/apigee-service/bin/apigee-service apigee-cassandra install

>> CONFIG: prep - backup files

cp -p /opt/apigee/apigee-cassandra/token/default.properties /opt/apigee/apigee-cassandra/token/default.properties.backup

cp -p /opt/apigee/apigee-cassandra/source/conf/cassandra.yaml /opt/apigee/apigee-cassandra/source/conf/cassandra.yaml.backup

[ -f /opt/apigee/customer/application/cassandra.properties ] && cp -p /opt/apigee/customer/application/cassandra.properties /opt/apigee/customer/application/cassandra.properties.backup

>> CONFIG bootstrap: false

echo "conf_cassandra_auto_bootstrap=true" >> /opt/apigee/apigee-cassandra/token/default.properties

echo "auto_bootstrap: {T}conf_cassandra_auto_bootstrap{/T}" >> /opt/apigee/apigee-cassandra/source/conf/cassandra.yaml

echo "auto_bootstrap=false" >> /opt/apigee/customer/application/cassandra.properties

chown -h apigee:apigee /opt/apigee/customer/application/cassandra.properties

 

3) Install Cassandra

Note: This will install and configure Cassandra Only. In this model, Zookeeper will be installed separately on another server. This replaces Step 4 of Adding a Data Center (ref-1).

Example:

/opt/apigee/apigee-setup/bin/setup.sh -p c -f /etc/apigee/cass-new-dc.config

 

4) Rebuild the Cassandra nodes in two parts:

Note: This replaces Step 5 of Adding a Data Center (ref-1), nodetool rebuild.

Part 1: rebuild the local seed nodes

Part 2: rebuild the remaining nodes from the local seeds

>> LOOK UP seed nodes

cat /opt/apigee/token/application/cassandra.properties

>> PART 1: rebuild each <new-dc> seed node pulling from a pre-existing DC

  • Rebuild each of the two <new-dc> seed nodes listed in the output of the above command.
  • To limit load against the pre-existing DCs, ensure completion of one node before starting the second node. The easiest way to monitor this is to watch the network utilization of the seed nodes across both DCs. A jump and decline can be observed as the rebuild begins and completes.

/opt/apigee/apigee-cassandra/bin/nodetool -h {new-dc-seed-node-ip} rebuild {pre-existing-dc-name}

>> Part 2: rebuild remaining <new-dc> nodes from the <new-dc> seed nodes

/opt/apigee/apigee-cassandra/bin/nodetool -h {new-dc-node-ip} rebuild {new-dc-name}

 

5) Clean up auto_bootstrap

The next steps will reset auto_boostrap back to default (true).

>> RESET configuration files

mv -f /opt/apigee/apigee-cassandra/token/default.properties.backup /opt/apigee/apigee-cassandra/token/default.properties

mv -f /opt/apigee/apigee-cassandra/source/conf/cassandra.yaml.backup /opt/apigee/apigee-cassandra/source/conf/cassandra.yaml

if [ -f /opt/apigee/customer/application/cassandra.properties.backup ]; then mv -f /opt/apigee/customer/application/cassandra.properties.backup /opt/apigee/customer/application/cassandra.properties; else rm -f /opt/apigee/customer/application/cassandra.properties; fi

>> UPDATE Cassandra configuration

On all <new-dc> nodes:

/opt/apigee/apigee-setup/bin/setup.sh -p c -f /etc/apigee/cass-new-dc.config

 

Zookeeper

Note: The following replaces Step 4 of Adding a Data Center (ref-1), nodetool rebuild.

In these models, Zookeeper will be installed independently of Cassandra on its own node.

On new Zookeeper nodes, remember to use the Zookeeper only setup command:

/opt/apigee/apigee-setup/bin/setup.sh -p zk -f /etc/apigee/zookeeper-new-dc.config

 

New Data Center Management Configuration

Depending on the component configuration, POD registration must also be performed.

  • QPid servers: Step 12.f of Adding a Data Center (ref-1)
  • MPs: Step 15 of Adding a Data Center (ref-1)

 

Refs:

  1. https://docs.apigee.com/private-cloud/v4.52.00/adding-data-center 
  2. https://docs.apigee.com/private-cloud/v4.52.00/installation-topologies#12hostclusteredinstallation
  3. https://docs.apigee.com/private-cloud/v4.52.00/about-planets-regions-pods-organizations-environments...
  4. https://docs.apigee.com/private-cloud/v4.52.00/adding-router-or-message-processor-node
  5. https://docs.apigee.com/private-cloud/v4.52.00/add-or-remove-qpid-nodes
  6. https://docs.apigee.com/private-cloud/v4.52.00/installation-topologies#13hostclusteredinstallation
  7. https://docs.apigee.com/private-cloud/v4.52.00/set-master-standby-replication-postgres
  8. https://docs.apigee.com/private-cloud/v4.52.00/installing-monetization-services
  9. https://docs.apigee.com/private-cloud/v4.52.00/rack-support 

 

Next Edition (coming soon)

Scaling Apigee Private Cloud - Part 2, More than 5000 QPS 

 

Contributors
Version history
Last update:
‎11-14-2023 01:03 PM
Updated by: