Apigee Hybrid - Estimating Infrastructure cost or T-shirt size estimation

RakeshTalanki · ‎12-09-2022

Apigee hybrid is a platform for developing and managing API proxies that features a hybrid deployment model. The hybrid model includes a management plane hosted by Apigee in the cloud and a runtime plane that you install and manage on one of the supported Kubernetes platforms.

Many times customers would like to know the infrastructure cost to run Apigee hybrid in their own cloud. Customers would like to know details such as how many clusters they need, how many nodes in each cluster, CIDR range for the cluster, type of networking, multi-region setup details, and so on.

This blog is meant to provide T-shirt size capacity estimates needed to run Apigee hybrid. There are many ways to slice and dice the deployment model with Apigee hybrid. Depending on your actual requirements, there are ways to minimize resources and consolidate underlying infrastructure.

As a prerequisite to understanding Apigee hybrid sizing, it is important to understand the Apigee hybrid architecture and different components involved. If you are not familiar with the architecture, we recommend that you review the details.

Discovery questionnaire

The following items are important to consider for sizing and cost calculations:

Number of APIs
Peak transactions per session (TPS)
Future TPS growth
Number of environments (based on the SDLC cycle at your site)
Disaster recovery (DR) site requirements and configurations (Active/Active or Active/Passive).

T-shirt size estimates

The T-shirt estimates are an easy way to get sizing information. These estimates are based on the number of TPS and the number of Apigee proxies.

Apigee hybrid sizing guidelines:

Minimum 3 stateful 3 stateless nodes are required in the cluster
Max number of organizations = 25
Max number of environments = 85 per organization
API Proxies = 50 per environment.
DR site as Active/Active with half the TPS requirements for each site.

Getting into details

We will use the product guidelines described in Minimum cluster requirements and the standard Google Cloud virtual machine (VM) options for node sizes:

e2-standard-4 (4 cpu, 16GB RAM)
e2-standard-8 (8 cpu, 32GB RAM)
e2-standard-16 (16 cpu, 64GB RAM)

Runtime components can be deployed on VMs with size options of 4vCPU, 8vCPU, 16vCPU or 32vCPU and above, as long as the minimum number of nodes is satisfied.

In this example, we will use the following deployment model as a typical buildout for our discussion:

Non-prod Cluster 1
Dev Org
dev1, dev2, ...
Test Org
test1, test2, ...

Non-prod Cluster 2
Stage Org
stage1, stag2, ...

Prod Cluster
Prod Org
prod1, prod2, ...

We assume that the Non-prod cluster serves half the TPS of the Prod TPS.

Also note the Apigee hybrid limitation for adding multiple orgs in a cluster.

Conservatively, we assume the Apigee runtime pod will support 300 TPS, although the runtime can support much higher TPS. The actual TPS depends on a number of factors including, but not limited to, the number of policies, flows, service callouts, payload size, backend responsiveness, and many other factors. The sizing guidelines provided here should be used as a reference, and we highly recommend that you do your own performance testing to understand your environment's performance.

The following estimates provide examples of small, medium, and large T-shirt size deployments of Apigee hybrid and each one’s corresponding requirements on the Kubernetes cluster. Using the sizing estimates below, you can calculate the cost using your specific enterprise price sheet:

Playground

POC - Minimum configuration:
Up to 150 TPS
1 Google Cloud Project, 1 Apigee Org, 1-2 Apigee Environments
Supports up to 50 shared flows and proxies
6 node Kubernetes cluster
Total: 24 vCPUs, 38 GB RAM, 250GB storage per Cassandra pod (3) (750 GB total), 50 GB per node for runtime pods (3) (150 GB storage).
6 nodes of e2-standard-4
Estimated Cost: $_________

Small

Non-Prod:
300 TPS
1 Google Cloud Project, 1 Apigee Org, 2 Apigee Environments
Supports up to 100 shared flows and proxies
6 Node Kubernetes Cluster
Total: 23 vCPUs, 37 GB RAM, 250 GB storage per Cassandra pod (3) (750 GB total), 100 GB storage per node for runtime pods (3) (300 GB storage).
6 nodes of e2-standard-4
Estimated Cost: $_________

Prod:
600 TPS
1 Google Cloud Project, 1 Apigee Org, 2 Apigee Environments
Supports up to 100 shared flows and proxies
6 Node Kubernetes Cluster, 43 vCPUs, 74 GB RAM, 500 GB storage per Cassandra pod (3) (1,500 GB) (100 GB storage per node for runtime pods) (3) (300 GB storage)
6 nodes of e2-standard-8
Estimated Cost: $_________

Typical Buildout Total:
2 Non-prod Orgs, 1 Prod Org
89 vCPUs, 148 GB RAM, 3.6 TB storage (stateful and stateless storage)
12 nodes of e2 standard-4, 6 nodes of e2-standard-8
Estimated Cost: $_________

Medium

Non-Prod:
500 TPS
1 Google Cloud Project, 1 Apigee Org, 5 Apigee Environments
Supports up to 250 shared flows and proxies
8 Node Kubernetes Cluster, 29 vCPUs, 45 GB RAM, 250 GB storage per Cassandra pod (3) (750 GB)(100 GB storage per node for runtime pods)(5) (300 GB storage).
8 nodes of e2-standard-4
Estimated Cost: $_________

Prod:
1000 TPS
1 Google Cloud Project, 1 Apigee Org, 5 Apigee Environments
Supports up to 250 shared flows and proxies
8 Node Kubernetes Cluster, 59 vCPUs, 91 GB RAM, 500 GB storage per Cassandra pod (3) (1,500 GB)100 GB storage per node for runtime pods (5) (300 GB storage).
8 nodes of e2-standard-8
Estimated Cost: $_________

Typical Buildout Total:
2 Non-Prod Org, 1 Prod Org
117 vCPUs, 181 GB RAM, 3.6TB storage (stateful and stateless storage)
16 nodes of e2-standard-4, 8 nodes of e2-standard-8
Estimated Cost: $_________

Large

Non-Prod:
1,000 TPS
1 Google Cloud Project, 1 Apigee Org, 10 Apigee Environments
Supports up to 500 shared flows and proxies
6 Node Kubernetes Cluster, 40 vCPUs, 56 GB RAM, 500 GB storage per Cassandra pod stateful (3) (1,500 GB)100 GB storage per node for runtime pods (3) stateless (1600 GB storage).
6 nodes of e2-standard-8
Estimated Cost: $_________

Prod:
2,000 TPS
1 Google Cloud Project, 1 Apigee Org, 10 Apigee Environments, 2 Regions
Supports up to 500 shared flows and proxies
11 Node Kubernetes Cluster, 85 vCPUs, 118 GB RAM, 750 GB storage per Cassandra pod (3) (2,250 GB), 100 GB storage per node for runtime pods (8) (1000 GB storage)
22 nodes of e2-standard-8 (for 2 regions)
Estimated Cost: $_________

Typical Buildout Total:
2 Non-Prod Orgs, 1 Prod Org (2 prod regions)
556 vCPUs, 348 GB RAM, 21TB storage (stateful and stateless storage)
34 nodes of e2-standard-8
Estimated Cost: $_________

Networking

In this section, we discuss how to calculate CIDR range.

To determine total IP addresses required, you need to find the total IP addresses required by each VM and multiply by the count of VMs:

During Kubernetes upgrades, which happens one VM at a time, an extra VM is deployed, which adds 1 to the total count of VMs determined. For example, from the above Large Prod calculation: 3 (data) + 16 (runtime) + 1 (upgrade) = 20 VMs
Usually each Apigee cluster takes two CIDR blocks:

Primary range for PODs, VMs and other load balancers.
Secondary range for Kubernetes services.

Primary range: For the primary range, determine the Container Network Interface (CNI) used for your Kubernetes clusters. Read the respective CNI documentation to determine if Pods get IP addresses from a logical IP range or virtual network CIDR block. Some example calculations for the primary range are:

For clusters where Pods get IP addresses from logical IP ranges. Eg. Azure kubnet
- Total IP addresses ~ Total VMs + Total load balancers (1 for each Apigee Ingress Deployment Service).
- In our example, 20 + 2 (Assuming mTLS and TLS ingress deployment) = 22 hence a /27 CIDR range is required.
- Since Pods are getting IP addresses from a logical range, there is no cost associated with this CIDR range. Hence you can go with a bigger CIDR range, something like /18.
For clusters where PODs get IP addresses from the virtual network CIDR range, for example, GKE, Azure CNI, Amazon VPC CNI:
- Total IPs ~ Total VMs + (IPs assigned per VM * Total VMs) + Total load balancers (1 for each Apigee Ingress Deployment Service)
- In our example, 20 + 30 (IP addresses per VM) * 19 (Total data and runtime VM count) + 2 (Assuming mTLS and TLS ingress deployment) = 592 hence a /21 CIDR range is required.
- (30 pods is the default number of Pods per VM in Azure CNI, replace this number with what is relevant for your cloud provider)
Depending on the cloud provider and instance type chosen, the number of IPs assigned per VM can vary. Some references are as below.
- GKE - 2 * Max Pods allowed per VM (see Pod CIDR ranges in Standard cluster
- Azure CNI - Max number of Pods allowed to run on the VM = 30 (default). See Managing the Amazon VPC CNI plugin for Kubernetes add-on.

Secondary range: The approximate total number of Apigee Kubernetes services + non-Apigee services. Determine the count of non-Apigee services by looking at other existing clusters. For example: `kubectl get services -A`. Use below table as reference to determine the CIDR range once you know the Total IP address count.

CIDR	Usable IPs
/27	<=30
/26	<=62
/25	<=126
/24	<=254
/23	<=510
/22	<=1022
/21	<=2046
/20	<=4094

Tips to keep Infrastructure cost low

In order to perform the operations efficiently, performance versus availability versus cost is an important factor. We highly recommend iterating performance testing, documenting the bottlenecks, and addressing them. Once you have fine-tuned the system, here are some ways to cut costs, and if the cluster doesn’t need to run at all times, then consider using preemptible VM’s for stateless nodes.

Conclusion

This blogpost gives a methodology to calculate infrastructure cost with some assumptions. This methodology is a quick way to understand the infrastructure for Apigee hybrid sizing. For more detailed calculations related to your unique requirements, pricing, and industry, talk to a Google Cloud Sales specialist to help to narrow down the details.

References

Thanks to Apigee Engineering, Apigee Sales Specialists and Will Witman from Docs team, for helping in building a calculator, providing specific cloud calculations and sharing knowledge deploying Apigee Hybrid with different enterprises.

kyisoethinkokyi · ‎05-16-2023

Hello @RakeshTalanki ,

Thanks for the guidelines.

I have one question on the following estimation.

Prod:
2,000 TPS
1 Google Cloud Project, 1 Apigee Org, 10 Apigee Environments, 2 Regions
Supports up to 500 shared flows and proxies
11 Node Kubernetes Cluster, 85 vCPUs, 118 GB RAM, 750 GB storage per Cassandra pod (3) (2,250 GB), 100 GB storage per node for runtime pods (8) (1000 GB storage)
22 nodes of e2-standard-8 (for 2 regions)

Is this calculated for DC/DR (Active-Active)?

Or DC/DR (Active-Standby)?

RakeshTalanki · ‎01-29-2024

Hi @kyisoethinkokyi ... the calculation is for Active-Active setup. Each cluster is setup for autoscale of pods. And each DC should be able to take the load when the other DC is not running. The calculations incorporates this setup.

kyisoethinkokyi · ‎03-13-2024

Thanks a lot @RakeshTalanki