capacity for on prem

Hi,

we are using 13 node topology for our organization attached below, we want to increase our capacity as api call will increase , kindly let me know what will be the best design approach

MicrosoftTeams-image.png

@anilsagar

@dchiesa1 

 

Regards 

0 7 251
7 REPLIES 7

Move to Apigee X (https://www.googlecloudcommunity.com/gc/Cloud-Product-Articles/Walkthrough-Trying-out-Apigee-X-Evalu...)

possible?

13-node topology is good but just curious on few questions

let me do some apigee tech sales role(which I am not) & ask below questions to really understand more on your env.

a) How many Apigee deployments (physically) do you have today? is it single dc or multi-dc setup?
b) How many Apigee Orgs and Environments you use today?
c) How many proxies are active on Apigee Production? Can you provide a total of how many API calls Apigee is processing quarterly/annually? (across all proxies in production)
d) Specs (cores, RAM, storage) of the VMs running individual Apigee components?
e)Estimate of your Peak TPS and Average TPS? This should be across all APIs from your production environment
f)What's your current api performance metrics look like?

It depends which component you want to scale mp/router/cassandra/zk etc - https://docs.apigee.com/private-cloud/v4.18.01/scaling-edge-private-cloud?hl=en

If you are growing fast it is simple to move to apigeeX rather working on OPDK. Apigee infra is little interesting & need deep understanding but docs does help on scaling on those components & support is always helpful as well..

 

 

 

Hi,

Thanks for replying , no as of now moving to apigeex is not possible, now coming to your queries:

a) How many Apigee deployments (physically) do you have today? is it single dc or multi-dc setup?--- Single
b) How many Apigee Orgs and Environments you use today?--- 1 org and 4 Environments
c) How many proxies are active on Apigee Production? Can you provide a total of how many API calls Apigee is processing quarterly/annually? (across all proxies in production)-- total 348 proxies in Production, total traffic 39.46*12 = 473.52 millions (yearly) we are estimating 500 millions from upcoming quarter
d) Specs (cores, RAM, storage) of the VMs running individual Apigee components?--> 3 VMS for ZooKeeper/Cassandra-- 8 CPU 16 GB RAM 50+50+150 (Diskes GB)
2 VMS for openLDAP-- 2 CPU 4 GB RAM 50+50+50 (Diskes GB)
2 VMS for Edge Management server/EDGE UI -- 4 CPU 8 GB RAM 50+50+50, 50+50+150 (Diskes GB)
2 VMS for Postgres server--> 8 CPU 16 GB RAM 50+50+50, 50+1024+1900+100 (Diskes GB)
2 VMS for Message processor--> 8 CPU 16 GB RAM 50+50+150 (Diskes GB)
2 VMS for Qpid and ingest --> 4 CPU 8 GB RAM 50+50+50 (Diskes GB)
2 VMS for Internet Router --> 4 CPU 8 GB RAM 50+50+50 (Diskes GB)
total 15 vms
e)Estimate of your Peak TPS and Average TPS? This should be across all APIs from your production environment--> how to generate kindly guide.

FYI @API-Evangelist 

Regards

For TPS you can use below queries which you will need to run from postgresql across all orgs.

psql -h <hostname> -d apigee -U <user>

eg:
Query to return the 15 days with the highest number of messages per minute between March 30, 2019, and May 31, 2020.
SELECT timestamp,
SUM(message_count)/60.0 AS MAXTPS
FROM analytics."<ORG>.<ENV>.agg_api"
WHERE timestamp >= '2019-3-30 00:00:00'
AND timestamp < '2020-5-31 00:00:00'
GROUP BY 1
ORDER BY MAXTPS DESC
LIMIT 15;

eg:
Query to return the 15 days with the highest average number of messages per day between March 30, 2019, and May 31, 2020.

SELECT
DATE_TRUNC('day', timestamp AT time zone '-0:0:0')::timestamp without time zone AS time_unit,
SUM(message_count)/86400.0 AS AVGTPS
FROM analytics."<ORG>.<ENV>.agg_api"
WHERE timestamp >= '2019-3-30 00:00:00'
AND timestamp < '2020-5-31 00:00:00'
GROUP BY 1
ORDER BY AVGTPS DESC
LIMIT 15;

There are so many reasons to move towards apigee x as lot of new features/ integrations are in place which you may be missing out.You can pilot it and explore the difference and of all you don't have to worry about managing the infra part..If you still want to proceed with on-prem may be nice to talk if possible once you collect data.

Hi, 

sorry for the delay , but its not possible to move to apigee x now , we will continue opdk for now , kindly find the avg tps and max tps as requested :

Year      Month   avg TPS

2021Jan1.41845
2021Feb1.81379
2021Mar2.28603
2021Apr2.66478
2021May2.94332
2021Jun3.61086
2021Jul3.87983
2021Aug4.73602
2021Sep2.90042
2021Oct2.34643
2021Nov1.72044
2021Dec1.83685
2022Jan1.84336
2022Feb2.15953
2022Mar2.30936
2022Apr2.75841
2022May4.15207
2022Jun5.79906
2022Jul8.54758
2022Aug5.9294
2022Sep5.52341
2022Oct5.90109
2022Nov10.54376
2022Dec13.63803
2023Jan13.52815
2023Feb15.10069
2023Mar15.58903
2023Apr18.14631
2023May14.73297
2023Jun13.70272

max tps: 

Year                    Month                 Max tps for month

2021Jan32.48
2021Feb45.48
2021Mar55.32
2021Apr60.47
2021May56.2
2021Jun45.82
2021Jul71.75
2021Aug73.78
2021Sep41.73
2021Oct40.2
2021Nov50.92
2021Dec27.68
2022Jan29.27
2022Feb17.03
2022Mar21.47
2022Apr24.05
2022May33.7
2022Jun47.1
2022Jul40.95
2022Aug50.92
2022Sep53.85
2022Oct187.67
2022Nov85.52
2022Dec110.55
2023Jan79.52
2023Feb89.05
2023Mar255.78
2023Apr456.4
2023May211.98
2023Jun180.7

FYI @API-Evangelist  

Thanks

Hi @API-Evangelist kindly respond

Before you go in detail on scaling we want to do this first..


  1. Identify the list of proxies which are really active & inactive. I believe you have api governance & making sure you are periodically retiring/removing un-used api's.
  2. Validate the api's which are time consuming or bad performance proxies which you may want to re-design/leverage cache..

  3. Often times given the project timelines we end up quick works with in api platforms without understanding the best usage..If you have happen to see such inconsistent patterns you want to re-design well..

  4. Check unnecessary backup/cronjob etc running on apigee vm's where you want fast routing/procesing capabilities
Have a good session with your linux admin to see if any such improvements can be done outside apigee processes.
  5. Make sure you are well maintaining the apigee regular maintenance tasks where ever applicable..(do not ignore)

    ref:
    https://docs.apigee.com/private-cloud/v4.18.05/recurring-edge-services-maintenance-taskshttps://docs.apigee.com/private-cloud/v4.18.05/apache-cassandra-maintenance-taskshttps://docs.apigee.com/private-cloud/v4.18.05/zookeeper-maintenancehttps://docs.apigee.com/private-cloud/v4.18.05/recurring-analytics-services-maintenance-tasks

In most cases, scaling to accommodate a higher number of transactions per second (TPS) will only affect the components that serve live API traffic.


The number of analytics and management components may increase, primarily due to the high availability requirements for the capabilities provided by these components.
In your case may be scale these components..

  1. Increase ZooKeeper/Cassandra to additional 2 nodes which will be 5 nodes with similar specs.You need to make sure to have 3 nodes to maintain quorum.
ref:
https://docs.apigee.com/private-cloud/v4.18.05/about-cassandra-replication-factor-and-consistency-le...https://docs.apigee.com/private-cloud/v4.18.05/adding-zookeeper-nodeshttps://docs.apigee.com/private-cloud/v4.18.05/adding-cassandra-nodes
  2. Increase Qpid servers by 2 nodes.Routers send requests to Message Processors, which call Cassandra nodes to offload Analytics data to Qpid queues, which are then ingested by services in Qpid Ingest Server.Postgres Server aggregates Analytics data asynchronously.
https://docs.apigee.com/private-cloud/v4.18.05/add-or-remove-qpid-nodes
  3. Add additional 2 nodes for Router & message processor for better routing and processing capabilities.
https://docs.apigee.com/private-cloud/v4.18.05/adding-router-or-message-processor-node

  4. Finally take opinion from apigee support as well..
    Good Luck.
 

 

You can also check which component is hitting the 80% CPU threshold and over and decide accordingly. In order to handle more requests, you can simply scale Router and Message Processor which is the runtime component. There is documentation available for it,

https://docs.apigee.com/private-cloud/v4.52.00/adding-router-or-message-processor-node