Re: capacity for on prem

supriyokumar274 · 07-07-2023 02:09 AM

Hi,

we are using 13 node topology for our organization attached below, we want to increase our capacity as api call will increase , kindly let me know what will be the best design approach

@anilsagar

@dchiesa1

Regards

API-Evangelist

Move to Apigee X (https://www.googlecloudcommunity.com/gc/Cloud-Product-Articles/Walkthrough-Trying-out-Apigee-X-Evalu...)

possible?

13-node topology is good but just curious on few questions

let me do some apigee tech sales role(which I am not) & ask below questions to really understand more on your env.

a) How many Apigee deployments (physically) do you have today? is it single dc or multi-dc setup?
b) How many Apigee Orgs and Environments you use today?
c) How many proxies are active on Apigee Production? Can you provide a total of how many API calls Apigee is processing quarterly/annually? (across all proxies in production)
d) Specs (cores, RAM, storage) of the VMs running individual Apigee components?
e)Estimate of your Peak TPS and Average TPS? This should be across all APIs from your production environment
f)What's your current api performance metrics look like?

It depends which component you want to scale mp/router/cassandra/zk etc - https://docs.apigee.com/private-cloud/v4.18.01/scaling-edge-private-cloud?hl=en

If you are growing fast it is simple to move to apigeeX rather working on OPDK. Apigee infra is little interesting & need deep understanding but docs does help on scaling on those components & support is always helpful as well..

supriyokumar274

Hi,

Thanks for replying , no as of now moving to apigeex is not possible, now coming to your queries:

a) How many Apigee deployments (physically) do you have today? is it single dc or multi-dc setup?--- Single
b) How many Apigee Orgs and Environments you use today?--- 1 org and 4 Environments
c) How many proxies are active on Apigee Production? Can you provide a total of how many API calls Apigee is processing quarterly/annually? (across all proxies in production)-- total 348 proxies in Production, total traffic 39.46*12 = 473.52 millions (yearly) we are estimating 500 millions from upcoming quarter
d) Specs (cores, RAM, storage) of the VMs running individual Apigee components?--> 3 VMS for ZooKeeper/Cassandra-- 8 CPU 16 GB RAM 50+50+150 (Diskes GB)
2 VMS for openLDAP-- 2 CPU 4 GB RAM 50+50+50 (Diskes GB)
2 VMS for Edge Management server/EDGE UI -- 4 CPU 8 GB RAM 50+50+50, 50+50+150 (Diskes GB)
2 VMS for Postgres server--> 8 CPU 16 GB RAM 50+50+50, 50+1024+1900+100 (Diskes GB)
2 VMS for Message processor--> 8 CPU 16 GB RAM 50+50+150 (Diskes GB)
2 VMS for Qpid and ingest --> 4 CPU 8 GB RAM 50+50+50 (Diskes GB)
2 VMS for Internet Router --> 4 CPU 8 GB RAM 50+50+50 (Diskes GB)
total 15 vms
e)Estimate of your Peak TPS and Average TPS? This should be across all APIs from your production environment--> how to generate kindly guide.

FYI @API-Evangelist

Regards

API-Evangelist

For TPS you can use below queries which you will need to run from postgresql across all orgs.

psql -h <hostname> -d apigee -U <user>

eg:
Query to return the 15 days with the highest number of messages per minute between March 30, 2019, and May 31, 2020.
SELECT timestamp,
SUM(message_count)/60.0 AS MAXTPS
FROM analytics."<ORG>.<ENV>.agg_api"
WHERE timestamp >= '2019-3-30 00:00:00'
AND timestamp < '2020-5-31 00:00:00'
GROUP BY 1
ORDER BY MAXTPS DESC
LIMIT 15;

eg:
Query to return the 15 days with the highest average number of messages per day between March 30, 2019, and May 31, 2020.

SELECT
DATE_TRUNC('day', timestamp AT time zone '-0:0:0')::timestamp without time zone AS time_unit,
SUM(message_count)/86400.0 AS AVGTPS
FROM analytics."<ORG>.<ENV>.agg_api"
WHERE timestamp >= '2019-3-30 00:00:00'
AND timestamp < '2020-5-31 00:00:00'
GROUP BY 1
ORDER BY AVGTPS DESC
LIMIT 15;

There are so many reasons to move towards apigee x as lot of new features/ integrations are in place which you may be missing out.You can pilot it and explore the difference and of all you don't have to worry about managing the infra part..If you still want to proceed with on-prem may be nice to talk if possible once you collect data.

supriyokumar274

Hi,

sorry for the delay , but its not possible to move to apigee x now , we will continue opdk for now , kindly find the avg tps and max tps as requested :

Year Month avg TPS

2021	Jan	1.41845
2021	Feb	1.81379
2021	Mar	2.28603
2021	Apr	2.66478
2021	May	2.94332
2021	Jun	3.61086
2021	Jul	3.87983
2021	Aug	4.73602
2021	Sep	2.90042
2021	Oct	2.34643
2021	Nov	1.72044
2021	Dec	1.83685
2022	Jan	1.84336
2022	Feb	2.15953
2022	Mar	2.30936
2022	Apr	2.75841
2022	May	4.15207
2022	Jun	5.79906
2022	Jul	8.54758
2022	Aug	5.9294
2022	Sep	5.52341
2022	Oct	5.90109
2022	Nov	10.54376
2022	Dec	13.63803
2023	Jan	13.52815
2023	Feb	15.10069
2023	Mar	15.58903
2023	Apr	18.14631
2023	May	14.73297
2023	Jun	13.70272

max tps:

Year Month Max tps for month

2021	Jan	32.48
2021	Feb	45.48
2021	Mar	55.32
2021	Apr	60.47
2021	May	56.2
2021	Jun	45.82
2021	Jul	71.75
2021	Aug	73.78
2021	Sep	41.73
2021	Oct	40.2
2021	Nov	50.92
2021	Dec	27.68
2022	Jan	29.27
2022	Feb	17.03
2022	Mar	21.47
2022	Apr	24.05
2022	May	33.7
2022	Jun	47.1
2022	Jul	40.95
2022	Aug	50.92
2022	Sep	53.85
2022	Oct	187.67
2022	Nov	85.52
2022	Dec	110.55
2023	Jan	79.52
2023	Feb	89.05
2023	Mar	255.78
2023	Apr	456.4
2023	May	211.98
2023	Jun	180.7

FYI @API-Evangelist

Thanks

supriyokumar274

Hi @API-Evangelist kindly respond

API-Evangelist

Before you go in detail on scaling we want to do this first.. 

Identify the list of proxies which are really active & inactive. I believe you have api governance & making sure you are periodically retiring/removing un-used api's.
Validate the api's which are time consuming or bad performance proxies which you may want to re-design/leverage cache.. 
Often times given the project timelines we end up quick works with in api platforms without understanding the best usage..If you have happen to see such inconsistent patterns you want to re-design well.. 
Check unnecessary backup/cronjob etc running on apigee vm's where you want fast routing/procesing capabilities Have a good session with your linux admin to see if any such improvements can be done outside apigee processes.
Make sure you are well maintaining the apigee regular maintenance tasks where ever applicable..(do not ignore) 
ref:
https://docs.apigee.com/private-cloud/v4.18.05/recurring-edge-services-maintenance-tasks https://docs.apigee.com/private-cloud/v4.18.05/apache-cassandra-maintenance-tasks https://docs.apigee.com/private-cloud/v4.18.05/zookeeper-maintenance https://docs.apigee.com/private-cloud/v4.18.05/recurring-analytics-services-maintenance-tasks

In most cases, scaling to accommodate a higher number of transactions per second (TPS) will only affect the components that serve live API traffic. 

The number of analytics and management components may increase, primarily due to the high availability requirements for the capabilities provided by these components. In your case may be scale these components..

Increase ZooKeeper/Cassandra to additional 2 nodes which will be 5 nodes with similar specs.You need to make sure to have 3 nodes to maintain quorum. ref: https://docs.apigee.com/private-cloud/v4.18.05/about-cassandra-replication-factor-and-consistency-le... https://docs.apigee.com/private-cloud/v4.18.05/adding-zookeeper-nodes https://docs.apigee.com/private-cloud/v4.18.05/adding-cassandra-nodes 
Increase Qpid servers by 2 nodes.Routers send requests to Message Processors, which call Cassandra nodes to offload Analytics data to Qpid queues, which are then ingested by services in Qpid Ingest Server.Postgres Server aggregates Analytics data asynchronously. https://docs.apigee.com/private-cloud/v4.18.05/add-or-remove-qpid-nodes 
Add additional 2 nodes for Router & message processor for better routing and processing capabilities. https://docs.apigee.com/private-cloud/v4.18.05/adding-router-or-message-processor-node
Finally take opinion from apigee support as well..
Good Luck.

johnwilliams

You can also check which component is hitting the 80% CPU threshold and over and decide accordingly. In order to handle more requests, you can simply scale Router and Message Processor which is the runtime component. There is documentation available for it,

https://docs.apigee.com/private-cloud/v4.52.00/adding-router-or-message-processor-node