Need details on the implications on an installation run time, when a particular server is down?

asurajpai
Participant V

Say, for a 13 node cluster installation, where different APIGEE profiles are installed in different servers.

For Eg. now say any of the server is down

1) Router is down: You will not be able to hit the runtime proxy endpoints.

2) Management server is down: you will not be able to create a new proxy.

3) OpenLDAPis down: You will not be able to login to UI.

So is there any documentation/ link were we can get the details of all the servers and the implication on the runtime and other servers when a particular server is down.

1) Zookeeper

2) Cassandra

3) openLDAP

4) Postgres Server

5) Management Server

6) Management UI

7) Router

😎 Message Processor

9) Qpid Server

Solved Solved
0 7 746
1 ACCEPTED SOLUTION

Not applicable

Hi Suraj,

At the outset, you will have to look at your deployment in 3 planes

1st plane is the Messaging Plane. This is what supports your runtime traffic. Routers, Message Processors and Cassandra fall under this.

2nd plane it the Management Plane. This is where you configure, change, deploy your API Poxies, Products and all other artifacts. Zookeeper, openLDAP, Management UI , Management Server (and API) fall under this

3rd plane is the Analytics plane. The Analytics is collected and aggregated asynchronous to your API calls. Postgres Server, Qpid Server fall under this.

In your deployment you will have to absolutely make sure that the messaging plane has enough redundancy, as per your SLA requirements. If Management plane or Analytics plane goes down, your API will still be operational. How much redundancy you want on these varies from use case to use case. You should also make sure, through your monitoring that all the instances of a specific (or at least the critical) component are not down.

Let me now get into each component in question and explain what happens if all the instances of a particular component go down (which should ideally not happen, for the components in the messaging plane at least):

Messaging Plane

If you messaging plane is down, you API calls will not work. Some of the aspects of Management API will also not work

1. Router

If all your Routers are down, your API classification will fail. Your Edge deployment will not be able to take any API calls.

2. Message Processor

If all your Message Processors are down, the API runtime is down. You will start seeing classification errors, because the router will not know where to send the API calls. It is equivalent to not having your API proxy deployed.

3. Cassandra

If you have Authentication polices that are using Apigee as the Authentication provider (API Key validation, OAuth) or if you have distributed caching etc, these policies will fail or will not work as expected. If one of the cassandra nodes is down you will not have an issue. You will run into issues only when the number of nodes is less than the consistency level

Management Plane

1. Management server

You will not be able to make any configuration chances like creating API Proxies etc

2. OpenLDAP : You will not be able to login (both through the UI and the API)

3. Zookeeper

You will not be able to make any configuration chances like creating API Proxies etc

4. Management UI

You will not be able to login. However the Management API will let you do all the functionality supported by the UI.

Analytics plane

1. Qpid Server

Qpid picks up the analytics raw data for message processors. If all your Qpid servers are down, the message processors have in-memory buffer where they can store the data for one of the Qpid servers to comeback up. If the buffer overflows before any of the Qpids can come up or if the message processor goes down, the raw analytics data for that duration of time will be lost.

2. Postgres Server

Postgres is where all the analytics data resides. If Postgres is down, your analytics is effectively down.

View solution in original post

7 REPLIES 7

Hi @asurajpai,

In a 13 node install, all these software components have redundancy, so a one component failure will not affect the runtime. You will find more information on the components in the install config guide. If you are looking for just a standard guideline, then this ebook would address many of your questions

http://apigee.com/about/resources/ebooks/digital-ready-it-api-platform-story

Thanks,

This details is a very high level in the api-platform-story. I need the implication from a monitoring perspective, it is for a requirement to understand the impact of an independent servers on the other nodes.

Hi @Srividya Annapragada, can you please help on the question.

Hi @jagjyot @nitinsingh, Can you please help on this topic and point to the correct documentation.

@Sandeep Murusupalli can yo please help in providing the above information or direct me to the correct contact.

asurajpai
Participant V

@Sandeep Murusupalli can yo please help in providing the above information or direct me to the correct contact.

Not applicable

Hi Suraj,

At the outset, you will have to look at your deployment in 3 planes

1st plane is the Messaging Plane. This is what supports your runtime traffic. Routers, Message Processors and Cassandra fall under this.

2nd plane it the Management Plane. This is where you configure, change, deploy your API Poxies, Products and all other artifacts. Zookeeper, openLDAP, Management UI , Management Server (and API) fall under this

3rd plane is the Analytics plane. The Analytics is collected and aggregated asynchronous to your API calls. Postgres Server, Qpid Server fall under this.

In your deployment you will have to absolutely make sure that the messaging plane has enough redundancy, as per your SLA requirements. If Management plane or Analytics plane goes down, your API will still be operational. How much redundancy you want on these varies from use case to use case. You should also make sure, through your monitoring that all the instances of a specific (or at least the critical) component are not down.

Let me now get into each component in question and explain what happens if all the instances of a particular component go down (which should ideally not happen, for the components in the messaging plane at least):

Messaging Plane

If you messaging plane is down, you API calls will not work. Some of the aspects of Management API will also not work

1. Router

If all your Routers are down, your API classification will fail. Your Edge deployment will not be able to take any API calls.

2. Message Processor

If all your Message Processors are down, the API runtime is down. You will start seeing classification errors, because the router will not know where to send the API calls. It is equivalent to not having your API proxy deployed.

3. Cassandra

If you have Authentication polices that are using Apigee as the Authentication provider (API Key validation, OAuth) or if you have distributed caching etc, these policies will fail or will not work as expected. If one of the cassandra nodes is down you will not have an issue. You will run into issues only when the number of nodes is less than the consistency level

Management Plane

1. Management server

You will not be able to make any configuration chances like creating API Proxies etc

2. OpenLDAP : You will not be able to login (both through the UI and the API)

3. Zookeeper

You will not be able to make any configuration chances like creating API Proxies etc

4. Management UI

You will not be able to login. However the Management API will let you do all the functionality supported by the UI.

Analytics plane

1. Qpid Server

Qpid picks up the analytics raw data for message processors. If all your Qpid servers are down, the message processors have in-memory buffer where they can store the data for one of the Qpid servers to comeback up. If the buffer overflows before any of the Qpids can come up or if the message processor goes down, the raw analytics data for that duration of time will be lost.

2. Postgres Server

Postgres is where all the analytics data resides. If Postgres is down, your analytics is effectively down.