Solved: DN failover handling

ozanseymen · 07-07-2015 06:38 AM

How does Apigee configure route53 to do failover for DN scenario?

How do we set a health check monitor?
What is the latency tolerance before we declare region as down?
How many retries are executed before we declare region as down?
How/when do we add the region back to DN cluster once it has recovered?

Cheers

frankliu1

Ozan,

There are a few variations. If you can tell your DNS name related to the DN, we can give more specific information to your setup, but 90 seconds is a good ballpark number to fail a region in most cases. We re-try 3 times with 30 seconds interval, but we have the option to do fast fail with about 10 seconds interval.

In general, if the region failover is automatic, the recover will be automatic, after two successful health monitor checks. But if the failover is manual (eg: requested by customer), the recover will be manual too.

Hope that helps.

View solution in original post

gnanasekaran

My 3 cents,

> The criteria and health check will depend on the customer requirements

> I don't think we use route53 heath checks or monitors, api health would be good option for customers to configure health check

> failover or recovery is just a DNS record update in route 53, that's it - but it's not 'auto' automatic, the switch is done manually after a alert is received or the problem is resolved

frankliu1

Ozan,

There are a few variations. If you can tell your DNS name related to the DN, we can give more specific information to your setup, but 90 seconds is a good ballpark number to fail a region in most cases. We re-try 3 times with 30 seconds interval, but we have the option to do fast fail with about 10 seconds interval.

In general, if the region failover is automatic, the recover will be automatic, after two successful health monitor checks. But if the failover is manual (eg: requested by customer), the recover will be manual too.

Hope that helps.