Token issuance cross multi datacenter

Hello Community!

I have an 'interesting'  puzzle. We have a 2 Data Center setup. In some instances we see that the token issued in dc1 isn't available in dc2 for some time (actually milliseconds, but its impacting the client so they win). Is there a way to pin the subsequent requests to the issuing datacenter until the token makes its way to the other datacenter? Or is there a way to have it issued into both datacenters at the same time?

Solved Solved
1 7 522
1 ACCEPTED SOLUTION

One hacky way to insure tokens are available might be to simply delay in the API proxy for some time, which should accommodate the C* propagation. This implies a change in your API Proxy. This "delay" callout might help. 

Another way to address this might be to use client-to-datacenter affinity, so that a client always uses the same DC if it is available, regardless of what token it presents. This requires a networking change. 

View solution in original post

7 REPLIES 7

This is OPDK?

One hacky way to insure tokens are available might be to simply delay in the API proxy for some time, which should accommodate the C* propagation. This implies a change in your API Proxy. This "delay" callout might help. 

Another way to address this might be to use client-to-datacenter affinity, so that a client always uses the same DC if it is available, regardless of what token it presents. This requires a networking change. 

The real question is why the clients are allowed to use each region in a form of round-robin fashion, for this to be a real issue (i.e. auth request on one region, use of the token on the second). Best solution is to configure at network level in the GTM/BigIP/Netscaler a Source-IP (or other, based on a response header, for example, assuming the GTM does TLS termination) based affinity. It's usually done by adding an iRule in BigIP; this is complemented by extra rules to balance the load on each region. It also depends on geographical location of clients and regions. I also assume this is not OPDK on GCP regions, where we do have anycast. It's not generally a good idea to add a fixed extra response latency while issuing tokens, as the C* cross DC replication latency is a variable function of DB load, B/W, API traffic volumes. Also good idea to monitor the C* replication lag real time and alert if a threshold is breached: https://github.com/gitaroktato/cassandra-replication-latency-tools

Hi Nicola, there are cases where customer uses multi-dc (dc-2) as active-active and all accessible cluster, that might be in same physical DC. I always though that multi DC good for geo scaling, but in some cases its the same country, but DR site that used as active. I've seen some replication delays if you letting customers use your multi-dc at the same time with global LB on top. 

One of the solutions I also can see is sticky session - client that started the session will be working against same RMPs and, same DC. 

You are  correct, Denis; stickiness falls under the umbrella of "based on a response header", but in general this would involve API design and if a "session" is actually involved. REST, otherwise, is stateless.

It is, but LB can stick to IP or a bunch of other TCP stream params, you actually don't explicitly need to 'stick' some param back to the client. 

Yes, that is the easiest way to configure it. You can additionally configure the GTM to still be fair in the way it distributes the requests across sites, with additional iRules in bigIP, for example.