Unexpected timeout error observed in JavaScript callout (Private Cloud v4.51)

Hi,

we have couple of js policy attached in our proxies where timelimit is 200(default value) y, we are getting js timeout error in prod which is hampering the real traffic even we increased the limit to 1000 but still sometimes we are getting error,

is there any permanent solution for this kindly let me know , we are using private cloud version 4.51

FYI @dandino @anilsagar 

please note there is no complex code written 

Regards 

2 1 181
1 REPLY 1

This can happen when your MPs are very busy or over-burdened. Either memory or CPU is not available to perform the JS step (even if it is simple logic), because that memory and CPU is dedicated to many other things.  Or, even if it is not busy, the CPU could be "waiting" for the network or for IO. 

What I suggest

  • Check the cpu utilization and IO on your MP VMs.  If they are showing high utilization, you may need to add a few VMs to relieve pressure, or increase the memory or CPU available to the VMs (scale out or up, or both).  
  • In some cases, even if you have low transaction throughput, you can have resource constraint problems if you have a single environment with many many API proxies deployed into it. The symptom here will be high memory utilization on the MPs, and slow GC cycles in the Java VMs. Deploying a proxy requires that each MP mapped to that environment must load the appropriate configuration into memory. If you have thousands of APIs deployed (not uncommon), that is thousands of configurations to load. Consider splitting up the environment into multi environments, and re-doing the MP-to-environment assignments.
  • If you have single MPs mapped to multiple environments, this can exacerbate the problem I just described above. Consider mapping a single MP to a single environment to avoid this problem. 
  • If you have nodes which are multi-role - a single node acts as RMP as well as C*, then you need to consider scaling that out.  Move each role to a distinct node.
  • The problem may be elsewhere!  May be not on the MP node itself.  The MP depends on Cassandra and  of course the network, to do its work.  If the network is constrained or saturated, then the MP can spend lots of time waiting, which  can result in you seeing timeouts in your JavaScript.  If Cassandra is not returning results quickly, then the MP must wait, which can result in the same symptom. 

So in short, you need to carefully examine the performance metrics across the entire cluster. Managing the performance of a distributed system is.. a complex endeavor. 

PS: You didn't ask, but... Google does all of this for you, if you use the managed Apigee service, Apigee X.  Lots of people say "Well we like on-prem systems, we have on-prem services so we need an on-prem API Platform."  The conclusion does not automatically follow from the premise.  Apigee X can be used to proxy APIs that run in your own datacenters.  You don't NEED to manage an API Platform. It's extra operational cost, and likely not contributing to your company's key differentiation. Check your assumptions, you may be able to avoid that cost.