Our production OPDK cluster occasionally starts returning INTERNAL_SERVER_ERROR faults. It seems to start with ~3% of requests and then increases. Our temporary fix has been to restart the router and message-processor instances. This seems to help for a few days.
Here is the 500 response we're seeing:
{ "fault": { "faultstring": "Internal Server Error", "detail": { "code": "INTERNAL_SERVER_ERROR" } } }
I haven't yet found anything in the router or message-processor system.log files (but those logs seem very difficult to use).
Any advice as to what this could be? Or where I should check?
We're using OPDK 4.15.07.00 with a cluster containing 6 routers and 6 MPs.
Try enabling the debug session on MP to see more details about the root cause.
Follow below link for more details across enabling debug session.
https://community.apigee.com/articles/1533/how-to-enable-debug-in-the-apigee-edge-router-and.html
did you solve this @Eric Dahl. Having same issues here. Thanks
,Did you solve this issue @Eric Dahl?
We did not. We've spent many days with Apigee support on this.
This seemed to help:
- increasing message processor max heap size (Xmx)
- switching to use G1GC garbage collector for message processor
- monitoring for target servers which have slow response times - this excaberates the problem
We're using 4.15.07. It seems like the problem does not occur in 4.16.09