Need ideas to debug 502 errors

Hi,

I am getting HTTP 502 errors every day in the production environment.

Error message is {'fault':{'faultstring':'Unexpected EOF at target','detail':{'errorcode':'messaging.adaptors.http.flow.UnexpectedEOFAtTarget'}}}.

We maintain unique transaction id through out the entire flow client-->apigee-->backend. Backend team couldn't find any log associated with the transaction ID for which we are getting 502 errors. We can't enable tcpdump in the production environment with high TPS and we are not able to reproduce the issue. I am looking for suggestions on how to debug the issue. Backend load balancer is F5 which is very stable and it doesn't have any settings which will reject request over high volume. So what are my options here???

Your suggestions will be highly appreciated!

Note: Recently we have seen an issue on a loopback api where MP sets content-length with a non numeric value which nginx couldn't understand and as a result, it reset the connection and caused 502 response to the consumer. So I am wondering of this 502 could be associated with apigee system settings nothing to do with backend.

Thanks,

Krish

2 6 3,127
6 REPLIES 6

Two observation in the message logging in Post Client Flow

1. target.received.start.timestamp value is being populated as -1.

2. Response header is coming as blank.

Based on this can conclude the connection was closed before MP send the response to the Router. Is that right??

@Krish,

The error message "Unexpected EOF at target" indicates that we are getting an End of File from the target/backend server.

  1. Usually in these cases, you will not see any errors in your backend server logs.
  2. However, if you are on Private Cloud, you can look at the Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log) you may see the exception similar to this
    "message": "org:myorg env:test api:TestAPI rev:10 messageid:rrt-1-abcd-1 NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context$3.onException() : SSLClientChannel[C:<IPaddress>:443 Remote host:0.0.0.0:50100]@459069 useCount=6 bytesRead=0 bytesWritten=755 age=40107ms lastIO=12832ms .onExceptionRead exception: {}
    java.io.EOFException: eof unexpected
    at com.apigee.nio.channels.PatternInputChannel.doRead(PatternInputChannel.java:45) ~[nio-1.0.0.jar:na]
    at com.apigee.nio.channels.InputChannel.read(InputChannel.java:103) ~[nio-1.0.0.jar:na]
    at com.apigee.protocol.http.io.MessageReader.onRead(MessageReader.java:79) ~[http-1.0.0.jar:na]
    at com.apigee.nio.channels.DefaultNIOSupport$DefaultIOChannelHandler.onIO(NIOSupport.java:51) [nio-1.0.0.jar:na]
    at com.apigee.nio.handlers.NIOThread.run(NIOThread.java:123) [nio-1.0.0.jar:na]"
  3. This indicates that the MP has send the request to the backend server and is waiting to read the response. But the MP gets the end of file unexpectedly from the backend server.
  4. If you take a tcpdump you are very likely to observe that the backend server must be sending [FIN,ACK] as soon as the MP sends the request.

I would suggest that the best way to approach this problem is to collect a tcpdump either on the MP or on the backend server and analyse it for further investigation.

Regards,

Amar

Thanks @AMAR DEVEGOWDA for the reply.

Yes I can see the error in the MP logs. Is there any alternative of taking TCP dump? We can't enable TCP dump prod env with 800TPS where is the issue can't be replicated.

@Amar

i have also noticed that in the MP system.log we see below error

NIOThread@14 ERROR HTTP.CLIENT - HTTPClient$Context$3.onException() : SSLClientChannel[Connected: Remote:XX:443 Local:XX:37752]@1434278 useCount=9 bytesRead=0 bytesWritten=415 age=105363ms lastIO=266ms isOpen=true.onExceptionRead exception: {} java.io.EOFException: eof unexpected

If we want to analyse this further, then we need to know what are these paremteres

useCount=9 bytesRead=0 bytesWritten=415 age=105363ms lastIO=266ms

LastIO is basically the time taken by current request ie apigee received response of eof after 266ms.

Any idea what does the rest of the variable means?
does userCount means number of current thread connected to the target?

Thanks and Regards,

Not applicable

OK, I figured out the issue on our end. We were in the process of migrating to Apigee to manager our public apis. We added minimal configurations to our proxy endpoint and target endpoint. We were able to make GET query string calls without issue. However, our POST methods with JSON body were failing with the the following error:

{"fault": {
  "faultstring": "Unexpected EOF at target",
  "detail": {"errorcode": "messaging.adaptors.http.flow.UnexpectedEOFAtTarget"}
}}

So, I started going through each item until I came across this item in my target endpoint's HTTPTargetConnection.Properties collection.

<HTTPTargetConnection>
 
	<Properties>
            <Property name="compression.algorithm">gzip</Property>
        </Properties>

Once I removed the compression.algorithm entry from the Properties collection, everything started to work on our end.

Hi Team,

I was going through the below link for persistence connection.

https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

In this its mention that

Clients which assume persistent connections and pipeline immediately after connection establishment SHOULD be prepared to retry their connection if the first pipelined attempt fails. If a client does such a retry, it MUST NOT pipeline before it knows the connection is persistent. Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses.

Is this possible that this 502 EOF error are coming because

1. Apigee send multuple request to server in the same connection.

2. The server sent successful repsonse but for one such response it sent connection close header and hence rest of the request failed with 502 bad gateway.