Why are we getting 503 errors from only one region/data center ?

We have Edge setup with MPs/Routers in two regions/data centers. We are seeing a strange behaviour where in we are getting 503 errors for our APIs continuously when the calls are made from one of the regions.

{  "error": 
      {
        "statusCode":503, 
        "code":"E01", 
        "message":"The Service is temporarily unavailable", 
        "developerMessage":"ServiceUnavailable", 
        "_debug": {"fault":"{\"fault\":{\"faultstring\":\"The Service is temporarily unavailable\",\"detail\":{\"errorcode\":\"messaging.adaptors.http.flow.ServiceUnavailable\"}}}"} 
       }
} 

While we get successful responses every time from the other region.

Can you please check what could be the issue ?

Solved Solved
0 3 608
1 ACCEPTED SOLUTION

  1. Enabled the trace and was able to get the failing requests. Noticed that we got the following error:
    error The Service is temporarily unavailable 
    error.cause Received fatal alert: close_notify
  2. The Message Processor (MP) logs showed that 503 is shown because the SSL handshake failed with the target server.
    2017-07-07 14:28:16,142 org:test-org env:test api:myapi rev:6 messageid:<message_id> NIOThread@1 ERROR HTTP.CLIENT - HTTPClient$Context.handshakeFailed() : SSLClientChannel[C:<BackendServer-IPaddress>:443 Remote host:<MP-IP>:42038]@107327 useCount=1 bytesRead=0 bytesWritten=0 age=279ms lastIO=279ms lastIO=279ms isOpen=true handshake failed, message: Received fatal alert: close_notify
  3. Made the API calls and collected the tcpdump on one the the MPs in the failing region
    tcpdump -i any -s 0 host <BackendServer IP address> -w <File name>
  4. Analysed the tcpdump and found the following information:
    • The MP sent the ClientHello message with TLSv1.2 protocol
    • The Backend Server supported only TLSv1.0 protocol
    • So the Backend Server immediately sends "close_notify" alert and terminated the connection.This results in SSL handshake failure.
  5. Checked the Target Endpoint configuration in the API Proxy to see if we are specifying TLSv1.2 protocol explicitly.But that wasn’t the case.We were allowing the MP to use the default protocol.Also if this was the issue, then we should’ve seen the problem in both the regions.
  6. The default protocol happens to be TLSv1.2 as MPs are running with Java 8.
  7. So got curious to understand how come the MPs on the passing region are sending TLSv1.0 protocol.
  8. Searched the properties on the Message Processors in the passing region and found that "jdk.tls.client.protocols=TLSv1" was set in jvmsecurity.properties file.
  9. This was the cause for the difference in behavior in the two regions.


Solution:

Resolved the problem by setting the property "jdk.tls.client.protocols=TLSv1" on all the MPs in the failing region.


Key Things to Note:

  1. Setting the property jdk.tls.client.protocols to TLSv1.0 will cause all the APIs running on the specific MP to use protocol TLSv1.0 while communicating with any of the backend servers.You can read more about this property here.
  2. You need to set this property only if you are 100% sure that all the backend servers associated with all your APIs support a specific protocol.
  3. If you want to take this approach, then set the property to the same value on all MPs across all the regions.
  4. Otherwise, the right thing to do is to set the appropriate protocol in the SSLInfo section of the Target Endpoint as shown below:
<HTTPTargetConnection>
   <URL>https://foo.com</URL>
     <SSLInfo>
        <Enabled>true</Enabled>
	<Protocols>
	   <Protocol>TLSv1.0</Protocol>
	</Protocols>
    </SSLInfo>
</HTTPTargetConnection> 

View solution in original post

3 REPLIES 3

  1. Enabled the trace and was able to get the failing requests. Noticed that we got the following error:
    error The Service is temporarily unavailable 
    error.cause Received fatal alert: close_notify
  2. The Message Processor (MP) logs showed that 503 is shown because the SSL handshake failed with the target server.
    2017-07-07 14:28:16,142 org:test-org env:test api:myapi rev:6 messageid:<message_id> NIOThread@1 ERROR HTTP.CLIENT - HTTPClient$Context.handshakeFailed() : SSLClientChannel[C:<BackendServer-IPaddress>:443 Remote host:<MP-IP>:42038]@107327 useCount=1 bytesRead=0 bytesWritten=0 age=279ms lastIO=279ms lastIO=279ms isOpen=true handshake failed, message: Received fatal alert: close_notify
  3. Made the API calls and collected the tcpdump on one the the MPs in the failing region
    tcpdump -i any -s 0 host <BackendServer IP address> -w <File name>
  4. Analysed the tcpdump and found the following information:
    • The MP sent the ClientHello message with TLSv1.2 protocol
    • The Backend Server supported only TLSv1.0 protocol
    • So the Backend Server immediately sends "close_notify" alert and terminated the connection.This results in SSL handshake failure.
  5. Checked the Target Endpoint configuration in the API Proxy to see if we are specifying TLSv1.2 protocol explicitly.But that wasn’t the case.We were allowing the MP to use the default protocol.Also if this was the issue, then we should’ve seen the problem in both the regions.
  6. The default protocol happens to be TLSv1.2 as MPs are running with Java 8.
  7. So got curious to understand how come the MPs on the passing region are sending TLSv1.0 protocol.
  8. Searched the properties on the Message Processors in the passing region and found that "jdk.tls.client.protocols=TLSv1" was set in jvmsecurity.properties file.
  9. This was the cause for the difference in behavior in the two regions.


Solution:

Resolved the problem by setting the property "jdk.tls.client.protocols=TLSv1" on all the MPs in the failing region.


Key Things to Note:

  1. Setting the property jdk.tls.client.protocols to TLSv1.0 will cause all the APIs running on the specific MP to use protocol TLSv1.0 while communicating with any of the backend servers.You can read more about this property here.
  2. You need to set this property only if you are 100% sure that all the backend servers associated with all your APIs support a specific protocol.
  3. If you want to take this approach, then set the property to the same value on all MPs across all the regions.
  4. Otherwise, the right thing to do is to set the appropriate protocol in the SSLInfo section of the Target Endpoint as shown below:
<HTTPTargetConnection>
   <URL>https://foo.com</URL>
     <SSLInfo>
        <Enabled>true</Enabled>
	<Protocols>
	   <Protocol>TLSv1.0</Protocol>
	</Protocols>
    </SSLInfo>
</HTTPTargetConnection> 

This helped me. Thanks for posting. What did you use for collecting tcp dump.

@AshwiniRai,

Glad to know that this post helped you. I used the tcpdump command (network sniffer tool) to capture the network packets.