GCP cloud function giving 504 timeout at cold star...

w268wang · 03-21-2024 08:04 AM

I have a V2 python311 cloud function and the traffic pattern is 15-25 requests arriving simultaneously every 2 hours. The function has around a 1/20 chance of returning "504 The request has been terminated because it has reached the maximum request timeout" error for all of the requests in a given hour. The normal processing time of this function is <20s, but this situation persists when I increase the timeout from 60s to 120s. Another thing that is worth mentioning is the 504 errors were shown in the log at the same time the requests arrived. The print statement at the beginning of the function also didn't print out anything in the log when 504 errors happened. So it seems like the functions were timed out without even starting to process the requests. My setting for minimal instances is 0, would these 504 time-out errors be because of a cold start?

Thanks for any suggestions.

juliadeanne

Hello @w268wang,

Welcome to Google Cloud Community!

Based on your description, it appears that your logs do not show any print statements at the beginning of the function during the times when the 504 errors occurred. This indicates that your function instances aren't even starting to execute before the request times out. This could be related to the scaling behavior of your Cloud Function infrastructure.

With the minimum instances set to 0, Cloud Functions won't have any pre-warmed instances available. When a burst of requests arrives, functions need to be spun up from scratch, causing a delay (cold start) before they can process requests. This delay can easily exceed the default timeout (60 seconds) if your function involves significant initialization or dependency loading.

Please consider setting the minimum instances to a value (perhaps 1 or 2) to ensure at least a few pre-warmed functions are ready to handle initial requests.

For more information, please visit our official documentation on configuring minimum instances in Cloud Functions.

w268wang

Hey @juliadeanne

Thanks for getting back to me!

After reviewing the log, I found there was a case where only a single request went in and the instance startup time was only 14sec, but that request still got a 504 error. The timeout setting when this happened was 400s. I can share more details if you would like to look into it.

Another important observation is that the timestamps of some of these 504 errors in the log are within a mere 3 seconds of the timestamps when the sender initiates the request.

But I will try increasing the minimal instance to see if that helps.

Thanks,

Sarotiv

@juliadeanne , good afternoon!

I'm facing the exact same issue as @w268wang reported, and in my case, even after setting min instances in my function to 2, I'm getting 504 errors whereas the function is not even being executed, just like in his case.

It happens at completely random times in a day, but always once every day. Every three hours a cloud schedule enqueues an X number of tasks where each task makes a request to my function. The number of X varies, but in average it is 20. For instance, earlier today there were 12 tasks enqueued, and at least 6 of them got the 504 error after 300 seconds, although they ran fine upon automatic retry (the queue is set to make 5 retries).

Considering my function is defined with 2 GB memory and 10 max instances, and that it allocates an average of 260 MB and takes approximately 15 seconds on a successful request, then it should be handling concurrency just fine with no more than 2 instances in a 12 request burst scenario like I described.

Sarotiv

@w268wang , just for the record, are you by any chance also enqueueing tasks in order to make batch calls to your function?

I'm asking this because I just noticed that when I enqueue them, even on a successful scenario, the corresponding entry in log explorer for the task (not the function) will appear before the first entry from the actual function, but the most interesting part is that it shows the status code and the response time for the http request to the function just a few milliseconds BEFORE the function had even started executing.

So I'm starting to believe there is an issue more on the task itself rather than the function.

w268wang

Hey @Sarotiv, I'm not making batch calls to my function though. But I do see several times that the status code and the response time entry show up before the start of the function.

MinasConsulting

Hi Everyone,

I am experiencing the exact same issue but with python 3.12. On cold start sometimes all requests end in 504 errors after hitting max timeout. I can't get it to resolve until I redeploy the code. A working cold start takes 4-5 seconds then a warm response is around 250ms. This seems to be an issue with GCP.

msarsale

Hi, I'm having the same situation.

I see the 504 both in Cloud Functions V2 and Cloud Run logs. The way found to 'restart' this was re-deploying the Cloud Function.

GCP cloud function giving 504 timeout at cold start