Operation on instance failed... GCP is just failin...

NS12 · 11-24-2023 01:19 PM

Hi,

I'm trying to run a project using an A100 GPU and get the following message when trying to run it:

Operation on instance failed: The zone 'projects/pro-park-404707/zones/us-west4-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

Small rant to end with: to say the least, GCP has been extremely disappointing. So many cryptic error messages, nothing really works. I guess we just have to migrate to something more stable, didn't have any issues ever with Azure, so probably going back to that, thought Google would nail their AI platform but that seems to not be the case at all 😞

UPDATE:
When trying to use two T4s instead the following happens:

Operation on instance failed: Quota 'NVIDIA_T4_GPUS' exceeded. Limit: 1.0 in region us-west4.

What a joke... (and yes they approved a quota increase weeks ago, but yeah, just a shit show at this point)

lawrencenelson

Hi @NS12,

Welcome to the Google Cloud Community!

@NS12 wrote:

Operation on instance failed: The zone 'projects/pro-park-404707/zones/us-west4-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

Unfortunately, there is no other way around this but to choose a different region/zone or wait until resources are freed up. 😔

@NS12 wrote:

When trying to use two T4s instead the following happens:

Operation on instance failed: Quota 'NVIDIA_T4_GPUS' exceeded. Limit: 1.0 in region us-west4.

Please refer to this documentation:

Similar to virtual CPU quota, GPU quota refers to the total number of virtual GPUs in all VM instances in a region. GPU quotas apply to running VMs and VM reservations. Both predefined and preemptible VMs consume this quota.

Check the Quotas page to ensure that you have enough GPUs available in your project, and to request a quota increase. In addition, new accounts and projects have a global GPU quota that applies to all regions.

When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones. Request preemptible GPU quota to use those resources.

You may view this Stack Overflow thread (the accepted solution is outdated) that might help with your case.

I hope this helps. You can always contact Google Cloud Support to further look into your case. Thanks. 😃

Operation on instance failed... GCP is just failing everywhere left and right