storage.buckets.list and Error Code 7 When Using the Nodejs Client

All,

I have a service account (tim-sandbox-loader-sa) in one project (tims-sandbox) that pulls data from Cloud Storage buckets in another project (production), using the Node.js idiomatic client for Cloud Storage, running in a Cloud Function triggered periodically by Cloud Scheduler.

In keeping with the Principle of Least Privilege, 
tim-sandbox-loader-sa has (only) the Storage Object Viewer Role on the one specific bucket that it needs access to in production. Also, everything actually WORKS great -- however...

Whenever the function runs, I get severity=ERROR entries in logs/cloudaudit.googleapis.com%2Fdata_access, similar to the following (somewhat redacted because I'm paranoid):

 

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 7
    },
    "authenticationInfo": {
      "principalEmail": "<redacted service account principal>",
      "serviceAccountDelegationInfo": [
        {
          "firstPartyPrincipal": {
            "principalEmail": "<redacted>@serverless-robot-prod.iam.gserviceaccount.com"
          }
        }
      ]
    },
    "requestMetadata": {
      "callerSuppliedUserAgent": "Blob/1 (cr/<redacted>)",
      "requestAttributes": {
        "time": "2024-04-16T15:00:06.249340103Z",
        "auth": {}
      },
      "destinationAttributes": {}
    },
    "serviceName": "storage.googleapis.com",
    "methodName": "storage.buckets.get",
    "authorizationInfo": [
      {
        "resource": "projects/_/buckets/<redacted bucket>",
        "permission": "storage.buckets.get",
        "resourceAttributes": {}
      }
    ],
    "resourceName": "projects/_/buckets/<redacted bucket>",
    "resourceLocation": {
      "currentLocations": [
        "<redacted location>"
      ]
    }
  },
  "insertId": "<redacted insert id>",
  "resource": {
    "type": "gcs_bucket",
    "labels": {
      "project_id": "<redacted project id",
      "bucket_name": "<redacted bucket name>",
      "location": "<redacted location>"
    }
  },
  "timestamp": "2024-04-16T15:00:06.243715813Z",
  "severity": "ERROR",
  "logName": "projects/<redacted>/logs/cloudaudit.googleapis.com%2Fdata_access",
  "receiveTimestamp": "2024-04-16T15:00:07.221902175Z"
}

 My code that accesses the bucket is pretty simple. I inspected the corresponding code in the Nodejs Client, and I don't see where it should/would be listing or enumerating buckets in my case:

 

  const storage = new Storage();

  const filename = await getLastExtractFileName(tableId);

  console.log(`Loading ${filename} to ${projectId}.${datasetId}.${tableId}...`);

  async function getLastExtractFileName(tableId: string) {
    
    const options = {
      prefix: `extracts/${tableId}_`,
    };

    // Lists files in the bucket, filtered by a prefix
    const [files] = await storage.bucket(bucketName).getFiles(options);

    return (files?.slice(-1) || [])[0]?.name;
  }
...

I suspect I may be able to remove the ERROR by creating a custom role with storage.buckets.get permission, adding tim-sandbox-loader-sa to production, and granting the custom role there, at the project level, but that's a guess, and seems overly complex and not in keeping with the POLP (since this service account doesn't actually need to list buckets). 

Any idea who/what/why is trying to enumerate the buckets there, and suggestions for removing it's need to do so, or at least eliminating or reducing the severity of the error message in the log appropriately? 

5 3 177
3 REPLIES 3

Hello @TimJohns , 

Welcome to the Google Cloud Community!

You're seeing an error because your code is trying to check if a bucket exists before it lists files, which requires the `storage.buckets.get` permission. However, your service account doesn't have this permission, leading to an ERROR log.

As you mentioned, creating a custom role that includes only `storage.objects.get` and assigning it to the service account might fix this. This ensures the service account has just the necessary permissions, avoiding the need to list buckets.

 

Thanks @juliadeanne

I don't see anywhere that my code is -explicitly- trying to check if a bucket exists before it lists files, and my initial assumption was that the Nodejs client -implicitly- does so, but that doesn't seem to be the case, either.

In theory,

 

await storage.bucket(bucketName).getFiles(options);

 

should result in a GET to:

https://storage.googleapis.com/storage/v1/b/<redactedbucket>/o

which requires storage.objects.list. The storage.objects.list permission should already be provided to the service account by the Storage Object Viewer role, along with storage.objects.get. The error message references storage.buckets.get, which shouldn't be required.

I do agree that SOMETHING (that doesn't have the storage.buckets.get permission) is making a GET to:

 

https://storage.googleapis.com/storage/v1/b/<redactedbucket>

 

...but I don't think it's the code snippet above. 

I'm going to try to narrow this down a bit more, by trying one or more of these two approaches:

  • Create a small test case in a stand-along project (the added benefit is I don't have to redact everything), where that snippet is the ONLY code being called in the Cloud Function by the service account.
  • Remove the Nodejs client, and call the REST API directly

I'll follow-up with results; and thanks again for taking a look.

...actually nevermind those two approaches; found it by inspection. It WASN'T the snippet above. After the Cloud Function gets the extract's path using that snippet above, the data itself is loaded into BigQuery with this:

    // Configure the load job. For full list of options, see:
    // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
    const metadata = {
      sourceFormat: 'CSV',
      schema: schemas[tableId],
      skipLeadingRows: 1,
      autodetect: true,
      createDisposition: 'CREATE_NEVER',
      allowQuotedNewlines: true,
      writeDisposition: 'WRITE_TRUNCATE',
    };

    console.log(JSON.stringify({metadata}, null, 2));

    // Load data from a Google Cloud Storage file into the table
    const [job] = await bigquery
      .dataset(datasetId)
      .table(tableId)
      .load(storage.bucket(bucketName).file(filename), metadata);
    // load() waits for the job to finish
    console.log(`Job ${job.id} completed.`);

On a hunch, I took a look at the docs for batch loading CSVs from a bucket into BigQuery, and sure enough, storage.buckets.get is required.

That requirement seems a little bit excessive and probably unnecessary for loading data from a known object path, but the recommendation in the doc to grant the Storage Admin role is making me especially twitchy.

I'm going to leave feedback on the doc and create an issue in issue tracker suggesting the requirement for the permission itself be looked at.