Re: How to export data from a BigQuery table to Cl...

java2018 · 10-24-2023 02:58 AM

When attempting to export data from a BigQuery table to Cloud Storage, you encountered an issue. The error message indicates that the operation cannot be performed on a nested schema field event_params. How can this issue be resolved?

code：
destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.json")

dataset_ref = bigquery.DatasetReference(project, dataset_id)

table_ref = dataset_ref.table(table_id)

print(destination_uri, table_ref)

extract_job = client.extract_table(

table_ref,

destination_uri,

# Location must match that of the source table.

location="us",

) # API request

extract_job.result() # Waits for job to complete.

error message:

Traceback (most recent call last):
File "/home/tangbo508/python-files/test.py", line 25, in <module>
extract_job.result() # Waits for job to complete.
File "/usr/local/lib/python3.9/dist-packages/google/cloud/bigquery/job/base.py", line 922, in result
return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/google/api_core/future/polling.py", line 261, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Operation cannot be performed on a nested schema. Field: event_params

Poala_Tenorio

The error message you're encountering indicates that you're trying to export a BigQuery table to Cloud Storage, but there's a nested schema field called "event_params" in the table, which is causing the issue. BigQuery doesn't support exporting tables with nested fields directly to Cloud Storage.

To resolve this issue, you can perform a flattening of the nested schema, so that you're working with a flat structure. You can use the FLATTEN function in a SQL query to do this before exporting the data. Here's an example of how to modify your code to flatten the table:

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = "your_bucket_name"
dataset_id = "your_dataset_id"
table_id = "your_table_id"
destination_uri = f"gs://{bucket_name}/shakespeare.json"

# Build a query to flatten the nested schema
query = f"""
SELECT * FROM `{project}.{dataset_id}.{table_id}` """

job_config = bigquery.QueryJobConfig(destination=destination_uri, write_disposition="WRITE_TRUNCATE")

query_job = client.query(query, location="US", job_config=job_config)
query_job.result() # Wait for the query job to complete print(f"Table data exported to {destination_uri}")

By using a query that flattens the nested schema, you should be able to export the data to Cloud Storage without encountering the "nested schema" error.

How to export data from a BigQuery table to Cloud Storage?