Re: biquery performance issue

Vijaya · 04-22-2022 09:39 AM

Hello Team,

I have this table which has one driving table and then almost 17 tables left joined on this, now because of this the data has kind of replicated and the table size has grown big.

currently its process 37G data. I would like to tune this table badly. But we don't have any dates or int column to which i can create partition on. any suggestion on this would be really helpful.

josegutierrez

I think that you can use a Temporary Table[1] to write the query result[2] and this would be faster to you when you query them.
Another thing that you could do is to do the query with a row_number and use this row_number[3] as the column that you can use to partition the table.

[1]https://cloud.google.com/bigquery/docs/multi-statement-queries#temporary_tables
[2]https://cloud.google.com/bigquery/docs/writing-results#temporary_and_permanent_tables
[3]https://cloud.google.com/bigquery/docs/reference/standard-sql/numbering_functions#row_number

databaseman

Your table doesn't have dates or int ? that is surprising...

However, you can get benefits by assigning cluster fields. Unfortunately, the query estimate feature does not accurately reflect the cluster field definition, however, when they are used properly, your queries will have smaller byte scans.

To setup Cluster Fields, you can think of this just like a compound index. You can have up to 3 cluster fields, but the second field is only valuable when the first is qualified, and the third field is only valuable when the first and second are qualified.

So choose fields which are most often qualified for cluster fields, and update your table definitions. You will then see lower byte scans.