I'm monitoring table updates per project, so I'm using INFORMATION_SCHEMA.TABLES. But TABLES only have "creation time", don't have anymore LAST_MODIFIED_TIME like in INFORMATION_SCHEMA.SCHEMATA. I tried to monitor with TABLES_STORAGE_BY_PROJECT > STORAGE_LAST_MODIFIED_TIME but the result is different from the Last Modified Time from the Details when I see manually the table in Big Query.
What can I do to monitor the update of all the tables from a project?
Solved! Go to Solution.
In Google Cloud BigQuery, the INFORMATION_SCHEMA.TABLES
view does not include the LAST_MODIFIED_TIME
column. However, you can use the __TABLES_SUMMARY__
meta-table to get the last modified time of a table.
Here's an example of how you can use it:
SELECT
table_id,
TIMESTAMP_MILLIS(last_modified_time) as last_modified_time
FROM
`your_project.your_dataset.__TABLES_SUMMARY__`
This will give you a list of all tables in the dataset along with their last modified time.
Please note that __TABLES_SUMMARY__
is a dataset-level meta-table, so you would need to run this query for each dataset in your project.
If you want to monitor the updates of all tables in a project, you might need to create a script or a scheduled query that periodically runs the above query for all datasets and then sends an alert or logs the result somewhere if a table has been updated.
Also, keep in mind that __TABLES_SUMMARY__
only provides information about native tables, not views or external tables. If you need to monitor updates to views or external tables, you would need a different approach, such as using Cloud Audit Logs to monitor changes to the underlying data sources.
In Google Cloud BigQuery, the INFORMATION_SCHEMA.TABLES
view does not include the LAST_MODIFIED_TIME
column. However, you can use the __TABLES_SUMMARY__
meta-table to get the last modified time of a table.
Here's an example of how you can use it:
SELECT
table_id,
TIMESTAMP_MILLIS(last_modified_time) as last_modified_time
FROM
`your_project.your_dataset.__TABLES_SUMMARY__`
This will give you a list of all tables in the dataset along with their last modified time.
Please note that __TABLES_SUMMARY__
is a dataset-level meta-table, so you would need to run this query for each dataset in your project.
If you want to monitor the updates of all tables in a project, you might need to create a script or a scheduled query that periodically runs the above query for all datasets and then sends an alert or logs the result somewhere if a table has been updated.
Also, keep in mind that __TABLES_SUMMARY__
only provides information about native tables, not views or external tables. If you need to monitor updates to views or external tables, you would need a different approach, such as using Cloud Audit Logs to monitor changes to the underlying data sources.
I don't see last_modified_time field in __TABLE_SUMMARY__ .
In BigQuery, tracking the exact last modified time for all tables within a project can be nuanced due to how BigQuery's architecture works. Here' are a few approaches, considering their advantages and potential limitations:
Using the __TABLES__
Meta-table (Legacy Approach)
__TABLES__
meta-table within a specific dataset.SELECT table_id, TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time
FROM `your_project.your_dataset.__TABLES__`
Using Audit Logs for Monitoring
Implementing a Comprehensive Monitoring Solution
While a single standard SQL query for project-wide last modified times isn't directly supported in BigQuery, a combination of techniques provides a robust way to monitor changes:
__TABLES__
for dataset-specific queries (use cautiously)@ms4446 - how long will __TABLES__ be supported? Frequently see issues ("...too many tables") when using INFORMATION_SCHEMA.TABLES as alternative path for last_modified_time
edit - referencing __TABLES__ as last_modified_time is not included in __TABLES_SUMMARY__
There are no plans to deprecate for __TABLES__
If you're experiencing issues with __TABLES__
and INFORMATION_SCHEMA.TABLES
, especially with the "too many tables" error, it might be due to the limitations of the metadata operations. When querying large datasets with many tables, these metadata queries can hit limits.
To address this:
INFORMATION_SCHEMA.TABLES
in combination with WHERE
clauses to limit the scope of your query.__TABLES_SUMMARY__
, you might need to rely on other methods or tools to track changes to your tables.To me the last modified also does not show up in the __TABLES_SUMMARY__.
However it is in __TABLES__ - so if using this, the proposed solution works.
Best regards
Yes! If the last_modified_time
field is missing from __TABLES_SUMMARY__
but available in __TABLES__
, then using __TABLES__
is a valid approach for obtaining table last modified times in BigQuery.
Why use __TABLES__
?
last_modified_time
for update monitoring.SELECT table_id, TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time
FROM `your_project.your_dataset.__TABLES__`
Points to remember:
__TABLES__
is considered a legacy feature. Stay updated with Google Cloud's documentation for any changes.Using __TABLES__
this way should effectively address your table update monitoring needs based on last modified times.