INFORMATION_SCHEMA.TABLES Monitoring last modified time

I'm monitoring table updates per project, so I'm using INFORMATION_SCHEMA.TABLES. But TABLES only have "creation time", don't have anymore LAST_MODIFIED_TIME like in INFORMATION_SCHEMA.SCHEMATA. I tried to monitor with TABLES_STORAGE_BY_PROJECT > STORAGE_LAST_MODIFIED_TIME but the result is different from the Last Modified Time from the Details when I see manually the table in Big Query.

What can I do to monitor the update of all the tables from a project? 

Solved Solved
1 7 9,837
1 ACCEPTED SOLUTION

In Google Cloud BigQuery, the INFORMATION_SCHEMA.TABLES view does not include the LAST_MODIFIED_TIME column. However, you can use the __TABLES_SUMMARY__ meta-table to get the last modified time of a table.

Here's an example of how you can use it:

SELECT
table_id,
TIMESTAMP_MILLIS(last_modified_time) as last_modified_time
FROM
`your_project.your_dataset.__TABLES_SUMMARY__`

This will give you a list of all tables in the dataset along with their last modified time.

Please note that __TABLES_SUMMARY__ is a dataset-level meta-table, so you would need to run this query for each dataset in your project.

If you want to monitor the updates of all tables in a project, you might need to create a script or a scheduled query that periodically runs the above query for all datasets and then sends an alert or logs the result somewhere if a table has been updated.

Also, keep in mind that __TABLES_SUMMARY__ only provides information about native tables, not views or external tables. If you need to monitor updates to views or external tables, you would need a different approach, such as using Cloud Audit Logs to monitor changes to the underlying data sources.

View solution in original post

7 REPLIES 7

In Google Cloud BigQuery, the INFORMATION_SCHEMA.TABLES view does not include the LAST_MODIFIED_TIME column. However, you can use the __TABLES_SUMMARY__ meta-table to get the last modified time of a table.

Here's an example of how you can use it:

SELECT
table_id,
TIMESTAMP_MILLIS(last_modified_time) as last_modified_time
FROM
`your_project.your_dataset.__TABLES_SUMMARY__`

This will give you a list of all tables in the dataset along with their last modified time.

Please note that __TABLES_SUMMARY__ is a dataset-level meta-table, so you would need to run this query for each dataset in your project.

If you want to monitor the updates of all tables in a project, you might need to create a script or a scheduled query that periodically runs the above query for all datasets and then sends an alert or logs the result somewhere if a table has been updated.

Also, keep in mind that __TABLES_SUMMARY__ only provides information about native tables, not views or external tables. If you need to monitor updates to views or external tables, you would need a different approach, such as using Cloud Audit Logs to monitor changes to the underlying data sources.

I don't see last_modified_time field in __TABLE_SUMMARY__ .

In BigQuery, tracking the exact last modified time for all tables within a project can be nuanced due to how BigQuery's architecture works. Here' are a few approaches, considering their advantages and potential limitations:

Using the __TABLES__ Meta-table (Legacy Approach)

  • Dataset Scope: This method queries the __TABLES__ meta-table within a specific dataset.
  • Caveat: This is a legacy SQL feature, and its use is generally discouraged in favor of standard SQL and the INFORMATION_SCHEMA views.
SELECT table_id, TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time 
FROM `your_project.your_dataset.__TABLES__` 

Using Audit Logs for Monitoring

  • Granular Tracking: BigQuery audit logs track creation, queries, and data changes at the project level.
  • Robustness: This offers a more reliable way to monitor table updates.

Implementing a Comprehensive Monitoring Solution

  1. Export Audit Logs: Set up a sink to export BigQuery audit logs to a separate BigQuery dataset.
  2. Scheduled Queries / Functions: Analyze the logs with scheduled queries or a Cloud Function to pinpoint recent table updates.
  3. Alerting: Upon detection of table updates, trigger alerts (email, Slack, etc.)

While a single standard SQL query for project-wide last modified times isn't directly supported in BigQuery, a combination of techniques provides a robust way to monitor changes:

  • Legacy __TABLES__ for dataset-specific queries (use cautiously)
  • Audit logs for in-depth tracking
  • Custom monitoring solutions for tailored notification and alerting

@ms4446  - how long will __TABLES__ be supported? Frequently see issues ("...too many tables") when using INFORMATION_SCHEMA.TABLES as alternative path for last_modified_time 

edit - referencing __TABLES__ as last_modified_time is not included in __TABLES_SUMMARY__

There are no plans to deprecate  for __TABLES__  If you're experiencing issues with __TABLES__ and INFORMATION_SCHEMA.TABLES, especially with the "too many tables" error, it might be due to the limitations of the metadata operations. When querying large datasets with many tables, these metadata queries can hit limits.

To address this:

  1. Consider using INFORMATION_SCHEMA.TABLES in combination with WHERE clauses to limit the scope of your query.
  2. If you're trying to get the last modified time and it's not available in __TABLES_SUMMARY__, you might need to rely on other methods or tools to track changes to your tables.

To me the last modified also does not show up in the __TABLES_SUMMARY__.

However it is in __TABLES__  - so if using this, the proposed solution works.

Best regards

Yes! If the last_modified_time field is missing from __TABLES_SUMMARY__ but available in __TABLES__, then using __TABLES__ is a valid approach for obtaining table last modified times in BigQuery.

Why use __TABLES__?

  • It provides various table metadata, including the crucial last_modified_time for update monitoring.
  • Here's an example query to retrieve last modified times for a specific dataset:
SELECT table_id, TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time
FROM `your_project.your_dataset.__TABLES__`

Points to remember:

  • Dataset-Level Meta-Table: Run this query for each dataset to monitor all tables.
  • Large Datasets: Performance issues or limits might arise for large datasets. Consider filtering or chunking the query.
  • Legacy Feature: While not deprecated, __TABLES__ is considered a legacy feature. Stay updated with Google Cloud's documentation for any changes.
  • Monitoring Strategy: Use scheduled scripts or BigQuery's scheduled queries for continuous monitoring.
  • Detailed Monitoring: For detailed tracking, consider BigQuery's audit logs.

Using __TABLES__ this way should effectively address your table update monitoring needs based on last modified times.