Hi team,
I've setup a data exchange and created a listing in it. I subscribed to it and this created a linked dataset in my target project (different from source project). When I try to run time travel queries for tables in my linked dataset (select * from target_project.linked_dataset.standard_table for system_time as of timestamp() ) they fail with Not found: Table standard_table:linked_dataset.standard_table@1713797390587 was not found in location US at [1:1] error even though the timestamp used in the query is earlier than 2 days (minimum time travel retention period). I've verified that both my source dataset (aka shared dataset according to Analytics Hub ) and linked datasets are in the same regions (US).
My question is how can we use time travel for standard tables in linked dataset to retrieve the data at some time in near past?
Thanks
Solved! Go to Solution.
The error you're getting suggests there's a disconnect between how time travel is normally handled in BigQuery and how it interacts with linked datasets from Analytics Hub. While time travel works seamlessly on native BigQuery tables, linked datasets present a unique scenario.
Why Standard Time Travel Doesn't Work as Expected
Refined Approach for Time Travel with Linked Datasets
Identify Source Tables: Determine the exact tables in the source project's dataset that correspond to the linked dataset tables you're trying to query.
Direct Time Travel on Source: Instead of querying the linked dataset directly with time travel, construct your queries to target the source tables in the original dataset.
Example: If your linked dataset is target_project.linked_dataset.standard_table
, find its corresponding source table (e.g., source_project.source_dataset.standard_table
) and use this in your time travel query:
SELECT *
FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
Additional Considerations
Example (Complete Flow)
Listing in Data Exchange (Source Project): source_project.source_dataset
(shared as source_dataset
)
Linked Dataset (Target Project): target_project.linked_dataset
Query:
SELECT *
FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
One more point. I've not set any data egress restriction on the listing in the data exchange
The error you're getting suggests there's a disconnect between how time travel is normally handled in BigQuery and how it interacts with linked datasets from Analytics Hub. While time travel works seamlessly on native BigQuery tables, linked datasets present a unique scenario.
Why Standard Time Travel Doesn't Work as Expected
Refined Approach for Time Travel with Linked Datasets
Identify Source Tables: Determine the exact tables in the source project's dataset that correspond to the linked dataset tables you're trying to query.
Direct Time Travel on Source: Instead of querying the linked dataset directly with time travel, construct your queries to target the source tables in the original dataset.
Example: If your linked dataset is target_project.linked_dataset.standard_table
, find its corresponding source table (e.g., source_project.source_dataset.standard_table
) and use this in your time travel query:
SELECT *
FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
Additional Considerations
Example (Complete Flow)
Listing in Data Exchange (Source Project): source_project.source_dataset
(shared as source_dataset
)
Linked Dataset (Target Project): target_project.linked_dataset
Query:
SELECT *
FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
@ms4446 Thanks for your reply. I've a follow up question. How can we detect if a dataset is a original source dataset (that contains data) or is a linked dataset in Java code?
Based on my debugging, I could make use of Dataset.getDescription(). If it contains something like Source listing: <listing_name> we can say its a linked dataset . But this approach can fail in some edge cases. So I would want to know a better and concrete approach.
The most reliable way to distinguish between linked datasets and source datasets is to examine specific metadata properties associated with BigQuery datasets.
Here's an improved Java code snippet leveraging the BigQuery API:
import com.google.cloud.bigquery.*;
// ... (other imports)
public boolean isLinkedDataset(BigQuery bigquery, DatasetId datasetId) {
Dataset dataset = bigquery.getDataset(datasetId);
if (dataset == null) {
// Dataset not found, handle this case appropriately
return false;
}
DatasetInfo datasetInfo = dataset.toBuilder().build();
// Check for the presence of the linked dataset metadata
if (datasetInfo.getLinkedDataset() != null && !datasetInfo.getLinkedDataset().isEmpty()) {
return true; // It's a linked dataset
}
return false; // It's not a linked dataset
}
com.google.cloud.bigquery
library to interact with BigQuery.isLinkedDataset
Function:
bigquery
) and a DatasetId
object (specifying the project ID, dataset ID).bigquery.getDataset()
.DatasetInfo.getLinkedDataset()
method. If this returns a non-empty value, it indicates that the dataset is linked to another dataset in a different project.Explanation:
Dataset
object representing the target dataset.DatasetInfo
object, which contains detailed metadata about the dataset.DatasetInfo.getLinkedDataset()
property. If it has a value (a list of strings representing linked dataset information), we know it's a linked dataset.Why this is Superior to Description Parsing:
Example:
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
DatasetId datasetId = DatasetId.of("your-project-id", "your-dataset-id");
if (isLinkedDataset(bigquery, datasetId)) {
System.out.println("Dataset is a linked dataset.");
} else {
System.out.println("Dataset is an original source dataset.");
}
Important Note: Make sure you have the necessary BigQuery permissions to access dataset metadata.
@ms4446 I'm not able to find DatasetInfo.getLinkedDataset() method. I'm using com.google.cloud:google-cloud-bigquery:2.5.1. I also checked the DatasetInfo class of the latest version 2.39.1 but couldn't find getLinkedDataset() method.
My apologies for the confusion. The getLinkedDataset()
method was recently introduced in the BigQuery Java client library (version 2.13.0 and later), and it's not present in older versions like 2.5.1.
Alternative Approach for Older Versions
Since you are using version 2.5.1, you'll need a slightly different approach to detect linked datasets. You can still use the description parsing technique you mentioned earlier, but with a more robust check:
import com.google.cloud.bigquery.*;
// ... (other imports)
public boolean isLinkedDataset(BigQuery bigquery, DatasetId datasetId) {
Dataset dataset = bigquery.getDataset(datasetId);
if (dataset == null) {
// Dataset not found
return false;
}
String description = dataset.getDescription();
// Check for specific linked dataset description pattern
return description != null && description.matches("Source listing: .*");
}
Explanation:
dataset.getDescription()
.Source listing: .*
to check if the description matches the pattern typically found in linked dataset descriptions. This pattern indicates that the dataset is linked to a listing in a data exchange.Important Considerations:
getLinkedDataset()
method, it is still fairly reliable as the description pattern for linked datasets is consistent.Upgrading Library Version (Recommended):
If possible, I strongly recommend upgrading your BigQuery Java client library to a newer version (2.13.0 or later). This will give you access to the more reliable getLinkedDataset()
method, eliminating the need for description parsing.
Example (Upgraded Version):
Assuming you've upgraded your library to a compatible version, here's how you'd use the getLinkedDataset()
method:
public boolean isLinkedDataset(BigQuery bigquery, DatasetId datasetId) {
Dataset dataset = bigquery.getDataset(datasetId);
if (dataset == null) {
// Dataset not found
return false;
}
DatasetInfo datasetInfo = dataset.toBuilder().build();
// Use the dedicated method (available in version 2.13.0+)
return datasetInfo.getLinkedDataset() != null && !datasetInfo.getLinkedDataset().isEmpty();
}
User | Count |
---|---|
4 | |
1 | |
1 | |
1 | |
1 |