Dataform operations, validate if csv exist in bucket before trying to upload its content

Good day, I have the following sqlx file of type "operations", it is loading data from a csv file (in a bucket) into a table named incidents, the thing is that this csv file can or can not be present, since not everyday you have incidents (the csv has incidents from the previous date so it usually has a name like for example test28042024), so the execution fails if the csv is not present at the bucket, is there any way to validate before trying to load the data? I tried by adding the IF EXISTS clause but it is not working this way, maybe there is a way to make this validations?


 

 

 

 

config {
    type: "operations",
    hasOutput: true
}

js {

    function getYesterday() {
        const today = new Date();
        const localOffset = -(today.getTimezoneOffset() / 60);
        const peruOffset = -5; // Peru timezone offset (GMT-5)
        const offset = peruOffset - localOffset;

        const yesterday = new Date(today.getTime() - 24 * 3600 * 1000 + offset * 3600 * 1000);

        const dd = String(yesterday.getDate()).padStart(2, '0');
        const mm = String(yesterday.getMonth() + 1).padStart(2, '0');
        const yyyy = yesterday.getFullYear();

        return dd + mm + yyyy;
    }

}

-- Load data only if the CSV file exists
IF EXISTS ('gs://bucket/test${getYesterday()}.csv') THEN
  LOAD DATA INTO
    project.dataset.incidents (incidentId Integer,
      name String,
      timestamp Integer)
  FROM FILES ( format = 'CSV',
      uris = ['gs://bucket/test${getYesterday()}.csv']);
ELSE
  -- Handle the case when the CSV file does not exist (e.g., log a message)
  LOG 'CSV file not found. Skipping data load.';
END IF;

 

 

 

 

Finally this is how the csv can look like (I wanted to use headers but could not figure out how to include them without the load process failing):

1,SOMENAME,1714298231000
2,ANOTHERNAME,1714298231000
3,ANYNAME,1714304510000
4,UNIQUENAME,1714304510000

Really appreciate all your help

1 2 93
2 REPLIES 2

Just wanted to give a quick update, I found this documentation https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#load_data_statement and managed to make it skip headers with the 

skip_leading_rows=1 parameter, so it looks like this now (still have not find how to validate if the csv exist, also searching on how to validate if the csv exist but it is empty):
 

 

config {
    type: "operations",
    hasOutput: true
}

js {

    function getYesterday() {
        const today = new Date();
        const localOffset = -(today.getTimezoneOffset() / 60);
        const peruOffset = -5; // Peru timezone offset (GMT-5)
        const offset = peruOffset - localOffset;

        const yesterday = new Date(today.getTime() - 24 * 3600 * 1000 + offset * 3600 * 1000);

        const dd = String(yesterday.getDate()).padStart(2, '0');
        const mm = String(yesterday.getMonth() + 1).padStart(2, '0');
        const yyyy = yesterday.getFullYear();

        return dd + mm + yyyy;
    }

}


  LOAD DATA INTO
    project.dataset.incidents (incidentId Integer,
      name String,
      timestamp Integer)
  FROM FILES (skip_leading_rows=1, format = 'CSV',
      uris = ['gs://bucket/test${getYesterday()}.csv']);

 

 
 

I need to validate existence of the CSV files too. Does anybody knows how to do this in Dataform?

I want to avoid creating  a DAG in Composer (or Airflow) to validate the existence of CSV files using a bash or python script. 

--
Best regards
David Regaladoxq
Web | Linkedin | Cloudskillsboost