Cloud Storage extensions not downloading xlsx files correctly

Hi,

I am facing issues trying to use a Google Cloud Extension for Cloud Storage in APIgee Edge.

Issue is very similar to another one unresolved using serviceCallout : File response from service callout.

I want to use cloud storage extension in order to have an endpoint which directly download some Excel File from google cloud storage.

Coded an extension callout this way:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ConnectorCallout async="false" continueOnError="true" enabled="true" name="EC-storageTest">
    <DisplayName>EC-storageTest</DisplayName>
    <Connector>GCStorage</Connector>
    <Action>downloadFile</Action>
    <Input><![CDATA[{"fileName": "test.xlsx"}]]></Input>
    <Output>storage.file.retrieved</Output>
</ConnectorCallout>

And a subsequent assignMessage this way:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="AM-AssignStorageFile">
    <DisplayName>AM-AssignStorageFile</DisplayName>
    <AssignTo type="response" createNew="false"/>
    <Set>
        <Payload contentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet">{storage.file.retrieved.content}</Payload>
    </Set>
</AssignMessage>

Calling my endpoint from insomnia:

10755-response.png

seems a proper xlsx file, also header "Content-type" is correctly set as "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" but if i try to save the response as a file and open with Excel i get a corrupted file alert:

10756-error.png

Am I missing something ?

Documentation only shows response as application/json and using txt files in plain text.

Thank you.

Solved Solved
0 3 1,649
1 ACCEPTED SOLUTION

Have you considered just going direct? In other words using the REST interface to the Google Cloud Storage Bucket as the target of the API Proxy?

I know it's nice to have the connector to help manage authentication and so on. But it seems there's a problem with the way the connector handles content-type. We can avoid that by "going direct."

Google Cloud storage has a REST API. And it's very simple to use, when you're retrieving a known object. Send a GET request to the API endpoint specifying something like /b/BUCKET/o/OBJECT, and you get the object.

Well, not quite - you'd get the metadata about the object, not the content of the object. The metadata for one of my objects stored in GCS looks like this:

{
  "kind": "storage#object",
  "id": "dchiesa-gcs-testing/SIG-Lite-Apigee-answers-20200401.xlsx/1611938346242483",
  "selfLink": "https://www.googleapis.com/storage/v1/b/dchiesa-gcs-testing/o/SIG-Lite-Apigee-answers-20200401.xlsx",
  "mediaLink": "https://storage.googleapis.com/download/storage/v1/b/dchiesa-gcs-testing/o/SIG-Lite-Apigee-answers-20200401.xlsx?generation=1611938346242483&alt=media",
  "name": "SIG-Lite-Apigee-answers-20200401.xlsx",
  "bucket": "dchiesa-gcs-testing",
  "generation": "1611938346242483",
  "metageneration": "1",
  "contentType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
  "storageClass": "STANDARD",
  "size": "117597",
  "md5Hash": "yDWKOttlIxfwaM+gThvBnw==",
  "crc32c": "Qso4+Q==",
  "etag": "CLOjnKTKwe4CEAE=",
  "timeCreated": "2021-01-29T16:39:06.243Z",
  "updated": "2021-01-29T16:39:06.243Z",
  "timeStorageClassUpdated": "2021-01-29T16:39:06.243Z"
}

To get the actual content, I'd need to send a GET to the mediaLink endpoint.

The only tricky part of this system is the authorization. Like all the APIs for *.googleapis.com, the apis for Google Cloud Storage accept an OAuth token for authentication. And one easy way to get a token is to use a service account credential.

Google describes how:

  1. create a Service account credential
  2. grant it the proper "roles" for use within Google Cloud. To read GCS objects, grant that Service Account the "Storage Object Viewer" role.
  3. Download the JSON key, which will contain the private key for that service account
  4. use the private key to sign a JWT, of a particular form.
  5. Send that signed JWT to https://oauth2.googleapis.com/token
  6. receive the OAuth token in response

at that point, using the token is pretty easy:

  1. Use the OAuth token in a GET request to https://storage.googleapis.com/storage/v1/b/BUCKETNAME/o/OBJECTNAME to get the metadata about the object
  2. Invoke the mediaLink URL in the response (again passing the OAuth token) to get the actual object content.

We can do all of this within Apigee, using the KVM to store the JSON key file securely, and using the Apigee cache to store the obtained token.

I put together an example of this here.

And you can see me walk through a Youtube screencast of this solution, here.

View solution in original post

3 REPLIES 3

Have you considered just going direct? In other words using the REST interface to the Google Cloud Storage Bucket as the target of the API Proxy?

I know it's nice to have the connector to help manage authentication and so on. But it seems there's a problem with the way the connector handles content-type. We can avoid that by "going direct."

Google Cloud storage has a REST API. And it's very simple to use, when you're retrieving a known object. Send a GET request to the API endpoint specifying something like /b/BUCKET/o/OBJECT, and you get the object.

Well, not quite - you'd get the metadata about the object, not the content of the object. The metadata for one of my objects stored in GCS looks like this:

{
  "kind": "storage#object",
  "id": "dchiesa-gcs-testing/SIG-Lite-Apigee-answers-20200401.xlsx/1611938346242483",
  "selfLink": "https://www.googleapis.com/storage/v1/b/dchiesa-gcs-testing/o/SIG-Lite-Apigee-answers-20200401.xlsx",
  "mediaLink": "https://storage.googleapis.com/download/storage/v1/b/dchiesa-gcs-testing/o/SIG-Lite-Apigee-answers-20200401.xlsx?generation=1611938346242483&alt=media",
  "name": "SIG-Lite-Apigee-answers-20200401.xlsx",
  "bucket": "dchiesa-gcs-testing",
  "generation": "1611938346242483",
  "metageneration": "1",
  "contentType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
  "storageClass": "STANDARD",
  "size": "117597",
  "md5Hash": "yDWKOttlIxfwaM+gThvBnw==",
  "crc32c": "Qso4+Q==",
  "etag": "CLOjnKTKwe4CEAE=",
  "timeCreated": "2021-01-29T16:39:06.243Z",
  "updated": "2021-01-29T16:39:06.243Z",
  "timeStorageClassUpdated": "2021-01-29T16:39:06.243Z"
}

To get the actual content, I'd need to send a GET to the mediaLink endpoint.

The only tricky part of this system is the authorization. Like all the APIs for *.googleapis.com, the apis for Google Cloud Storage accept an OAuth token for authentication. And one easy way to get a token is to use a service account credential.

Google describes how:

  1. create a Service account credential
  2. grant it the proper "roles" for use within Google Cloud. To read GCS objects, grant that Service Account the "Storage Object Viewer" role.
  3. Download the JSON key, which will contain the private key for that service account
  4. use the private key to sign a JWT, of a particular form.
  5. Send that signed JWT to https://oauth2.googleapis.com/token
  6. receive the OAuth token in response

at that point, using the token is pretty easy:

  1. Use the OAuth token in a GET request to https://storage.googleapis.com/storage/v1/b/BUCKETNAME/o/OBJECTNAME to get the metadata about the object
  2. Invoke the mediaLink URL in the response (again passing the OAuth token) to get the actual object content.

We can do all of this within Apigee, using the KVM to store the JSON key file securely, and using the Apigee cache to store the obtained token.

I put together an example of this here.

And you can see me walk through a Youtube screencast of this solution, here.

Thank you Dino,

I appreciate the fullfilling answer, I will definitively put in place your solution.

For what regards the authentication part, I could easily use the Google Auth Extension in Apigee to retrieve the access token, do you agree ?

Yes, I agree with that: You can use the Google Auth Extension to retrieve the access token.