Data Collectors, DataCapture & many proxies: capture data from all the proxies: best practices need,

Hi, Community!
I have quite a big number of proxies under my support.
Management would like to collect analytics data
(app names, user names, dates-times & so on)
from all of the proxies & export data to BigQuery.
Exported data is going to be consumed &
processed by some Machine Learning.

Given:

  • many proxies.
  • following Data Collectors: "dc_app_name", "dc_req_email",
    "dc_req_username", "dc_response_header_date" & so on.
  • separate DataCapture policies in each proxy.
  • OR SharedFlow with DataCapture policy.


My question is, what are the best practices for the "Data Collectors" & "DataCapture" policy?
Specifically, from the documentation & my tiny Proof Of Concept, it seems to me that
at first look,

Apigee Hybrid captures data, e.g. emails from all the many proxies
into the single "dc_req_email" resulting that all the emails from all proxies being stored mixed,
"Many proxies - to One data collector (emails)".

However,  in the DataCapture policy  |  Apigee X  |  Google Cloud there is a note:

Show More
  • If you use a Data Collector in multiple policies, the captured data will be overwritten by the last policy that executes.
0 3 472
3 REPLIES 3

I am hoping that maybe @dknezic @dchiesa1 @shirishv @markjkelly @kurtkanaskie has some advice.

Many dimensions and metrics are already captured in analytics by default, and this also includes the ones you're referring to. You can refer to the list here

The data capture policy can be used to supplement this list with your own additional analytics custom dimensions. As mentioned, you'll want to have only a single data capture policy in your API proxy if you end up needing to use it. 

The built in analytics collects aggregate data based on the metrics and dimensions, but not message content (e.g. headers, request body content), so you would need to use the Data Capture policy for those fields. You can place multiple Capture elements in that policy.

For example:

<DataCapture name="DC-custom" continueOnError="false" enabled="true">
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <Capture>
        <Collect ref="flow.username" default="0"/>
        <DataCollector>dc_req_username</DataCollector>
    </Capture>
    <Capture>
        <Collect ref="flow.useremail" default="0"/>
        <DataCollector>dc_req_email</DataCollector>
    </Capture>
</DataCapture>

 

Since you've looked at Data Capture you may have seen Exporting data from Analytics which describes how you can export data to Big Query. This will include your custom data.