Counting number of distinct proxypath suffix in a window of time

Not applicable

Hi All,

I'm looking to make a report on the following.


Each device in our ecosystem can be identified by an ID in the proxypath. The Developer has to use this ID in the path to actually make the call to the device.

For example calling GET ourapiurl.com/abc.11111/Status will give you a JSON with the status of device A; and GET /def.22222/ will give you a JSON with the status of device B

A developer might call each device multiple times in a week.

What I want the end result of this report to be is the number of distinct devices that are called in (for example) a week per developer (either developer app or developer email).

Any nudge in the right direction is of great help. Thanks in advance.


Kind regards,


Niels

Solved Solved
1 2 381
1 ACCEPTED SOLUTION

There are a couple ways to go about this.

You can do it with Custom Reports. In the Apigee Edge UI,

  1. Click into the analytics menu
  2. Reports
  3. + Custom Report
  4. under Metrics - select "Traffic". "Sum" will be automatically selected.
  5. under Dimensions - select 'Proxy Path suffix"
  6. You can add a filter - for example restricting the report to a particular API Proxy.
  7. done

That gives you traffic for Proxy Path suffix, which is not actually what you want. You want the count of unique proxy path suffixes. But if you have a small number of device IDs (= paths), then the count will be obvious and visible - it's just the number of buckets or vertical bars . (The sum of traffic will be indicated by the height of the vertical bars).

But maybe you don't want a visual report, and instead you want raw data. For that you can use the stats API, and it looks like this:

curl -i -n $mgmtserver/v1/o/${ORG}/environments/${ENV}/stats/proxy_pathsuffix?"select=sum(message_count)&timeRange=03/01/2018%2000:00~03/05/2018%2000:00&timeUnit=day&sortby=sum(message_count)&sort=DESC" 

The query is:

select=sum(message_count)&timeRange=03/01/2018%2000:00~03/05/2018%2000:00&timeUnit=day&sortby=sum(message_count)&sort=DESC

I'm not in love with the query syntax because it's kind difficult to construct the time range data. But anyway I hope you can see the parameters

  • select=sum(message_count)
  • timeRange=03/01/2018%2000:00~03/05/2018%2000:00
  • timeUnit=day
  • sortby=sum(message_count)
  • sort=DESC

The output of that query is something like this:

{
  "environments" : [ {
    "dimensions" : [ {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "377036.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "314971.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "499420.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "285603.0"
        } ]
      } ],
      "name" : "/t1"
    }, {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "115926.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "113718.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "190219.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "100976.0"
        } ]
      } ],
      "name" : "/status"
    }, {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "62527.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "56846.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "95124.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "50487.0"
        } ]
      } ],
      "name" : "/token"
    } ],
    "name" : "prod"
  } ],
  "metaData" : {
    "errors" : [ ],
    "notices" : [ "query served by:825fb13d-6acc-40fc-991d-1269bcff154d", "Source:Big Query", "Table used: uap-prod-gcp-us-east1-3.edge.edge_api_raxgroup026_fact" ]
  }
}

Post-processing THAT, you could just could the number of items in the metrics array - that is the count of unique paths used in that time period.

Learn about the stats API by reading the documentation. See the guide to the analytics API here, and the see the reference information here.

As you can see, There are some paths in the result I showed above that are "edge cases" - blank paths or test paths or whatever. In your case, there might be similar paths, that don't contain a device ID. You want to filter those out, somehow. You could do that in post-processing of the JSON, but there's a more direct way to do it.

If you used the StatisticsCollector policy within your API Proxy to record the actual device ID, you could query on that directly. I suppose you'd use an ExtractVariables to get the device ID out of the path, and into a context variable. Maybe you already do that? Anyway after the device ID is contained within a context variable, if you use a StatisticsCollector policy to record that particular context variable for every appropriate request, then that dimension will be added to the list of queryable dimensions in Apigee Edge Analytics. I see you've tagged your question with "statistics collector" so maybe you're already doing this?

In that case you could query on the custom dimension you've added, like so:

curl -i -n $mgmtserver/v1/o/${ORG}/environments/${ENV}/stats/device_id...

That would eliminate any paths that don't include the device ID.

helpful?

View solution in original post

2 REPLIES 2

There are a couple ways to go about this.

You can do it with Custom Reports. In the Apigee Edge UI,

  1. Click into the analytics menu
  2. Reports
  3. + Custom Report
  4. under Metrics - select "Traffic". "Sum" will be automatically selected.
  5. under Dimensions - select 'Proxy Path suffix"
  6. You can add a filter - for example restricting the report to a particular API Proxy.
  7. done

That gives you traffic for Proxy Path suffix, which is not actually what you want. You want the count of unique proxy path suffixes. But if you have a small number of device IDs (= paths), then the count will be obvious and visible - it's just the number of buckets or vertical bars . (The sum of traffic will be indicated by the height of the vertical bars).

But maybe you don't want a visual report, and instead you want raw data. For that you can use the stats API, and it looks like this:

curl -i -n $mgmtserver/v1/o/${ORG}/environments/${ENV}/stats/proxy_pathsuffix?"select=sum(message_count)&timeRange=03/01/2018%2000:00~03/05/2018%2000:00&timeUnit=day&sortby=sum(message_count)&sort=DESC" 

The query is:

select=sum(message_count)&timeRange=03/01/2018%2000:00~03/05/2018%2000:00&timeUnit=day&sortby=sum(message_count)&sort=DESC

I'm not in love with the query syntax because it's kind difficult to construct the time range data. But anyway I hope you can see the parameters

  • select=sum(message_count)
  • timeRange=03/01/2018%2000:00~03/05/2018%2000:00
  • timeUnit=day
  • sortby=sum(message_count)
  • sort=DESC

The output of that query is something like this:

{
  "environments" : [ {
    "dimensions" : [ {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "377036.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "314971.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "499420.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "285603.0"
        } ]
      } ],
      "name" : "/t1"
    }, {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "115926.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "113718.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "190219.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "100976.0"
        } ]
      } ],
      "name" : "/status"
    }, {
      "metrics" : [ {
        "name" : "sum(message_count)",
        "values" : [ {
          "timestamp" : 1520121600000,
          "value" : "62527.0"
        }, {
          "timestamp" : 1520035200000,
          "value" : "56846.0"
        }, {
          "timestamp" : 1519948800000,
          "value" : "95124.0"
        }, {
          "timestamp" : 1519862400000,
          "value" : "50487.0"
        } ]
      } ],
      "name" : "/token"
    } ],
    "name" : "prod"
  } ],
  "metaData" : {
    "errors" : [ ],
    "notices" : [ "query served by:825fb13d-6acc-40fc-991d-1269bcff154d", "Source:Big Query", "Table used: uap-prod-gcp-us-east1-3.edge.edge_api_raxgroup026_fact" ]
  }
}

Post-processing THAT, you could just could the number of items in the metrics array - that is the count of unique paths used in that time period.

Learn about the stats API by reading the documentation. See the guide to the analytics API here, and the see the reference information here.

As you can see, There are some paths in the result I showed above that are "edge cases" - blank paths or test paths or whatever. In your case, there might be similar paths, that don't contain a device ID. You want to filter those out, somehow. You could do that in post-processing of the JSON, but there's a more direct way to do it.

If you used the StatisticsCollector policy within your API Proxy to record the actual device ID, you could query on that directly. I suppose you'd use an ExtractVariables to get the device ID out of the path, and into a context variable. Maybe you already do that? Anyway after the device ID is contained within a context variable, if you use a StatisticsCollector policy to record that particular context variable for every appropriate request, then that dimension will be added to the list of queryable dimensions in Apigee Edge Analytics. I see you've tagged your question with "statistics collector" so maybe you're already doing this?

In that case you could query on the custom dimension you've added, like so:

curl -i -n $mgmtserver/v1/o/${ORG}/environments/${ENV}/stats/device_id...

That would eliminate any paths that don't include the device ID.

helpful?

Thanks very much Dino! That's definitely a great nudge in the right direction. I did some reading on this subject and suspected the statistics collector was possibly involved somehow, but I couldn't quite piece it together.