Solved: Re: Questions of the getting analytics data from t...

Report Inappropriate Content · 05-01-2018 10:01 PM

Hi,

I have some questions regarding the Apigee’s stats API and how it works.

To give some context on these questions we want to be able to get consistent and complete data from the stats API, some challenges are the 14400 transaction limitation on returned data (see reference 1), if the Spark engine or Postgres is invoked and if the Off Set parameter can be used. (reference 2: Apigee documentation)

Question 1:

Is there any documented business rules that state when the Spark engine will be used vs when Postgres data source is used? Can this be known prior to the data being returned?

I have observed that when request_uri is included in the dimension and the fact is sum(message_count)this triggers the data being returned as a Spark source.When dimensions like developer_email a developer_app with the fact also is sum(message_count) are used this returns the source as Postgres.

Question 2:

When the Spark Engine is used and 14400 records are returned, what is the best method to achieve multiple calls to get the full data set greater than the 14400? Noting that offset will not work?

The only suggestion that I can think of off the top of my head is to programmatically limit the input time range to reduce the number below 14400, then record the end date/time and make this the start date/time for the next call, then looping through.

Question 3:

On the documentation page, the Accuracy parameter mentions getting data from the raw data set. Is the default value (e.g. the accuracy parameter not being used) the same as “accuracy=0”

Question 4:

In the light of if the real-time value is set to false, there will be a 3 hour delay in availability of query data. If no real-time value is set, what is the delay? How does this relate to the 10 minutes delay for data that appears in the management API Calls? (see ref 2).

thanks for your help in advance.

Reference 1:

https://community.apigee.com/articles/25311/limit-parameter-restriction-and-using-the-offsetli.html

---------------------------------------------------------------

Reference 2:

https://apidocs.apigee.com/management/apis/get/organizations/%7Borg_name%7D/environments/%7Benv_name...

---------------------------------------------------------------

spadmanabhan

@brucejenkins Hi

here are some answers to your questions. Can you tell us which region (NA, EU, APAC, etc.) where your data is - that may have some bearing on the answers.

1) when spark vs. when postgres.

Actually, the real issue for us is the following: can a query API be routed to aggregation tables (in Postgres) or do they need to be routed to the fact data. Now, the fact data is queried by Google Bigquery in NA, EU, etc. and spark is used for this in APAC. The routing is done based on analyzing the query parameters. If a query is addressable by an aggregation, then it will be routed to that table. "request_uri" is not an attribute of our aggregations and hence your query got routed to fact tables for processing.

The response json includes the notices field which will indicate the source of processing and the DB engine. Same query template will be processed by same engine -- it does not change between invocations.

2) Spark engine is used for the "stats" api only in APAC as i mentioned above. If you see this and would like to retrieve >14400 result rows, one new option is our Asynchronous query processing Alpha which can be enabled for "paid" customers. Asynchronous query is an option for all paid customers who would like to do "more complex" queries.

With the "stats" query call, your idea to break the query down into <=14400 rows is the viable approach for spark engine.

All other engines support limit and offset.

3) Accuracy is always 100% now and this parameter is deprecated internally. We do need to remove from docs.

4) All our data is being processed and made available within 10 minutes (or less) in all regions today. The realtime parameter is also deprecated for this reason.

View solution in original post

spadmanabhan

@brucejenkins Hi

here are some answers to your questions. Can you tell us which region (NA, EU, APAC, etc.) where your data is - that may have some bearing on the answers.

1) when spark vs. when postgres.

Actually, the real issue for us is the following: can a query API be routed to aggregation tables (in Postgres) or do they need to be routed to the fact data. Now, the fact data is queried by Google Bigquery in NA, EU, etc. and spark is used for this in APAC. The routing is done based on analyzing the query parameters. If a query is addressable by an aggregation, then it will be routed to that table. "request_uri" is not an attribute of our aggregations and hence your query got routed to fact tables for processing.

The response json includes the notices field which will indicate the source of processing and the DB engine. Same query template will be processed by same engine -- it does not change between invocations.

2) Spark engine is used for the "stats" api only in APAC as i mentioned above. If you see this and would like to retrieve >14400 result rows, one new option is our Asynchronous query processing Alpha which can be enabled for "paid" customers. Asynchronous query is an option for all paid customers who would like to do "more complex" queries.

With the "stats" query call, your idea to break the query down into <=14400 rows is the viable approach for spark engine.

All other engines support limit and offset.

3) Accuracy is always 100% now and this parameter is deprecated internally. We do need to remove from docs.

4) All our data is being processed and made available within 10 minutes (or less) in all regions today. The realtime parameter is also deprecated for this reason.

jonesfloyd

I've removed the accuracy query param from the API reference. Thanks for the update, @spadmanabhan. Should "realtime" also be removed?

Report Inappropriate Content

Thanks for the comprehensive and interesting response, we operate out of APAC (which lines up with the Spark engine responses I am observing).

We are paying customers so I will look into the Asynchronous query processing options.

This answer has helped a lot.

spadmanabhan

@brucejenkins you are welcome. Please contact our support for enabling for asynchronous query and we will do so.

Questions of the getting analytics data from the stats API.