Limit parameter restriction and using the Offset+Limit parameters for large result sets for analytics

0 2 2,839

Please Read: Limit parameter Maximum enforcement and using the Offset parameter to fetch rows exceeding limits.

We have implemented a restriction of 14400 rows on the limit parameter in analytics API calls. Prior to this restriction, the limit clause could be arbitrary and it caused failures when processing arbitrarily large result sets from certain queries. The failures were severe and brought down the management server where this result was being processed.

With this restriction, an error response of 400 Bad Request is returned when an API query is submitted that had the limit parameter set to a number greater than 14400.

Example:

GET /v1/organizations/{org}/environments/(env}/stats/request_uri?filter=(target_response_code+ge+0)&limit=20000&select=sum(target_error),sum(message_count)&sort=DESC&sortby=sum(target_error)&timeRange=05/22/2016+15:00:00~05/24/2016+16:00:00&timeUnit=hour&tsAscending=true

Response

< HTTP/1.1 400 Bad Request

< Content-Type: application/json

< Date: Fri, 27 May 2016 17:17:35 GMT

< Content-Length: 125

<

{

"code" : "Limit value should not exceed 14400",

"message" : "Limit value should not exceed 14400",

"contexts" : [ ]

* Connection #0 to host 0 left intact

}

We realize that some analytics queries do require a result set larger than 14400 rows to be returned. This is where the offset parameter can be used in combination with limit. The offset parameter allows for pagination of results enabling larger result sets to be fetched.

Using the same example as provided above, assume that the given query API expects a large result set. Then, the following pseudo-code snippet provides an approach to fetching the paginated results.

//Initialize Offset to zero

Offset=0

Limit = <Limit>

//Fetch until Done

While Not (Done)

{

// Issue API call with offset and limit. Offset is from the offset variable.

GET /v1/organizations/{org}/environments/(env}/stats/request_uri?filter=(target_response_code+ge+0&select=sum(target_error),sum(message_count)&sort=DESC&sortby=sum(target_error)&timeRange=05/22/2016+15:00:00~05/24/2016+16:00:00&timeUnit=hour&tsAscending=true)&limit= <Limit>&offset=<Offset>

If the Response is empty then Done = True

Else

{ process results;

set Offset = Offset + Limit;

}

}

An example empty response is shown below:

{

"environments" : [ {

"metrics" : [ ],

"name" : "prod"

} ],

"metaData" : {

"errors" : [ ],

"notices" : [ "source dw:rea1redshift002.us-ea.4.apigee.com", "query served by:05c5d2e0-5875-40f3-9533-065c5caa9b85", "Table used: <org>.<env>.fact" ]

}

* Connection #0 to host 0 left intact

}

The limit and offset approach is supported in Analytics APIs by our database servers. Please let us know of any feedback on this through this community or via our support channel.

We are introducing a Big Data framework for analytics. As this rolls out shortly, please stay tuned for updates to this topic about limit clause.

Comments
Not applicable

We changed our code to use Offset + Limit work around. We would increase Offset in a while loop, and break our of the loop when no more dimensions are returned.This approach worked for couple of weeks. But now it goes into indefinite loop due to the fact that it always returns the same dimension. Did you guys, again implemented some change to apigee? It breaks our dev and qa environment. Prod is still ok.

mw970
Staff

Note: if no limit= parameter is specified, by default the query will be limited to 1000 records.

Version history
Last update:
‎05-27-2016 11:41 AM
Updated by: