Get No of pages in a PDF and dynamically update the quota policy

A user could hit an image for a PDF(multi-page) and based on the count of pages in a PDF, I want to update the Quota policy.

If the number of pages exceeds N, then I need to reject the request with a custom status code and error message

I wrote a small code in Python to get a page count from a pdf.

def get_page_count_pdf():
	vFileOpen = open(PDF_VARIABLE, 'rb', 1 )
	vPages = 0
	for vLine in vFileOpen.readlines():
		if b"/Count " in vLine:
			vPages = int(re.search(b"/Count \d*", vLine).group()[7:])
	return vPages

My Queries

1. How & where to add this code on Preflow?

2. How to update the quota dynamically based on the count?

3. There could be concurrent PDF hits, can I use Flow Variable(s)?

4. Reject, if the no of pages is greater than N


An example would be really helpfull

I am pretty new to Apigee, so appreciate the help and guidance 🙂

Thanks !

0 2 230
2 REPLIES 2

There are some subtleties in the solution.

  1. First, I guess your python code runs by reading a file in the filesystem.

    Apigee API Proxies can run custom policies that use logic defined in python, but Python scripts running within Apigee API Proxies cannot read a filesystem file. So that will need to change.

    How will the PDF file be provided to the proxy? Is it passed as a file attachment? as a simple octet-stream? If so , it MAY be possible to rework your code to read directly from the HTTP payload to extract the PDF file. Maybe. I have no experience with that.

    An alternative approach, instead of building the python logic in the Apigee proxy itself, is to embed the page-counting logic in a microservice which is installed adjacent to the Apigee proxy (maybe in Google App Engine, or as a Google Cloud Function) and then call to that remote service from the Apigee proxy using a ServiceCallout.

  2. Now, supposing that you can get this python code in shape so that it works within an Apigee proxy, where should you attach it? That's up to you. Request Preflow, conditional flow, ... it's up to you. If you're not clear on the reasons why to choose one versus the other, then check the documentation.
  3. Regarding how to use a dynamic value for a Quota, check the documentation on the Quota policy and specifically the MessageWeight configuration.
  4. Concurrency in Apigee is assumed. Context variables are scoped to the specific request that is being handled.

The challenge is a little subtle. Break it down into parts to get the solution.

Hi @Dino-at-Google

The PDF will be passed as a file attachment and I am trying to read it from the request payload, which is not possible as per this post.

I did think of the same alternative approach

1. Cloud Function(2GB) won't be able to take that much traffic

2. Having multiple F4_HIGHMEM App Engine instances in production for a small validation check is not very cost effective

2. I plan to attach it in Request Preflow.

I'll check the documents before finalizing it.

3. Already in place

4. Noted.