Topological Sort: The Curious Case of Apigee Shared Flows

"What has been will be again, what has been done will be done again; there is nothing new under the sun." 

-- Ecclesiastes 1:9

"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

-- Doug McIlroy, Unix philosophy

TL;DR

Apigee X/hybrid fails to deploy a shared flow that depends on another shared flow if the dependency is not already deployed. For big collections of shared flows to deploy, it is desirable to automate your CI/CD task and deploy shared flow collection in the correct sequence to avoid manual tracking of dependencies. 

The gensfds.sh utility does that and generates a dot file that lets you visualize the dependencies for comprehension and documentation purposes. 

The Problem Statement

On the evening of November 11, 2021, I was looking for an Apigee implementation of Dynamic Client Registration protocol. After some googling, I found the one as a part of an implementation of the Apigee Reference Implementation of the Australian CDS (Consumer Data Standard), which, apart of DCR implementation, contains troves of experiences captured in the form of proxies written and maintained by long-time Apigeek Debora Elkin.

Problem? It is written for Apigee Edge and I needed an Apigee hybrid Version. And thus my Proxy Migration Journey started. And one of the obstacles was the strictness of checks during shared flow bundle import operation. Reusable Shared Flows are cool. They make you DRY. They capture experience and increase re-use. CDS-AU Apigee project has many of them. Seventeen, to be exact: https://github.com/apigee/consumer-data-standards-au/tree/master/src/shared-flows. And as shared flows are meant to be used by proxies, I need to deploy them before deploying proxies that depend on them. 

Surprise it was when I moved into a directory of the shared flows and decided to deploy them one after the other. Apparently, some Shared Flows were "DRYied" as well and contained references to other shared flows and one of the subtle changes in Apigee X/hybrid import/deploy behavior was an error happening if you try to deploy a shared flow while its dependency is not deployed yet. If that would be 3-4 bundles, everyone would be tempted to trace the dependencies manually, come up with a correct sequence and  execute it. For a dozen and a half bundles there was a case to come up with a more intelligent approach. 

I brushed off the dust from my trusted 3 volume set of The Art of Computer Programming by Donald Knuth and quickly found the right piece of theory: the topological sort! 

The collection of shared flows is a Directed Acyclic Graph (DAG), the vertices are the flows and the edges are their dependencies. As long as the cycles are absent, the result of toposort operation would give me a nicely ordered sequence of bundle deployment, that would automatically avoid any missed dependency errors.

Written in pseudo-language, it would require an implementation in something more executable. It would take time and effort. Not that I ever miss a good coding exercise, but as the great Russian surgeon Nikolay Pirogov once said, "The best surgical operation is the one that did not need to be performed". The next best thing was to find a readily available implementation in a convenient popular language. Still, after some googling I was able to identify an even better solution: as it frequently happens, if you need some clever algorithm, it might already be implemented as an Unix utility. To my amazement, it was. A topological search utility tsort, first written for Version 7 Unix in early 80s of the last century, was doing exactly what was required. Not only has it remained practically unchanged since that time, but in 2017 it became a part of the POSIX.1 standard, which meant that it is readily available as a part of any Linux/Unix distribution. Even my OSX has it. Quelle coïncidence ! 

Shared Flow Source Parsing

A simple effective heuristic to identify a shared flow invocation structure is to look for an FlowCallout attribute in a policy file.

yuriyl_0-1653299980296.png

A structure of the directory with shared flow. 

The sharedflowbundle/policies directory contains a collection of policies.  If we open a Flow Callout policy, we can identify two markers we can rely on: the top-level attribute "<FlowCallout " and a nested <SharedFlowBundle/> attribute that contains a name of the flow to invoke.

Typical FlowCallout policy:

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<FlowCallout async="false" continueOnError="false" enabled="true" name="FC-CheckTokenNotReused">
    <DisplayName>FC-CheckTokenNotReused</DisplayName>
    <FaultRules/>
    <Properties/>
    <SharedFlowBundle>check-token-not-reused</SharedFlowBundle>
    <Parameters>
        <Parameter name="jwtPayload" ref="jwt.JWT-VerifyPrivateKeyJWT.payload-json"/>
        <Parameter name="jwtType">Bearer</Parameter>
        <Parameter name="cacheTime" ref="jwt.JWT-VerifyPrivateKeyJWT.seconds_remaining"/>
    </Parameters>
</FlowCallout>

File: FC-CheckTokenNotReused.xml

 

When we iterate through the list of directories in a shared-flows directory, we collect the shared flows we need to deploy. For each directory, we look for FlowCallout policies and extract the called Shared Flow. This information is sufficient to perform a topological sort and generate the dot file.

Topological Sort And Graphviz Utilities

To paraphrase a wikipedia page, a topological sort is an ordering of a directed graph into a linear ordering of its vertices so that their dependencies are respected during the graph traversal operations. The graph shall be a DAG, directed acyclic graph. Thus no cyclic references are permitted.

Let's use an example from a `tsort` man page. 

 

tsort <<EOF
a b c c d e
g g
f g e f
h h
EOF

 

produces the output:

 

a
b
c
d
e
f
g
h

 

That is pretty close to what we need for our use case to produce the sequence of Shared Flows deployment actions to avoid an Apigee X/hybrid deployment error. To make it exactly what we need, we need to reverse the list of items. To do that in Linux, we are going to use the tac utility. tac is the reverse of cat does just that: takes an input of string and produces the output in a reverse order.

As we captured the structure of the shared flow dependencies, we can also present it graphically. Thanks to the Graphviz/dot wizardry, this is easy to achieve.

For the DAG above, the dot file we have to produce would look like this:

 

digraph G {
   rankdir=LR
   node [shape=box,fixedsize=true,width=3]
   "a" -> "b";
   "a" -> "c";
   "a" -> "c";
   "a" -> "d";
   "a" -> "e";
   "g" -> "g";
   "f" -> "g";
   "f" -> "e";
   "f" -> "f";
   "h" -> "h";
}

 

Using an online dot file renderer, https://dreampuf.github.io/GraphvizOnline we can produce now

yuriyl_1-1653299980197.png

That in essence is what the gensfds.sh script does.

The Shared Flow Dependencies Walkthrough

Let's now see how the utility is used. As an example, we are going to use https://github.com/apigee/consumer-data-standards-au repository. The shared flows collection can be inspected at the following link https://github.com/apigee/consumer-data-standards-au/tree/master/src/shared-flows.

Clone it locally. 

To install a utility, go to your favorite bin folder and run the following command:

 

https://raw.githubusercontent.com/apigee/devrel/main/tools/sf-dependency-list/src/gensfds.sh

 

This command will download the script into a current directory. Then will make it executable.

To generate the install sequence, execute

 

gensfds.sh . tsort

 

Output:

 

get-jwks-from-dynamic-uri
check-token-not-reused
decide-if-customer-present
validate-audience-in-jwt
add-response-fapi-interaction-id
verify-idp-id-token
validate-ssa
check-request-headers
authenticate-with-private-key-jwt
add-response-headers-links-meta
apply-traffic-thresholds
collect-performance-slo
get-ppid
manage-tokens-by-consent-id
paginate-backend-response
validate-request-params
verify-mtls-and-hok

 

To generate a dot file 

 

gensfds.sh . dot

 

Output:

 

digraph G {
    rankdir=LR
    node [shape=box,fixedsize=true,width=3]
"add-response-fapi-interaction-id";
"add-response-headers-links-meta" -> "add-response-fapi-interaction-id";
"apply-traffic-thresholds";
"authenticate-with-private-key-jwt" -> "check-token-not-reused";
"authenticate-with-private-key-jwt" -> "get-jwks-from-dynamic-uri";
"authenticate-with-private-key-jwt" -> "validate-audience-in-jwt";
"check-request-headers" -> "decide-if-customer-present";
"check-token-not-reused";
"collect-performance-slo";
"decide-if-customer-present";
"get-jwks-from-dynamic-uri";
"get-ppid";
"manage-tokens-by-consent-id";
"paginate-backend-response";
"validate-audience-in-jwt";
"validate-request-params";
"validate-ssa" -> "check-token-not-reused";
"verify-idp-id-token" -> "get-jwks-from-dynamic-uri";
"verify-mtls-and-hok";
}

 

You can either use an online dot visualizer or generate an svg file locally. For Debian, that would be

 

sudo apt install graphviz -y

gensfds.sh . dot >/tmp/sfds.dot
dot -Tsvg /tmp/sfds.dot

 

The resulting diagram will look like this:

yuriyl_2-1653299980261.png

Summary

This is a story that reflects the basic values and principles, the elegance and richness of the Unix utility ecosystem that lets you take a complex problem and solve it in a short and succinct script.

Contributors
Version history
Last update:
‎05-25-2022 07:29 AM
Updated by: