Stripping of HTML part from response returned by API endpoint having content as mix of json+html

Hello all,

We have a backend returning as API response the mix of json and html. In our company we are using apigee as proxy for all client calls. We need to strip off html part and keep only JSON part returned to the client.

Unfortunately endpoint is controlled by a vendor tool and for now we have no control over activating/deactivating xml part before response is sent back to apigee. Right now response is returned from the endpoint with Content-Type: text/html  and content like :

 

 

{
   key1: value1,
   key2: value2,
   ...
}
<webresult>
 ...
</webresult>

 

 

I have written simple JS code put into JS policty to strip off <webresult> however when I try to add it into response Target Endpoints PostFlow but my problem is that  I always get response.content value empty.

I have tried to get response in the following ways but without any luck:

 

 

var content1 = response.content;

 

 

however when I tried to display response.status.code it worked.

I have also tried to add a policy before JS policy to set Content-Type to text/plain but that did not help.

Trying to get content asJSON does not work, returns empty JSON:

 

 

print("json response:"+content.response.asJSON")

 

seems like response.content is empty, but that's not true, because the response contains json and html tags as explained above, so something is either not set or misconfigured in my case.

 

Can someone advise whether apigee will allow us to process mix of json+htmlresponse and modify it in the way to strip off html part and keep only json as final response? I was looking at ExtractVariable step as well, but first trials did not give any success.

Thanks for support,

0 10 1,068
10 REPLIES 10

I have tried to get response in the following ways but without any luck:

Can you try this?

 

var content1 = targetResponse.content;

 

or this?

 

var content1 = context.getVariable('response.content');

 

Can someone advise whether apigee will allow us to process mix of json+htmlresponse and modify it in the way to strip off html part and keep only json as final response?

Sure, Apigee can do that. Apigee can manipulate text responses. Using a JavaScript callout to do that is a reasonable thing to do. You're on the right track.

Thank you dchiesa1. I just tried calling targetResponse.content but I am getting fault:

{"fault":{"faultstring":"Execution of JS-strip-webresult failed with errors: Javascript runtime error: \"ReferenceError:\"targetResponse\" is not defined. (StripWebresultFooter.js:2)\"","detail":{"errorcode":"steps.javascript.ScriptExecutionFailed"}}}

 

Tried also with context.getVariable("response.content") but still empty string is returned.

Please note that no other policies are present in PreFlow of Target Endpoints apart from content type and JS. I tried also to remove content-type step and keep only JS, but again same results as described above.

When I tried to call:

print("content:" + context.targetResponse.content)

I get no error but the value is 'empty'

sorry! 

var content1 = context.getVariable("targetResponse.content");

 

If you are getting an empty string, then... there is no response. If that is what you are observing, then the probably cause is that you've attached the JS policy to the request flow, which runs before the response is received. Check your attachment point.

If you think you have attached the policy in the correct place, please show it.  Show the configuration that indicates where you have attached this policy.  Or, please collect a trace session, and attach it here.

Thanks. It seems to be correctly attached, because at the same time I can successfully extract value from response.status.code and get correctly 200.

Trying with context.getVariable("targerResponse.content") gives: null

 

 

 

print("context.targetResponse.content"+context.getVariable("targetResponse.content"));

 

 

 

under trace I see for: stepExecution-stdout  value of: context.targetResponse.content=null

I can't attach picture, so please find below the extracted conf:

Target Endpoints conf:

 

 

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TargetEndpoint name="default">
    <Description/>
    <FaultRules/>
    <PreFlow name="PreFlow">
        <Request>
            <Step>
                <Name>DisablePathCopy</Name>
            </Step>
            <Step>
                <Name>SetURLPath</Name>
            </Step>
            <Step>
                <Name>AM-SetExternalJWTHeaders</Name>
            </Step>
        </Request>
        <Response/>
    </PreFlow>
    <PostFlow name="PostFlow">
        <Request/>
        <Response>
            <Step>
                <Name>JS-strip-webresult</Name>
            </Step>
        </Response>
    </PostFlow>
    <Flows/>
    <HTTPTargetConnection>
        <Path>/pentaho/kettle/executeTrans/test</Path>
        <LoadBalancer>
            <Server name="test_server"/>
        </LoadBalancer>
        <Properties>
            <Property name="response.streaming.enabled">true</Property>
            <Property name="success.codes">2XX,500</Property>
        </Properties>
    </HTTPTargetConnection>
</TargetEndpoint>

 

 

 

general conf:

 

 

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<APIProxy revision="4" name="test-1">
    <Basepaths>/test/v1</Basepaths>
    <ConfigurationVersion majorVersion="1" minorVersion="0"/>
    <CreatedAt>1637837540262</CreatedAt>
    <CreatedBy>noone</CreatedBy>
    <Description>By Contour-1.0.0 on behalf of noone</Description>
    <DisplayName>test_v1</DisplayName>
    <LastModifiedAt>1643034575373</LastModifiedAt>
    <LastModifiedBy>noone</LastModifiedBy>
    <ManifestVersion>SHA-325382572385723fgg47535345</ManifestVersion>
    <Policies>
        <Policy>AM-SetContentType</Policy>
        <Policy>AM-SetExternalJWTHeaders</Policy>
        <Policy>JS-strip-webresult</Policy>
        // other policies used in request part goes here
    </Policies>
    <ProxyEndpoints>
        <ProxyEndpoint>default</ProxyEndpoint>
    </ProxyEndpoints>
    <Resources>
        <Resource>jsc://StripWebresultFooter.js</Resource>
    </Resources>
    <Spec/>
    <TargetServers/>
    <TargetEndpoints>
        <TargetEndpoint>default</TargetEndpoint>
    </TargetEndpoints>
</APIProxy>

 

 

StripWebresultFooter.js policy is as simple as:

 

print("response.status.code="+response.status.code);
print("context.targetResponse.content"+context.getVariable("targetResponse.content"));

 

 

and here is full RAW response got from SoapUI I use to shoot the request:

HTTP/1.1 200 
Date: Mon, 24 Jan 2022 15:45:36 GMT
Content-Type: text/html;charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: JSESSIONID=48000B640102CF78D261CA88724DC202; Path=/pentaho; Secure; HttpOnly
Strict-Transport-Security: max-age=0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block

{"rows":{...},"header":[{"filter":["true"],"limitRows":"5","offset":"0","dateTime":"2022-01-24 15:45:36"}]}
<webresult>
  <result>OK</result>
  <message>Execution of transformation finished</message>
  <id/>
</webresult>

If JS-strip-result is the policy , then it appears to be attached in the right place. 

In my experience ... context.getVariable('response.content') returns a string containing the response content.

What is the content-type of the response? 

JavaScript can handle text content. 

like text/html, text/plain, application/json, that sort of thing. 

If it is a multi-part response, then you cannot parse it directly with JavaScript. 

 

content-type is text/html.

I just extracted part of the trace and I would expect to see under <ResponseMessage> the <content> tag, but it's not present, even if I get the reply in SoapUI

part of the trace:

    <Point id="Paused"/>
    <Point id="Resumed"/>
    <Point id="StateChange">
        <DebugInfo>
            <Timestamp>24-01-22 15:45:36:815</Timestamp>
            <Properties>
                <Property name="To">REQ_SENT</Property>
                <Property name="From">TARGET_REQ_FLOW</Property>
            </Properties>
        </DebugInfo>
        <ResponseMessage>
            <Headers>
                <Header name="Content-Type">text/html;charset=utf-8</Header>
                <Header name="Date">Mon, 24 Jan 2022 15:45:36 GMT</Header>
                <Header name="Set-Cookie">JSESSIONID=48000B640102CF78D261CA88724DC202; Path=/pentaho; Secure; HttpOnly</Header>
                <Header name="Strict-Transport-Security">max-age=0</Header>
                <Header name="Transfer-Encoding">chunked</Header>
                <Header name="X-Content-Type-Options">nosniff</Header>
                <Header name="X-Frame-Options">SAMEORIGIN</Header>
                <Header name="X-XSS-Protection">1; mode=block</Header>
            </Headers>
            <ReasonPhrase></ReasonPhrase>
            <StatusCode>200</StatusCode>
        </ResponseMessage>
    </Point>

 

 SoapUI response after shooting POST request

HTTP/1.1 200 
Date: Mon, 24 Jan 2022 15:45:36 GMT
Content-Type: text/html;charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: JSESSIONID=48000B640102CF78D261CA88724DC202; Path=/pentaho; Secure; HttpOnly
Strict-Transport-Security: max-age=0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block

{"rows":{...},"header":[{"filter":["true"],"limitRows":"5","offset":"0","dateTime":"2022-01-24 15:45:36"}]}
<webresult>
  <result>OK</result>
  <message>Execution of transformation finished</message>
  <id/>
</webresult>

 

I just checked and it seems in the response from our API endpoint there is no Content-Length set. Looking at Solved: Response Body missing from call to API proxy altho... - Google Cloud Community  Can this be the reason why the Response content body is missing?

Still I am confused how it happens that at the entry of apigee I cannot read response.content whereas final response with the body content is properly returned to the client (see my last comments).

I don't know about that article. I haven't looked at it.

The reason there is no content-length is that this is a chunked reply from your upstream. The response headers show this:

 

Transfer-Encoding: chunked

 

I have not seen that as a reason for the response.content to be unavailable.

I've just tried this in my own Apigee Edge organization, and in my case, the policies attached to the target response flow can see the content from a chunked response. I don't know what's going on with your case, but it is working for me.

Attached please find an API proxy that does very little; it only passes through to a service that returns an HTTP chunked response.

If you import and deploy it to an Apigee organization - ** be sure to modify the target endpoint - then turn on tracing, you can demonstrate this against your own service that returns chunked data.

Here's a screencast showing my test.

Maybe the content you're receiving is very very large? Can you give me an estimate for how large the chunked response is? I've tested it here only with small file sizes. (And all text content).

BTW, something that is confusing me: you say the response is text/html, but, it sure looks like some JSON mixed with XML to me. None of what you showed looks like HTML.

Thanks@dchiesa1

Response is not large, couple of bytes. We have managed to rework the way how BE is formatting output response by switching from html to json. This stripped off additional html/xml block and what's surprising the response.content is non-empty but only if we add ExtractVariable policy with JSONPath. When I have tried to do same AssignMessage with AssignVariable policy as in the screencast, that resulted in empty response.content. I will continue debugging to have clear picture what is happening (there is no magic). Unfortunately due to restrictions I cannot record screencast from the office, but will definitely share the outcome of my further investigation.

Thank you again @dchiesa1 for prompt responses.

FYI: After discussions with the Pentaho HV vendor who provides BE side engine for encoding the response, it was accepted that getting text/html content as a mix of json + xml tags is not valid and must be corrected. [PDI-19320] XML output with servlet with carte.sh - Pentaho Platform Tracking