How can I use ExtractVariables to extract elements from within a CDATA section?

Hi @Dino-at-Google

I'm sending SoAP request to apigee proxy contains

<java:ExtraInfoXml xmlns:java="java:com.test.model.ws">
                    <![CDATA[<tns:easr-milestone xmlns:tns="http://easr.model.milestone.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://easr.model.milestone.com eASRMilestone.xsd "><pon>2007</pon><version>01</version><lec-ddd>0001-01-01</lec-ddd><received-date>2020-07-08</received-date><foc-date>0001-01-01</foc-date><sent-date>0001-01-01</sent-date><supp-type></supp-type><asr-type> </asr-type><ether-evc-asr>0</ether-evc-asr><vlan></vlan></tns:easr-milestone>]]>
                </java:ExtraInfoXml>

 

After extracting getting results are

<![CDATA[2007010001-01-012020-07-080001-01-010001-01-01 0]]>

my expectation is to extract each attribute in Extract Variable Policy

i tried this 2 ways, but not working. Need solution

<Variable name="pon" type="String">
            <XPath>//*[local-name() = 'Body']//*[local-name() = 'ExtraInfoXml']//*[local-name() = 'easr-milestone']//*[local-name() = 'pon']</XPath>
</Variable>
<Variable name="pon" type="String">
            <XPath>//*[local-name() = 'Body']//*[local-name() = 'ExtraInfoXml']//*[local-name() = 'pon']</XPath>
</Variable>
Solved Solved
1 4 1,004
1 ACCEPTED SOLUTION

I think you want to extract the element TEXT values from the XML document, and you want those text values to be available to you in context variables. And I think you want to use as input, an XML fragment that is enveloped within a CDATA section in an outer XML document.

You can accomplish this with two passes of ExtractVariables.

The key realization here is that anything within a CDATA section in an XML document is just text. Though our eyes can see that it is XML, any processor of the containing XML document treats it as text. Because of that there is no way (as far as I know) to use XPath to reach inside a CDATA section. You can use XPath to get the text within the CDATA section, but you cannot use XPath to query *into* the the CDATA section. To use Xpath you must have an XML document, and the cdata is just... text! So you need to use two passes:

  1. use one ExtractVariables policy to extract the cdata text. Let's suppose we extract it into a variable named "cdata_text".
  2. use AssignMessage to create a new "artificial" message. It's just a thing that will hold the cdata, necessary because ExractVariables requires an input of type "message". Assign as the payload for that message, the cdata_text variable. This creates a new "document" that holds what was previously plain text.
  3. Use another ExtractVariables policy to extract all the elements from that new message (document).
    This works if you know the structure of the XML within the CDATA section. It won't work if you have arbitrarily structured XML there.

Here is the specific set of policies I used. It works with your example.

EV-1

<ExtractVariables name='EV-CDATA'>
  <Source>contrivedMessage</Source>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <XMLPayload>
    <Namespaces>
      <Namespace prefix='ws'>java:com.test.model.ws</Namespace>
    </Namespaces>
    <Variable name='cdata_text' type='string'>
      <XPath>/ws:ExtraInfoXml/text()</XPath>
    </Variable>
  </XMLPayload>
</ExtractVariables>

AM-1

<AssignMessage name='AM-ArtificialMessage'>
  <AssignTo createNew='true' transport='http' type='request'>artificialMessage</AssignTo>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <Set>
    <Payload contentType='text/xml'>{cdata_text}</Payload>
    <Verb>POST</Verb>
  </Set>
</AssignMessage>

EV-2

<ExtractVariables name='EV-Values'>
  <Source>artificialMessage</Source> <!-- from previous step -->
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <XMLPayload>
    <Namespaces>
      <Namespace prefix='ns1'>http://easr.model.milestone.com</Namespace>
    </Namespaces>
    <Variable name='pon' type='string'>
      <XPath>/ns1:easr-milestone/pon/text()</XPath>
    </Variable>
    <Variable name='version' type='string'>
      <XPath>/ns1:easr-milestone/version/text()</XPath>
    </Variable>
    <Variable name='lec-ddd' type='string'>
      <XPath>/ns1:easr-milestone/lec-ddd/text()</XPath>
    </Variable>
    <Variable name='received-date' type='string'>
      <XPath>/ns1:easr-milestone/received-date/text()</XPath>
    </Variable>
    <Variable name='foc-date' type='string'>
      <XPath>/ns1:easr-milestone/foc-date/text()</XPath>
    </Variable>
    <Variable name='sent-date' type='string'>
      <XPath>/ns1:easr-milestone/sent-date/text()</XPath>
    </Variable>
    <Variable name='supp-type' type='string'>
      <XPath>/ns1:easr-milestone/supp-type/text()</XPath>
    </Variable>
    <!-- ...and so on... -->
  </XMLPayload>
</ExtractVariables>

View solution in original post

4 REPLIES 4

mail-2
Participant IV

"In an XML document or external entity, a CDATA section is a piece of element content that is marked up to be interpreted literally, as textual data, not as marked up content."

(https://en.wikipedia.org/wiki/CDATA#CDATA_sections_in_XML)

Meaning - you cannot apply XPath to XML in CDATA because for the parser it's just text and nothing else.

Okay, But how to extract those attributes any possible way

From your question I don't understand what exactly it is you want to do. Having said that, you might want to look into a solution involving regular expressions.

I think you want to extract the element TEXT values from the XML document, and you want those text values to be available to you in context variables. And I think you want to use as input, an XML fragment that is enveloped within a CDATA section in an outer XML document.

You can accomplish this with two passes of ExtractVariables.

The key realization here is that anything within a CDATA section in an XML document is just text. Though our eyes can see that it is XML, any processor of the containing XML document treats it as text. Because of that there is no way (as far as I know) to use XPath to reach inside a CDATA section. You can use XPath to get the text within the CDATA section, but you cannot use XPath to query *into* the the CDATA section. To use Xpath you must have an XML document, and the cdata is just... text! So you need to use two passes:

  1. use one ExtractVariables policy to extract the cdata text. Let's suppose we extract it into a variable named "cdata_text".
  2. use AssignMessage to create a new "artificial" message. It's just a thing that will hold the cdata, necessary because ExractVariables requires an input of type "message". Assign as the payload for that message, the cdata_text variable. This creates a new "document" that holds what was previously plain text.
  3. Use another ExtractVariables policy to extract all the elements from that new message (document).
    This works if you know the structure of the XML within the CDATA section. It won't work if you have arbitrarily structured XML there.

Here is the specific set of policies I used. It works with your example.

EV-1

<ExtractVariables name='EV-CDATA'>
  <Source>contrivedMessage</Source>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <XMLPayload>
    <Namespaces>
      <Namespace prefix='ws'>java:com.test.model.ws</Namespace>
    </Namespaces>
    <Variable name='cdata_text' type='string'>
      <XPath>/ws:ExtraInfoXml/text()</XPath>
    </Variable>
  </XMLPayload>
</ExtractVariables>

AM-1

<AssignMessage name='AM-ArtificialMessage'>
  <AssignTo createNew='true' transport='http' type='request'>artificialMessage</AssignTo>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <Set>
    <Payload contentType='text/xml'>{cdata_text}</Payload>
    <Verb>POST</Verb>
  </Set>
</AssignMessage>

EV-2

<ExtractVariables name='EV-Values'>
  <Source>artificialMessage</Source> <!-- from previous step -->
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <XMLPayload>
    <Namespaces>
      <Namespace prefix='ns1'>http://easr.model.milestone.com</Namespace>
    </Namespaces>
    <Variable name='pon' type='string'>
      <XPath>/ns1:easr-milestone/pon/text()</XPath>
    </Variable>
    <Variable name='version' type='string'>
      <XPath>/ns1:easr-milestone/version/text()</XPath>
    </Variable>
    <Variable name='lec-ddd' type='string'>
      <XPath>/ns1:easr-milestone/lec-ddd/text()</XPath>
    </Variable>
    <Variable name='received-date' type='string'>
      <XPath>/ns1:easr-milestone/received-date/text()</XPath>
    </Variable>
    <Variable name='foc-date' type='string'>
      <XPath>/ns1:easr-milestone/foc-date/text()</XPath>
    </Variable>
    <Variable name='sent-date' type='string'>
      <XPath>/ns1:easr-milestone/sent-date/text()</XPath>
    </Variable>
    <Variable name='supp-type' type='string'>
      <XPath>/ns1:easr-milestone/supp-type/text()</XPath>
    </Variable>
    <!-- ...and so on... -->
  </XMLPayload>
</ExtractVariables>