Extract Variable adds an auto attribute

Hi all,

I have an XML like below.

When I want to extract the Invoice tag with "Extract Variable" from this XML, it automatically adds an attribute.

How can I extract without attribute?


XML;

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<StandardBusinessDocument xmlns="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
    xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
    xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
    xmlns:n1="urn:oasis:names:specification:ubl:schema:xsd:CommonSignatureComponents-2"
    xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"
    xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDataTypes-2"
    xmlns:sac="urn:oasis:names:specification:ubl:schema:xsd:SignatureAggregateComponents-2"
    xmlns:sbc="urn:oasis:names:specification:ubl:schema:xsd:SignatureBasicComponents-2"
    xmlns:udt="urn:oasis:names:specification:ubl:schema:xsd:UnqualifiedDataTypes-2"
    xmlns:ccts-cct="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2">
</StandardBusinessDocumentHeader>
<ns1:Invoice>
    <cbc:ID>1</cbc:ID>
</ns1:Invoice>
</StandardBusinessDocument>


Extract Variable :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ExtractVariables async="false" continueOnError="false" enabled="false" name="EV-GetInvoice">
    <DisplayName>EV-GetInvoice</DisplayName>
    <Source clearPayload="false">request.content</Source>
    <XMLPayload stopPayloadProcessing="false">
        <Namespaces>
            <Namespace prefix="n2">urn:oasis:names:specification:ubl:schema:xsd:Invoice-2</Namespace>
        </Namespaces>
        <Variable name="request.content" type="nodeset">
            <XPath> (invoice) | (n2:Invoice)</XPath>
        </Variable>
    </XMLPayload>
</ExtractVariables>

Expected output

<ns1:Invoice>
    <cbc:ID>1</cbc:ID>
</ns1:Invoice>


Out after Extract Variable

<ns1:Invoice xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDataTypes-2" xmlns:n1="urn:oasis:names:specification:ubl:schema:xsd:CommonSignatureComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:sac="urn:oasis:names:specification:ubl:schema:xsd:SignatureAggregateComponents-2" xmlns:udt="urn:oasis:names:specification:ubl:schema:xsd:UnqualifiedDataTypes-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:sbc="urn:oasis:names:specification:ubl:schema:xsd:SignatureBasicComponents-2" xmlns:ccts-cct="urn:un:unece:uncefact:data:specification:CoreComponentTypeSchemaModule:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <cbc:ID>1</cbc:ID>
</ns1:Invoice>
0 4 274
4 REPLIES 4

You can't do that with ExtractVariables.

The ExtractVariables policy, when operating on XML, will extract the XML nodest you specify with the XPath expression. Your desired output, which looks like this:

<ns1:Invoice>
    <cbc:ID>1</cbc:ID>
</ns1:Invoice>

...is not "well formed" XML. It's not good XML. Therefore ExtractVariables will never produce it when you extract a nodeset.

"Why is it not well formed?", you might ask. At first glance, it looks pretty reasonable. It looks kinda like XML. But it's not XML. There are two namespace prefixes used, ns1 and cbc, and neither is declared in that fragment.

XML rules disallow that. I understand that visually you can see that this fragment of XML is just ... what is "inside" the StandardBusinessDocument element (minus the StandardBusinessDocumentHeader element). But XPath doesn't do simple text extraction. That is not how XPath and XML works.

The output you observe from ExtractVariables, with all those namespace declarations that maybe hurt your eyes, is *correct*. It is not minimally correct; at a minimum, well formed XML will have just 2 namespaces for the sample you provided. But Non-minimally correct is still correct, even if it has too many prefix declarations.

What is it that you REALLY want to do? What Information do you want from within the Invoice element? Just the ID? a set of other elements? The full, well-formed Invoice element, without all the extra (unused) namespaces? All those things are solvable. For the latter, you can run the output of ExtractVariables through this XSL to get the minimal-required XML. The result will be like this:

<ns1:Invoice 
  xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
  xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
   <cbc:ID>1</cbc:ID>
</ns1:Invoice>

That is well-formed XML.


If you can't solve the problem with the information I've given you here, give some more context, and let's see if we can figure it out.


ps: The original XML input you showed, is also not well-formed. I suppose that's because you edited the XML to simplify the example. But the edits caused a problem. There is a closing tag for the element (</StandardBusinessDocumentHeader>), for which there is no opening tag.

Thank you for your comment.

I can easily extract it with the XSL you sent and the method I mentioned above. Actually, as you said, I know that this is valid and the format I want is invalid.

I want to extract the <Invoice> tag as the user-submitted it. (without changing the requested data) So if the user added an attribute in the <Invoice> tag, I want to extract it together with the attribute. If the user didn't add any attributes, I want to extract without attributes.

I forgot to delete the </StandardBusinessDocumentHeader> value while editing.

I can easily extract it with the XSL you sent and the method I mentioned above. Actually, as you said, I know that this is valid and the format I want is invalid.

Now I'm not clear. Have you sorted your problem out? is there some question still outstanding?

Unfortunately, I still have a problem.


I want to parse the <Invoice> tag in a valid or invalid way.

So; I want to extract the <Invoice> tag as the user-submitted it (without changing the request body).
If the user added an attribute in the <Invoice> tag, I want to extract it together with the attribute but if the user didn't add any attributes, I want to extract without attributes.