Talend tExtractXMLField

发布于 2024-10-17 02:13:27 字数 1163 浏览 3 评论 0原文

我在 Talend 中有一份工作,应该检索一个字段并循环遍历它。

我的大问题是代码循环遍历 XML 字段但它返回 null。 下面是 XML 的示例:

<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
    <empresa>
        <imoveis>
            <imovel>
                [-- some fields --  ]

                <fotos>
                    <nome id="" order="">photo1</nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                </fotos>
            </imovel>
            [ -- other entries here -- ]
        </imoveis>
    </empresa>
</empresas>

现在,我尝试使用 tExtractXMLField 组件获取“fotos”元素。 这是我在组件中的内容: 在此处输入图像描述

我尝试更改 XPath 查询和 XPath 循环查询,但结果是我不这样做循环遍历该字段,或者我在 tMap 的值字段中得到 null。

这是该作业的图像:

在此处输入图像描述

您可以看到我已从 XML 中检索了 4 个项目,但是什么我在“nome”字段中得到 null 。 XPath 肯定有问题,但我似乎找不到问题:(

希望有人能帮助我。谢谢 注意:我在 ubuntu 10.10 64 位上使用 talendv4.1.2

I have this job in Talend that is supposed to retrieve a field and loop through it.

My big problem is that the code is looping through the XML fields but it's returning null.
Here is a sample of the XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<empresas>
    <empresa>
        <imoveis>
            <imovel>
                [-- some fields --  ]

                <fotos>
                    <nome id="" order="">photo1</nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                    <nome id="" order=""></nome>
                </fotos>
            </imovel>
            [ -- other entries here -- ]
        </imoveis>
    </empresa>
</empresas>

Now using the tExtractXMLField component I am trying to get the "fotos" element.
Here is what I have in the component:
enter image description here

I have tried to change the XPath query and the XPath loop query but the result is either I don't loop through the field or I get the null in the value field in the tMap.

Here is an image of the job:

enter image description here

You can see that I have retrieved 4 items from the XML but what I get is null in the "nome" field. There must be something wrong with the XPath but I can't seem to find the problem :(

Hope someone can help me out. Thanks
Notes: I am using talendv4.1.2 on ubuntu 10.10 64bit

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

挽梦忆笙歌 2024-10-24 02:13:28

有两种方法可以解决这个问题。一种方法是直接使用 XMLinput 和 bluish 提到的指令。

另一种方法是继续走你选择的路。在 XMLinput 中,确保您的 Loop XPath 查询设置为 "/empresas/empresa/imoveis/imovel/fotos" 并且您通过 fotos 元素传递选中获取节点选项。 fotos 元素的 XPath 查询应为 "../fotos""."

您的 extractXMLField 组件看起来配置良好。
另外,我不知道 tSetGlobalVar 在您的设计中做了什么,但请确保它不会影响您尝试传递的 fotos 元素。

There are two ways to go about it. One way is to use directly XMLinput and the instructions that bluish mentioned.

The other way is to continue on the path that you chose. In the XMLinput, make sure that your Loop XPath query is set to "/empresas/empresa/imoveis/imovel/fotos" and that you pass through the fotos element with the Get Nodes option checked. The XPath Query of your fotos element should be "../fotos" or ".".

Your extractXMLField component looks to be well configured.
Also, I don't know what tSetGlobalVar does in your design, but make sure it doesn't affect the fotos element that you're trying to pass through.

一曲琵琶半遮面シ 2024-10-24 02:13:28

示例 talend 作业
我做了一个测试工作,这肯定会对你有帮助。如果我没记错的话,你想获得“fotos”标签下的所有“nome”。

sample talend job
I have made a test job, this will help you definitely. If I'm not wrong you want to get all the "nome" under the "fotos" tag.

醉酒的小男人 2024-10-24 02:13:28

尝试将循环 xpath 更改为文件“empresas”中的顶层。有时这对我有用,我也似乎有“?xml version =“1.0”encoding =“ISO-8859-1”?”标签之前引起问题,你可以尝试删除它。

还要确保 tFileInputXML 中的编码设置正确。

Try to change your loop xpath to the top level in the file, "empresas". Sometimes that works for me, also I have seem the "?xml version="1.0" encoding="ISO-8859-1"?" tag cause problems before, you could try to remove that.

Also make sure that the encoding is set correctly in the tFileInputXML.

与他有关 2024-10-24 02:13:28

我认为您对读取 XML 和从 XML 中提取 XML 感到困惑。

读取 XML:
如果您提供的 XML 部分是您 tFileInputXML 读取的文件,则不需要 tExtractXMLField,只需按如下方式配置 tFileInputXML:

  • 将 xpath 循环设置为 元素,如下所示“//nome”
  • 在tFileInputXML组件中添加3列id、order和content,
  • 用xpath查询“.”获取内容列
  • 使用 xpath 查询“@id”获取 id 值
  • 使用 xpath 查询“@order”获取订单值

在此处输入图像描述

从 XML 中提取 XML:
这就是 tExtractXMLField 组件的目标:
它允许解析数据库列或另一个 XML 文档中包含的 XML 数据,就好像它本身就是一个数据流一样。

简而言之,tExtractXMLField 从包含 XML 的列记录创建数据流。
在解析soap查询结果时非常有用:服务器回复通常以xml形式提供,如下所示:

<arg2> 
  <![CDATA[
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <exportInscriptionEnLigneType>
      <date>2015-04-10</date>
      <nbDossiers>2</nbDossiers>
      <reference>20150410100</reference>
      <listeDossiers>
        <dossier>
          <numOrdre>1</numOrdre>
          <identifiantDossier>AAAAA</identifiantDossier>
        </dossier>
        <dossier>
          <numOrdre>2</numOrdre>
          <identifiantDossier>BBBBB</identifiantDossier>
        </dossier>
      </listeDossiers>
    </exportInscriptionEnLigneType>
]]>
</arg2> 

在上面的XML中,arg2>元素包含您可能需要解析的XML文档。

tExtractXMLField 就是为此目的而创建的。
我写了一篇关于如何实现这项工作的教程,请看这里“如何从 xml 中提取 xml"。它是法语的,但屏幕截图可能有助于理解所提供的一些评论。

希望它会有所帮助。

此致,

I think you are confusing reading XML and extracting XML from XML.

Reading XML:
If the part of XML you have provided is the file readed by you tFileInputXML you don't need tExtractXMLField, just configure the tFileInputXML as this:

  • set the xpath loop to the <nome> elements, like this "//nome"
  • add 3 columns in the tFileInputXML component id, order and content
  • get content column with xpath query "."
  • get id value with xpath query "@id"
  • get order value with xpath query "@order"

enter image description here

Extracting XML from XML:
That is the goal of the tExtractXMLField component:
It allows to parse XML data contained in a database column or another XML document as if it was itself a data flow.

To put it in a nutshell, tExtractXMLField create a flow of data from a column record containing XML.
It is very useful when parsing soap query result: server reply is usually provided as xml, like this one:

<arg2> 
  <![CDATA[
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <exportInscriptionEnLigneType>
      <date>2015-04-10</date>
      <nbDossiers>2</nbDossiers>
      <reference>20150410100</reference>
      <listeDossiers>
        <dossier>
          <numOrdre>1</numOrdre>
          <identifiantDossier>AAAAA</identifiantDossier>
        </dossier>
        <dossier>
          <numOrdre>2</numOrdre>
          <identifiantDossier>BBBBB</identifiantDossier>
        </dossier>
      </listeDossiers>
    </exportInscriptionEnLigneType>
]]>
</arg2> 

In XML above, arg2>element contains an XML document that you may need to parse.

tExtractXMLField has been created for this purpose.
I've written a tutorial on how to achieve this work, please have a look here "how to extract xml from xml". It is in french but screenshots may help understanding the few comments provided.

Hope it will help.

Best regards,

躲猫猫 2024-10-24 02:13:27

如果您想在 节点上循环,您的 Loop XPath 查询必须

"/empresas/empresa/imoveis/imovel/fotos/nome"

是 foto_nome XPath 查询,例如

"text()"

“小心:我还更正了 XML 中可能带来问题的错误 ( 缺少“s”)。

If you want to loop on <nome> nodes your Loop XPath Query has to be

"/empresas/empresa/imoveis/imovel/fotos/nome"

and foto_nome XPath Query something like

"text()"

Take care: I also corrected an error in your XML that could bring issues (</imoveis> missing the "s").

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文