如何保留诸如＆amp; ndash之类的自定义实体；在使用XSLT转换XML的同时

发布于 2025-02-09 01:55:28 字数 2105 浏览 3 评论 0原文

我正在尝试使用XSLT转换XML，这是我正在使用的示例....

输入XML：

<!DOCTYPE printArtifactGroup [<!ENTITY ndash "&#38;#38;ndash;">]>
<group>
   <begin>
      <head>
         <text>(VOLS 0200)</text>
      </head>
      <data>
         <text>Health 161&ndash;1 to 16&ndash;32&ndash;End 2006</text>
      </data>
   </begin>
</group>

XSLT：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
   <xsl:output encoding="utf8" method="xml" indent="yes" />


   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="begin">
      <xsl:copy-of select="data" />
   </xsl:template>

</xsl:stylesheet>

Python代码运行XSLT变换：

import lxml.etree as ET
from lxml import etree

parser = etree.XMLParser(load_dtd=True, resolve_entities=True, huge_tree=True)

dom = ET.parse('test.xml', parser)
xslt = ET.parse('test.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
processed_file='processed.xml'
with open(processed_file, 'w') as file:
    file.write(str(newdom))
print(newdom)
print('Task Done')

实际输出：

<?xml version="1.0"?>
<group>
   <data>
         <text>Health 161&amp;ndash;1 to 16&amp;ndash;32&amp;ndash;End 2006</text>
      </data>
</group>

预期输出：

<group>
   <begin>
      <data>
         <text>Health 161&ndash;1 to 16&ndash;32&ndash;End 2006</text>
      </data>
   </begin>
</group>

XML Parser正在解决＆amp; （ampersand）实体对＆amp; amp; ---当我们拥有自定义实体＆amp; ndash时；它正在转换为＆amp; ndash; 这是默认行为，但是我们有一个巨大的XML，在将转换的XML与源进行比较时，更改实体时很难。

无论如何，我们可以通过保留原始Entites来生成预期的输出。

预先感谢，任何想法或建议都会真正实现。

原文

I'm trying to transforming xml using xslt, here is the example I'm using....

Input xml:

<!DOCTYPE printArtifactGroup [<!ENTITY ndash "&#38;ndash;">]>
<group>
   <begin>
      <head>
         <text>(VOLS 0200)</text>
      </head>
      <data>
         <text>Health 161–1 to 16–32–End 2006</text>
      </data>
   </begin>
</group>

xslt:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
   <xsl:output encoding="utf8" method="xml" indent="yes" />


   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="begin">
      <xsl:copy-of select="data" />
   </xsl:template>

</xsl:stylesheet>

python code to run xslt transformation:

import lxml.etree as ET
from lxml import etree

parser = etree.XMLParser(load_dtd=True, resolve_entities=True, huge_tree=True)

dom = ET.parse('test.xml', parser)
xslt = ET.parse('test.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
processed_file='processed.xml'
with open(processed_file, 'w') as file:
    file.write(str(newdom))
print(newdom)
print('Task Done')

Actual output:

<?xml version="1.0"?>
<group>
   <data>
         <text>Health 161&ndash;1 to 16&ndash;32&ndash;End 2006</text>
      </data>
</group>

expected output:

<group>
   <begin>
      <data>
         <text>Health 161–1 to 16–32–End 2006</text>
      </data>
   </begin>
</group>

xml parser is resolving the &(ampersand) entity to & --- when we have custom entities – it is converting to –
this is the default behavior, but we have a huge xml and when comparing the transformed xml with source it is difficult when entities are changed.

is there anyway we can generate the expected output by retaining the original entites.

Thanks in advance, any idea or suggestions are really appriciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

止于盛夏 2025-02-16 01:55:28

（现代）XSLT方法（XSLT 2及更高版本，可与SaxOnc的Python API一起使用）

<!DOCTYPE printArtifactGroup [<!ENTITY ndash "–">]>
<group>
   <begin>
      <head>
         <text>(VOLS 0200)</text>
      </head>
      <data>
         <text>Health 161–1 to 16–32–End 2006</text>
      </data>
   </begin>
</group>

，然后将使用XSL：pracem-map，例如

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="3.0"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="#all">

    <xsl:character-map name="characters-to-entities">
      <xsl:output-character character="–" string="&ndash;"/> 
    </xsl:character-map>
        
    <xsl:output use-character-maps="characters-to-entities"/>
    
    <xsl:mode on-no-match="shallow-copy"/>

</xsl:stylesheet>

那样text text 元素作为Eg ＆lt; text＆gt; health 161＆amp; ndash; 1至16＆amp; ndash; 32＆amp; ndash; end 2006＆lt;/text＆gt;。

（注意：我仅介绍了实体/角色映射问题，我没有尝试在该示例中实现您的转换的另一部分）。

The (modern) XSLT way (XSLT 2 and later, available to Python with the Python API of SaxonC) would to use

<!DOCTYPE printArtifactGroup [<!ENTITY ndash "–">]>
<group>
   <begin>
      <head>
         <text>(VOLS 0200)</text>
      </head>
      <data>
         <text>Health 161–1 to 16–32–End 2006</text>
      </data>
   </begin>
</group>

and then an xsl:character-map e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="3.0"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="#all">

    <xsl:character-map name="characters-to-entities">
      <xsl:output-character character="–" string="&ndash;"/> 
    </xsl:character-map>
        
    <xsl:output use-character-maps="characters-to-entities"/>
    
    <xsl:mode on-no-match="shallow-copy"/>

</xsl:stylesheet>

That way the text element is output as e.g. <text>Health 161–1 to 16–32–End 2006</text>.

(Note: I have solely presented the entity/character map issue, I have not tried to implement the other part of your transformation in that sample).

回复收藏 0 原文

任谁 2025-02-16 01:55:28

这是一个技巧：将内部实体定义替换为

[<!ENTITY ndash "<e:ndash/>">]

e是一个新的名称空间前缀，然后用两个替换match =“ begin”模板模板：

<xsl:template match="begin">
  <xsl:apply-templates select="data" />
</xsl:template>
<xsl:template match="e:*">
  <xsl:text disable-output-escaping="yes">&</xsl:text>
  <xsl:value-of select="local-name()"/>
  <xsl:text>;</xsl:text>
</xsl:template>

在源XML解析期间，这将转换＆amp; ndash;，它是HTML中的a targue ，带有XML element <代码>＆lt; ndash/＆gt; 。在XSLT处理过程中，该元素被一系列字符替换。 disable-output-Escaping可防止XSLT处理器以'amp; ndash;输出。

Here's a trick: Replace the internal entity definition with

[<!ENTITY ndash "<e:ndash/>">]

where e is a new namespace prefix, and replace the match="begin" template with two templates:

<xsl:template match="begin">
  <xsl:apply-templates select="data" />
</xsl:template>
<xsl:template match="e:*">
  <xsl:text disable-output-escaping="yes">&</xsl:text>
  <xsl:value-of select="local-name()"/>
  <xsl:text>;</xsl:text>
</xsl:template>

During parsing of the source XML, this will convert –, which is a character in HTML, with an XML element <ndash/>. And during XSLT processing, this element is replaced with a sequence of characters. The disable-output-escaping prevents the XSLT processor from outputting this as –.

回复收藏 0 原文

~没有更多了~