如何防止 XSLT 在 HTML 输出中引入空格

发布于 2025-01-16 11:39:27 字数 3050 浏览 1 评论 0原文

我正在使用 XSLT 从 XML 源生成 HTML。 HTML 显示了原始 XML 文件中没有的大量空白。通常这不是问题,因为浏览器会忽略多余的空白字符。但我正在开发一个应用程序,该应用程序依赖于 HTML 页面内文本光标的正确定位。添加的空格确实会扰乱偏移量,从而无法将光标可靠地定位在元素内。

我的问题:如何让我的 XSLT 不在文本节点中引入任何额外的空格?我正在使用但这并不能阻止处理器引入大量空白。看起来 HTML 中应用了一些漂亮的打印处理,但我不知道这是从哪里来的。我目前正在使用 Saxon PE 9.9.1.7

[编辑]

我创建了一个简单的示例,显示了相同的奇怪行为。首先是 XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p>This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT. This manual describes the spacecraft, safety aspects, usage and maintenance procedures. Make sure the manual is available to anyone who will be using the product.</p>
</root>

这是简化的 XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0">

    <xsl:output method="html" encoding="UTF-8"/>

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="root">
        <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE html&gt;&#xD;</xsl:text>
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates select="*"/>
                <script src="cursor.js"></script>
            </body>
        </html>
    </xsl:template> 

    <xsl:template match="p">
        <p contenteditable="true" id="p1" onclick="show_position()">
            <xsl:value-of select="."/>
        </p>
    </xsl:template>

</xsl:stylesheet>

显示当前光标位置的 JavaScript 文件:

function show_position( )
{
    alert('position: ' + document.getSelection().anchorOffset );
}

XSLT 生成的 HTML 如下所示(在 oXygen 中显示):

<!DOCTYPE html>
<html>
   <head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
       <title>Test</title>
   </head>
   <body>
      <p contenteditable="true" id="p1" onclick="show_position()">This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT.
         This manual describes the spacecraft, safety aspects, usage and maintenance procedures.
         Make sure the manual is available to anyone who will be using the product.</p><script src="cursor.js"></script></body>
</html>

在浏览器中查看 HTML 会使所有额外的空格折叠成正如预期的那样,一个空间。在段落内部单击会显示距段落开头的当前偏移量。单击“本手册”前面紧邻的位置将显示位置 86。单击右侧的一个字符将显示位置 96。在以“确保”开头的句子中也会引入相同的额外空格。

我尝试使用 Chrome 和 Safari - 两者都显示相同的结果。这似乎不是浏览器问题,而是 XSLT 处理器生成 HTML 的问题。我尝试过其他 Saxon 版本,但生成的 HTML 始终相同。

任何有关如何防止 HTML 输出中出现这些额外空白字符的进一步信息将不胜感激。

I am generating HTML from XML sources using XSLT. The HTML shows a lot of whitespace that was not in the original XML files. Normally this is not a problem as the browser will ignore the extra whitespace characters. But I am developing an application that relies on correct positioning of the text cursor inside the HTML page. The added whitespaces do mess up the offsets, making it impossible to reliably position the cursor inside an element.

My question: how can I get my XSLT to not introduce any additional whitespaces in text nodes? I am using <xsl:strip-space elements="*"/> but that does not keep the processor from introducing lots of whitespace. It looks like some pretty-printing processing is applied to the HTML and I have no idea where this comes from. I am currently using Saxon PE 9.9.1.7

[Edit]

I created a simple example that shows the same strange behaviour. First the XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <p>This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT. This manual describes the spacecraft, safety aspects, usage and maintenance procedures. Make sure the manual is available to anyone who will be using the product.</p>
</root>

Here is the simplified XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0">

    <xsl:output method="html" encoding="UTF-8"/>

    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="root">
        <xsl:text disable-output-escaping="yes"><!DOCTYPE html>
</xsl:text>
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates select="*"/>
                <script src="cursor.js"></script>
            </body>
        </html>
    </xsl:template> 

    <xsl:template match="p">
        <p contenteditable="true" id="p1" onclick="show_position()">
            <xsl:value-of select="."/>
        </p>
    </xsl:template>

</xsl:stylesheet>

The JavaScript file to show the current cursor position:

function show_position( )
{
    alert('position: ' + document.getSelection().anchorOffset );
}

The HTML that is generated by the XSLT looks like this (shown in oXygen):

<!DOCTYPE html>
<html>
   <head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
       <title>Test</title>
   </head>
   <body>
      <p contenteditable="true" id="p1" onclick="show_position()">This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT.
         This manual describes the spacecraft, safety aspects, usage and maintenance procedures.
         Make sure the manual is available to anyone who will be using the product.</p><script src="cursor.js"></script></body>
</html>

Viewing the HTML in a browser makes all the extra whitespaces collapse into a single space, as expected. Clicking inside the paragraph shows the current offset from the start of the paragraph. Clicking immediately before 'This manual' shows position 86. Clicking one character to the right shows position 96. The same extra whitespace is introduced in the sentence starting with 'Make sure'.

I tried with Chrome and Safari - both show identical results. It does not seem to be a browser problem, but an issue with HTML generation by the XSLT processor. I have tried other Saxon versions but the resulting HTML is always the same.

Any further info on how to prevent these extra whitespace characters in my HTML output would be highly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

∞琼窗梦回ˉ 2025-01-23 11:39:27

我认为 output method="html" 的默认值是 indent="yes",所以你当然可以显式设置 indent="no" > 在您的 xsl:output 声明中。

此外,正如您所说,您使用 Saxon PE 9.9,您可以访问 XSLT 3 功能,例如 suppress-indentation="p" 和/或 Saxon PE/EE 特定设置,以使用非常高的设置正常行长度,请检查文档以了解例如 saxon:line-length 或类似内容。

The default for output method="html" is indent="yes", I think, so you could certainly explicitly set indent="no" on your xsl:output declaration.

Additionally, as you say you use Saxon PE 9.9, you have access to XSLT 3 features like suppress-indentation="p" and/or Saxon PE/EE specific settings to use a very high setting for the normal line length, check the documentation for e.g. saxon:line-length or similar.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文