如何跟踪 XML 元素的源行（位置）？

发布于 2024-10-07 19:47:55 字数 672 浏览 2 评论 0原文

我认为这个问题可能没有令人满意的答案，但我还是会问，以防我错过了什么。

基本上，我想在给定元素实例的情况下找出源文档中某个 XML 元素源自的行。我希望这样做只是为了更好地诊断错误消息 - XML 是配置文件的一部分，如果它有问题，我希望能够将错误消息的阅读器指向 XML 文档中的正确位置这样他就可以纠正错误。

据我了解，标准 Scala XML 支持可能没有这样的内置功能。毕竟，用此类信息注释每个 NodeSeq 实例是很浪费的，而且并不是每个 XML 元素都有解析它的源文档。在我看来，标准的 Scala XML 解析器会丢弃行信息，并且以后无法检索它。

但切换到另一个 XML 框架并不是一个选择。 “仅”为了更好的诊断错误消息而添加另一个库依赖项对我来说似乎不合适。另外，尽管存在一些缺点，我还是非常喜欢对 XML 的内置模式匹配支持。

我唯一的希望是您可以向我展示一种方法来更改或子类化标准 Scala XML 解析器，以便它生成的节点将用源代码行号进行注释。也许可以为此创建一个特殊的 NodeSeq 子类。或者也许只有 Atom 可以被子类化，因为 NodeSeq 太动态了？我不知道。

无论如何，我的希望几乎为零。我认为解析器中没有一个地方可以让我们挂钩来更改节点的创建方式，并且在那个地方可以使用行信息。不过，我想知道为什么我以前没有发现这个问题。如果这是重复的，请指出原件。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黯淡〆 2024-10-14 19:47:55

我不知道该怎么做，但是 Pangea 为我指明了方向。首先，让我们创建一个处理位置的特征：

import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
    var locator: org.xml.sax.Locator = _
    def printLocation(msg: String) {
        println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
    }

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    // Display location messages
    abstract override def warning(e: SAXParseException) {
        printLocation("warning")
        super.warning(e)
    }
    abstract override def error(e: SAXParseException) {
        printLocation("error")
        super.error(e)
    }
    abstract override def fatalError(e: SAXParseException) {
        printLocation("fatal error")
        super.fatalError(e)
    }
}

接下来，让我们创建我们自己的加载器，覆盖 XMLLoader 的 adapter 以包含我们的特征：

import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}

这就是全部！对象 XML 几乎没有为 XMLLoader 添加任何内容——基本上是 save 方法。如果您觉得需要完全替换，您可能需要查看其源代码。但这只是当您想自己处理所有这些时，因为 Scala 已经具有产生错误的特征：

object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}

顺便说一句，ConsoleErrorHandler 特征从异常中提取其行和数字信息。出于我们的目的，我们也需要异常之外的位置（我假设）。

现在，要修改节点创建本身，请查看 scala.xml.factory.FactoryAdapter 抽象方法。我已经决定使用 createNode，但我在 NoBindingFactoryAdapter 级别进行重写，因为它返回 Elem 而不是 Node >，这使我能够添加属性。所以：

import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
    var locator: org.xml.sax.Locator = _

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
        super.createNode(pre, label, attrs, scope, children) 
        % Attribute("line", Text(locator.getLineNumber.toString), Null) 
        % Attribute("column", Text(locator.getColumnNumber.toString), Null)
    )
}

object MyLoader extends factory.XMLLoader[Elem] {
    // Keeping ConsoleErrorHandler for good measure
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}

结果：

scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>

请注意，它获得了最后一个位置，即结束标记处的位置。这是可以通过重写 startElement 来改进的一件事，以跟踪每个元素在堆栈中的起始位置，以及 endElement 从该堆栈弹出到 var< /code> 由 createNode 使用。

好问题。我学到了很多！ :-)

I had no idea how to do that, but Pangea showed me the way. First, let's create a trait to handle location:

import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
    var locator: org.xml.sax.Locator = _
    def printLocation(msg: String) {
        println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
    }

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    // Display location messages
    abstract override def warning(e: SAXParseException) {
        printLocation("warning")
        super.warning(e)
    }
    abstract override def error(e: SAXParseException) {
        printLocation("error")
        super.error(e)
    }
    abstract override def fatalError(e: SAXParseException) {
        printLocation("fatal error")
        super.fatalError(e)
    }
}

Next, let's create our own loader overriding XMLLoader's adapter to include our trait:

import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}

And that's all there is to it! The object XML adds little to XMLLoader -- basically, the save methods. You might want to look at its source code if you feel the need for a full replacement. But this is only if you want to handle all of this yourself, since Scala already have a trait to produce errors:

object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}

The ConsoleErrorHandler trait extract its line and number information from the exception, by the way. For our purposes, we need the location outside exceptions too (I'm assuming).

Now, to modify node creation itself, look at the scala.xml.factory.FactoryAdapter abstract methods. I have settled on createNode, but I'm overriding at the NoBindingFactoryAdapter level, because that returns Elem instead of Node, which enables me to add attributes. So:

import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
    var locator: org.xml.sax.Locator = _

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
        super.createNode(pre, label, attrs, scope, children) 
        % Attribute("line", Text(locator.getLineNumber.toString), Null) 
        % Attribute("column", Text(locator.getColumnNumber.toString), Null)
    )
}

object MyLoader extends factory.XMLLoader[Elem] {
    // Keeping ConsoleErrorHandler for good measure
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}

Result:

scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>

Note that it got the last location, the one at the closing tag. That's one thing that can be improved by overriding startElement to keep track of where each element started in a stack, and endElement to pop from this stack into a var used by createNode.

Nice question. I learned a lot! :-)

回复收藏 0 原文

中二柚 2024-10-14 19:47:55

我看到scala内部使用SAX进行解析。 SAX 允许您在 ContentHandler，可用于检索当前位置发生错误的位置。但我不确定如何利用 Scala 的内部运作。这是一篇文章我发现可能会有所帮助，看看这是否可行。

回复收藏 0 原文

那些过往 2024-10-14 19:47:55

我对Scala一无所知，但在其他环境中也会出现同样的问题。例如，XML 转换将其结果沿着 SAX 管道发送到验证器，当验证器尝试查找其验证错误的行号时，它们就消失了。或者所讨论的 XML 从未被序列化或解析，因此从未有行号。

解决该问题的一种方法是生成（人类可读的）XPath 表达式来说明错误发生的位置。这些不像行号那么容易使用，但总比没有好得多：它们唯一地标识一个节点，并且通常很容易被人们解释（特别是如果他们有 XML 编辑器）。

例如，Schematron 使用 Ken Holman（我认为）的这个 XSLT 模板生成一个 XPath 表达式来描述上下文节点的位置/身份：

<xsl:template match="node() | @*" mode="schematron-get-full-path-2">
   <!--report the element hierarchy-->
   <xsl:for-each select="ancestor-or-self::*">
      <xsl:text>/</xsl:text>
      <xsl:value-of select="name(.)"/>
      <xsl:if test="preceding-sibling::*[name(.)=name(current())]">
         <xsl:text>[</xsl:text>
         <xsl:value-of
            select="count(preceding-sibling::*[name(.)=name(current())])+1"/>
         <xsl:text>]</xsl:text>
      </xsl:if>
   </xsl:for-each>
   <!--report the attribute-->
   <xsl:if test="not(self::*)">
      <xsl:text/>/@<xsl:value-of select="name(.)"/>
   </xsl:if>
</xsl:template>

我不知道您是否可以在您的场景中使用 XSLT，但您可以应用无论您拥有什么工具，原则都是一样的。

I don't know anything about Scala, but the same issue pops up in other environments. For example, an XML transformation sends its results down a SAX pipeline to a validator, and when the validator tries to find line numbers for its validation errors, they're gone. Or the XML in question was never serialized or parsed, and therefore never had line numbers.

One way to address the problem is by generating (human-readable) XPath expressions to say where the error occurred. These are not as easy to use as line numbers but they're a lot better than nothing: they uniquely identify a node, and they're often pretty easy for humans to interpret (especially if they have an XML editor).

For example, this XSLT template by Ken Holman (I think) used by Schematron generates an XPath expression to describe the location/identity of the context node:

<xsl:template match="node() | @*" mode="schematron-get-full-path-2">
   <!--report the element hierarchy-->
   <xsl:for-each select="ancestor-or-self::*">
      <xsl:text>/</xsl:text>
      <xsl:value-of select="name(.)"/>
      <xsl:if test="preceding-sibling::*[name(.)=name(current())]">
         <xsl:text>[</xsl:text>
         <xsl:value-of
            select="count(preceding-sibling::*[name(.)=name(current())])+1"/>
         <xsl:text>]</xsl:text>
      </xsl:if>
   </xsl:for-each>
   <!--report the attribute-->
   <xsl:if test="not(self::*)">
      <xsl:text/>/@<xsl:value-of select="name(.)"/>
   </xsl:if>
</xsl:template>

I don't know if you can use XSLT in your scenario, but you could apply the same principle with whatever tools you have available.

回复收藏 0 原文