如何跟踪 XML 元素的源行(位置)?

发布于 2024-10-07 19:47:55 字数 672 浏览 0 评论 0原文

我认为这个问题可能没有令人满意的答案,但我还是会问,以防我错过了什么。

基本上,我想在给定元素实例的情况下找出源文档中某个 XML 元素源自的行。我希望这样做只是为了更好地诊断错误消息 - XML 是配置文件的一部分,如果它有问题,我希望能够将错误消息的阅读器指向 XML 文档中的正确位置这样他就可以纠正错误。

据我了解,标准 Scala XML 支持可能没有这样的内置功能。毕竟,用此类信息注释每个 NodeSeq 实例是很浪费的,而且并不是每个 XML 元素都有解析它的源文档。在我看来,标准的 Scala XML 解析器会丢弃行信息,并且以后无法检索它。

但切换到另一个 XML 框架并不是一个选择。 “仅”为了更好的诊断错误消息而添加另一个库依赖项对我来说似乎不合适。另外,尽管存在一些缺点,我还是非常喜欢对 XML 的内置模式匹配支持。

我唯一的希望是您可以向我展示一种方法来更改或子类化标准 Scala XML 解析器,以便它生成的节点将用源代码行号进行注释。也许可以为此创建一个特殊的 NodeSeq 子类。或者也许只有 Atom 可以被子类化,因为 NodeSeq 太动态了?我不知道。

无论如何,我的希望几乎为零。我认为解析器中没有一个地方可以让我们挂钩来更改节点的创建方式,并且在那个地方可以使用行信息。不过,我想知道为什么我以前没有发现这个问题。如果这是重复的,请指出原件。

I assume that there is probably no satisfactory answer to this question, but I ask it anyway in case I missed something.

Basically, I want to find out the line in the source document from which a certain XML element originated, given the element instance. I want this only for better diagnostic error messages - the XML is part of a configuration file, and if there is something wrong with it, I want to be able to point the reader of the error message to exactly the right place in the XML document so he can correct the error.

I understand that the standard Scala XML support probably has no built-in feature like this. After all, it would be wasteful to annotate every single NodeSeq instance with such information, and not every XML element even has a source document from which it has been parsed. It seems to me that the standard Scala XML parser throws the line information away, and later on there is no way to retrieve it.

But switching to another XML framework is not an option. Adding another library dependency "only" for the sake of better diagnostic error messages seems inappropriate to me. Also, despite some shortcomings, I really like the built-in pattern matching support for XML.

My only hope is that you can show me a way to alter or subclass the standard Scala XML parser such that the nodes it produces will be annotated with the number of the source line. Maybe a special subclass of NodeSeq can be created for this. Or maybe only Atom can be subclassed because NodeSeq is too dynamic? I don't know.

Anyway, my hopes are close to zero. I don't think there is a place in the parser where we can hook in to change the way nodes are created, and that at that place the line information is available. Still, I wonder why I have not found this question before. Please point me to the original if this is a duplicate.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

黯淡〆 2024-10-14 19:47:55

我不知道该怎么做,但是 Pangea 为我指明了方向。首先,让我们创建一个处理位置的特征:

import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
    var locator: org.xml.sax.Locator = _
    def printLocation(msg: String) {
        println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
    }

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    // Display location messages
    abstract override def warning(e: SAXParseException) {
        printLocation("warning")
        super.warning(e)
    }
    abstract override def error(e: SAXParseException) {
        printLocation("error")
        super.error(e)
    }
    abstract override def fatalError(e: SAXParseException) {
        printLocation("fatal error")
        super.fatalError(e)
    }
}

接下来,让我们创建我们自己的加载器,覆盖 XMLLoaderadapter 以包含我们的特征:

import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}

这就是全部!对象 XML 几乎没有为 XMLLoader 添加任何内容——基本上是 save 方法。如果您觉得需要完全替换,您可能需要查看其源代码。但这只是当您想自己处理所有这些时,因为 Scala 已经具有产生错误的特征:

object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}

顺便说一句,ConsoleErrorHandler 特征从异常中提取其行和数字信息。出于我们的目的,我们也需要异常之外的位置(我假设)。

现在,要修改节点创建本身,请查看 scala.xml.factory.FactoryAdapter 抽象方法。我已经决定使用 createNode,但我在 NoBindingFactoryAdapter 级别进行重写,因为它返回 Elem 而不是 Node >,这使我能够添加属性。所以:

import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
    var locator: org.xml.sax.Locator = _

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
        super.createNode(pre, label, attrs, scope, children) 
        % Attribute("line", Text(locator.getLineNumber.toString), Null) 
        % Attribute("column", Text(locator.getColumnNumber.toString), Null)
    )
}

object MyLoader extends factory.XMLLoader[Elem] {
    // Keeping ConsoleErrorHandler for good measure
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}

结果:

scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>

请注意,它获得了最后一个位置,即结束标记处的位置。这是可以通过重写 startElement 来改进的一件事,以跟踪每个元素在堆栈中的起始位置,以及 endElement 从该堆栈弹出到 var< /code> 由 createNode 使用。

好问题。我学到了很多! :-)

I had no idea how to do that, but Pangea showed me the way. First, let's create a trait to handle location:

import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
    var locator: org.xml.sax.Locator = _
    def printLocation(msg: String) {
        println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
    }

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    // Display location messages
    abstract override def warning(e: SAXParseException) {
        printLocation("warning")
        super.warning(e)
    }
    abstract override def error(e: SAXParseException) {
        printLocation("error")
        super.error(e)
    }
    abstract override def fatalError(e: SAXParseException) {
        printLocation("fatal error")
        super.fatalError(e)
    }
}

Next, let's create our own loader overriding XMLLoader's adapter to include our trait:

import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}

And that's all there is to it! The object XML adds little to XMLLoader -- basically, the save methods. You might want to look at its source code if you feel the need for a full replacement. But this is only if you want to handle all of this yourself, since Scala already have a trait to produce errors:

object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}

The ConsoleErrorHandler trait extract its line and number information from the exception, by the way. For our purposes, we need the location outside exceptions too (I'm assuming).

Now, to modify node creation itself, look at the scala.xml.factory.FactoryAdapter abstract methods. I have settled on createNode, but I'm overriding at the NoBindingFactoryAdapter level, because that returns Elem instead of Node, which enables me to add attributes. So:

import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
    var locator: org.xml.sax.Locator = _

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
        super.createNode(pre, label, attrs, scope, children) 
        % Attribute("line", Text(locator.getLineNumber.toString), Null) 
        % Attribute("column", Text(locator.getColumnNumber.toString), Null)
    )
}

object MyLoader extends factory.XMLLoader[Elem] {
    // Keeping ConsoleErrorHandler for good measure
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}

Result:

scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>

Note that it got the last location, the one at the closing tag. That's one thing that can be improved by overriding startElement to keep track of where each element started in a stack, and endElement to pop from this stack into a var used by createNode.

Nice question. I learned a lot! :-)

中二柚 2024-10-14 19:47:55

看到scala内部使用SAX进行解析。 SAX 允许您在 ContentHandler,可用于检索当前位置发生错误的位置。但我不确定如何利用 Scala 的内部运作。 这是一篇文章 我发现可能会有所帮助,看看这是否可行。

I see that scala internally uses SAX for parsing. SAX allows you to set a Locator on the ContentHandler, which can be used to retrieve the current location where the error occurred. I am not sure how you can tap into the internal workings of Scala though. Here is one article I found that might be of some help to see if this is doable.

那些过往 2024-10-14 19:47:55

我对Scala一无所知,但在其他环境中也会出现同样的问题。例如,XML 转换将其结果沿着 SAX 管道发送到验证器,当验证器尝试查找其验证错误的行号时,它们就消失了。或者所讨论的 XML 从未被序列化或解析,因此从未有行号。

解决该问题的一种方法是生成(人类可读的)XPath 表达式来说明错误发生的位置。这些不像行号那么容易使用,但总比没有好得多:它们唯一地标识一个节点,并且通常很容易被人们解释(特别是如果他们有 XML 编辑器)。

例如,Schematron 使用 Ken Holman(我认为)的这个 XSLT 模板生成一个 XPath 表达式来描述上下文节点的位置/身份:

<xsl:template match="node() | @*" mode="schematron-get-full-path-2">
   <!--report the element hierarchy-->
   <xsl:for-each select="ancestor-or-self::*">
      <xsl:text>/</xsl:text>
      <xsl:value-of select="name(.)"/>
      <xsl:if test="preceding-sibling::*[name(.)=name(current())]">
         <xsl:text>[</xsl:text>
         <xsl:value-of
            select="count(preceding-sibling::*[name(.)=name(current())])+1"/>
         <xsl:text>]</xsl:text>
      </xsl:if>
   </xsl:for-each>
   <!--report the attribute-->
   <xsl:if test="not(self::*)">
      <xsl:text/>/@<xsl:value-of select="name(.)"/>
   </xsl:if>
</xsl:template>

我不知道您是否可以在您的场景中使用 XSLT,但您可以应用无论您拥有什么工具,原则都是一样的。

I don't know anything about Scala, but the same issue pops up in other environments. For example, an XML transformation sends its results down a SAX pipeline to a validator, and when the validator tries to find line numbers for its validation errors, they're gone. Or the XML in question was never serialized or parsed, and therefore never had line numbers.

One way to address the problem is by generating (human-readable) XPath expressions to say where the error occurred. These are not as easy to use as line numbers but they're a lot better than nothing: they uniquely identify a node, and they're often pretty easy for humans to interpret (especially if they have an XML editor).

For example, this XSLT template by Ken Holman (I think) used by Schematron generates an XPath expression to describe the location/identity of the context node:

<xsl:template match="node() | @*" mode="schematron-get-full-path-2">
   <!--report the element hierarchy-->
   <xsl:for-each select="ancestor-or-self::*">
      <xsl:text>/</xsl:text>
      <xsl:value-of select="name(.)"/>
      <xsl:if test="preceding-sibling::*[name(.)=name(current())]">
         <xsl:text>[</xsl:text>
         <xsl:value-of
            select="count(preceding-sibling::*[name(.)=name(current())])+1"/>
         <xsl:text>]</xsl:text>
      </xsl:if>
   </xsl:for-each>
   <!--report the attribute-->
   <xsl:if test="not(self::*)">
      <xsl:text/>/@<xsl:value-of select="name(.)"/>
   </xsl:if>
</xsl:template>

I don't know if you can use XSLT in your scenario, but you could apply the same principle with whatever tools you have available.

古镇旧梦 2024-10-14 19:47:55

尽管您表示不想使用不同的库或框架,但值得注意的是,所有优秀的 Java 流解析器(Xerces for Sax、Woodstox 和 Aalto for Stax)确实为它们所服务的所有事件/令牌提供了位置信息。

尽管此信息并不总是由 DOM 树等更高级别的抽象保留(由于需要额外的存储;性能并不是大问题,因为位置信息总是被跟踪,因为无论如何都需要它)这可能很容易或至少可能修复。

Although you indicated that you would not want to use different library or framework, it is worth noting that all good Java streaming parsers (Xerces for Sax, Woodstox and Aalto for Stax) do make location information available for all events/tokens they serve.

Although this information is not always retained by higher-level abstractions like DOM trees (due to additional storage needed; performance isn't big concern since location information is always tracked as it is needed for error reporting anyway) this may be easy or at least possible to fix.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文