当前位置：文江博客话题详情

有没有办法使用 Scala 的 XML 库执行 XPath 字符串查询？

发布于 2024-10-03 01:42:03 字数 282 浏览 1 评论 0原文

给定一个 scala XML 对象，我可以执行像“//entries[@title='scala']”这样的 xpath 字符串查询吗？

理想情况下，它会像：

.xpath("//b[@name='n1']" )

我无法手动将所有 xpath 查询转换为 scala 的内部 xpath-ish 方法调用，因为我的程序将动态接受 xpath 查询。

此外，内置的 java xml 库非常冗长，所以我想避免它。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉墨 2024-10-10 01:42:04

最好的选择是（而且一直都是，即使是在 Java 中）使用 JDOM。我用下面的库对 JDom 进行了拉皮条，使其对 scala 更加友好：

import org.jdom._
import org.jdom.xpath._
import scala.collection.JavaConversions
import java.util._
import scala.collection.Traversable


package pimp.org.jdom{
   object XMLNamespace{
      def apply(prefix:String,uri:String) = Namespace.getNamespace(prefix,uri)
      def unapply(x:Namespace) = Some( (x.getPrefix, x.getURI) )
   }
   object XMLElement{
      implicit def wrap(e:Element) = new XMLElement(e)
      def unapply(x:Element) = Some( (x.getName, x.getNamespace) )
   }
   class XMLElement(underlying:Element){
      def attributes:java.util.List[Attribute] =
         underlying.getAttributes.asInstanceOf[java.util.List[Attribute]]
      def children:java.util.List[Element] =
         underlying.getChildren.asInstanceOf[java.util.List[Element]]
      def children(name: String): java.util.List[Element] =
         underlying.getChildren(name).asInstanceOf[java.util.List[Element]]
      def children(name: String, ns: Namespace): java.util.List[Element] =
         underlying.getChildren(name, ns).asInstanceOf[java.util.List[Element]]
   }
}

package pimp.org.jdom.xpath{
   import pimp.org.jdom._

   //instances of these classes are not thread safe when xpath variables are used

   class SingleNodeQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectSingleNode(startFrom).asInstanceOf[NType]
      }
   }

   class NodesQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectNodes(startFrom).asInstanceOf[java.util.List[NType]]
      }
   }

   class NumberValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.numberValueOf(startFrom).intValue
      }
   }

   class ValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.valueOf(startFrom)
      }
   }

}

当我写这篇文章时，我的想法是，一般来说，您希望提前编译每个 XPath 查询（以便可以多次重用它），并且您希望在指定查询文本时指定查询返回的类型（与 JDOM 的 XPath 类不同，后者在执行时选择四种方法之一进行调用）。

命名空间应该隐式传递（这样您就可以指定它们一次，然后就忘记它们），并且 XPath 变量绑定应该在查询时可用。

您可以像这样使用该库：（可以推断出显式类型注释 - 我将它们包含在内只是为了说明。）

val S = XMLNamespace("s","http://www.nist.gov/speech/atlas")
val XLink = XMLNamespace("xlink", "http://www.w3.org/1999/xlink")
implicit val xmlns= List(S, XLink)

private val anchorQuery=new ValueQuery("s:AnchorRef[@role=$role]/@xlink:href")

val start:String=anchorQuery(region,"role"->"start")
val end:String=anchorQuery(region,"role"->"end")

//or

private val annotationQuery=new NodesQuery[Element]("/s:Corpus/s:Analysis/s:AnnotationSet/s:Annotation")

for(annotation:Element <- annotationQuery(doc)) {
  //do something with it
}

我想我应该想出某种方法将其发布给公众。

Your best bet is (and always was, even in Java) to use JDOM. I've pimped JDom with the following library to be a bit more scala friendly:

import org.jdom._
import org.jdom.xpath._
import scala.collection.JavaConversions
import java.util._
import scala.collection.Traversable


package pimp.org.jdom{
   object XMLNamespace{
      def apply(prefix:String,uri:String) = Namespace.getNamespace(prefix,uri)
      def unapply(x:Namespace) = Some( (x.getPrefix, x.getURI) )
   }
   object XMLElement{
      implicit def wrap(e:Element) = new XMLElement(e)
      def unapply(x:Element) = Some( (x.getName, x.getNamespace) )
   }
   class XMLElement(underlying:Element){
      def attributes:java.util.List[Attribute] =
         underlying.getAttributes.asInstanceOf[java.util.List[Attribute]]
      def children:java.util.List[Element] =
         underlying.getChildren.asInstanceOf[java.util.List[Element]]
      def children(name: String): java.util.List[Element] =
         underlying.getChildren(name).asInstanceOf[java.util.List[Element]]
      def children(name: String, ns: Namespace): java.util.List[Element] =
         underlying.getChildren(name, ns).asInstanceOf[java.util.List[Element]]
   }
}

package pimp.org.jdom.xpath{
   import pimp.org.jdom._

   //instances of these classes are not thread safe when xpath variables are used

   class SingleNodeQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectSingleNode(startFrom).asInstanceOf[NType]
      }
   }

   class NodesQuery[NType](val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.selectNodes(startFrom).asInstanceOf[java.util.List[NType]]
      }
   }

   class NumberValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.numberValueOf(startFrom).intValue
      }
   }

   class ValueQuery(val expression:String)(implicit namespaces:Traversable[Namespace]=null){
      private val compiled=XPath.newInstance(expression)

      if (namespaces!=null){
         for ( ns <- namespaces ) compiled.addNamespace(ns.getPrefix,ns.getURI)
      }

      def apply(startFrom:Any,variables:(String,String)*)={
         variables.foreach{ x=> compiled.setVariable(x._1,x._2)}
         compiled.valueOf(startFrom)
      }
   }

}

My idea when I wrote this was that in general, you want to compile each XPath query in advance (so that it can be reused more than once), and that you want to specify the type returned by the query at the point where you specify the text of the query (not like JDOM's XPath class does which is to pick one of four methods to call at execution time).

Namespaces should be passed around implicitly (so you can specify them once and then forget about them), and XPath variable binding should be available at query time.

You'd use the library like this: (Explicit type annotations can be inferred -- I've included them for illustration only.)

val S = XMLNamespace("s","http://www.nist.gov/speech/atlas")
val XLink = XMLNamespace("xlink", "http://www.w3.org/1999/xlink")
implicit val xmlns= List(S, XLink)

private val anchorQuery=new ValueQuery("s:AnchorRef[@role=$role]/@xlink:href")

val start:String=anchorQuery(region,"role"->"start")
val end:String=anchorQuery(region,"role"->"end")

//or

private val annotationQuery=new NodesQuery[Element]("/s:Corpus/s:Analysis/s:AnnotationSet/s:Annotation")

for(annotation:Element <- annotationQuery(doc)) {
  //do something with it
}

I guess I should come up with some way of releasing this to the public.

回复收藏 0 原文

梦境 2024-10-10 01:42:04

kantan.xpath 就是这样做的。这是我刚刚在 REPL: 中输入的内容

import kantan.xpath._
import kantan.xpath.ops._

"<a><b name='n1'></b></a>".evalXPath[Node]("//b[@name='n1']")

，其中 Node 类型参数描述了期望从 XML 文档中提取的类型。一个可能更明确的例子是：

new URI("http://stackoverflow.com").evalXPath[List[URI]]("//a/@href")

这将下载 stackoverflow 主页，将其评估为 XML 文档（有一个用于 HTML 清理的 NekoHTML 模块）并提取所有链接的目标。

kantan.xpath does just that. Here's something I just typed in the REPL:

import kantan.xpath._
import kantan.xpath.ops._

"<a><b name='n1'></b></a>".evalXPath[Node]("//b[@name='n1']")

, where the Node type parameter describes the type one expects to extract from the XML document. A perhaps more explicit example would be:

new URI("http://stackoverflow.com").evalXPath[List[URI]]("//a/@href")

This would download the stackoverflow homepage, evaluate it as an XML Document (there's a NekoHTML module for HTML sanitisation) and extract the target of all links.

回复收藏 0 原文

~没有更多了~