如何从 XPath 查询中的先前属性值中提取嵌入的属性值?

发布于 2024-11-18 07:00:45 字数 487 浏览 2 评论 0原文

我试图从 html 的以下部分中的 onclick 属性中“选择”链接

<span onclick="Javascript:document.quickFindForm.action='/blah_blah'" 
 class="specialLinkType"><img src="blah"></span>

,但无法获得比以下 XPath 更进一步的信息,

//span[@class="specialLinkType"]/@onclick

该 XPath 只返回

Javascript:document.quickFindForm.action

Any ideas on how to pick out that link inside of the QuickFindForm .action 带有 XPath?

I'm trying to "select" the link from the onclick attribute in the following portion of html

<span onclick="Javascript:document.quickFindForm.action='/blah_blah'" 
 class="specialLinkType"><img src="blah"></span>

but can't get any further than the following XPath

//span[@class="specialLinkType"]/@onclick

which only returns

Javascript:document.quickFindForm.action

Any ideas on how to pick out that link inside of the quickFindForm.action with an XPath?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

z祗昰~ 2024-11-25 07:00:45

我在 Java 应用程序中尝试了 XPath,它工作正常:

    import java.io.IOException;
    import java.io.StringReader;

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathFactory;

    import org.w3c.dom.Document;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;

    public class Teste {

        public static void main(String[] args) throws Exception {
            Document doc = stringToDom("<span onclick=\"Javascript:document.quickFindForm.action='/blah_blah'\" class=\"specialLinkType\"><img src=\"blah\"/></span>");
            XPath newXPath = XPathFactory.newInstance().newXPath();
            XPathExpression xpathExpr = newXPath.compile("//span[@class=\"specialLinkType\"]/@onclick");
            String result = xpathExpr.evaluate(doc);
            System.out.println(result);

        }

        public static Document stringToDom(String xmlSource) throws SAXException, ParserConfigurationException, IOException {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            return builder.parse(new InputSource(new StringReader(xmlSource)));
        }
    }

结果:

Javascript:document.quickFindForm.action='/blah_blah'

I tried the XPath in a Java application and it worked ok:

    import java.io.IOException;
    import java.io.StringReader;

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathFactory;

    import org.w3c.dom.Document;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;

    public class Teste {

        public static void main(String[] args) throws Exception {
            Document doc = stringToDom("<span onclick=\"Javascript:document.quickFindForm.action='/blah_blah'\" class=\"specialLinkType\"><img src=\"blah\"/></span>");
            XPath newXPath = XPathFactory.newInstance().newXPath();
            XPathExpression xpathExpr = newXPath.compile("//span[@class=\"specialLinkType\"]/@onclick");
            String result = xpathExpr.evaluate(doc);
            System.out.println(result);

        }

        public static Document stringToDom(String xmlSource) throws SAXException, ParserConfigurationException, IOException {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            return builder.parse(new InputSource(new StringReader(xmlSource)));
        }
    }

Result:

Javascript:document.quickFindForm.action='/blah_blah'
乖乖公主 2024-11-25 07:00:45

如果 Scrapy 支持 XPath 字符串函数,这将起作用

substring-before(
   substring-after(
      //span[@class="specialLinkType"]/@onclick,"quickFindForm.action='")
   ,"'")

它看起来也支持正则表达式。像这样的东西应该有效

.select('//span[@class="specialLinkType"]/@onclick').re(r'quickFindForm.action=\'(.*?)\'')

警告:我无法测试第二个解决方案,您必须检查 \' 在这种情况下是单引号的正确转义序列。

If Scrapy supports XPath string functions this will work

substring-before(
   substring-after(
      //span[@class="specialLinkType"]/@onclick,"quickFindForm.action='")
   ,"'")

It looks like it also supports regex. Something like this should work

.select('//span[@class="specialLinkType"]/@onclick').re(r'quickFindForm.action=\'(.*?)\'')

Caveat: I can't test the second solution and you will have to check that \' is the proper escape sequence for single quotes in this case.

天荒地未老 2024-11-25 07:00:45

我用的是xquery,但是xpath中应该是一样的。我使用了一个 xpath 函数“tokenize”,它根据正则表达式分割字符串(http://www.xqueryfunctions.com/xq/fn_tokenize.html)。
在这种情况下,我根据“ ' ”分割字符串,

        xquery version "1.0";
        let $x := //span[@class="specialLinkType"]/@onclick
        let $c := fn:tokenize( $x, '''' )
        return $c[2]

在 xpath 中应该是:

        fn:tokenize(//span[@class="specialLinkType"]/@onclick, '''' )[2]

I used xquery but it should be the same in xpath. I used an xpath function "tokenize" that splits a string based on a regular expression (http://www.xqueryfunctions.com/xq/fn_tokenize.html).
In this case I split the string basing on " ' "

        xquery version "1.0";
        let $x := //span[@class="specialLinkType"]/@onclick
        let $c := fn:tokenize( $x, '''' )
        return $c[2]

That in xpath shoud be:

        fn:tokenize(//span[@class="specialLinkType"]/@onclick, '''' )[2]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文