使用YQL多查询& XPath解析HTML,如何转义嵌套引号?
标题比它必须的更复杂,这是问题查询。
SELECT *
FROM query.multi
WHERE queries="
SELECT *
FROM html
WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
SELECT *
FROM xml
WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
具体来说,这一行
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
是有问题的,因为引用,我必须将它们嵌套三层深,并且我已经用完了可以使用的引号字符。我尝试了以下变体但没有成功:
//no attribute quoting
xpath='//li[@class=listLi]/div[@class=views]/a/span'
//try to quote attribute w/ backslash & single quote
xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span'
//try to quote attribute w/ backslash & double quote
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
//try to quote attribute with double single quotes, like SQL
xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span'
//try to quote attribute with double double quotes, like SQL
xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span'
//try to quote attribute with quote entities
xpath='//li[@class="listLi"]/div[@class="views"]/a/span'
//try to surround XPath with backslash & double quote
xpath=\"//li[@class='listLi']/div[@class='views']/a/span\"
//try to surround XPath with double double quote
xpath=""//li[@class='listLi']/div[@class='views']/a/span""
全部都没有成功。
我没有看到太多关于转义 XPath 字符串的内容,但我发现的所有内容似乎都是使用 concat (这不会有帮助,因为 ' 和 " 都不可用)或 html 实体的变体。不使用属性的引号不会不会抛出错误,但会失败,因为它不是我需要的实际 XPath 字符串,
我在 YQL 文档中没有看到任何有关如何处理转义的内容,但我希望他们会这样做。有某种逃生指南。
The title is more complicated than it has to be, here's the problem query.
SELECT *
FROM query.multi
WHERE queries="
SELECT *
FROM html
WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
SELECT *
FROM xml
WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
Specifically this line,
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
It's problematic because of the quoting, I have to nest them three levels deep and I've run out of quote characters to use. I've tried the following variations without success:
//no attribute quoting
xpath='//li[@class=listLi]/div[@class=views]/a/span'
//try to quote attribute w/ backslash & single quote
xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span'
//try to quote attribute w/ backslash & double quote
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
//try to quote attribute with double single quotes, like SQL
xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span'
//try to quote attribute with double double quotes, like SQL
xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span'
//try to quote attribute with quote entities
xpath='//li[@class="listLi"]/div[@class="views"]/a/span'
//try to surround XPath with backslash & double quote
xpath=\"//li[@class='listLi']/div[@class='views']/a/span\"
//try to surround XPath with double double quote
xpath=""//li[@class='listLi']/div[@class='views']/a/span""
All without success.
I don't see much out there about escaping XPath strings but everything I've found seems to be variations on using concat (which won't help because neither ' nor " are available) or html entities. Not using quotes for the attributes doesn't throw an error but fails because it's not the actual XPath string I need.
I don't see anything in the YQL docs about how to handle escaping. I'm aware of how edge-casey this is but was hoping they'd have some sort of escaping guide.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要使用双反斜杠转义任何分隔 XPath 查询的字符...换句话说:
(在 YQL 控制台中尝试此操作)
You need to escape whatever character is delimiting your XPath query with a double backslash... in other words:
(try this in the YQL console)
我想出了一个解决方案,它并没有真正回答我原来的问题,但确实解决了问题。
data.html.cssselect 表将采用 CSS 选择器和将其解析为 XPath,避免令人讨厌的转义问题。
I've come up with a solution that doesn't really answer my original question but does solve the problem.
The data.html.cssselect table will take a CSS selector & parse it into an XPath, avoiding the nasty escaping issues.