直接 LXML 或 PyQuery
有人有使用直接 lxml 与 PyQuery 进行抓取的经验吗?我最近才发现后者并且很感兴趣。我还没有找到很多关于这个库的评论,所以我很好奇它有多强大。
我熟悉 lxml 并且通常很喜欢它。然而,使用 jQuery 选择器语法会更好。
交换机值得吗?
谢谢!
Does anyone have experience scraping with straight lxml vs. PyQuery. I just came across the latter recently and was intrigued. I haven't been able to find many comments about the library just yet, so I'm curious as to how robust it is.
I'm familiar with lxml and generally enjoy it. It would be nice, however, to use jQuery selector syntax.
Is the switch worth it?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
lxml 支持 XPath,这与 CSS 选择器类似。
这能满足您的需求吗?
lxml supports XPath, which is similar to CSS selectors.
Would that meet your needs?
只有你自己才能回答是否值得的问题。
它仅取决于您是否想要使用额外的依赖项来获取 jQuery 的自定义 CSS 选择器。
以下是 jQuery 在标准 CSS 选择器之上添加的内容: http:// api.jquery.com/category/selectors/jquery-selector-extensions/
以下是将这些选择器翻译为 PyQuery 中的普通 CSS 选择器: https://bitbucket.org/olauzanne/pyquery/src/c2bf08a8f4e7/pyquery/cssselectpatch .py
我不明白为什么它比使用 lxml 的普通 CSS 选择器更不健壮。它只是将特殊的 jQuery 选择器转换为 CSS 选择器。
Only you can answer the question of whether it's worth it.
It simply depends on whether you want to use an extra dependency in order to get jQuery's custom CSS selectors.
Here are the things jQuery adds on top of the standard CSS selectors: http://api.jquery.com/category/selectors/jquery-selector-extensions/
And here is the translation of those selectors to normal CSS selectors in PyQuery: https://bitbucket.org/olauzanne/pyquery/src/c2bf08a8f4e7/pyquery/cssselectpatch.py
I don't see why it should be any less robust than using plain CSS selectors with lxml. It's simply translating special jQuery selectors into CSS selectors.