如何从脚本中搜索 Stack Overflow 问题?
给定一串关键字,例如“Python 最佳实践”,我想获取包含该关键字的前 10 个 Stack Overflow 问题,按相关性 (?) 排序,例如来自 Python 脚本。 我的目标是最终得到一个元组列表(标题、URL)。
我怎样才能做到这一点? 您会考虑改为查询 Google 吗? (你会如何用 Python 做到这一点?)
Given a string of keywords, such as "Python best practices", I would like to obtain the first 10 Stack Overflow questions that contain that keywords, sorted by relevance (?), say from a Python script. My goal is to end up with a list of tuples (title, URL).
How can I accomplish this? Would you consider querying Google instead? (How would you do it from Python?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
将其转换为函数应该很简单。
编辑:哎呀,我会做的......
Converting this to a function should be trivial.
EDIT: Heck, I'll do it...
由于 Stackoverflow 已经具备此功能,您只需获取搜索结果页面的内容并抓取您需要的信息即可。 以下是按相关性搜索的 URL:
如果您查看源代码,您将看到每个问题所需的信息都在这样的行中:
因此,您应该能够通过对该形式的字符串进行正则表达式搜索来获取前十个问题。
Since Stackoverflow already has this feature you just need to get the contents of the search results page and scrape the information you need. Here is the URL for a search by relevance:
If you View Source, you'll see that the information you need for each question is on a line like this:
So you should be able to get the first ten by doing a regex search for a string of that form.
建议向 SO 添加 REST API。 http://stackoverflow.uservoice.com/< /a>
Suggest that a REST API be added to SO. http://stackoverflow.uservoice.com/
您可以从有效的 HTTP 请求中筛选返回的 HTML。 但这会导致恶业,并失去享受良好睡眠的能力。
You could screen scrape the returned HTML from a valid HTTP request. But that would result in bad karma, and the loss of the ability to enjoy a good night's sleep.
我只是使用 Pycurl 将搜索词连接到查询 uri 上。
I would just use Pycurl to concatenate the search terms onto the query uri.