当前位置：文江博客话题详情

如何使用 yahoo 搜索 API 搜索特定文件类型？

发布于 2024-07-13 03:39:52 字数 143 浏览 16 评论 0原文

有谁知道雅虎上是否有一些可用于程序化搜索的参数，允许限制结果，以便仅返回特定类型文件的链接（例如 PDF）？在 GUI 中可以做到这一点，但如何通过 API 实现呢？

我非常感谢 Python 中的示例代码，但任何其他解决方案也可能会有所帮助。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尐籹人 2024-07-20 03:39:52

是的，有：

http://developer.yahoo.com/search /boss/boss_guide/Web_Search.html#id356163

回复收藏 0 原文

瘫痪情歌 2024-07-20 03:39:52

谢谢。
我发现自己这样的东西可以正常工作（文件类型是第一个参数，查询是第二个）：

format = sys.argv[1]

query = " ".join(sys.argv[2:])

srch = create_search （“Web”，app_id，查询=查询，格式=格式）

回复收藏 0 原文

沉溺在你眼里的海 2024-07-20 03:39:52

这就是我为这类事情所做的事情。它公开了更多参数，以便您可以根据需要进行调整。这应该打印出查询“resume”中的前十个 PDF URL [我的不是其中之一;)]。您可以根据需要下载这些 URL。

从查询返回的 json 字典有点粗糙，但这应该可以帮助您入门。请注意，在实际代码中，您需要检查字典中的某些键是否存在。当没有结果时，这段代码可能会抛出异常。

Tiago 提供的链接有助于了解“type”参数支持哪些值。

from yos.crawl import rest
APPID="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
base_url = "http://boss.yahooapis.com/ysearch/%s/v%d/%s?start=%d&count=%d&type=%s" + "&appid=" + APPID
querystr="resume"
start=0
count=10
type="pdf"
search_url = base_url % ("web", 1, querystr, start, count, type)
json_result = rest.load_json(search_url)
for url in [recs['url'] for recs in json_result['ysearchresponse']['resultset_web']]:
    print url

Here's what I do for this sort of thing. It exposes more of the parameters so you can tune it to your needs. This should print out the first ten PDFs URLs from the query "resume" [mine's not one of them ;) ]. You can download those URLs however you like.

The json dictionary that gets returned from the query is a little gross, but this should get you started. Be aware that in real code you will need to check whether some of the keys in the dictionary exist. When there are no results, this code will probably throw an exception.

The link that Tiago provided is good for knowing what values are supported for the "type" parameter.

from yos.crawl import rest
APPID="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
base_url = "http://boss.yahooapis.com/ysearch/%s/v%d/%s?start=%d&count=%d&type=%s" + "&appid=" + APPID
querystr="resume"
start=0
count=10
type="pdf"
search_url = base_url % ("web", 1, querystr, start, count, type)
json_result = rest.load_json(search_url)
for url in [recs['url'] for recs in json_result['ysearchresponse']['resultset_web']]:
    print url

回复收藏 0 原文

~没有更多了~