使用 C# 搜索多个网站

发布于 2024-12-01 04:05:31 字数 223 浏览 0 评论 0原文

我可以使用C#自动搜索网站,然后返回搜索结果吗?

如果我给它一个顶级域名,是否有一个网络爬虫会做同样的事情(例如:我告诉它在 stackoverflow.com 上找到“有趣”这个词,它会告诉我所有“有趣”出现的时间)?

这些网站允许通过搜索栏进行搜索。

我需要网站合作来自动搜索吗?

注意:我只计划每天进行一到两次搜索,因此我怀疑我会被阻止,或被要求验证自己的身份。

Can I use C# to auto search websites, then return the search results?

Is there a web crawler that would do the same thing if I give it a top domain (ex: I tell it find the word "funny" on stackoverflow.com, and it would tell me all the times "funny" appeared)?

These web sites allow searching via their search bar.

Do I need the web sites cooperation to automate searches?

NOTE: I only plan to be doing about one or two searches a day, so I doubt I'll be blocked, or asked to authenticate myself.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笑脸一如从前 2024-12-08 04:05:31

如果您计划爬行整个网站来计算单词数,如果您不缓存它,您就会被阻止,那么您基本上会请求网站的每个页面。也许考虑集成谷歌域搜索?

以下是 google 页面的链接,详细介绍了如何与 c#

http://code 进行交互。 google.com/apis/gdata/client-cs.html

编辑: 抱歉,不太正确:http://gsalib.codeplex.com/

http://answers.oreilly.com/topic/2165-how-to-search-google-and-bing-in-c/

If your planning on crawling through an entire website to count words like that if you dont cache it you will get blocked, youll be requesting every page of the website essentially. Perhaps consider integrating google domain search's instead?

Here is a link to googles page detailing how to interface with c#

http://code.google.com/apis/gdata/client-cs.html

EDIT: Sorry that wasn't quite right : http://gsalib.codeplex.com/

http://answers.oreilly.com/topic/2165-how-to-search-google-and-bing-in-c/

霓裳挽歌倾城醉 2024-12-08 04:05:31

我会考虑构建一个 RSS 聚合器。 RSS 是标准化的,因此这可能是从各种来源收集搜索结果的最可靠方法。

编辑:对于不支持 RSS 的网站

对于不支持 RSS 的网站,您可以考虑使用屏幕抓取工具。查看有关代码项目的这篇文章以帮助您入门:

http://www.codeproject。 com/KB/aspnet/weather.aspx

I would look into building an RSS aggregator. RSS is standardized, so that's probably the most reliable way to collect search results from various sources.

EDIT: For sites that don't support RSS

For the sites that don't support RSS, you can look into using a screen scraper. Check out this article on The Code Project to get you started:

http://www.codeproject.com/KB/aspnet/weather.aspx

谁的新欢旧爱 2024-12-08 04:05:31

...网站允许通过搜索栏进行搜索...我可以使用 C# 自动搜索网站,然后返回搜索结果吗?

可以,如果网站提供了搜索词所在的 URL作为 URL 的查询字符串参数提供。

          http://yourTargetDomain?searchterm=foo

但是,除非该网站专门将该 URL 的搜索结果设计为结构化数据,否则该网站不会“告诉[您]所有出现‘有趣’的情况”,而是会向您发回适合浏览器的搜索响应要显示,因此您必须从该 HTML 流中解析结果。

例如:

http://philadelphia.craigslist .org/search/tls?query=ladder&srchType=A&minAsk=&maxAsk=

...web sites allow searching via their search bar ... Can I use C# to auto search websites, then return the search results?

Yes, if the website provides a URL where the search-term is provided as a query-string argument to a URL.

          http://yourTargetDomain?searchterm=foo

But unless the website has specifically designed the search results from that URL to be structured data, the website won't "tell [you] all the times 'funny' appeared" but will send you back a search response that is suitable for a browser to display, so you would have to parse the results out of this stream of HTML.

For example:

http://philadelphia.craigslist.org/search/tls?query=ladder&srchType=A&minAsk=&maxAsk=

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文