每日交易网站聚合器

发布于 2024-10-12 17:15:49 字数 249 浏览 3 评论 0原文

最近我发现了一些提供一站式团购服务的网站(例如http://dealery.com)优惠券。我想知道这些网站如何从各种每日交易网站获取交易信息。我确信他们没有使用任何 API。因为并非所有每日交易网站都提供 API。

他们在进行屏幕抓取吗?或者他们是否使用 RSS 提要来构建自己的数据库? 如果有人知道这个问题的解决方案,请分享。我将不胜感激。

谢谢。

Recently I came across few sites (for eg. http://dealery.com) which provide One-Stop shopping for group buying coupons. I am wondering how these sites get deal info from various daily deal sites. I am sure they are not using any APIs. Because not all daily deal sites are providing APIs.

Are they doing screen scraping? OR are they using the RSS feeds to build their own database?
If anyone know the solution for this, please share. I would greatly appreciate that.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

裸钻 2024-10-19 17:15:49

我知道这有点旧了,但我想我会花时间在这里回答你的问题。正如 Logan 指出的,像 Agriya 开发的交易聚合脚本通过三种方式从日常交易网站中抓取数据:

  1. 从可用的 RSS 提要中解析数据
  2. 从联属网络提供的 XML 提要中解析数据例如 Commission Junction
  3. 它使用正则表达式从交易网站的 HTML 页面中解析所需的数据

选项 1 和 2 相当容易实现,但选项 3 要求您非常擅长格式化正则表达式。您还会遇到一个更令人头疼的问题:如果交易网站对其 HTML 代码进行了最轻微的更改,则需要重做正则表达式。

I know this is a little bit old but thought I'd take the time to answer your question here. As Logan pointed out, deal aggregation scripts like the one Agriya has developed scrapes the data from daily deal websites in three ways:

  1. It parses the data out of the RSS feed where available
  2. It parses the data out of an XML feed provided by an affiliate network like Commission Junction
  3. It uses regular expressions to parse the required data out of the HTML pages of the deal websites

Options 1 and 2 are fairly easy to achieve but option 3 requires you to be pretty good at formatting regular expressions. You've also got the added headache that if the deal site makes the slightest of changes to their HTML code then the regular expression needs to be redone.

橙幽之幻 2024-10-19 17:15:49

我们经营澳大利亚最大的交易聚合商。我们使用以下方法来获取我们的数据;

  • 解析来自站点的数据 XML 提要(首选)
  • 解析来自站点的数据 RSS 提要
  • 自定义屏幕抓取

正如 Peter 提到的,当站点更改代码时,屏幕抓取可能会有点麻烦,但是这种情况并不经常发生。也许每月需要更新我们列出的 100 个左右网站中的 1 到 2 个网站。

We run Australia's largest deal aggregator. We use the following methods to get our data;

  • Parse data from sites XML feed (preferred)
  • Parse data from sites RSS feed
  • Custom screen scraping

As Peter mentioned the screen scraping can be a bit of a pain when sites change their code, however this doesn't happen that often. Maybe have to update 1 or 2 sites a month out of the 100 or so we list.

望她远 2024-10-19 17:15:49

Dealery 可能会使用 RSS feed 或 API,因为我见过由他们聚合的网站实际上提供 RSS feed 和 API。

其他网站:

可能会进行屏幕抓取,因为我无法找到它们聚合的某些网站的 RSS 提要或 API。

Dealery might use RSS feeds or APIs because the sites I've seen aggregated by them actually offer RSS feeds and APIs.

Other sites:

might do screen scraping because I can't find RSS feeds or APIs for some of the sites they aggregate from.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文