Ruby Anemone 蜘蛛为每个访问的 url 添加标签
我设置了抓取:
require 'anemone'
Anemone.crawl("http://www.website.co.uk", :depth_limit => 1) do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
但是我希望蜘蛛在它访问的每个 URL 上使用 Google-analytics 反跟踪标记,而不一定实际单击链接。
我可以使用蜘蛛一次并存储所有 URL,并使用 WATIR 来运行它们并添加标签,但我想要以避免这种情况,因为它很慢,而且我喜欢skip_links_like和页面深度函数。
我怎样才能实现这个?
I have a crawl set up:
require 'anemone'
Anemone.crawl("http://www.website.co.uk", :depth_limit => 1) do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
However I want the spider to use a Google-analytics anti-tracking tag on every URL it visits and not necessarily actually click the links.
I could use the spider once and store all of the URL's and use WATIR to run through them adding the tag but I want to avoid this because it is slow and I like the skip_links_like and page depth functions.
How could I implement this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您想在加载之前向 URL 添加一些内容,对吗?您可以使用
focus_crawl
来实现此目的。focus_crawl
方法旨在过滤 URL 列表:但您也可以将其用作通用 URL 过滤器。
例如,如果您想将
atm_source=SiteCon&atm_medium=Mycampaign
添加到所有链接,那么您的page.links.map
将如下所示:如果您
atm_source
或atm_medium
包含非 URL 安全字符,然后对它们进行 URI 编码。You want to add something to the URL before you load it, correct? You can use
focus_crawl
for that.The
focus_crawl
method intended to filter the URL list:but you can use it as a general purpose URL filter as well.
For example, if you wanted to add
atm_source=SiteCon&atm_medium=Mycampaign
to all the links then yourpage.links.map
would look something like this:If you're
atm_source
oratm_medium
contain non-URL safe characters then URI-encode them.