检测停放页面的方法?

发布于 2024-07-12 01:06:07 字数 132 浏览 8 评论 0 原文

有人知道一种以编程方式检测停放网页的方法吗? 也就是说,那些您意外输入(或有时故意输入)的页面,它们由域名停放服务托管,上面除了广告什么都没有。

我正在开发一个链接网络,并希望确保过期的网站不会被其他人抢走,然后成为停放页面。

Anyone know of a way to programatically detect a parked web page? That is, those pages that you accidentally type in (or intentionally sometimes) and they are hosted by a domain parking service with nothing but ads on them.

I am working on a linking network and want to make sure that sites that expire don't end up getting snatched by someone else and then being a parked page.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

倚栏听风 2024-07-19 01:06:07

我认为这是一个测试,可以捕获相当多的人。 它利用了这样一个事实:您实际上并不希望为您的停放域提供真正的网站。 它查找子域和路径的通配符。 假设我们的系统中有这个 URL

http://www.example.com/method-检测停车

首先,我会检查实际的 URL 并对其进行哈希处理或获取副本进行比较。

我的第二个检查是

http://random.example.com/random

是否与原始链接匹配或者甚至成功,您就有一个很好的指示页面已停放。 如果失败,我可能会单独检查子域和路径。 如果页面随机更改某些元素,您可能需要选择一些项目进行比较。 例如,制作页面中包含的链接列表并比较这些链接或标题标签。

Here is a test that I think may catch a decent number of them. It takes advantage of the fact you don't actually want to have real web sites up for your parked domains. It looks for the wildcarding of both subdomain and path. Lets say we have this URL in our system

http://www.example.com/method-to-detect-parked.

First I would check the actual URL and hash it or grab a copy for comparison.

My second check would be to

http://random.example.com/random

If it matches the original link or even succeeds, you have a pretty good indicator that the page is parked. If it fails I might check both the subdomain and path individually. If the page randomly changes some elements, you may want to choose a few items to compare. For example make a list of links included in the page and compare those or maybe the title tag.

吾家有女初长成 2024-07-19 01:06:07

我想说的是,您必须检查相关网站的 WHOIS 记录和/或页面的实际内容,并就什么构成“停放页面”制定一些启发式方法。

goooogle.com 为例,查看其 WHOIS 记录显示他们属于“隐私保护”,并且他们的 DNS 服务器是 ns1/ns2.fastpark.net。 如果你查看该网站的源代码,他们会愚蠢到有一个名为“style_park.css”的 CSS 文件:)

总而言之,我认为你无法想出一个通用的方法来做吧。 您可能最终会得到一些不断发展的规则库或黑名单

I would say that you'll have to examine the WHOIS records for the sites in question and/or the actual content of the pages and develop some heuristics as to what constitutes a "parked page".

Take goooogle.com, looking at their WHOIS record shows that they are owned by "Privacy Protection" and that their DNS servers are ns1/ns2.fastpark.net. If you look at the source for the site, they're silly enough to have a CSS file named "style_park.css" :)

All in all, I don't think you'll be able to come up with a generic way to do it. You'll probably end up with some ever evolving rule base or blacklist

朕就是辣么酷 2024-07-19 01:06:07

查看 dns/whois 记录的创建日期,并将其与链接的添加日期进行比较。 如果 DNS 较新,则该链接需要手动检查。

或者:检查 http://example.com/http://example.com/xxxxxxrandomstringxxxxx 。 如果这两个页面相同,则说明存在某种需要手动检查的问题。 您想要链接的主页已损坏,或者域名已停放并且所有页面都返回相同的值。 此测试不是 100%,因为某些停放页面会回显 URL 中的元素。

如果您只想检查现有网站,像 http://www.linkalarm.com/ 这样的服务就可以这井。

Look at the creation date of the dns/whois record, and compare it to the add date of the link. If the DNS is newer, that's a link that needs manual checking.

Or: check http://example.com/ and http://example.com/xxxxxxrandomstringxxxxx . If those two pages are identical, you've got some sort of problem that needs manual checking. Either the primary page you wanted to link to is broken, or the domain is parked and all pages return the same value. This test is not 100%, because some parked pages echo back elements from the URL.

If you just want to check an existing website, a service like http://www.linkalarm.com/ does this well.

兮颜 2024-07-19 01:06:07

您可以只依靠您的用户“报告此链接”...这会将其放入队列中以供稍后查看?

You could just rely on your users to "Report this link"... which would put it into a queue to review later?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文