网站内容抓取
我们在 IIS 6 Windows 2003 上托管了一个企业列表目录。我们的竞争对手抓取并窃取我们的内容和客户。
我们尝试过使用蜜罐 URL 和日志解析来阻止 IP,但没有取得太大成功。有谁知道我可以在网络服务器前面运行的网络设备或代理服务器来最大程度地减少此问题?
所有建议均受到高度赞赏。
We have a Business Listings directory hosted on IIS 6 Windows 2003. Our competitors crawl and steal our content and customers.
We have tried IP blocking using honeypot URLs and log parsing without much success. Is anyone aware of a network device or a proxy server that I can run in front of my web server to minimize this issue?
All suggestions are highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以尝试蜘蛛陷阱,但他们可以为此添加检查。
您还可以添加速率限制器,并在达到一定速率后强制他们解决验证码,但您也可能会惹恼您的常规用户。
但实际上,您创建的任何内容他们都可能可以适应并解决。你最好的办法可能就是像开发商艺术所说的那样,然后找律师。
You could try a spider trap, but they could add a check for that.
You could also add a rate limiter, and after a certain rate force them to solve a CAPTCHA, but you might also annoy your regular users.
But really, anything you create they can probably adapt and work around. Your best be might just be what Developer Art said, and get a lawyer.
如果有很多数据页面,您可以监控访问者的 IP,并确保给定的 IP 每天看到的页面不超过您页面的一小部分。
最终您想要的是一个矛盾:您确实希望人们将其下载到他们的计算机上(立即查看);但您不希望他们将其下载到他们的计算机上(以便稍后查看)。
If there are many pages of data, you can monitor the IPs of visitors and make sure a given IP sees no more than a fraction of your pages per day.
Ultimately what you want is a contradiction: you do want people to download it to their computers (to view it now); but you don't want them to download it to their computers (to view it later).