在投票比赛中追捕作弊者
目前我们正在举办一场比赛,进展非常顺利。不幸的是,我们让所有这些作弊者重新开始工作,他们正在运行自动为其条目投票的脚本。通过手动查看数据库条目,我们已经发现了一些作弊者 - 例如,使用同一浏览器在 70 分钟内获得 5 星级评级。现在,随着用户群的增长,识别他们变得越来越困难。
到目前为止我们所做的:
- 我们存储 IP 和浏览器,并在一小时的时间内阻止该组合。饼干对对付这些家伙没有帮助。
- 我们还使用了验证码,该验证码已被破解
有谁知道如何使用 PHP 脚本在数据库中找到模式,或者如何更有效地阻止它们?
任何帮助将不胜感激...
Currently we are running a competition which proceeds very well. Unfortunately we have all those cheaters back in business who are running scripts which automatically vote for their entries. We already saw some cheaters by looking at the database entries by hand - 5 Star ratings with same browser exactly all 70 minutes for example. Now as the userbase grows up it gets harder and harder to identify them.
What we do until now:
- We store the IP and the browser and block that combination to a one hour timeframe. Cookies won't help against these guys.
- We are also using a Captcha, which has been broken
Does anyone know how we could find patterns in our database with a PHP script or how we could block them more efficiently?
Any help would be very appreciated...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
直接反馈消除
这更像是一种通用策略,可以与许多其他方法结合使用。不要让垃圾邮件发送者知道他是否成功。
您可以完全隐藏当前结果,仅显示百分比而不显示绝对票数,或者延迟显示票数。
投票标记
也是一种通用策略。如果您有某种理由认为该投票是垃圾邮件发送者发出的,请计算他们的投票并将其标记为无效,并在最后删除无效投票。
码
使用 验证码。如果您的验证码已损坏,请使用更好的验证码。
IP 检查
毫无用处 限制 IP 地址可以投票的数量在一个时间跨度内。
推荐人检查
如果您假设一个用户映射一个 IP 地址,则可以限制该 IP 地址投票的数量。然而,这种假设通常只适用于私人家庭。
电子邮件确认
使用电子邮件确认,每封电子邮件只允许一票。手动检查您的数据库,看看他们是否使用一次性电子邮件。
请注意,您可以将
+foo
添加到电子邮件地址中的用户名。[电子邮件受保护]
和< a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2f5a5c4a5d414e424a044940406f4a574e425f434a014c4042">[email protected]
会将邮件发送到同一帐户,所以请记住,在检查是否有人已经投票时。HTML 表单随机化
随机化选择的顺序。他们可能需要一段时间才能发现这一点。
HTTPS
投票伪造的一种方法是从有效的浏览器(如 Firefox)捕获 http 请求并用脚本模仿它,当您使用时,这并不那么容易加密。
代理检查
更困难如果垃圾邮件发送者通过代理投票,您可以检查 X-Forwarded-For 标头。
缓存检查
尝试查看客户端是否加载所有未缓存的资源。
许多垃圾邮件机器人不这样做。我从未尝试过这一点,我只知道投票网站通常不会检查这一点。
一个示例是在 html 中嵌入
,其中 a.gif 是一些 1x1 像素图像。然后,您必须使用
Cache-Control "no-cache, Must-revalidate"
设置请求GET /a.gif
的 http 标头。您可以使用.htaccess
文件在 Apache 中设置 http 标头,例如 这个。 (感谢 Jacco)[编辑 2010-09-22]
Evercookie
Direct feedback elimination
This is more of a general strategy that can be combined with many of the other methods. Don't let the spammer know if he succeeds.
You can either hide the current results altogether, only show percentages without absolute number of votes or delay the display of the votes.
Vote flagging
Also a general strategy. If you have some reason to assume that the vote is by a spammer, count their vote and mark it as invalid and delete the invalid votes at the end.
Captcha
Use a CAPTCHA. If your Captcha is broken, use a better one.
IP checking
Limit the number of votes an IP address can cast in a timespan.
Referrer checking
If you assume that one user maps one IP address, you can limit the number if votes by that IP address. However this assumption usually only holds true for private households.
Email Confirmation
Use Email confirmation and only allow one vote per Email. Check your database manually to see if they are using throwaway-emails.
Note that you can add
+foo
to your username in an email address.[email protected]
and[email protected]
will both deliver the mail to the same account, so remember that when checking if somebody has already voted.HTML Form Randomization
Randomize the order of choices. This might take a while for them to find out.
HTTPS
One method of vote faking is to capture the http request from a valid browser like Firefox and mimic it with a script, this doesn't work as easy when you use encryption.
Proxy checking
If the spammer votes via proxy, you can check for the X-Forwarded-For header.
Cache checking
Try to see if the client loads all the uncached resources.
Many spambots don't do this. I never tried this, I just know that this isn't checked usually by voting sites.
An example would be embedding
<img src="a.gif" />
in your html, with a.gif being some 1x1 pixel image. Then you have to set the http header for the requestGET /a.gif
withCache-Control "no-cache, must-revalidate"
. You can set the http headers in Apache with your.htaccess
file like this. (thanks Jacco)[Edit 2010-09-22]
Evercookie
您是否尝试过进行浏览器指纹识别?
检查这个来自 EFF 的开源:
https://panopticlick.eff.org/
可用于识别世界上 500-1500 个类似的人(!)。
Have you tried to do browser fingerprinting?
Check this open source from EFF:
https://panopticlick.eff.org/
Could be used to identify one person similar to 500-1500 in the world (!).
您可以在投票表中添加验证码。还需要电子邮件确认将会很有用
You may add captcha to voting form. Also requiring e-mail confirmation will be useful
如果您真的很担心,那么您必须执行电子邮件验证之类的操作,这可能足以阻止大多数作弊者。
它还取决于 NAT 背后的多个人是否可能想要投票给同一个选项(例如最喜欢的学校)。
您创建的任何方案都可以被利用。
编辑:正如其他人所建议的,您可以使用 CAPTCHA 例如 reCAPTCHA 阻止自动机器人,并降低人类重复投票的可能性。代价是降低人类投票的可能性。
If you're really worried about it then you have to do something like email verification, which might be sufficient to block most cheaters.
Also it depends whether multiple people behind a NAT are likely to want to vote for the same option (e.g. favourite school).
Any scheme you create can be gamed.
EDIT: As everyone else has suggested, you can use a CAPTCHA such as reCAPTCHA to block automated bots, and make humans less likely to repeat vote. At the cost of making humans less likely to vote at all.
投票推广模式(您可能已经意识到了) )有一节介绍如何减轻游戏影响 - 但这是一个很难完全避免的部分。鉴于您迄今为止的行动,我会考虑使用加权,例如考虑一段时间内的合理投票水平,例如每小时每个投票 10 票(只是一个示例,而不是指南),对于剩余投票,将接下来的 10 票权重为 90% (即只数 9 个),接下来的 10 个占 80%,依此类推。这是雅虎对于这种模式下的游戏的建议:
The Vote to Promote pattern (you may be aware of it) has a section on how to mitigate against gaming - but it is a tricky one to avoid altogether. Given your actions to date I would consider using weighting, for example consider a reasonable level of voting over a time period, say 10 votes per ting per hour (just an example not a guide) and for surplus votes weight the next 10 at 90% (ie only count 9), the next 10 at 80% and so on. This is Yahoo's advice on gaming within this pattern:
查看 Asirra:http://research.microsoft.com/en -us/um/redmond/projects/asirra/
它仍处于测试阶段,但非常酷。
Check out Asirra: http://research.microsoft.com/en-us/um/redmond/projects/asirra/
It's still in beta, but it's pretty cool.
要防止机器人投票,您可以使用 CAPTCHA。
To prevent the bots from voting you can use CAPTCHA.
我唯一想到的是使用 验证码。要么是带有图片和噪音的复杂服务,例如 ReCaptcha 服务,要么是非常简单且简单的服务不引人注意的问题,例如“七加三等于多少?”或者(如果您位于美国)“我们总统的姓氏是什么”,每个人都可以回答的简单常识问题。如果您经常更改它们,这甚至可能比经典的基于图像的验证码更有效。
The only thing that comes to mind is using a Captcha. Either an elaborate one with pictures and noise like the ReCaptcha service, or a very simple and unobtrusive one like "What is seven plus three?" or (If you're located in the US), "What is the last name of our President", simple common sense questions everybody can answer. If you change them often enough, this could even be more effective than a classic image-based CAPTCHA.
验证码不是灵丹妙药,用户可以让脚本向他们显示验证码,并手动解决它们,每分钟至少进行几次投票。
您需要将它们与此处提到的其他技术结合使用。
CAPTCHA's aren't a silver bullet, the user could have their script display the CAPTCHA to them and solve them manually for at least several votes per minute.
You need to use them in combination with other techniques mentioned here.
您可以像在 Django 中一样添加 蜜罐字段。最有可能的是,这并不能保护您免受那些故意想要改变您的竞争对手的作弊者的侵害,但至少您将有更少的“路过”垃圾邮件发送者需要额外处理。
You could add a honeypot field like in Django. Most likely, this will not protect you from cheaters who deliberately want to change your competition, but at least you will have lesser 'drive-by' spammers to additionally take care of.
抱歉,我发了两个帖子,但我不允许在同一篇帖子中发布两个网址...
如果您正在考虑构建自己的跟踪,也许此链接可能会提供一些灵感:https://panopticlick.eff.org/
事实证明,即使没有任何形式的跟踪 cookie,许多浏览器也可以被唯一地识别。我猜投票机器人可能会给出非常具体的指纹?
Sorry for the double post, but I wasn't allowed to post two URLs in the same post...
If you're looking at building your own tracking, maybe this link might provide some inspiration: https://panopticlick.eff.org/
Turns out that a lot of browsers can be uniquely identified, even without any form of tracking cookies. I'm guessing a vote-bot might give a very specific fingerprint?
因此,如果每个人都想举办一场可以赢得某些东西的比赛,并想使用社区驱动的评级系统……我在这里分享一些经验:
坏处:
1) 首先它不能保证100%安全
2)要接触大量用户并过滤掉所有无意义的评分是非常困难的
3)在这种情况下忘记星级评定...他们总是 5 星或 1 星
好的
1) 不要让他们了解自己的立场...我们用前 100 名的随机呈现取代了“按地点排序”视图(只有前 30 名才能赢得价格)...这确实很有帮助,因为很多用户一旦看不到自己所处的位置就失去了兴趣。
2) 不允许投票,例如:1x5_Stars 40x1_Star...只允许以公平方式投票的用户...
3) 他们中的大多数行为有点愚蠢...您会在日志中看到它们并可以跟踪记录谁投票公平,谁投票不公平...搜索模式...
**祝你好运;-) **
So if everyone ever wants to make a competition where people can win something and wanna use a community driven rating system... here i share some experiences:
The bad:
1) First it cant be made secure for 100%
2) to reach a mass of users which filters out all the nonsense ratings is very hard
3) Forget about star ratings in that case... their is always either 5 Stars or 1 Star
The good
1) Dont give them orientation about where they stand... We replaced the "Order by place" view with a random presentation of the TOP 100 (only the top 30 wll win a price)... This really helped because a lot of users lost their interest as soon as they didnt see where they stood.
2) Don't allow votings like: 1x5_Stars 40x1_Star... Just allow users which vote in a fair way...
3) Most of them act a little bit stupid... You'll see them in your logs and can trace down who votes fair and who unfair... Search for patterns...
**GOOD LUCK ;-) **
CAPTCHA 总是好的,但对于某些用户来说可能会“令人不安”。
reCAPTCHA 是一项相当常用的服务
CAPTCHA is always good, might be "disturbing" for some users though.
reCAPTCHA is a fairly used service
只允许在提交投票之前使用 openid 和 reCaptcha 登录的用户,并使用相同 ip 地址监控提交者列表。
How about only allow users who logged in with openid and with reCaptcha before submitting the vote, and monitering the submitter list with same ip address.
我们结合使用验证码和电子邮件。用户通过邮件收到带有 GUID 的链接。
对于每个尝试投票的用户来说,该值必须是唯一的。
www.votesite.com/vote.aspx?guid=.....
通过使用此链接,投票可以确认或未确认。在数据库中,我们检查电子邮件地址和 GUID 的组合是否唯一。
We use a combination of CAPTCHA and email. The user receive a link with a GUID by mail.
This one must be unique for each user that try to vote.
www.votesite.com/vote.aspx?guid=.....
By using this link the vote is confirmed or not. In database we check the combination of email address and GUID to be unique.
我使用了CAPTCHA、IP验证和LSO(Flash本地共享对象,普通人很难找到和删除)的组合。
I use a combination of CAPTCHA, IP verification and LSO (Flash Local Shared Objects, hard to find and delete for common people).
1.使用recaptcha
2。是的,随机您的投票选项,但不是这样:
->从 vote_id_1 到 asdsasd_1、grdsgsdg_2、
而是使用会话变量在投票表单中设置从 vote_id_1 到 asgjdas87th2ad 的掩码。
1.Use recaptcha
2. Yes randomize your voting options but not like this:
-> from vote_id_1 to asdsasd_1, grdsgsdg_2,
Instead use session variables to set a mask from vote_id_1 to asgjdas87th2ad in the vote form.
一些事后随机分析怎么样,例如时间序列分析 - 寻找特定
(ip、浏览器、投票)
事件的周期性?然后,您可以为每个此类事件组分配属于 1 个人的概率,并丢弃超出某个概率级别的所有此类事件组,或者使用某种加权根据概率降低权重。看看R,它包含很多有用的分析包。
What about some post hoc stochastic analysis, like time series analysis - looking for periodicity in events of particular
(ip, browser, vote)
? You could then assign probability to each such group of events that it belongs to 1 person and either discard all such groups of events beyond some probability level, or use some kind of weighting to lower the weight according to the probability.Look in R, it contains A LOT of useful analysis packages.
检查他们正在使用的电子邮件的域详细信息。我遇到了同样的问题,发现它们都注册到同一个注册者。我写在这里: http://tincan.co.uk/659/news /competition-spammers.html
现在,我过滤注册中使用的电子邮件的 DNS 信息。
Check the domain details of the email they are using. I had the same problem and found that all of them were registered to the same registrant. I wrote it up here: http://tincan.co.uk/659/news/competition-spammers.html
Now, I filter on the DNS information for the email used in the registration.