我该如何处理网站上用户的不良行为?
我正在与一个小组一起开发一个项目,我们正在制作一个涉及大量用户交互的实验网站。简而言之,该网站的性质涉及大量用户发帖和评论。根据我们网站的主题,我们预计会收到有争议的帖子和最有可能令人反感的材料。
我的问题是我们可以使用什么算法、方法等来监控和处理这些“不良用户”与我们网站的交互。
目前,我们实际上只是根据人物、大学和企业名称数据库检查帖子。这将使帖子在某种程度上匿名,并且会消除帖子的冒犯感。我们还应该/可以在我们的设计中实现什么来实现这一目标?
解决方案:
每个人都提出了很好的建议,我将进一步研究一下。关于制作列表,我一直在尝试编写一个小脚本,该脚本收集网站集合,其中包含具有大量数据(3000-4000 个名称)的名称目录,并且我正在解析 HTML,并将每个值存储在数据库中以针对用户帖子运行。这有点“权宜之计”,但暂时可以作为一个很好的测试器。
I am working on a project with a group, and we are making an experimental site that involves heavy user interaction. In a nutshell, the nature of the site involves heavy user posting and commenting. Based on the theme of our site, we are expecting to get controversial posts and most likely offensive material.
My question is what algorithms, methods, etc. we can use to monitor and handle these "bad user" interactions with our website.
Right now, we have really only come up with checking the posts against a database of people, college and business names. This would make the posts anonymous somewhat and would take a sense of offense out of the post. What else should/can we implement into our design that will accomplish this?
Solution:
Everybody had really good suggestions that I'm going to research a little more. In reference to the making a list, I have been experimenting with a small script I wrote that is taking a collection of websites which contain directories of names with a substantial amount of data(3000-4000 names), and I am parsing the HTML, and storing each value in a database to be ran against the user posts. This is a little "makeshift" but it will serve as a good tester for the time being.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有关该问题的一些良好背景信息以及一些一般性建议,请查看 Clay Shirky 的演讲记录:一个群体就是它自己最大的敌人
要直接从 StackOverflow 播客中窃取内容,速率限制是最有效的方法之一。对评论之间的时间间隔设置合理的限制,如果超过限制,则将该用户置于暂时的“冷静”期,他们在几分钟内无法互动。如果他们不断违反这个限制,你可能会遇到一个病态的施虐者,并且可能会让他们冷静更长的时间,善意地要求他们不要等等。
速率限制会减少火焰,因为引发火焰战争的主要因素之一是人们变得愤怒和开始发表人身攻击而不是理性争论。速率限制会在一定程度上减少这种行为。
允许人们标记攻击性材料也很有价值(并且只允许每个用户标记一个项目一次),但我只会向标记率相当高的版主显示标记的项目。您需要过滤掉“背景噪音”,因为您发布的几乎所有内容都会冒犯某人。
For some good background to the problem, with some general suggestions, check out this transcript of a speech by Clay Shirky: A group is its own worst enemy
To steal directly from the StackOverflow podcast, rate limiting is one of the most effective methods. Put reasonable limits on how much time may elapse between comments, and if the limit is exceeded, put that user into a temporary "cool-off" period where they can't interact for a few minutes. If they keep bouncing against this limit, you may have a pathological abuser, and might cool them off for longer, ask them nicely to refrain, etc.
Rate limiting will reduce flaming because one of the primary contributors to flame wars is people get angry and start posting personal attacks rather than rational arguments. Rate limiting will reduce this behavior somewhat.
Allowing people to flag offensive material is also valuable (and only allow each user to flag an item once), but I would only show flagged items to moderators where there is a fairly high rate of flagging. You need to filter out the "background noise" because almost anything you post is going to offend someone.
取决于有多少用户,对愚蠢行为的容忍度有多大(冒犯性的帖子在那里停留一段时间是否可以)等。
一种可能性是要求用户创建用户帐户(适当的验证码以防止自动创建帐户) ),然后他们才能发帖。然后根据需要删除攻击性帖子(以及相应的帐户)。
有多种方法可以识别攻击性帖子。一种标准的 2.0 技术是让用户将彼此的帖子标记为攻击性的。这可以使管理员更容易捕获。
Depends on how many users, how much tolerance for silliness (is it OK for an offensive post to be on there for a little while), etc.
One possibility would be to require users to create user accounts (suitably CAPTCHAed to prevent automated account creation) before they can post. Then delete offensive posts (and the corresponding accounts) as necessary.
There are different ways to identify offensive posts. One standard 2.0 technique is to let users flag each others' posts as offensive. This can make it easier for admins to capture.
为了阻止愤怒的人,我非常喜欢“标记此帖子”链接。您的社区将为您完成大部分审核工作。
为了阻止那些理智的人写出煽动性的东西,你可以尝试变得聪明一些。列出一长串非常强的单词(显然,咒骂词是最强的),并对每个单词进行适当的评分。如果帖子的文字强度得分(根据帖子字数进行调整)超过阈值,则会显示大红色警告,并建议发布者考虑重新措辞。如果他们无论如何都点击提交,请继续将其放入审核队列中,而不是立即发布。
为了阻止垃圾邮件发送者,我非常喜欢在 javascript + cookie 重放技术中执行的加密随机数 + 哈希函数。不需要丑陋的验证码的视觉空间,并且在实践中具有同等的性能。我还没有看到垃圾邮件发送者能够克服以自动方式击败它所需的障碍。我见过困惑的垃圾邮件发送者在他们的自动化系统被 100.0% 准确率拒绝后手动输入垃圾邮件。
并完全阅读其他答案中的 Clay Shirky 链接。了解社区动态是关键。
附录:实施非交互式验证码。
向服务器发出 AJAX 查询以获取随机数。服务器发回包含随机数的 JSON 响应,并设置包含随机数值的 cookie。在 JavaScript 中计算随机数的 SHA1 哈希值,将值复制到隐藏字段中。当用户发布表单时,他们现在将带有随机数值的 cookie 发送回。根据 cookie 计算随机数的 SHA1 哈希值,与隐藏字段中的值进行比较,并验证您是否在过去 15 分钟内生成了该随机数(memcached 对此很有用)。如果所有这些检查都通过,请发表评论。
这项技术要求垃圾邮件发送者坐下来弄清楚发生了什么,一旦他们这样做了,他们仍然必须发出多个请求并保持状态才能让评论通过。这比大多数垃圾邮件发送者愿意做的工作要多得多,特别是因为这项工作仅适用于单个站点。最大的缺点是任何关闭 JavaScript 或禁用 Cookie 的人都会被标记为潜在的垃圾邮件。这意味着审核队列仍然是一个好主意。
从理论上讲,这可以通过默默无闻来保证安全,但在实践中,它非常好。
To stop angry people, I'm a huge fan of the "Flag this post" link. Your community will do most of the moderation for you.
To stop reasonable people who wrote something inflammatory, you can try being clever. Make a long list of really strong words (curse words being the strongest, obviously) and score each appropriately. If a post's word strength score (adjusted for post word count) crosses a threshold, display a big red warning, and suggest that the poster consider rewording. And if they hit submit anyways, go ahead and put that into the moderation queue instead of posting immediately.
To stop spammers, I'm a huge fan of the cryptographic nonce + hashing function performed in javascript + cookie replay technique. No visual space for an ugly captcha required, and equivalent performance in practice. I've yet to see a spammer go through the hurdles required to defeat it in an automated way. I have seen confused spammers enter spam manually by hand after their automated systems get rejected with 100.0% accuracy though.
And totally read that Clay Shirky link from the other answer. Understanding community dynamics is key.
Addenda: Implementing a non-interactive CAPTCHA.
Make an AJAX query for a nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in javascript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain state to get a comment through. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with javascript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.