如何不断过滤用户感兴趣的数据?
以带有“浏览”幻灯片的问题/答案网站为例,该幻灯片一次显示一个问题/答案页面。用户单击“下一步”按钮,就会向他呈现一个新的问题/答案。
我需要决定每次用户单击“下一步”时应返回哪些页面。我不想要的一些事情及其原因:
按降序显示“最新”问题:
假设输入了 100 个问题,那么没有用户会点击到第 100 个问题,也永远不会得到任何回复。这还意味着,如果最近没有提出新问题,则每次用户访问该网站时,他都会看到相同的重复陈旧数据。
显示“最活跃”的问题,根据大量建议的答案/评论来判断:
这不会返回那些活动度较低的问题,而这些问题正是需要更多可见性的问题
显示“活动度低”的问题,根据答案/评论不多来判断:
一旦问题开始活跃,它将停止显示。当我真的想鼓励讨论时,这会阻碍问题的活动。
我觉得这些的组合会很好用,但我不确定如何判断应该返回哪些页面。我要强调的是,我不希望用户必须选择要查看的项目类别(例如 SO 如何具有未答复/活动/最新过滤器)。
是否有任何常见的做法可以做到这一点,或者有什么想法可以做到这一点?
谢谢!
编辑:
这是我到目前为止所倾向于的,非常感谢蒂姆的评论: 到目前为止,我正在考虑按活动计数/视图计数对页面进行排名,每次用户在页面上执行操作(例如投票、评论、回答等)时,活动都会增加。每次每个页面的视图都会增加一个人查看该页面。
然后,我将按活动/观看比率对所有页面进行排名,并更频繁地显示比率较高的页面。这样,具有低活动性和高浏览量的页面将显示最少,而具有高活动性和低浏览量的页面将显示最频繁。我想,低活动/低观看次数和高活动/高观看次数将位于中间的某个位置,但我必须在测试版中密切关注这一点。我还计划存储用户在过去 24 小时内查看过的页面,这样他们就不会在某一天的幻灯片中看到任何重复内容。
防止“过时”数据的一些想法(如果以上所有似乎都不能阻止它):也许运行一个 cron 作业,该作业将定期检查最近未查看的页面并提高它们的比率以将它们放在顶部。
Take an example of a question/answer site with a 'browse' slideshow that will show one question/answer page at a time. The user clicks the 'next' button and a new question/answer is presented to him.
I need to decide which pages should be returned each time the user clicks 'next'. Some things I don't want and reasons why:
Showing 'newest' questions in descending order:
Say 100 questions get entered, then no user is going to click thru to the 100th item and it'll never get any responses. It also means if no new questions were asked recently, every time the user visits the site, he'll see the same repeated stale data.
Showing 'most active' questions, judged by a lot of suggested answers/comments:
This won't return those questions that have low activity, which are exactly the ones that need more visibility
Showing 'low activity' questions, judged by not a lot of answers/comments:
Once a question starts getting activity, it'll stop being shown. This will stymie the activity on a question, when I'd really like to encourage discussion.
I feel that a mix of these would work well, but I'm unsure of how to judge which pages should be returned. I'll stress that I don't want the user to have to choose which category of items to view (like how SO has the unanswered/active/newest filters).
Are there any common practices for doing this, or any ideas for how it might be done?
Thanks!
Edit:
Here's what I'm leaning towards so far, with much thanks to Tim's comment:
So far I'm thinking of ranking pages by Activity Count / View Count, where activity is incremented each time a user performs an action on a page, like a vote, comment, answer, etc. View will get incremented for each page every time a person views the page.
I'll then rank all pages by their activity/view ratio and show pages with a high ratio more often. This way pages with low activity and high views will be shown the least, while ones with high activity and low views will be shown most frequently. Low activity/low views and high activity/high views will be somewhere in the middle I imagine, but I'll have to keep a close eye on this in the beta release. I also plan on storing which pages the user has viewed in the past 24 hours so they won't see any repeats in the slideshow in a given day.
Some ideas for preventing 'stale' data (if all the above doesn't seem to prevent it): Perhaps run a cron job which will periodically check for pages that haven't been viewed recently and boost their ratio to put them at the top.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我看来,您正在触及两个有趣的问题:
如何定义用户对帖子感兴趣:在这里,您可以对可能有助于帖子兴趣的各种因素进行加权组合。活动量、条目的新鲜程度、您是否有办法知道该项目是否符合用户的兴趣等等。您可以根据直觉选择权重,看看结果与您的期望的匹配程度。如果您有时间和意愿,您可以收集有关用户对条目的反应程度的数据,并尝试使用机器学习技术了解每个因素的最佳权重。
如何为新帖子提供机会,也称为探索-利用权衡。
基本上,如果您只是继续访问已知的有趣条目,那么您将最大限度地提高用户的即时幸福感,但您永远不会了解新的有趣的东西,因此,总体而言您的用户不高兴。
这是一个很好研究的问题,根据您想深入研究的程度,您可以阅读有关 k 臂老虎机问题等文献。
但一个简单的解决方案是不选择得分最高的条目,而是根据概率分布选择条目,使得高分条目出现的概率更高。这样,大多数时候您都会展示有趣的内容,但每个帖子都有机会偶尔出现。
As I see it, you are touching upon two interesting questions:
How to define that a post is interesting to a user: Here you could take a weighted combination of various factors that could contribute to interestingness of a post. Amount of activity, how fresh the entry is, if you have a way of knowing that the item matches users interest etc etc. You could pick the weights based on intuition and see how well the result matches your expectation. If you have the time and inclination, you could collect data on how well your users respond to the entries and try to learn the optimum weights for each factor using machine learning techniques.
How to give new posts a chance, otherwise known as exploration-exploitation tradeoff.
BAsically, if you just keep going to known interesting entries then you will maximize instantaneous user happiness, but you will never learn about new interesting stuff hence, overall your users are unhappy.
This is a very well studies problem, and depending upon how much you want to get into it, you can read up literature on things like k-armed bandit problems.
But a simple solution would be to not pick the entry with the highest score, but pick the entry based on a probability distribution such that high score entries have higher probability of showing up. This way most of the times you show interesting stuff, but every post has a chance to show up occasionally.