自动完成最近/最常联系的算法?

发布于 2024-07-07 14:31:33 字数 519 浏览 9 评论 0原文

我们有一个自动完成列表,当您向某人发送电子邮件时,该列表就会填充,这一切都很好,直到列表变得非常大,您需要输入越来越多的地址才能到达您想要的地址,这就是针对自动完成的目的,

我认为应该添加一些逻辑,以便自动完成结果应该按最近接触或最常接触的某些功能排序,而不仅仅是按字母顺序排序。

我想知道是否有任何已知的良好算法可用于此类搜索,或者是否有人有任何建议。

我在想的是一个积分系统,比如当天是 5 分,最近三天是 4 分,上周是 3 分,上个月是 2 分,过去 6 个月是 1 分。 那么大多数情况下,25+ 是 5 分,15+ 是 4 分,10+ 是 3 分,5+ 是 2 分,2+ 是 1 分。除了这些数字“感觉”正确之外,没有真正的逻辑。

除了任意挑选的数字之外,还有人有任何输入吗? 如果您能给出一个您认为它们比我的更好的理由,也欢迎其他数字

编辑:这主要是在商业环境中,其中最近性(是的,用于造词)通常与频率一样重要。 而且,超过某个点后,与你交谈过 80 次的人与与你交谈过 30 次的人实际上并没有太大区别。

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete

I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.

What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.

I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.

Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine

Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

-黛色若梦 2024-07-14 14:31:33

看一下自组织列表。

快速而肮脏的外观:

移至最前面启发式:
链表,每当选择一个节点时,它就会移动到链表的前面。

频率启发式:
链表,每当选择一个节点时,其频率计数就会递增,然后该节点向链表的前面冒泡,因此最常访问的位于链表的头部。

看来转向前端实施最适合您的需求。

编辑:选择地址后,将其频率加一,并移动到具有相同权重的节点组的前面(或用于课程分组的(权重 div x))。 我认为老化是您提议的实施中的一个真正问题,因为它需要计算每个项目的权重。 自组织列表是一个很好的方法,但算法需要进行一些调整才能完成您想要的操作。

进一步编辑:
老化是指权重随着时间的推移而减少,这意味着您需要知道每次使用地址的情况。 这意味着,在构建列表时,您必须拥有可用的完整电子邮件历史记录。

问题是我们希望仅在实际访问节点时才在节点上执行计算(而不是搜索)——这为我们提供了良好的统计性能。

Take a look at Self organizing lists.

A quick and dirty look:

Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.

Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.

It looks like the move to front implementation would best suit your needs.

EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.

Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.

The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.

苍风燃霜 2024-07-14 14:31:33

这种事情看起来类似于 Firefox 在提示您正在输入的网站是什么时所做的事情。

不幸的是,我不知道 Firefox 到底是如何做到的,积分系统似乎也不错,也许你需要平衡你的积分:)

我会选择类似的东西:

NoM = 邮件数量

(NoM 今天发送到 X) ) + 1/2 *(上周发送给 X 的 NoM)/7 + 1/3 *(上个月发送给 X 的 NoM)/30

上个月您未写过的联系人(可以更改)将会得到 0 分。 您可以开始对总共发送的 NoM 进行排序(因为它位于联系人列表中:)。 这些将在与点 > 接触后显示。 0

这只是一个想法,无论如何它是对最多和刚刚邮寄的联系人给予不同的重视。

This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.

Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)

I'd go for something similar to:

NoM = Number of Mail

(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30

Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0

It's just an idea, anyway it is to give different importance to the most and just mailed contacts.

娇女薄笑 2024-07-14 14:31:33

如果您想变得疯狂,请通过以下几种方式之一标记最“活跃”的电子邮件:

  • 上次访问
  • 使用频率
  • 与待销售的联系人
  • 直接老板
  • 等等

然后,在列表顶部显示活动电子邮件。 注意您的用户最常使用哪个“组”。 在收集到足够的数据后,专门切换到该排序策略。

这是很多工作,但也很有趣……

If you want to get crazy, mark the most 'active' emails in one of several ways:

  • Last access
  • Frequency of use
  • Contacts with pending sales
  • Direct bosses
  • Etc

Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.

It's a lot of work but kind of fun...

情绪 2024-07-14 14:31:33

也许可以计算发送到每个地址的电子邮件数量。 然后:

ORDER BY EmailCount DESC, LastName, FirstName

这样,您最常用的地址就会排在第一位,即使它们已经几天没有使用过。

Maybe count the number of emails sent to each address. Then:

ORDER BY EmailCount DESC, LastName, FirstName

That way, your most-often-used addresses come first, even if they haven't been used in a few days.

鱼忆七猫命九 2024-07-14 14:31:33

我喜欢基于积分的系统的想法,其中包含最近使用的积分、使用频率和潜在的其他因素(更喜欢本地域中的联系人?)。

我曾经开发过一些这样的系统,“最近使用的”和“最常用的”都效果不佳。 如果您不小心输错了一次,那么“最近”可能会很痛苦。 或者,“最常用”不会随着时间的推移而发生太大变化,例如,如果您去年与某人有很多接触,但现在您的工作发生了变化。

获得想要使用的一组测量值后,您可以创建一个交互式应用程序来测试不同的权重,并查看哪些权重可为某些样本数据提供最佳结果。

I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).

I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.

Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.

执手闯天涯 2024-07-14 14:31:33

本文介绍了单参数缓存逐出系列包括最近最少使用和最不经常使用的策略作为特殊情况的策略。

参数 lambda 的范围从 0 到 1。当 lambda 为 0 时,它的执行方式与 LFU 缓存完全相同,当 lambda 为 1 时,它的执行方式与 LRU 缓存完全相同。 在 0 和 1 之间,它以自然的方式结合了新近度和频率信息。

This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.

The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.

长发绾君心 2024-07-14 14:31:33

尽管已经选择了答案,但我仍想提交我的方法以供考虑和反馈。

我会通过每次使用时增加一个计数器来解释频率,但增加一些大于 1 的值,例如 10(为了增加第二点的精度)。

我会通过将所有定期间隔(例如,24 小时)的计数器乘以某个递减器(例如,0.9)来考虑新近度。

每次使用:

UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = '[email protected]'

每个间隔:

UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)

通过这种方式,我将频率和新近度折叠到一个字段,避免需要保留详细的历史记录来导出{最后一天、上周、上个月}并保持数学(大部分)整数。

当然,增量和减量必须根据偏好进行调整。

In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.

I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).

I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).

Each use:

UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = '[email protected]'

Each interval:

UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)

In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.

The increment and diminisher would have to be adjusted to preference, of course.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文