聚合计数计数器
很多时候,我发现自己使用 Tally[ ]
来计算出现次数,然后,一旦我放弃了原始列表,就必须将另一个列表的结果添加(并加入)到该计数器列表中。
这种情况通常发生在我计算配置、出现次数、进行一些离散统计等时。
因此,我为 Tally 聚合定义了一个非常简单但方便的函数:
aggTally[listUnTallied__List:{},
listUnTallied1_List,
listTallied_List] :=
Join[Tally@Join[listUnTallied, listUnTallied1], listTallied] //.
{a___, {x_, p_}, b___, {x_, q_}, c___} -> {a, {x, p + q}, b, c};
这样
l = {x, y, z}; lt = Tally@l;
n = {x};
m = {x, y, t};
aggTally[n, {}]
{{x, 1}}
aggTally[m, n, {}]
{{x, 2}, {y, 1}, {t, 1}}
aggTally[m, n, lt]
{{x, 3}, {y, 2}, {t, 1}, {z, 1}}
该函数有两个问题:
1)性能
Timing[Fold[aggTally[Range@#2, #1] &, {}, Range[100]];]
{23.656, Null}
(* functional equivalent to *)
Timing[s = {}; j = 1; While[j < 100, s = aggTally[Range@j, s]; j++]]
{23.047, Null}
2)它不验证最后一个参数是一个真实计数列表或空(但对我来说不太重要)
是否有一个简单,优雅,更快,更有效的解决方案? (我知道这些要求太多,但愿望是免费的)
Many times I find myself counting occurrences with Tally[ ]
and then, once I discarded the original list, having to add (and join) to that counters list the results from another list.
This typically happens when I am counting configurations, occurrences, doing some discrete statistics, etc.
So I defined a very simple but handy function for Tally aggregation:
aggTally[listUnTallied__List:{},
listUnTallied1_List,
listTallied_List] :=
Join[Tally@Join[listUnTallied, listUnTallied1], listTallied] //.
{a___, {x_, p_}, b___, {x_, q_}, c___} -> {a, {x, p + q}, b, c};
Such that
l = {x, y, z}; lt = Tally@l;
n = {x};
m = {x, y, t};
aggTally[n, {}]
{{x, 1}}
aggTally[m, n, {}]
{{x, 2}, {y, 1}, {t, 1}}
aggTally[m, n, lt]
{{x, 3}, {y, 2}, {t, 1}, {z, 1}}
This function has two problems:
1) Performance
Timing[Fold[aggTally[Range@#2, #1] &, {}, Range[100]];]
{23.656, Null}
(* functional equivalent to *)
Timing[s = {}; j = 1; While[j < 100, s = aggTally[Range@j, s]; j++]]
{23.047, Null}
2) It does not validate that the last argument is a real Tallied list or null (less important for me, though)
Is there a simple, elegant, faster and more effective solution? (I understand that these are too many requirements, but wishing is free)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
也许,这会满足您的需求?
计时要好得多,并且对最后一个参数进行基于模式的检查。
编辑:
这是一个更快的版本:
它的计时:
Perhaps, this will suit your needs?
The timings are much better, and there is a pattern-based check on the last arg.
EDIT:
Here is a faster version:
The timings for it:
以下解决方案只是对原始函数的一个小修改。它在使用
ReplaceRepeated
之前应用Sort
,因此可以使用不太通用的替换模式,从而使其速度更快:The following solution is just a small modification of your original function. It applies
Sort
before usingReplaceRepeated
and can thus use a less general replacement pattern which makes it much faster:这是我迄今为止想出的最快的方法,(ab)使用
Sow
和Reap
可用的标记:不会赢得任何选美比赛,但这都是关于速度,对吧? =)
Here's the fastest thing I've come up with yet, (ab)using the tagging available with
Sow
andReap
:Not going to win any beauty contests, but it's all about speed, right? =)
如果您纯粹是象征性的,您可以尝试一些类似于
加入计数列表的操作。这是愚蠢的快,但返回的东西不是计数列表,所以它需要一些工作(之后它可能不再那么快了;))。
编辑:所以我有一个工作版本:
使用几个随机符号表我得到
这个版本只添加计数列表,不检查任何内容,仍然返回一些整数,并与 Leonid 的函数进行比较:
它已经是几秒钟了慢一些:-(。
哦,好吧,不错的尝试。
If you stay purely symbolic, you may try something along the lines of
for joining tally lists. This is stupid fast but returns something that isn't a tally list, so it needs some work (after which it may not be so fast anymore ;) ).
EDIT: So I've got a working version:
Using a couple of random symbolic tables I get
This version only adds tally lists, doesn't check anything, still returns some integers, and comparing to Leonid's function:
it's already a couple of seconds slower :-(.
Oh well, nice try.