节省时间的部分倒排索引构建
我需要构建部分倒排索引
。比如:
l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)
我认为它的作用非常清楚。在输入列表中,{x, y ...} 是唯一的,而 {a, b, c, ..} 不是唯一的。输出应按#[[1]]
排序。
现在,我正在这样做:
iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@
(Union@Flatten@Last@Transpose@list)
但是对于这样一个简单的任务来说,它看起来太复杂了,似乎太慢了,我应该能够应付军团。
试驾来比较您的结果:
words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)
那么,有什么加速的想法吗?
I need to build a partial Inverted Index
. Something like:
l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)
I think it is pretty clear what it does. In the input list, the {x, y ...} are unique, while the {a, b, c, ..} are not. The output ought to be ordered by #[[1]]
.
Right now, I am doing this:
iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@
(Union@Flatten@Last@Transpose@list)
But it looks too convoluted for such an easy task, seems too slow, and I should be able to cope with Legion.
A test drive to compare your results:
words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)
So, any ideas for an speedup?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
似乎是
Reap
-Sow
的经典任务(由于@Heike 对最终版本进行了改进):,
然后
编辑
这是一个替代版本具有类似(稍差)的性能:
有趣的是,这里的
Reap
-Sow
提供了比基于结构操作的解决方案稍快的解决方案。编辑 2
只是为了说明 - 对于那些喜欢基于规则的解决方案的人,这里是一个基于
Dispatch
和ReplaceList
组合的解决方案:不过,比其他两个慢大约2-3倍。
Seems a classic task for
Reap
-Sow
(improvement in the final version due to @Heike):Then,
and
EDIT
Here is an alternative version with a similar (slightly worse) performance:
It is interesting that
Reap
-Sow
here gives an even slightly faster solution than the one based on structural operations.EDIT 2
Just for an illustration - for those who prefer rule-based solutions, here is one based on a combination of
Dispatch
andReplaceList
:It is about 2-3 times slower than the other two, though.