节省时间的部分倒排索引构建

发布于 2024-12-09 13:55:46 字数 837 浏览 2 评论 0原文

我需要构建部分倒排索引。比如:

l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)

我认为它的作用非常清楚。在输入列表中,{x, y ...} 是唯一的,而 {a, b, c, ..} 不是唯一的。输出应按#[[1]]排序。

现在,我正在这样做:

iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@ 
                     (Union@Flatten@Last@Transpose@list)

但是对于这样一个简单的任务来说,它看起来太复杂了,似乎太慢了,我应该能够应付军团。

试驾来比较您的结果:

words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)

那么,有什么加速的想法吗?

I need to build a partial Inverted Index. Something like:

l = {{x, {h, a, b, c}}, {y, {c, d, e}}}
iI[l]
(*
-> {{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}
*)

I think it is pretty clear what it does. In the input list, the {x, y ...} are unique, while the {a, b, c, ..} are not. The output ought to be ordered by #[[1]].

Right now, I am doing this:

iI[list_List] := {#, list[[Position[list, #][[All, 1]]]][[All, 1]]} & /@ 
                     (Union@Flatten@Last@Transpose@list)

But it looks too convoluted for such an easy task, seems too slow, and I should be able to cope with Legion.

A test drive to compare your results:

words = DictionaryLookup[];
abWords = DictionaryLookup["ab" ~~ ___];
l = {#, RandomChoice[abWords, RandomInteger[{1, 30}]]} & /@ words[[1 ;; 3000]];
First@Timing@iI[l]
(*
-> 5.312
*)

So, any ideas for an speedup?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌路终见情 2024-12-16 13:55:46

似乎是 Reap-Sow 的经典任务(由于@Heike 对最终版本进行了改进):

iI[list_] := Sort[Reap[Sow @@@ list, _, List][[2]]] 

iI[l]

{{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}

然后

In[22]:= 
words=DictionaryLookup[];
abWords=DictionaryLookup["ab"~~___];
l={#,RandomChoice[abWords,RandomInteger[{1,30}]]}&/@words[[1;;3000]];
First@Timing@iI[l]
Out[25]= 0.047

编辑

这是一个替代版本具有类似(稍差)的性能:

iIAlt[list_] :=
   Sort@Transpose[{#[[All, 1, 2]], #[[All, All, 1]]}] &@
           GatherBy[Flatten[Thread /@ list, 1], Last];

有趣的是,这里的 Reap - Sow 提供了比基于结构操作的解决方案稍快的解决方案。

编辑 2

只是为了说明 - 对于那些喜欢基于规则的解决方案的人,这里是一个基于 DispatchReplaceList 组合的解决方案

iIAlt1[list_] :=
   With[{disp = Dispatch@Flatten[Thread[Rule[#2, #]] & @@@ list]},
       Map[{#, ReplaceList[#, disp]} &, Union @@ list[[All, 2]]]]

:不过,比其他两个慢大约2-3倍。

Seems a classic task for Reap-Sow (improvement in the final version due to @Heike):

iI[list_] := Sort[Reap[Sow @@@ list, _, List][[2]]] 

Then,

iI[l]

{{a, {x}}, {b, {x}}, {c, {x, y}}, {d, {y}}, {e, {y}}, {h, {x}}}

and

In[22]:= 
words=DictionaryLookup[];
abWords=DictionaryLookup["ab"~~___];
l={#,RandomChoice[abWords,RandomInteger[{1,30}]]}&/@words[[1;;3000]];
First@Timing@iI[l]
Out[25]= 0.047

EDIT

Here is an alternative version with a similar (slightly worse) performance:

iIAlt[list_] :=
   Sort@Transpose[{#[[All, 1, 2]], #[[All, All, 1]]}] &@
           GatherBy[Flatten[Thread /@ list, 1], Last];

It is interesting that Reap - Sow here gives an even slightly faster solution than the one based on structural operations.

EDIT 2

Just for an illustration - for those who prefer rule-based solutions, here is one based on a combination of Dispatch and ReplaceList:

iIAlt1[list_] :=
   With[{disp = Dispatch@Flatten[Thread[Rule[#2, #]] & @@@ list]},
       Map[{#, ReplaceList[#, disp]} &, Union @@ list[[All, 2]]]]

It is about 2-3 times slower than the other two, though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文