如何在 Mathematica 中基于部分字符串匹配进行选择
假设我有一个看起来像这样的矩阵:
{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8},
{faabian, 88},{foobar, 27}, {fiijii, 52}}
和一个像这样的列表:
{foo, faa}
现在我想根据列表中字符串的部分匹配来添加矩阵中每一行的数字,以便我最终得到这个:
{{foo, 126},{faa, 177}}
我假设我需要映射一个 Select 命令,但我不太确定如何做到这一点并仅匹配部分字符串。有人可以帮助我吗?现在我的真实矩阵大约有 150 万行,所以不太慢的东西会具有附加值。
Say I have a matrix that looks something like this:
{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8},
{faabian, 88},{foobar, 27}, {fiijii, 52}}
and a list like this:
{foo, faa}
Now I would like to add up the numbers for each line in the matrix based on the partial match of the strings in the list so that I end up with this:
{{foo, 126},{faa, 177}}
I assume I need to map a Select command, but I am not quite sure how to do that and match only the partial string. Can anybody help me? Now my real matrix is around 1.5 million lines so something that isn't too slow would be of added value.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个起点:
Here is a starting point:
这是另一种方法。它相当快,而且也简洁。
您可以使用 ruebenko 的预处理来提高速度。
这大约是他在我的系统上的方法的两倍:
请注意,在这个版本中,我测试了不在开头的子字符串,以匹配 ruebenko 的输出。如果您只想在字符串的开头匹配(这就是我在第一个函数中假设的),那么它会更快。
Here is yet another approach. It is reasonably fast, and also concise.
You can use ruebenko's pre-processing for greater speed.
This is about twice as fast as his method on my system:
Notice that in this version I test substrings that are not at the beginning, to match ruebenko's output. If you want to only match at the beginning of strings, which is what I assumed in the first function, it will be faster still.
使数据
现在
选择给定
如果可以的话,我将尝试使其更加实用/通用...
编辑(1)
下面的内容使其更加通用。 (使用与上面相同的数据):
给出
所以现在如果将 lst 更改为 3 项而不是 2 项,上面的相同代码将起作用:
make data
now select
gives
I'll try to make it more functional/general if I can...
edit(1)
This below makes it more general. (using same data as above):
gives
So now same code above will work if lst was changed to 3 items instead of 2:
怎么样:
我确信我读过一篇文章,也许在这里,其中有人描述了一个有效组合排序和拆分的函数,但我不记得了。也许其他人知道的话可以添加评论。
编辑
好吧,必须是就寝时间了——我怎么会忘记
Gatherby
请注意,对于 140 万对的虚拟列表,这需要几秒钟,所以不完全是一个超级快速的方法。
How about:
I am sure I have read a post, maybe here, in which someone described a function that effectively combined sorting and splitting but I cannot remember it. Maybe someone else can add a comment if they know of it.
Edit
ok must be bedtime -- how could I forget
Gatherby
Note that for a dummy list of 1.4 million pairs this took a couple of seconds so not exactly a super fast method.