如何在 Mathematica 中基于部分字符串匹配进行选择

发布于 2024-12-26 18:12:49 字数 395 浏览 1 评论 0原文

假设我有一个看起来像这样的矩阵:

{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8},
{faabian, 88},{foobar, 27}, {fiijii, 52}}

和一个像这样的列表:

{foo, faa}

现在我想根据列表中字符串的部分匹配来添加矩阵中每一行的数字,以便我最终得到这个:

{{foo, 126},{faa, 177}}

我假设我需要映射一个 Select 命令,但我不太确定如何做到这一点并仅匹配部分字符串。有人可以帮助我吗?现在我的真实矩阵大约有 150 万行,所以不太慢的东西会具有附加值。

Say I have a matrix that looks something like this:

{{foobar, 77},{faabar, 81},{foobur, 22},{faabaa, 8},
{faabian, 88},{foobar, 27}, {fiijii, 52}}

and a list like this:

{foo, faa}

Now I would like to add up the numbers for each line in the matrix based on the partial match of the strings in the list so that I end up with this:

{{foo, 126},{faa, 177}}

I assume I need to map a Select command, but I am not quite sure how to do that and match only the partial string. Can anybody help me? Now my real matrix is around 1.5 million lines so something that isn't too slow would be of added value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夏花。依旧 2025-01-02 18:12:49

这是一个起点:

data={{"foobar",77},{"faabar",81},{"foobur",22},{"faabaa",8},{"faabian",88},{"foobar",27},{"fiijii",52}};

{str,vals}=Transpose[data];
vals=Developer`ToPackedArray[vals];
findValPos[str_List,strPat_String]:=
    Flatten[Developer`ToPackedArray[
         Position[StringPosition[str,strPat],Except[{}],{1},Heads->False]]]

Total[vals[[findValPos[str,"faa"]]]]

Here is a starting point:

data={{"foobar",77},{"faabar",81},{"foobur",22},{"faabaa",8},{"faabian",88},{"foobar",27},{"fiijii",52}};

{str,vals}=Transpose[data];
vals=Developer`ToPackedArray[vals];
findValPos[str_List,strPat_String]:=
    Flatten[Developer`ToPackedArray[
         Position[StringPosition[str,strPat],Except[{}],{1},Heads->False]]]

Total[vals[[findValPos[str,"faa"]]]]
岛徒 2025-01-02 18:12:49

这是另一种方法。它相当快,而且也简洁。

data =
 {{"foobar", 77},
  {"faabar", 81},
  {"foobur", 22},
  {"faabaa", 8},
  {"faabian", 88},
  {"foobar", 27},
  {"fiijii", 52}};

match = {"foo", "faa"};

f = {#2, Tr @ Pick[#[[All, 2]], StringMatchQ[#[[All, 1]], #2 <> "*"]]} &;

f[data, #]& /@ match
{{"foo", 126}, {"faa", 177}}

您可以使用 ruebenko 的预处理来提高速度。
这大约是他在我的系统上的方法的两倍:

{str, vals} = Transpose[data];
vals = Developer`ToPackedArray[vals];

f2 = {#, Tr @ Pick[vals, StringMatchQ[str, "*" <> # <> "*"]]} &;

f2 /@ match

请注意,在这个版本中,我测试了不在开头的子字符串,以匹配 ruebenko 的输出。如果您只想在字符串的开头匹配(这就是我在第一个函数中假设的),那么它会更快。

Here is yet another approach. It is reasonably fast, and also concise.

data =
 {{"foobar", 77},
  {"faabar", 81},
  {"foobur", 22},
  {"faabaa", 8},
  {"faabian", 88},
  {"foobar", 27},
  {"fiijii", 52}};

match = {"foo", "faa"};

f = {#2, Tr @ Pick[#[[All, 2]], StringMatchQ[#[[All, 1]], #2 <> "*"]]} &;

f[data, #]& /@ match
{{"foo", 126}, {"faa", 177}}

You can use ruebenko's pre-processing for greater speed.
This is about twice as fast as his method on my system:

{str, vals} = Transpose[data];
vals = Developer`ToPackedArray[vals];

f2 = {#, Tr @ Pick[vals, StringMatchQ[str, "*" <> # <> "*"]]} &;

f2 /@ match

Notice that in this version I test substrings that are not at the beginning, to match ruebenko's output. If you want to only match at the beginning of strings, which is what I assumed in the first function, it will be faster still.

小梨窩很甜 2025-01-02 18:12:49

使数据

mat = {{"foobar", 77},
   {"faabar", 81},
   {"foobur", 22},
   {"faabaa", 8},
   {"faabian", 88},
   {"foobar", 27},
   {"fiijii", 52}};
lst = {"foo", "faa"};

现在

r1 = Select[mat, StringMatchQ[lst[[1]], StringTake[#[[1]], 3]] &];
r2 = Select[mat, StringMatchQ[lst[[2]], StringTake[#[[1]], 3]] &];
{{lst[[1]], Total@r1[[All, 2]]}, {lst[[2]], Total@r2[[All, 2]]}}

选择给定

{{"foo", 126}, {"faa", 177}}

如果可以的话,我将尝试使其更加实用/通用...

编辑(1)

下面的内容使其更加通用。 (使用与上面相同的数据):

foo[mat_, lst_] := Select[mat, StringMatchQ[lst, StringTake[#[[1]], 3]] &]
r = Map[foo[mat, #] &, lst];
MapThread[ {#1, Total[#2[[All, 2]]]} &, {lst, r}]

给出

{{"foo", 126}, {"faa", 177}}

所以现在如果将 lst 更改为 3 项而不是 2 项,上面的相同代码将起作用:

lst = {"foo", "faa", "fii"};

make data

mat = {{"foobar", 77},
   {"faabar", 81},
   {"foobur", 22},
   {"faabaa", 8},
   {"faabian", 88},
   {"foobar", 27},
   {"fiijii", 52}};
lst = {"foo", "faa"};

now select

r1 = Select[mat, StringMatchQ[lst[[1]], StringTake[#[[1]], 3]] &];
r2 = Select[mat, StringMatchQ[lst[[2]], StringTake[#[[1]], 3]] &];
{{lst[[1]], Total@r1[[All, 2]]}, {lst[[2]], Total@r2[[All, 2]]}}

gives

{{"foo", 126}, {"faa", 177}}

I'll try to make it more functional/general if I can...

edit(1)

This below makes it more general. (using same data as above):

foo[mat_, lst_] := Select[mat, StringMatchQ[lst, StringTake[#[[1]], 3]] &]
r = Map[foo[mat, #] &, lst];
MapThread[ {#1, Total[#2[[All, 2]]]} &, {lst, r}]

gives

{{"foo", 126}, {"faa", 177}}

So now same code above will work if lst was changed to 3 items instead of 2:

lst = {"foo", "faa", "fii"};
去了角落 2025-01-02 18:12:49

怎么样:

list = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa", 
    8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}};

t = StringTake[#[[1]], 3] &;

{t[#[[1]]], Total[#[[All, 2]]]} & /@ SplitBy[SortBy[list, t], t]

{{"faa", 177}, {"fii", 52}, {"foo", 126}}

我确信我读过一篇文章,也许在这里,其中有人描述了一个有效组合排序和拆分的函数,但我不记得了。也许其他人知道的话可以添加评论。

编辑

好吧,必须是就寝时间了——我怎么会忘记 Gatherby

{t[#[[1]]], Total[#[[All, 2]]]} & /@ GatherBy[list, t]

{{"foo", 126}, {"faa", 177}, {"fii", 52}}

请注意,对于 140 万对的虚拟列表,这需要几秒钟,所以不完全是一个超级快速的方法。

How about:

list = {{"foobar", 77}, {"faabar", 81}, {"foobur", 22}, {"faabaa", 
    8}, {"faabian", 88}, {"foobar", 27}, {"fiijii", 52}};

t = StringTake[#[[1]], 3] &;

{t[#[[1]]], Total[#[[All, 2]]]} & /@ SplitBy[SortBy[list, t], t]

{{"faa", 177}, {"fii", 52}, {"foo", 126}}

I am sure I have read a post, maybe here, in which someone described a function that effectively combined sorting and splitting but I cannot remember it. Maybe someone else can add a comment if they know of it.

Edit

ok must be bedtime -- how could I forget Gatherby

{t[#[[1]]], Total[#[[All, 2]]]} & /@ GatherBy[list, t]

{{"foo", 126}, {"faa", 177}, {"fii", 52}}

Note that for a dummy list of 1.4 million pairs this took a couple of seconds so not exactly a super fast method.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文