当前位置：文江博客话题详情

Mathematica 中的条件数据操作

发布于 2024-11-09 10:55:40 字数 331 浏览 6 评论 0原文

我正在尝试在 Mathematica 中准备高效数据分析的最佳工具。我有大约 300 个列和多个列。 100 000 行。

最好的技巧是什么：

“删除”、“提取”或简单地“考虑”数据结构的部分，用于绘制例如

我能想到的最棘手的例子之一是：

给定数据结构，
对于第 2 列中的值等于 x 并且第 8 列中的值不同于 y 的每一行，提取第 1 至 3、6 至 9 列以及最后一列

我也欢迎任何有关数据操作的一般性建议。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半枫 2024-11-16 10:55:40

对于具有命名列的表中数据的通用操作，我建议您参考此我的解决方案，针对类似的问题。对于任何特定情况，手动编写 Select 函数可能会更容易。然而，对于许多列和许多不同的查询，搞乱索引的可能性很高。以下是上述帖子中经过修改的解决方案，它提供了更友好的语法：

Clear[getIds];
getIds[table : {colNames_List, rows__List}] := {rows}[[All, 1]];

ClearAll[select, where];
SetAttributes[where, HoldAll];
select[cnames_List, from[table : {colNames_List, rows__List}], where[condition_]] :=
With[{colRules =  Dispatch[ Thread[colNames -> Thread[Slot[Range[Length[colNames]]]]]],
    indexRules  =  Dispatch[Thread[colNames -> Range[Length[colNames]]]]},
     With[{selF = Apply[Function, Hold[condition] /. colRules]},
       Select[{rows}, selF @@ # &][[All, cnames /. indexRules]]]];

这里发生的情况是 Select 中使用的函数会根据您的规范自动生成。例如（使用@Yoda的例子）：

rows = Array[#1 #2 &, {5, 15}];

我们需要定义列名称（必须是没有值的字符串或符号）：（

In[425]:= 
colnames = "c" <> ToString[#] & /@ Range[15]

Out[425]= {"c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10", "c11", "c12", 
"c13", "c14", "c15"}

当然，在实践中，通常名称更具描述性）。这是表格：

table = Prepend[rows, colnames];

这是您需要的选择语句（我选择了 x = 4 和 y=2）：

select[{"c1", "c2", "c3", "c6", "c7", "c8", "c9", "c15"}, from[table],
    where["c2" == 4 && "c8" != 2]]

{{2, 4, 6, 12, 14, 16, 18, 30}}

现在，对于单个查询，这可能看起来像一个复杂的方法来做到这一点。但您可以执行许多不同的查询，例如“as”

In[468]:= select[{"c1", "c2", "c3"}, from[table], where[EvenQ["c2"] && "c10" > 10]]

Out[468]= {{2, 4, 6}, {3, 6, 9}, {4, 8, 12}, {5, 10, 15}}

和“similar”。

当然，如果您的数据中存在特定的相关性，您可能会找到更快的特定专用算法。上面的函数可以通过多种方式扩展，以简化常见查询（包括“全部”等），或自动编译生成的纯函数（如果可能）。

编辑

从哲学角度来看，我确信许多 Mathematica 用户（包括我自己）发现自己时不时地一次又一次地编写类似的代码。事实上，Mathematica 具有简洁的语法，因此通常很容易针对任何特定情况进行编写。然而，只要一个人在某个特定领域工作（例如，表中的数据操作），许多操作的重复成本就会很高。我的示例在一个非常简单的设置中说明了一种可能的出路 - 创建特定于域的语言（DSL）。为此，通常需要为其定义一种语法/文法，并将其编写为 Mathematica 的编译器（以自动生成 Mathematica 代码）。现在，上面的例子是这个想法的一个非常原始的实现，但我的观点是 Mathematica 通常非常适合 DSL 创建，我认为这是一项非常强大的技术。

For a generic manipulation of data in a table with named columns, I refer you to this solution of mine, for a similar question. For any particular case, it might be easier to write a function for Select manually. However, for many columns, and many different queries, chances to mess up indexes are high. Here is the modified solution from the mentioned post, which provides a more friendly syntax:

Clear[getIds];
getIds[table : {colNames_List, rows__List}] := {rows}[[All, 1]];

ClearAll[select, where];
SetAttributes[where, HoldAll];
select[cnames_List, from[table : {colNames_List, rows__List}], where[condition_]] :=
With[{colRules =  Dispatch[ Thread[colNames -> Thread[Slot[Range[Length[colNames]]]]]],
    indexRules  =  Dispatch[Thread[colNames -> Range[Length[colNames]]]]},
     With[{selF = Apply[Function, Hold[condition] /. colRules]},
       Select[{rows}, selF @@ # &][[All, cnames /. indexRules]]]];

What happens here is that the function used in Select gets generated automatically from your specifications. For example (using @Yoda's example):

rows = Array[#1 #2 &, {5, 15}];

We need to define the column names (must be strings or symbols without values):

In[425]:= 
colnames = "c" <> ToString[#] & /@ Range[15]

Out[425]= {"c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10", "c11", "c12", 
"c13", "c14", "c15"}

(in practice, usually names are more descriptive, of course). Here is the table then:

table = Prepend[rows, colnames];

Here is the select statement you need (I picked x = 4 and y=2):

select[{"c1", "c2", "c3", "c6", "c7", "c8", "c9", "c15"}, from[table],
    where["c2" == 4 && "c8" != 2]]

{{2, 4, 6, 12, 14, 16, 18, 30}}

Now, for a single query, this may look like a complicated way to do this. But you can do many different queries, such as

In[468]:= select[{"c1", "c2", "c3"}, from[table], where[EvenQ["c2"] && "c10" > 10]]

Out[468]= {{2, 4, 6}, {3, 6, 9}, {4, 8, 12}, {5, 10, 15}}

and similar.

Of course, if there are specific correlations in your data, you might find a particular special-purpose algorithm which will be faster. The function above can be extended in many ways, to simplify common queries (include "all", etc), or to auto-compile the generated pure function (if possible).

EDIT

On a philosophical note, I am sure that many Mathematica users (myself included) found themselves from time to time writing similar code again and again. The fact that Mathematica has a concise syntax makes it often very easy to write for any particular case. However, as long as one works in some specific domain (like, for example, data manipulations in a table), the cost of repeating yourself will be high for many operations. What my example illustrates in a very simple setting is a one possible way out - create a Domain-Specific Language (DSL). For that, one generally needs to define a syntax/grammar for it, and write a compiler from it to Mathematica (to generate Mathematica code automatically). Now, the example above is a very primitive realization of this idea, but my point is that Mathematica is generally very well suited for DSL creation, which I think is a very powerful technique.

回复收藏 0 原文

时光礼记 2024-11-16 10:55:40

data = RandomInteger[{1, 20}, {40, 20}]

x = 5;
y = 8;
Select[data, (#[[2]] == x && #[[8]] != y &)][[All, {1, 2, 3, 6, 7, 8, 9, -1}]]

==> {{5, 5, 1, 4, 18, 6, 3, 5}, {10, 5, 15, 3, 15, 14, 2, 5}, {18, 5, 6, 7, 7, 19, 14, 6}}

获取矩阵片段和列表的一些有用命令是 Span (;;)、Drop、Take、Select 、案例等等。请参阅 tutorial/GettingAndSettingPiecesOfMatrices 和 guide/PartsOfMatrices,

Part ([[...]]) 与 ;; 结合使用可以非常强大。例如，a[[All, 1;;-1;;2]] 表示获取所有行和所有奇数列（-1 具有从末尾开始计数的通常含义）。

Select 可用于根据逻辑函数从列表中选取元素（记住矩阵是列表的列表）。它的孪生兄弟是 Cases，它根据模式进行选择。我在这里使用的函数是'pure'函数，其中#指的是应用此函数的参数（在本例中为列表的元素）。由于元素本身是列表（矩阵的行），我可以使用 Part ([[..]]) 函数引用列。

data = RandomInteger[{1, 20}, {40, 20}]

x = 5;
y = 8;
Select[data, (#[[2]] == x && #[[8]] != y &)][[All, {1, 2, 3, 6, 7, 8, 9, -1}]]

==> {{5, 5, 1, 4, 18, 6, 3, 5}, {10, 5, 15, 3, 15, 14, 2, 5}, {18, 5, 6, 7, 7, 19, 14, 6}}

Some useful commands to get pieces of matrices and list are Span (;;), Drop, Take, Select, Cases and more. See tutorial/GettingAndSettingPiecesOfMatrices and guide/PartsOfMatrices,

Part ([[...]]) in combination with ;; can be quite powerful. a[[All, 1;;-1;;2]], for instance, means take all rows and all odd columns (-1 having the usual meaning of counting from the end).

Select can be used to pick elements from a list (and remember a matrix is a list of lists), based on a logical function. It's twin brother is Cases which does selection based on a pattern. The function I used here is a 'pure' function, where # refers to the argument on which this function is applied (the elements of the list in this case). Since the elements are lists themselves (the rows of the matrix) I can refer to the columns by using the Part ([[..]]) function.

回复收藏 0 原文

ぇ气 2024-11-16 10:55:40

要拉出列（或行），您可以通过部分索引来完成

data = Array[#1 #2 &, {5, 15}];
data[[All, Flatten@{Range@3, Range @@ {6, 9}, -1}]]

MatrixForm@%

最后一行只是为了美观地查看它。

正如 Sjoerd 在他的评论中提到的（以及在他的答案的解释中），可以使用 Span (;;) 命令。如果要连接多个不相交的范围，请使用 Flatten 组合使用 Range比手动输入更容易。

To pull out columns (or rows) you can do it by part indexing

data = Array[#1 #2 &, {5, 15}];
data[[All, Flatten@{Range@3, Range @@ {6, 9}, -1}]]

MatrixForm@%

The last line is just to view it pretty.

As Sjoerd mentioned in his comment (and in the explanation in his answer), indexing a single range can be easily done with the Span (;;) command. If you are joining multiple disjoint ranges, using Flatten to combine the separate ranges created with Range is easier than entering them by hand.

回复收藏 0 原文

谜泪 2024-11-16 10:55:40

我读到：

对于第 2 列中的值等于 x 并且第 8 列中的值不同于 y 的每一行，提取第 1 至 3、6 至 9 列以及最后一列

这意味着我们想要：

元素 1-3 和 6每行的 -9

和

行中的最后一个元素，其中 [[2]] == x && [[8]]！=y。

这是我一起破解的：

a = RandomInteger[5, {20, 10}];          (*define the array*)
x = 4; y = 0;                            (*define the test values*)

Join @@ Range @@@ {1 ;; 3, 6 ;; 9};      (*define the column ranges*)

#2 == x && #8 != y & @@@ a;              (*test the rows*)

Append[%%, #] & /@ % /. {True -> -1, False :> Sequence[]};  (*complete the ranges according to the test*)

MapThread[Part, {a, %}] // TableForm     (*extract and display*)

I read:

Extract Column 1 to 3, 6 to 9 as well as the last One for every lines where the value in Column 2 is equal to x and the value in column 8 is different than y

as meaning that we want:

elements 1-3 and 6-9 from each row

AND

the last element from rows wherein [[2]] == x && [[8]] != y.

This is what I hacked together:

a = RandomInteger[5, {20, 10}];          (*define the array*)
x = 4; y = 0;                            (*define the test values*)

Join @@ Range @@@ {1 ;; 3, 6 ;; 9};      (*define the column ranges*)

#2 == x && #8 != y & @@@ a;              (*test the rows*)

Append[%%, #] & /@ % /. {True -> -1, False :> Sequence[]};  (*complete the ranges according to the test*)

MapThread[Part, {a, %}] // TableForm     (*extract and display*)

回复收藏 0 原文

~没有更多了~