J中的链式动词

发布于 2024-11-27 12:42:36 字数 616 浏览 2 评论 0原文

假设一个包含各种类型的盒装矩阵：

matrix =: ('abc';'defgh';23),:('foo';'bar';45)
matrix

+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

和一个列描述符：

columnTypes =: 'string';'string';'num'

我想根据类型按列在该矩阵上应用动词。我将使用动词 DoString 和 DoNum：

chain =: (('string';'num') i. columnTypes) { DoString`DoNum

编辑：列描述符很重要，使用哪个动词的决定基于它们，而不是类型本身。实际上，我可以有多种类型的字符串、数字，甚至日期（在 J 中将是数字）。

如何将链应用于矩阵的每一行？动词本身可以处理传递的值是否被装箱，这很好。另外，我宁愿避免转置矩阵 (|:)，因为它可能非常大。

原文

Suppose a boxed matrix containing various types:

matrix =: ('abc';'defgh';23),:('foo';'bar';45)
matrix

+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

And a column descriptor:

columnTypes =: 'string';'string';'num'

I want to apply verbs on this matrix by column according to types. I'll be using verbs DoString and DoNum:

chain =: (('string';'num') i. columnTypes) { DoString`DoNum

EDIT: The column descriptors are important, the decision on which verb to use is based on them, not on the type itself. In reality, I could have several types of strings, numerics, and even dates (which would be numeric in J).

How do I apply chain to each row of matrix? The verbs themselves can take care of whether the passed value is boxed or not, that's fine. Also, I'd rather avoid transposing the matrix (|:) as it could be quite large.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

差↓一点笑了 2024-12-04 12:42:36

执行此操作的标准方法是：

将面向行（单元格）的结构转换为面向列的结构
应用将正确的动词应用于每列（仅一次）

步骤 (1) 很简单。步骤（2）也很容易，但不是那么明显。有一个小技巧可以帮到你。

诀窍是知道许多原始运算符接受动名词作为左参数，并生成一个循环动名词的函数，依次应用每个动词。 IMO，此类别中最有用的运算符是 ;. 。这是使用它的示例实现：

步骤（0），输入：

   matrix      =:  ('abc';'defgh';23),:('foo';'bar';45)

   columnTypes =:  'string';'string';'num'

   DoString    =:  toupper
   DoNum       =:  0&j.

   matrix
+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

步骤（1），对数据进行列化：

   columnify   =:  <@:>"1@:|: :. rowify =: <"_1&>
   columnify matrix
+---+-----+-----+
|abc|defgh|23 45|
|foo|bar  |     |
+---+-----+-----+

请注意，列化提供了一个逆函数，它将重新“行化”数据，尽管您不应该这样做：见下文。

步骤 (2)，使用 的动词循环功能将正确的动词应用于每列（仅一次）；。：

   homogenize  =:  ({. foo&.>@:{.`'') [^:('foo'-:])L:0~ ]
   chain       =:  DoString`DoNum`] homogenize@{~  ('string';'num')&i.

请注意，默认转换对于未知的列类型是恒等函数，]。

动词均质化对输入进行标准化每个列处理器的输出（即抽象出预处理和后处理，以便用户只需提供转换的动态“核心”）。动词chain将列类型列表作为输入，并派生出适合使用;.（或类似运算符）左侧参数的动名词。

因此：

   1 (chain columnTypes);.1  columnify matrix
+---+-----+---------+
|ABC|DEFGH|0j23 0j45|
|FOO|BAR  |         |
+---+-----+---------+

或者，如果您确实必须有一个包含盒装单元格的 NxM 表，请应用列化“下方”的剪切：

   1 (chain columnTypes);.1&.columnify matrix
+-----+-----+
|ABC  |FOO  |
+-----+-----+
|DEFGH|BAR  |
+-----+-----+
|0j23 |0j45 |
+-----+-----+

但请注意，在 J 上下文中，将表保留为出于性能和符号原因，同类列的列表。

J 在“整体”处理数组时效果最佳；经验法则是您应该让原始名称或用户定义的名称在每个应用程序中看到尽可能多的数据。这是这种“columificaton”方法的主要好处：如果将数据存储为同类列的列表，则以后的操作会更快、更容易。

但是，如果您的用例确实要求将数据保留为包含盒装单元格的 NxM 表，那么将数据与列范式进行转换是一项昂贵的无操作操作。在这种情况下，您应该坚持使用原来的解决方案，

   1 chain\"1 matrix

它（因为您要求）实际上与 ;. 方法在相同的前提下工作。特别是，\ 是另一个原始运算符，它接受动名词参数，并连续应用每个动词（即循环地应用于每个新的数据窗口）。

实际上，1 chain\"1 matrix 的作用是将矩阵分成行 ("1)，并为每一行创建一个 1 宽的移动窗口， (1 f\matrix)，将 chain 的动词循环地应用于每个 1 宽的窗口（即 f 每 1- 变化一次）矩阵每行的宽数据窗口）。

由于行的移动 1 窗口（rank-1 向量）是按顺序排列的行原子，并且 chain 的动词以相同的顺序给出，因此实际上您是将这些动词应用于矩阵的列，one。原子。在。一个。

简而言之：1 链\"1 矩阵 类似于foo"0 矩阵，只不过 foo 对于每个原子都会发生变化。出于同样的原因，应该避免使用 foo"0 矩阵：因为在小等级上应用函数会违背 J 的粒度，从而导致性能损失。一般来说

，最好尽可能使用更高级别的应用函数，在这种情况下，需要将矩阵转换（并维护）为列范式。

换句话说，这里是;.。是 "1 作为\ 是 "0。如果您发现整个 columnify/homoize 内容太长或庞大（与1 链\"1 矩阵)，您可以导入 [1] 中提供的脚本，该脚本将这些定义打包为可重用的实用程序，并带有扩展。请参阅该页面以获取示例和说明。

[1] 相关实用脚本：
http://www.jsoftware.com/jwiki/DanBron/Snippets/DOOG

The standard method for doing this is:

Convert your row (cell)-oriented structure to a column-oriented structure
Apply the correct verb to each column (just once)

Step (1) is easy. Step (2) is also easy, but not as obvious. There's a little trick that helps.

The trick is knowing that a number of primitive operators accept a gerund as a left argument and produce a function which cycles over the gerund, applying each verb in turn. IMO, the most useful operator in this category is ;. . Here's an example implementation using it:

Step (0), inputs:

   matrix      =:  ('abc';'defgh';23),:('foo';'bar';45)

   columnTypes =:  'string';'string';'num'

   DoString    =:  toupper
   DoNum       =:  0&j.

   matrix
+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

Step (1), columify data:

   columnify   =:  <@:>"1@:|: :. rowify =: <"_1&>
   columnify matrix
+---+-----+-----+
|abc|defgh|23 45|
|foo|bar  |     |
+---+-----+-----+

Note that the columnify is provided with an inverse which will re-"rowify" data, though you shouldn't do that: see below.

Step (2), apply the correct verb to each column (exactly once), using the verb-cycling feature of ;.:

   homogenize  =:  ({. foo&.>@:{.`'') [^:('foo'-:])L:0~ ]
   chain       =:  DoString`DoNum`] homogenize@{~  ('string';'num')&i.

Note that the default transformation for unknown column-types is the identity function, ].

The verb homogenize normalizes the input & output of each column-processor (i.e abstracts out the pre- and post-processing so that the user only has to provide with the dynamic "core" of the transformation). The verb chain takes a list of column-types as an input and derives a gerund appropriate for use a left-hand argument to ;. (or a similar operator).

Thus:

   1 (chain columnTypes);.1  columnify matrix
+---+-----+---------+
|ABC|DEFGH|0j23 0j45|
|FOO|BAR  |         |
+---+-----+---------+

Or, if you really must have an NxM table of boxed cells, apply the cut "under" columnify:

   1 (chain columnTypes);.1&.columnify matrix
+-----+-----+
|ABC  |FOO  |
+-----+-----+
|DEFGH|BAR  |
+-----+-----+
|0j23 |0j45 |
+-----+-----+

But note it is much more appropriate, in a J context, to keep the table as a list of homogeneous columns, for both performance and notational reasons.

J works best when processing arrays "in toto"; the rule of thumb is you should let primitive or user-defined name see as much data as possible at each application. That's the major benefit of this "columificaton" approach: if you store your data as a list of homogeneous columns, it will be faster and easier to manipulate later.

However, if your use-case really demands you keep the data as a NxM table of boxed cells, then converting your data to- and from- column normal form is an expensive no-op. In that case, you should stick with your original solution,

   1 chain\"1 matrix

which (because you asked) actually works on the same premise as the ;. approach. In particular, \ is another of those primitive operators which accepts a gerund argument, and applies each verb in succession (i.e. to each a new window of data, cyclically).

In effect, what 1 chain\"1 matrix does is break the matrix into rows ("1), and for each row, it creates a 1-wide moving window, (1 f\ matrix), applying the verbs of chain to each of those 1-wide windows cylically (i.e. f changes with every 1-wide data window of each row of the matrix).

Since the moving 1-window of a row (a rank-1 vector) is the atoms of the row, in order, and the verbs of chain are given in the same order, in effect you're applying those verbs to the columns of the matrix, one. atom. at. a. time.

In short: 1 chain\"1 matrix is analogous to foo"0 matrix, except foo changes for each atom. And it should be avoided for the same reason foo"0 matrix should be avoided in general: because applying functions at small rank works against the grain of J, incurring a performance penalty.

In general, it's better to use apply functions at higher ranks whenever you can, which in this case calls for converting (and maintaining) matrix to column-normal form.

In other words, here, ;. is to "1 as \ is to "0. If you find the whole columnify/homogenize thing too lengthy or bulky (compared to 1 chain\"1 matrix), you can import the script provided at [1], which packages up those definitions as re-usable utilities, with extensions. See the page for examples and instructions.

[1] Related utility script:
http://www.jsoftware.com/jwiki/DanBron/Snippets/DOOG

回复收藏 0 原文

涫野音 2024-12-04 12:42:36

如果这些计算仅依赖于各个框内的数据（或许还依赖于全局值），则可以将 Agenda 与 Under Open（又名 Each）结合使用。该技术的应用如下所示：（

   doCells  =: (doNum`doString @. isLiteral)&.>
   isLiteral=: 2 -: 3!:0

   doNum    =: +:   NB. Double
   doString =: toupper

   doCells matrix
┌───┬─────┬──┐
│ABC│DEFGH│46│
├───┼─────┼──┤
│FOO│BAR  │90│
└───┴─────┴──┘

在本示例中，我为 doNum 和 doString 添加了任意含义，以帮助使可行性变得简单。）

<这里使用 code>isLiteral 可能就足够了，但如果涉及稀疏文字或 unicode 值，则会失败。

如果计算需要涉及比单个框更多的矩阵，那么这不会是您问题的答案。如果计算需要按行进行，则解决方案可能涉及应用排名 _1 的动词（即最高轴的每个项目。）

If these calculations depend only on the data inside individual boxes (and, perhaps, global values,) it is possible to use Agenda with Under Open (aka Each). An application of this technique is shown below:

   doCells  =: (doNum`doString @. isLiteral)&.>
   isLiteral=: 2 -: 3!:0

   doNum    =: +:   NB. Double
   doString =: toupper

   doCells matrix
┌───┬─────┬──┐
│ABC│DEFGH│46│
├───┼─────┼──┤
│FOO│BAR  │90│
└───┴─────┴──┘

(In this example I've put in arbitrary meanings for doNum and doString to help make the viability plain.)

The version of isLiteral used here may well suffice, but it will fail if either sparse literal or unicode values will be involved.

If the calculations need to involve more of the matrix than a single box, this won't be the answer to your question. If calculation needs to occur by line, instead, the solution may involve applying a verb at rank _1 (i.e. to each item of the highest axis.)

回复收藏 0 原文