R 选择值的第一次出现

发布于 2024-12-22 12:40:43 字数 2139 浏览 3 评论 0原文

我需要在 R 的数据集中选择特定值，但不知道如何实际执行它。

（作为旁注：数据代表什么？）

这是对连接到其他三个节点的网络节点的测量。 “时间”代表相对时间（某事何时发生）。 'tm1,tm2,tm3' 显示连接当前处于哪种协议状态。 “已知”是应该连接的节点的绝对数量。（此处最多 3 个）

首先，一些测试数据为 csv：

time,tm1,tm2,tm3,known
4,1,0,0,1
12,2,0,0,2
206,3,0,0,3
238,2,1,0,3
255,1,2,0,3
297,1,1,1,3
309,1,0,2,3
401,0,1,2,3
446,0,0,3,3
3,1,0,0,1
58,2,0,0,2
235,1,1,0,2
258,1,0,1,2
320,0,1,1,2
335,0,0,2,2
874,1,0,2,3
1024,0,1,2,3
1030,0,0,3,3
3,1,0,0,1
20,2,0,0,2
132,1,1,0,2
151,1,0,1,2
207,0,1,1,2
228,0,0,2,2
852,1,0,2,3
929,0,1,2,3
938,0,0,3,3

每一行都意味着系统中的“某些内容”发生了变化，但我必须过滤这些数据。例如，一帧是：

time,tm1,tm2,tm3,known
 3,1,0,0,1  <- ignore
20,2,0,0,2  <- ignore
132,1,1,0,2 <- ignore
151,1,0,1,2 <- select  (first appearance of 1 in tm3)
207,0,1,1,2 <- ignore
228,0,0,2,2 <- select  (first appearance of 2 in tm3)
852,1,0,2,3 <- ignore
929,0,1,2,3 <- ignore
938,0,0,3,3 <- select  (first appearance of 3 in tm3)

所选数据始终包含 3 个有趣的条目，每个“帧”包含 9 行，这对于所有所谓的帧都是如此（您可以注意到，当时间跳到 csv 数据中的较小值时）

也许我需要绘制更多数据，但这变得更加复杂，我不确定如何正确绘制这样的状态机。使用 3 个“节点”运行肯定是可以在有条件的情况下进行管理的，但是一旦我需要为 6 个节点执行此操作，这肯定会失控。我对 R 比较陌生，所以我不确定 R 是否是正确的决定，我只是想快速绘制一些测量数据，因此 R 非常适合。但是在 R 中正确绘制这样的状态机需要做多少工作，我想我不能投入这么多时间并只运行简单的数据绘制。

不幸的是，我无法估计它会变得多么复杂。也许有人可以启发我，或者已经可视化了状态机，并提供了一些如何做到这一点的技巧。

有关可视化状态的更多详细信息：

建议网络中有 3 个节点，应将它们连接起来。所以每一行实际上代表了网络连接状态的变化。

执行的协议由三个协议阶段组成，需要为连接执行这些阶段。

例如，像这样的行：

4,1,0,0,1

表示从节点收到一条消息，现在继续执行协议。然后是以下一行：

100,0,1,0,1

如果此阶段成功，则下一个协议阶段发生：

200,0,0,1,1

如果我们达到此状态，则另一个节点将被认证、验证并连接。正如您在我的示例数据中看到的，这些消息没有以任何有序的方式出现。尽管之间永远不存在转换：

4,1,0,0,1

直接

200,0,0,1,1

来自其他节点的消息可以出现在任何其他消息之间。

有趣的部分实际上不是可视化，而是每个协议阶段转换所需的时间。这就是我实际需要获取的数据。

当我写这篇文章时，我认为根据实际数据，不可能用当前信息来可视化这一点。

一种简单的方法是，为每个节点引入唯一的标识符，并根据标识符和时间对列表进行排序。然后我们自动得到一个有点排序的列表，因为单个节点无法跳过任何协议阶段，所以它是根据时间正确排序的。我想，因为每个节点本身都有固定的条目，所以这些数据可以很容易地计算和可视化。

我目前没有其他方法，问题是，这些数据并不那么容易实现，但如果我需要可视化它，似乎我无法克服唯一标识符。

原文

I need to select specific values in a dataset in R and can't figure out how to actually do it.

(As a sidenote: What does the data represent?)

This is a measurement of a network node connecting to three other nodes.
'time' represents a relative time, (when did something happen).
'tm1,tm2,tm3' shows in which protocol state the connection is currently.
'known' is the absolute number of nodes which should be connected. (here max 3)

First of all some test data as csv:

time,tm1,tm2,tm3,known
4,1,0,0,1
12,2,0,0,2
206,3,0,0,3
238,2,1,0,3
255,1,2,0,3
297,1,1,1,3
309,1,0,2,3
401,0,1,2,3
446,0,0,3,3
3,1,0,0,1
58,2,0,0,2
235,1,1,0,2
258,1,0,1,2
320,0,1,1,2
335,0,0,2,2
874,1,0,2,3
1024,0,1,2,3
1030,0,0,3,3
3,1,0,0,1
20,2,0,0,2
132,1,1,0,2
151,1,0,1,2
207,0,1,1,2
228,0,0,2,2
852,1,0,2,3
929,0,1,2,3
938,0,0,3,3

Each line means "something" has changed in the system, but I have to filter this data.
E.g one frame is:

time,tm1,tm2,tm3,known
 3,1,0,0,1  <- ignore
20,2,0,0,2  <- ignore
132,1,1,0,2 <- ignore
151,1,0,1,2 <- select  (first appearance of 1 in tm3)
207,0,1,1,2 <- ignore
228,0,0,2,2 <- select  (first appearance of 2 in tm3)
852,1,0,2,3 <- ignore
929,0,1,2,3 <- ignore
938,0,0,3,3 <- select  (first appearance of 3 in tm3)

The selected data contains always 3 interesting entries and each "frame" holds 9 lines, this is true for all so called frames (you can notice that when the time jumps to a lesser value in the csv data)

Maybe I need to plot more data, but this gets even more complicated and I'm not sure how I can plot such a state machine correctly. Running with 3 "nodes" is surely manageable with conditions, but as soon as I need to do that for lets say 6 Nodes, this runs surely out of hand. I'm relative new in R and so I'm not sure if R is the right decision, I just wanted to quickly plot some measurement data and therefor R fits perfectly.
But how much work is it to plot such state machines in R properly, I think I cannot invest so much time in it and just run with simple data plotting.

Unfortunately, I cannot estimate how complicated it will get.
Maybe someone can enlighten me or did already visualize a state machine and has some tips how to do it.

A bit more detail on visualizing the states:

Suggest there are 3 nodes in a network, which should be connected. So each line represents actually a change in the network connection state.

The protocol which is executed consists of three protocol stages, which needs to be executed for a connection.

So e.g. a line like:

4,1,0,0,1

means, there was a message received from a node, now proceed in the protocol.
A following line is then:

100,0,1,0,1

If this stage succeeds, the next protocol stage happens:

200,0,0,1,1

If we reach this state, the other node is authenticated, verified and connected.
As you can see in my example data, these messages appear not in any ordered ways.
Although there is never a transition between:

4,1,0,0,1

directly to

200,0,0,1,1

messages from other nodes can appear in between any other message.

The interesting part of this is actually not the visualization, but the time it takes for each protocol stage transition. This is the data I actually need to get.

As I write this, I think with the actual data it is not possible to visualize this with the current information.

An easy way could be, introduce an unique identifier for each node and sort the list based on identifier and time. Then we automatically get a somewhat ordered list, because no protocol stage can be jumped over for a single node, so it is ordered correctly based on time. Because there are fixed entries for each node itself these data could be easily calculated and visualized, I guess.

I currently see no other way, the problem is, this data is not so easily achievable, but if I need to visualize it, it seems i cannot get over the unique identifiers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空名 2024-12-29 12:40:43

每次选择的列中存在非零差异时，您可以使用 diff() 函数和一些逻辑索引来获取第一行。例如，假设您的数据位于名为 dat 的 data.frame 中：

> dat[c(dat$tm3[1], diff(dat$tm3)) != 0,]
   time tm1 tm2 tm3 known
6   297   1   1   1     3
7   309   1   0   2     3
9   446   0   0   3     3
10    3   1   0   0     1
13  258   1   0   1     2
15  335   0   0   2     2
18 1030   0   0   3     3
19    3   1   0   0     1
22  151   1   0   1     2
24  228   0   0   2     2
27  938   0   0   3     3

尝试单独运行此部分以查看创建的布尔值：c(dat$tm3[1], diff(dat$ tm3)) != 0

上面的代码不会返回第一行，但如果您需要，请将布尔索引更改为此：dat[c(TRUE, diff(dat$tm3)) != 0,]

这也得到了回答此处在交叉验证上。

You can use the diff() function and some logical indexing to get the first row each time there is a non zero difference in the column of choice. For example, assuming your data is in a data.frame named dat:

> dat[c(dat$tm3[1], diff(dat$tm3)) != 0,]
   time tm1 tm2 tm3 known
6   297   1   1   1     3
7   309   1   0   2     3
9   446   0   0   3     3
10    3   1   0   0     1
13  258   1   0   1     2
15  335   0   0   2     2
18 1030   0   0   3     3
19    3   1   0   0     1
22  151   1   0   1     2
24  228   0   0   2     2
27  938   0   0   3     3

Try running this part by itself to see the booleans created: c(dat$tm3[1], diff(dat$tm3)) != 0

The code above will not return the first row, but if you need that - change the boolean indexing to this instead: dat[c(TRUE, diff(dat$tm3)) != 0,]

This was also answered here on crossValidated.

回复收藏 0 原文

~没有更多了~