在 R 中使用 mutate() 时请参考上面的行
我想在数据框中创建一个新变量,该变量引用上一行中同一新变量的值。这是我想要做的一个例子:
一匹马在一个分为四个区域的田野里。这匹马戴着一个每分钟发出信号的信标,信号由四个传感器之一接收,每个区域一个。场地的中间有一道栅栏,马可以轻松地穿过 2 区和 3 区,但要进入 1 区和 4 区,必须经过 2 区和 3 区。马无法跳过栅栏。
|________________|
| |
sensor 2 | X | | sensor 3
| | |
| | |
| | |
sensor 1 | Y| | sensor 4
| | |
|----------------|
在上图中,如果马在位置 X,它将被传感器 2 检测到。但是,如果马在位置 Y 的中间栅栏附近,它可能会被传感器 1 或传感器 4 检测到,其范围略有重叠。
在下面的玩具示例中,我有一个数据框,其中每分钟都有 20 分钟的位置数据。大多数情况下,马会一次移动一个区域,但在某些情况下,它会在区域 1 和区域 4 之间来回切换。这应该是不可能的:马无法跳过栅栏,也无法在空间中跑来跑去一分钟。
因此,我想计算数据集中的一个新变量,该变量提供动物的“真实”位置,考虑到在 1 和 4 之间移动的可能性。
数据如下:
library(tidyverse)
library(reshape2)
example <- data.frame(time = seq(as.POSIXct("2022-01-01 09:00:00"),
as.POSIXct("2022-01-01 09:20:00"),
by="1 mins"),
location = c(1,1,1,1,2,3,3,3,4,4,4,3,3,2,1,1,4,1,4,1,4))
example
创建两个新变量:“prevloc”是动物所在的位置在前一分钟,“diffloc”是动物当前位置和前一位置之间的数字差异。
example <- example %>% mutate(prevloc = lag(location),
diffloc = abs(location - prevloc))
example
接下来,只需将“diffloc”的第一个值从 NA 更改为零:
example <- example %>% mutate(diffloc = ifelse(is.na(diffloc), 0, diffloc))
example
现在我们有一个数据帧,其中 diffloc 为 0(动物没有移动)、1(动物移动了一个区域)或 3(动物显然从区域移动) 1 到区域 4 或反之亦然)。在 diffloc = 3 的情况下,我想创建一个“真实”位置,考虑到这样的位置更改是不可能的。
在我的示例中,动物从区域 1 -> 区域进入。 4-> 1-> 4-> 1-> 4. 基于动物从区域 1 开始的事实,我的假设是动物一直呆在区域 1 中。
我尝试在下面解决这个问题,但不起作用:
example <- example %>%
mutate(returnloc = ifelse(diffloc < 3, location, lag(returnloc)))
我想知道是否有人可以帮助我解决这个问题?我已经尝试了几天,但还没有接近......
祝你好运,
亚当
I want to create a new variable in a dataframe that refers to the value of the same new variable in the row above. Here's an example of what I want to do:
A horse is in a field divided into four zones. The horse is wearing a beacon that signals every minute, and the signal is picked up by one of four sensors, one for each zone. The field has a fence that runs most of the way down the middle, such that the horse can pass easily between zones 2 and 3, but to get between zones 1 and 4 it has to go via 2 and 3. The horse cannot jump over the fence.
|________________|
| |
sensor 2 | X | | sensor 3
| | |
| | |
| | |
sensor 1 | Y| | sensor 4
| | |
|----------------|
In the schematic above, if the horse is at position X, it will be picked up by sensor 2. If the horse is near the middle fence at position Y, however, it may be picked up by either sensor 1 or sensor 4, the ranges of which overlap slightly.
In the toy example below, I have a dataframe where I have location data each minute for 20 minutes. In most cases, the horse moves one zone at a time, but in several instances, it switches back and forth between zone 1 and 4. This should be impossible: the horse cannot jump the fence, and neither can it run around in the space of a minute.
I therefore want to calculate a new variable in the dataset that provides the "true" location of the animal, accounting for the impossibility of travelling between 1 and 4.
Here's the data:
library(tidyverse)
library(reshape2)
example <- data.frame(time = seq(as.POSIXct("2022-01-01 09:00:00"),
as.POSIXct("2022-01-01 09:20:00"),
by="1 mins"),
location = c(1,1,1,1,2,3,3,3,4,4,4,3,3,2,1,1,4,1,4,1,4))
example
Create two new variables: "prevloc" is where the animal was in the previous minute, and "diffloc" is the number differences between the animal's current and previous location.
example <- example %>% mutate(prevloc = lag(location),
diffloc = abs(location - prevloc))
example
Next, just change the first value of "diffloc" from NA to zero:
example <- example %>% mutate(diffloc = ifelse(is.na(diffloc), 0, diffloc))
example
Now we have a dataframe where diffloc is either 0 (animal didn't move), 1 (animal moved one zone), or 3 (animal apparently moved from zone 1 to zone 4 or vice versa). Where diffloc = 3, I want to create a "true" location taking account of the fact that such a change in location is impossible.
In my example, the animal went from zone 1 -> 4 -> 1 -> 4 -> 1 -> 4. Based on the fact that the animal started in zone 1, my assumption is that the animal just stayed in zone 1 the whole time.
My attempt to solve this below, which doesn't work:
example <- example %>%
mutate(returnloc = ifelse(diffloc < 3, location, lag(returnloc)))
I wonder whether anyone can help me to solve this? I've been trying for a couple of days and haven't even got close...
Best wishes,
Adam
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种可能的解决方案是,当
diffloc == 3
时,查看之前不是1也不是4的值。如果是2,那么马肯定在1之后,如果是3,那么这匹马肯定排在4位。One possible solution is to, when
diffloc == 3
, look at the previous value that is not 1 nor 4. If it is 2, then the horse is certainly in 1 afterwards, if it is 3, then the horse is certainly in 4.这是一种使用包含 for 循环的函数的方法。
您不能依赖 diff,因为这不会拾取(错误的)区域 4 的序列。
c(1,1,4,4,4,1,1,1)
应转换为c(1,1,1,1,1,1,1,1)
如果我正确理解你的问题。所以,你需要迭代(我认为)。
请注意,此函数仅在第一行中的位置正确时才有效。如果它从(错误的)区域 4 开始,就会出错。
Here is an approach using a funciton containing a for-loop.
You cannot rely on diff, because this will not pick up sequences of (wrong) zone 4's.
c(1,1,4,4,4,1,1,1)
should be converted toc(1,1,1,1,1,1,1,1)
if I understand your question correctly.So, you need to iterate (I think).
NB that this function only works if the location in the first row is correct. If it start with (wrong) zone 4's, it will go awry.