向虚拟变量添加噪音

发布于 2025-01-20 10:32:24 字数 165 浏览 2 评论 0原文

我正在尝试运行 knn 回归,但是,我有很多虚拟变量,因此有很多联系。为了解决这个问题,我想给假人添加噪音。所以我想为特定变量上带有 1 的行提供 1 到 0.99 之间的随机值。我想对值为零的行执行相同的操作,但然后给它们一个 0 到 0.01 之间的随机数。有人可以帮助我找到一种有效的方法来转换我的虚拟变量吗?

I am trying to run a knn regression, however, I have a lot of dummy variables and therefore a lot of ties. To solve this problem, I want to add noise to the dummies. So I want to give the rows with 1 on a specific variable a random value between 1 and 0.99. I want to do the same for rows with a zero value, but then give them a random number between 0 and 0.01. Can somebody help me with an efficient way to transform my dummy variables?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

溇涏 2025-01-27 10:32:24

基本R中有一个很棒的功能,称为jitter

jitter(x = c(rep(0, 10), rep(1, 10)), factor = 0.01)

There is a great function in base R for this called jitter.

jitter(x = c(rep(0, 10), rep(1, 10)), factor = 0.01)
晚雾 2025-01-27 10:32:24

您可以使用ifelse语句来转换您的虚拟vars:

set.seed(4)
df <- data.frame(letter=letters[1:10],dummy=sample(0:1,10,replace = T))
df$newdummy <- ifelse(df$dummy==1,runif(1,0.99,1),runif(1,0,0.01))

在这里我添加一个新列,但是您可以通过将ifelse语句分配给旧虚拟变量来替代现有的列。
但是,我同意@SAMR的答案,即虚拟变量。不清楚您想对虚拟变量做什么

You can use an ifelse statement to transform your dummy vars:

set.seed(4)
df <- data.frame(letter=letters[1:10],dummy=sample(0:1,10,replace = T))
df$newdummy <- ifelse(df$dummy==1,runif(1,0.99,1),runif(1,0,0.01))

Here I add a new column, but you can substitute the existing one by assigning the ifelse statement to the old dummy variable.
However, I agree with the answer of @SamR, about dummy variables. It is not very clear what you want to do with the dummy variable

风渺 2025-01-27 10:32:24

要添加噪音,您可以执行以下操作:

x  <- rep(1, 1000)
noisy_x  <- x + rnorm(n = 1000, mean = 0, sd = 0.000001)
noisy_x
#  [1] 1.0000010 1.0000004 1.0000014 0.9999998 0.9999998 1.0000007 0.9999990 1.0000006 1.0000006 0.9999989 1.0000007 0.9999998 0.9999992 1.0000002 0.9999989 0.9999994
#   [17] 0.9999987 0.9999997 1.0000000 0.9999993 1.0000000 0.9999997 1.0000013 0.9999991 0.9999987 0.9999994 0.9999983 0.9999992 0.9999982 1.0000004 1.0000000 1.0000009

但是,我会质疑这是否是正确的方法。虚拟变量通常不需要添加噪声。你说你有关系是什么意思?一般来说,如果您有一个代表 n 个因子水平的变量,则只需要 n-1 个虚拟变量。您指的是这个吗?

To just add noise you can do something like:

x  <- rep(1, 1000)
noisy_x  <- x + rnorm(n = 1000, mean = 0, sd = 0.000001)
noisy_x
#  [1] 1.0000010 1.0000004 1.0000014 0.9999998 0.9999998 1.0000007 0.9999990 1.0000006 1.0000006 0.9999989 1.0000007 0.9999998 0.9999992 1.0000002 0.9999989 0.9999994
#   [17] 0.9999987 0.9999997 1.0000000 0.9999993 1.0000000 0.9999997 1.0000013 0.9999991 0.9999987 0.9999994 0.9999983 0.9999992 0.9999982 1.0000004 1.0000000 1.0000009

However, I would question whether this is the right approach. Dummy variables do not usually need noise added to them. What do you mean you are getting ties? In general, if you have a variable representing n levels of a factor, you will only need n-1 dummy variables. Is this what you are referring to?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文