当前位置：文江博客话题详情

R：从数据框架中选择向量（数字），样本n = 10个大小i = 5的子集，i = 10的向量中的i = 10，并计算这些样品中的每一个的平均值

发布于 2025-01-24 15:41:27 字数 1340 浏览 3 评论 0 原文

我有以下问题：

具有一个数据框架，即包含两个向量“名称”和“值”，一个作为文本，一个具有数字值，具有20行和2列，
我想提取“值”，并随机示例（使用相等的重量）10x大小5的子集与“值”计算平均值。我想在另一个向量10x1中捕获这些结果（平均值）。
但是，我想做与步骤2相同的事情，但是，我不想采样5号的子集，而是要有更多的观察值，即15（来自20个值）。我采用这15个值，计算平均重新征用此步骤10X，并在新向量10x1中的结果中记录。（4。最终，我想比较这两个向量之间的一些描述性统计数据，即期望较小的子集大小向量会有较胖的尾巴，更负面的偏斜等）。

创建数据框以开始

Name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
Values <- c(0.1, 0.05, 0.03, 0.06, -0.1, -0.3, -0.05, 0.5, 0.12, 0.06, 0.04, 0.15, 0.13, 0.16, -0.12, -0.03, -0.5, 0.05, 0.07, 0.03)
data <- data.frame(Name, Values)

相关部分：

# extract Values column
Values <- data$Values

# define sizes of subset and number of iterations
n_small <- 5
n_large <- 15
n_iterations <- 10

set.seed(123456)

# Initialize result vector
Averages_small <- NULL
Averages_large <- NULL

# Calculate average of the subset and allocate it to the result vector
for (i in n_iterations) {
  Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
  Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
}

以某种方式给出MA 9X NA和一个数字。我做错了什么？并且有一种更好的方法，因为以上是一个示例，并且没有NA值，但是原始数据集具有20K行，并且可能“包含”丢失值。

仅供参考，为您提供背景：价值是投资的回报数字，问题是有更多的投资有助于多样化。

非常感谢您的帮助！

原文

I have the following problem:

Have a data frame, i.e. containing two vectors "Name" and "Values", one as text and one with numeric values, with 20 rows and 2 columns
I want to extract "Values" and sample randomly (with equal weight) 10x a subset of size 5 from the "Values" and calculate the mean. I want to capture those results (mean values) in another vector 10x1.
I want to do the same as step 2, however, instead of sampling a subset of size 5, I want to have more observations, i.e. 15 (from the 20 values). I take those 15 values, calculate the mean an re-iterate this step 10x, logging in the results in a new vector 10x1.
(4. Ultimately, I want to compare some descriptive statistics between these two vectors, i.e. expecting that the smaller subset size vector would have fatter tails, more negatively skewed etc).

Creating the data frame as a start

Name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
Values <- c(0.1, 0.05, 0.03, 0.06, -0.1, -0.3, -0.05, 0.5, 0.12, 0.06, 0.04, 0.15, 0.13, 0.16, -0.12, -0.03, -0.5, 0.05, 0.07, 0.03)
data <- data.frame(Name, Values)

The relevant part:

# extract Values column
Values <- data$Values

# define sizes of subset and number of iterations
n_small <- 5
n_large <- 15
n_iterations <- 10

set.seed(123456)

# Initialize result vector
Averages_small <- NULL
Averages_large <- NULL

# Calculate average of the subset and allocate it to the result vector
for (i in n_iterations) {
  Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
  Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
}

Somehow this gives ma 9x NA and a number. What I am doing wrong? and is there a better way than for-loop this through, because above is an example and also no NA values, however, the original data set has 20k rows and it might "contain" missing values.

fyi, to give you a background: the Values are return figures of investments and the question is having a higher number of investments helps diversification.

Thank you very much for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋叶绚丽 2025-01-31 15:41:27

您可以使用 Replicate 获取示例的10次吸引。这将返回带有列中样本的矩阵，因此此矩阵的 colmeans 为您提供了所需的向量：

set.seed(1) # For reproducibility

vec5  <- colMeans(replicate(10, sample(data$Values, 5)))
vec15 <- colMeans(replicate(10, sample(data$Values, 15)))

vec5
#> [1] -0.014  0.148  0.044 -0.026  0.062  0.020 -0.032 -0.130  0.166  0.040

vec15
#> [1]  0.058000000  0.024666667  0.051333333  0.045333333  0.024000000
#> [6]  0.010666667  0.022666667 -0.010000000  0.003333333 -0.001333333

您可以看到 vec5 的标准偏差确实是较大：

sd(vec5)
#> [1] 0.08711908

sd(vec15)
#> [1] 0.02297406

You can use replicate to get 10 draws of your sample. This returns a matrix with the samples in columns, so the colMeans of this matrix gives you the vector you are looking for:

set.seed(1) # For reproducibility

vec5  <- colMeans(replicate(10, sample(data$Values, 5)))
vec15 <- colMeans(replicate(10, sample(data$Values, 15)))

vec5
#> [1] -0.014  0.148  0.044 -0.026  0.062  0.020 -0.032 -0.130  0.166  0.040

vec15
#> [1]  0.058000000  0.024666667  0.051333333  0.045333333  0.024000000
#> [6]  0.010666667  0.022666667 -0.010000000  0.003333333 -0.001333333

You can see that the standard deviation of vec5 is indeed larger:

sd(vec5)
#> [1] 0.08711908

sd(vec15)
#> [1] 0.02297406

回复收藏 0 原文

手心的海 2025-01-31 15:41:27

我知道这个问题已经得到回答，但是我在您的原始代码中发现了导致其不起作用的错误。
您编写的代码实际上可以按照您的意愿工作，但是For循环仅触发一次。 （i in V）向量上的循环，重复列出的每个值。记住你设定

n_iterations＆lt; - 10

因此，在您的循环中，您有效地具有（10），使得循环仅调用一次，这意味着整个结构最终结束了存在

awerages_small [10]＆lt; - 平均值（示例（vutical，n_small，替换= false））
Averages_large [10]＆lt; - 平均值（示例（vutical，n_large，替换= false））

您想要的是（i in 1:10），创建一个向量。可以解决这要么定义 n_iterations＆lt; - 1:10 ，或（使用您的原始设置）

set.seed(123456)
for (i in 1:n_iterations) {
     Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
     Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
 }
Averages_small
#> [1] -0.066  0.042  0.036  0.018  0.080  0.016 -0.038 -0.180  0.132  0.042
Averages_large
#> [1] -0.02600000 -0.01266667  0.02000000  0.04666667  0.03533333 -0.02200000 -0.01533333 -0.00400000  0.03266667  0.07333333

我知道，对于循环而言，不依靠一个可能是优越的，但我也认为您会欣赏为什么您的代码首先无法正常工作的解释。

I know that this question has already been answered, but I have found the mistake in your original code that caused it to not work.
The code as you wrote it can actually work as you want it to, but the for loop only fired once; for (i in v) loops over a vector, repeating with each value listed. Remember that you set

n_iterations <- 10

So in your loop, you effectively had for (i in 10), such that the loop was only called once, meaning that the whole structure ended up being

Averages_small[10] <- mean(sample(Values, n_small, replace = FALSE))
Averages_large[10] <- mean(sample(Values, n_large, replace = FALSE))

What you want is for (i in 1:10), which creates a vector. This can be solved either be defining n_iterations <- 1:10, or (using your original setup)

set.seed(123456)
for (i in 1:n_iterations) {
     Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
     Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
 }
Averages_small
#> [1] -0.066  0.042  0.036  0.018  0.080  0.016 -0.038 -0.180  0.132  0.042
Averages_large
#> [1] -0.02600000 -0.01266667  0.02000000  0.04666667  0.03533333 -0.02200000 -0.01533333 -0.00400000  0.03266667  0.07333333

I know that for loops are generally not optimal, and a solution that does not rely on one is probably superior, but I also thought that you would appreciate an explanation of why your code did not function correctly in the first place.

回复收藏 0 原文

~没有更多了~