我有以下问题:
- 具有一个数据框架,即包含两个向量“名称”和“值”,一个作为文本,一个具有数字值,具有20行和2列,
- 我想提取“值”,并随机示例(使用相等的重量)10x大小5的子集与“值”计算平均值。我想在另一个向量10x1中捕获这些结果(平均值)。
- 但是,我想做与步骤2相同的事情,但是,我不想采样5号的子集,而是要有更多的观察值,即15(来自20个值)。我采用这15个值,计算平均重新征用此步骤10X,并在新向量10x1中的结果中记录。
(4。最终,我想比较这两个向量之间的一些描述性统计数据,即期望较小的子集大小向量会有较胖的尾巴,更负面的偏斜等)。
创建数据框以开始
Name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
Values <- c(0.1, 0.05, 0.03, 0.06, -0.1, -0.3, -0.05, 0.5, 0.12, 0.06, 0.04, 0.15, 0.13, 0.16, -0.12, -0.03, -0.5, 0.05, 0.07, 0.03)
data <- data.frame(Name, Values)
相关部分:
# extract Values column
Values <- data$Values
# define sizes of subset and number of iterations
n_small <- 5
n_large <- 15
n_iterations <- 10
set.seed(123456)
# Initialize result vector
Averages_small <- NULL
Averages_large <- NULL
# Calculate average of the subset and allocate it to the result vector
for (i in n_iterations) {
Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
}
以某种方式给出MA 9X NA和一个数字。我做错了什么?并且有一种更好的方法,因为以上是一个示例,并且没有NA值,但是原始数据集具有20K行,并且可能“包含”丢失值。
仅供参考,为您提供背景:价值是投资的回报数字,问题是有更多的投资有助于多样化。
非常感谢您的帮助!
I have the following problem:
- Have a data frame, i.e. containing two vectors "Name" and "Values", one as text and one with numeric values, with 20 rows and 2 columns
- I want to extract "Values" and sample randomly (with equal weight) 10x a subset of size 5 from the "Values" and calculate the mean. I want to capture those results (mean values) in another vector 10x1.
- I want to do the same as step 2, however, instead of sampling a subset of size 5, I want to have more observations, i.e. 15 (from the 20 values). I take those 15 values, calculate the mean an re-iterate this step 10x, logging in the results in a new vector 10x1.
(4. Ultimately, I want to compare some descriptive statistics between these two vectors, i.e. expecting that the smaller subset size vector would have fatter tails, more negatively skewed etc).
Creating the data frame as a start
Name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
Values <- c(0.1, 0.05, 0.03, 0.06, -0.1, -0.3, -0.05, 0.5, 0.12, 0.06, 0.04, 0.15, 0.13, 0.16, -0.12, -0.03, -0.5, 0.05, 0.07, 0.03)
data <- data.frame(Name, Values)
The relevant part:
# extract Values column
Values <- data$Values
# define sizes of subset and number of iterations
n_small <- 5
n_large <- 15
n_iterations <- 10
set.seed(123456)
# Initialize result vector
Averages_small <- NULL
Averages_large <- NULL
# Calculate average of the subset and allocate it to the result vector
for (i in n_iterations) {
Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
}
Somehow this gives ma 9x NA and a number. What I am doing wrong? and is there a better way than for-loop this through, because above is an example and also no NA values, however, the original data set has 20k rows and it might "contain" missing values.
fyi, to give you a background: the Values are return figures of investments and the question is having a higher number of investments helps diversification.
Thank you very much for your help!
发布评论
评论(2)
您可以使用
Replicate
获取示例的10次吸引。这将返回带有列中样本的矩阵,因此此矩阵的colmeans
为您提供了所需的向量:您可以看到
vec5
的标准偏差确实是较大:You can use
replicate
to get 10 draws of your sample. This returns a matrix with the samples in columns, so thecolMeans
of this matrix gives you the vector you are looking for:You can see that the standard deviation of
vec5
is indeed larger:我知道这个问题已经得到回答,但是我在您的原始代码中发现了导致其不起作用的错误。
您编写的代码实际上可以按照您的意愿工作,但是For循环仅触发一次。
(i in V)
向量上的循环,重复列出的每个值。记住你设定因此,在您的循环中,您有效地具有(10) ,使得循环仅调用一次,这意味着整个结构最终结束了存在
您想要的是
(i in 1:10)
,创建一个向量。可以解决这要么定义n_iterations&lt; - 1:10
,或(使用您的原始设置)我知道,对于循环而言,不依靠一个可能是优越的,但我也认为您会欣赏为什么您的代码首先无法正常工作的解释。
I know that this question has already been answered, but I have found the mistake in your original code that caused it to not work.
The code as you wrote it can actually work as you want it to, but the for loop only fired once;
for (i in v)
loops over a vector, repeating with each value listed. Remember that you setSo in your loop, you effectively had
for (i in 10)
, such that the loop was only called once, meaning that the whole structure ended up beingWhat you want is
for (i in 1:10)
, which creates a vector. This can be solved either be definingn_iterations <- 1:10
, or (using your original setup)I know that for loops are generally not optimal, and a solution that does not rely on one is probably superior, but I also thought that you would appreciate an explanation of why your code did not function correctly in the first place.