随机选择增加的数据子集以查看均值趋于平稳的位置
有人可以建议执行以下操作的最佳方法吗?
我有三个变量(X、Y 和 Z)和四个组(1、2、3 和 4)。我一直在 SPSS 中使用判别函数分析来预测已知分组数据的组成员资格,以便与未来的未分组数据一起使用。
理想情况下,我希望能够对越来越多的数据子集进行随机采样,以了解需要多少观察值才能达到所需的正确分类百分比。
但是,我知道这可能很困难。因此,我正在寻求这样做的方法。
例如,假设第 1 组变量 X 的平均值为 141。该平均值可能是根据 2000 个观察值计算得出的。然而,也可能出现这样的情况:平均值出现在 700 个观测值中。我希望能够计算出我的数据中观察/案例数量的平均水平。例如,也许从 10 个观察开始,随机重复 50 或 100 次,然后增加到 20 个观察......等等。
我知道这是蒙特卡罗测试的一种形式。我可以使用 SPSS 15、17 和 18 以及 Excel。我还可以访问 minitab 15 & 16 和 amos17 并下载了“R”,但我对这些不熟悉。我的经验是使用 SPSS 和 Excel。我已经在 SPSS 中尝试了一些语法,修改自此..http://pages.infinit.net/rlevesqu/Syntax/RandomSampling/Select2CasesFromEachGroup.txt,但这对我来说输入子集编号等仍然相当耗时。
希望有人可以提供帮助。
感谢您的阅读。
安迪
Could anyone please advise the best way to do the following?
I have three variables (X, Y & Z) and four groups (1, 2, 3 & 4). I have been using discriminant function analysis in SPSS to predict group membership of known grouped data for use with future ungrouped data.
Ideally I would like to able to randomly sample an increasing number of a subset of the data to see how many observations are required to hit a desired correct classification percentage.
However, I understand this might be difficult. Therefore, I'm looking to to do this for the means.
For example, Lets say variable X has a mean of 141 for group 1. This mean might have been calculated from 2000 observations. However, it might be the case that the mean occurred at say 700 observations. I would like to be able to calculate at what number of observations/cases the mean levels of in my data. For example, perhaps starting at 10 observations and repeating this randomly say 50 or 100 times, then increasing to 20 observations....and so on.
I understand this is a form of monte carlo testing. I have access to SPSS 15, 17 and 18 and excel. I also have access to minitab 15 & 16 and amos17 and have downloaded "R" but im not familiar with these. My experience is with SPSS and excel. I have tried some syntax in SPSS Modified from this..http://pages.infinit.net/rlevesqu/Syntax/RandomSampling/Select2CasesFromEachGroup.txt but this would still be quite time consuming on my part to enter the subset number ect etc.
Hope some one can help.
Thanks for reading.
Andy
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您链接到的文本是一个好的开始(您还可以在 SPSS 中使用
SAMPLE
命令,但在我看来,当您考虑以这种方式构建示例时,您链接到的 Raynald 脚本更加灵活)。在伪代码中,该过程可能如下所示:
这就是 SPSS 的宏语言发挥作用的地方(我认为此文档是一个很好的介绍,此外您还可以检查 SPSS 标签上的其他参考资料维基)。基本上,一旦您弄清楚如何绘制样本并计算所需的统计数据,您只需要弄清楚如何编写宏,以便可以循环执行该过程(并向其传递样本大小参数)。我将循环包含 100 次,因为您希望能够对与每个样本大小相关的误差进行某种类型的估计。
如果您举例说明如何计算统计数据,我也许可以举例说明如何将其变成宏函数并循环所需的次数。
The text you linked to is a good start (you can also use the
SAMPLE
command in SPSS, but IMO the Raynald script you linked to is more flexible when you think about constructing the sample that way).In pseudo-code, the process might look like;
Here is where SPSS's macro language comes into play (I think this document is a good introduction, plus you can examine other references on the SPSS tag wiki). Basically once you figure out how to draw the sample and compute the stats you want, you just need to figure out how to write a macro so you can loop through the process (and pass it the sample size parameter). I include the loop 100 times because you want to be able to make some type of estimate about the error associated with each sample size.
If you give an example of how you compute the statistics I may be able to give examples of how to make that into a macro function and loop through the desired number of times.
@安迪W
@Oliver
谢谢你们的建议。我设法使用以下宏找到解决方法......http://www.spsstools.net/Syntax/Bootstrap/GetRandomSampleOfVariousSizeCalcStats.txt 但是,为此我需要复制并粘贴变量将给定组的数据放入新的数据窗口中。那不是什么大问题。为了更进一步,有人知道如何:1/我可以记录其他统计数据,例如标准错误、标准设备等。 2/使用其他分析,最好是判别函数分析,并在新的数据窗口中记录正确分类的百分比,而不是有大量的输出表3/不需要为每个组复制和粘贴变量,这样我就可以运行指定n的宏第 1、2、3 组和第 2 组 x 变量的样本4.
再次感谢。
@Andy W
@Oliver
Thanks for your suggestions guys. Ive managed to find a work around using the following macro from.........http://www.spsstools.net/Syntax/Bootstrap/GetRandomSampleOfVariousSizeCalcStats.txt However, for this I need to copy and paste the variable data for a given group into a new data window. Thats not to much of a problem. To take this further would anyone know how: 1/ I could get other statistics recorded eg std error, std dev ect ect. 2/Use other analysis, ideally discriminant function analysis and record in a new data window the percentage of correct classificcations rather than having lots of output tables 3/not need to copy and paste variables for each group so I can just run the macro specifying n samples for x variable on group 1, 2, 3 & 4.
Thanks again.