随机选择增加的数据子集以查看均值趋于平稳的位置

发布于 2024-11-28 01:16:31 字数 694 浏览 3 评论 0原文

有人可以建议执行以下操作的最佳方法吗？

我有三个变量（X、Y 和 Z）和四个组（1、2、3 和 4）。我一直在 SPSS 中使用判别函数分析来预测已知分组数据的组成员资格，以便与未来的未分组数据一起使用。

理想情况下，我希望能够对越来越多的数据子集进行随机采样，以了解需要多少观察值才能达到所需的正确分类百分比。

但是，我知道这可能很困难。因此，我正在寻求这样做的方法。

例如，假设第 1 组变量 X 的平均值为 141。该平均值可能是根据 2000 个观察值计算得出的。然而，也可能出现这样的情况：平均值出现在 700 个观测值中。我希望能够计算出我的数据中观察/案例数量的平均水平。例如，也许从 10 个观察开始，随机重复 50 或 100 次，然后增加到 20 个观察......等等。

我知道这是蒙特卡罗测试的一种形式。我可以使用 SPSS 15、17 和 18 以及 Excel。我还可以访问 minitab 15 & 16 和 amos17 并下载了“R”，但我对这些不熟悉。我的经验是使用 SPSS 和 Excel。我已经在 SPSS 中尝试了一些语法，修改自此..http://pages.infinit.net/rlevesqu/Syntax/RandomSampling/Select2CasesFromEachGroup.txt，但这对我来说输入子集编号等仍然相当耗时。

希望有人可以提供帮助。

感谢您的阅读。

安迪

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雾里花 2024-12-05 01:16:31

您链接到的文本是一个好的开始（您还可以在 SPSS 中使用 SAMPLE 命令，但在我看来，当您考虑以这种方式构建示例时，您链接到的 Raynald 脚本更加灵活）。

在伪代码中，该过程可能如下所示：

do n for sample size (a to b)
    loop 100 times
        draw sample size n
        compute (& save) statistics

这就是 SPSS 的宏语言发挥作用的地方（我认为此文档是一个很好的介绍，此外您还可以检查 SPSS 标签上的其他参考资料维基）。基本上，一旦您弄清楚如何绘制样本并计算所需的统计数据，您只需要弄清楚如何编写宏，以便可以循环执行该过程（并向其传递样本大小参数）。我将循环包含 100 次，因为您希望能够对与每个样本大小相关的误差进行某种类型的估计。

如果您举例说明如何计算统计数据，我也许可以举例说明如何将其变成宏函数并循环所需的次数。

The text you linked to is a good start (you can also use the SAMPLE command in SPSS, but IMO the Raynald script you linked to is more flexible when you think about constructing the sample that way).

In pseudo-code, the process might look like;

do n for sample size (a to b)
    loop 100 times
        draw sample size n
        compute (& save) statistics

Here is where SPSS's macro language comes into play (I think this document is a good introduction, plus you can examine other references on the SPSS tag wiki). Basically once you figure out how to draw the sample and compute the stats you want, you just need to figure out how to write a macro so you can loop through the process (and pass it the sample size parameter). I include the loop 100 times because you want to be able to make some type of estimate about the error associated with each sample size.

If you give an example of how you compute the statistics I may be able to give examples of how to make that into a macro function and loop through the desired number of times.

回复收藏 0 原文

懵少女 2024-12-05 01:16:31

@安迪W
@Oliver

谢谢你们的建议。我设法使用以下宏找到解决方法......http://www.spsstools.net/Syntax/Bootstrap/GetRandomSampleOfVariousSizeCalcStats.txt 但是，为此我需要复制并粘贴变量将给定组的数据放入新的数据窗口中。那不是什么大问题。为了更进一步，有人知道如何：1/我可以记录其他统计数据，例如标准错误、标准设备等。 2/使用其他分析，最好是判别函数分析，并在新的数据窗口中记录正确分类的百分比，而不是有大量的输出表3/不需要为每个组复制和粘贴变量，这样我就可以运行指定n的宏第 1、2、3 组和第 2 组 x 变量的样本4.

再次感谢。

DEFINE !sample(myvar !TOKENS(1) 
        /nbsampl !TOKENS(1)
        /size !CMDEND).
* myvar = the variable of interest (here we want the mean of salary)
* nbsampl = number of samples.
* size = the size of each samples.

!LET !first='1'
!DO !ss !IN (!size)
!DO !count = 1 !TO !nbsampl.

GET FILE='c:\Program Files\SPSS\employee data.sav'.

COMPUTE draw=uniform(1).
SORT CASES BY draw.
N OF CASES !ss.

COMPUTE samplenb=!count. 
COMPUTE ss=!ss.

AGGREGATE
  /OUTFILE=*
  /BREAK=samplenb
  /!myvar = MEAN(!myvar) /ss=FIRST(ss).

!IF (!first !NE '1') !THEN
ADD FILES /FILE=*  /FILE='c:\temp\sample.sav'.
!IFEND
SAVE OUTFILE='c:\temp\sample.sav'.
!LET !first='0'

!DOEND. 
!DOEND. 

VARIABLE LABEL ss 'Sample size'.
EXAMINE
  VARIABLES=salary BY ss /PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL
  /MISSING=REPORT.

!ENDDEFINE.
* ----------------END OF MACRO ----------------------------------------------.


* Call macro (parameters are number of samples (here 20) and sizes of sample (here 5, 10,15,30,50). 
* Thus 20 samples of size 5.
* Thus 20 samples of size 10, etc.
!sample myvar=salary nbsampl=20 size= 5 10 15 30 50.

@Andy W
@Oliver

Thanks for your suggestions guys. Ive managed to find a work around using the following macro from.........http://www.spsstools.net/Syntax/Bootstrap/GetRandomSampleOfVariousSizeCalcStats.txt However, for this I need to copy and paste the variable data for a given group into a new data window. Thats not to much of a problem. To take this further would anyone know how: 1/ I could get other statistics recorded eg std error, std dev ect ect. 2/Use other analysis, ideally discriminant function analysis and record in a new data window the percentage of correct classificcations rather than having lots of output tables 3/not need to copy and paste variables for each group so I can just run the macro specifying n samples for x variable on group 1, 2, 3 & 4.

Thanks again.

DEFINE !sample(myvar !TOKENS(1) 
        /nbsampl !TOKENS(1)
        /size !CMDEND).
* myvar = the variable of interest (here we want the mean of salary)
* nbsampl = number of samples.
* size = the size of each samples.

!LET !first='1'
!DO !ss !IN (!size)
!DO !count = 1 !TO !nbsampl.

GET FILE='c:\Program Files\SPSS\employee data.sav'.

COMPUTE draw=uniform(1).
SORT CASES BY draw.
N OF CASES !ss.

COMPUTE samplenb=!count. 
COMPUTE ss=!ss.

AGGREGATE
  /OUTFILE=*
  /BREAK=samplenb
  /!myvar = MEAN(!myvar) /ss=FIRST(ss).

!IF (!first !NE '1') !THEN
ADD FILES /FILE=*  /FILE='c:\temp\sample.sav'.
!IFEND
SAVE OUTFILE='c:\temp\sample.sav'.
!LET !first='0'

!DOEND. 
!DOEND. 

VARIABLE LABEL ss 'Sample size'.
EXAMINE
  VARIABLES=salary BY ss /PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL
  /MISSING=REPORT.

!ENDDEFINE.
* ----------------END OF MACRO ----------------------------------------------.


* Call macro (parameters are number of samples (here 20) and sizes of sample (here 5, 10,15,30,50). 
* Thus 20 samples of size 5.
* Thus 20 samples of size 10, etc.
!sample myvar=salary nbsampl=20 size= 5 10 15 30 50.

回复收藏 0 原文

~没有更多了~