使用 Stata 进行聚合的 Fisher 精确检验
我有一组如下数据:
A B C D
1 2 3 4
2 3 4 5
它们是聚合数据,ABCD构成2x2表,我需要对每一行进行Fisher精确检验,并为该行的Fisher精确检验的p值添加一个新列。
我可以使用fisher.exact
和循环在R中执行此操作,但我在Stata中找不到用于Fisher精确测试的命令。
I have a set of data like below:
A B C D
1 2 3 4
2 3 4 5
They are aggregated data which ABCD constitutes a 2x2 table, and I need to do Fisher exact test on each row, and add a new column for the p-value of the Fisher exact test for that row.
I can use fisher.exact
and loop to do it in R, but I can't find a command in Stata for Fisher exact test.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你用 R 术语思考,而这在 Stata 中通常是徒劳的(就像 Stata 人员不可能弄清楚如何在 R 中
by ... : regress
一样;每个包都有它的自己的范式和自己的优势)。没有要添加列的对象。也许您可以更多地说明您最终需要对 p 值做什么,以便找到您的 Stata 合作者会同情的合适解决方案。
如果您确实想添加一个新列(
生成
一个新变量,用 Stata 来说),那么您可能需要查看tabulate
及其返回值:我假设您的
ABC D
代表两个二元变量,数字是数据中的频率。您必须清除
内存,因为 Stata 一次只考虑一个数据集。然后,您可以
制表
结果并生成
包含p值的新变量,尽管创建包含常量值的变量会浪费大量内存:这里,< code>[fw=variable] 是指定频率权重的一种方式;我输入了
return list
来了解 Stata 将什么样的信息存储为过程结果。 这就是 Stata 所使用的类似对象的东西。 R 将在fisher.test()$p.value
组件中返回测试结果,Stata 创建返回值,r(component)
用于简单命令,e(component)
用于估计命令。如果你想要一个循环解决方案(如果你有很多集合),你可以这样做:
恕我直言,这就是 Stata 的脚本编写能力,比 R 强得多(尽管可以说这是一个极其肮脏的编程技巧)。本地宏
k
的取值范围为 1 到 2,并且该宏在大括号内的代码段中的任何位置都被替换为“k”。或者,您可以将结果作为标量保存在 Stata 短期内存中:
但是,标量与数据集无关,因此您无法使用
数据。
像
cci
这样的直接命令建议here 也会返回您可以类似检索的值。HTH、斯塔斯
You are thinking in R terms, and that is often fruitless in Stata (just as it is impossible for a Stata guy to figure out how to do
by ... : regress
in R; every package has its own paradigm and its own strengths).There are no objects to add columns to. May be you could say a little bit more as to what you need to do, eventually, with your p-values, so as to find an appropriate solution that your Stata collaborators would sympathize with.
If you really want to add a new column (
generate
a new variable, speaking Stata), then you might want to look attabulate
and its returned values:I assume that your
A B C D
stand for two binary variables, and the numbers are frequencies in the data. You have toclear
the memory, as Stata thinks about one data set at a time.Then you could
tabulate
the results andgenerate
new variables containing p-values, although that would be a major waste of memory to create variables that contain a constant value:Here,
[fw=variable]
is a way to specify frequency weights; I typedreturn list
to find out what kind of information Stata stores as the result of the procedure. THAT'S the object-like thing Stata works with. R would return the test results in thefisher.test()$p.value
component, and Stata creates returned values,r(component)
for simple commands ande(component)
for estimation commands.If you want a loop solution (if you have many sets), you can do this:
That's the scripting capacity in which Stata, IMHO, is way stronger than R (although it can be argued that this is an extremely dirty programming trick). The local macro
k
takes values from 1 to 2, and this macro is substituted as ``k'` everywhere in the curly bracketed piece of code.Alternatively, you can keep the results in Stata short term memory as scalars:
However, the scalars are not associated with the data set, so you cannot save them with the
data.
The immediate commands like
cci
suggested here would also have returned values that you can similarly retrieve.HTH, Stas
看一下带有
exact
选项的cci
命令:它是所谓的“立即”命令的一部分。它们允许您直接根据参数而不是根据存储在内存中的数据进行计算。查看
立即帮助
Have a look the
cci
command with theexact
option:It is part of the so-called "immediate" commands. They allow you to do computations directly from the arguments rather than from data stored in memory. Have a look at
help immediate
发帖者原始问题中的每个观察结果显然都由一个传统 2 x 2 表中的四个计数组成。斯塔斯代码适用于个人观察数据。 Nick指出-cci-可以分析abcd数据。下面的代码将 -cci 应用于每个表,并且与 Stas 的代码一样,将 p 值添加到数据集。
forvalues i = 1/`=_N'
语句告诉 Stata 从第一个观察到最后一个观察运行循环。a[`i']
指的是第 i 个观察中变量“a”的值。请注意,为局部宏指定与变量相同的名称是没有问题的。
Each observation in the poster's original question apparently consisted of the four counts in one traditional 2 x 2 table. Stas's code applied to data of individual observations. Nick pointed out that -cci- can analyze a b c d data. Here's code that applies -cci to each table and, like Stas's code, adds the p-values to the data set. The
forvalues i = 1/`=_N'
statement tells Stata to run the loop from the first to the last observation.a[`i']
refers to the the value of the variable `a' in the i-th observation.Note that there is no problem in giving a local macro the same name as a variable.