使用 Stata 进行聚合的 Fisher 精确检验

发布于 2024-11-28 08:44:51 字数 218 浏览 1 评论 0原文

我有一组如下数据：

A B C D
1 2 3 4
2 3 4 5

它们是聚合数据，ABCD构成2x2表，我需要对每一行进行Fisher精确检验，并为该行的Fisher精确检验的p值添加一个新列。

我可以使用fisher.exact和循环在R中执行此操作，但我在Stata中找不到用于Fisher精确测试的命令。

原文

I have a set of data like below:

A B C D
1 2 3 4
2 3 4 5

They are aggregated data which ABCD constitutes a 2x2 table, and I need to do Fisher exact test on each row, and add a new column for the p-value of the Fisher exact test for that row.

I can use fisher.exact and loop to do it in R, but I can't find a command in Stata for Fisher exact test.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凹づ凸ル 2024-12-05 08:44:51

你用 R 术语思考，而这在 Stata 中通常是徒劳的（就像 Stata 人员不可能弄清楚如何在 R 中by ... : regress 一样；每个包都有它的自己的范式和自己的优势）。

没有要添加列的对象。也许您可以更多地说明您最终需要对 p 值做什么，以便找到您的 Stata 合作者会同情的合适解决方案。

如果您确实想添加一个新列（生成一个新变量，用 Stata 来说），那么您可能需要查看 tabulate 及其返回值：

    clear
    input x y f1 f2
    0 0 5 10
    0 1 7 12
    1 0 3 8
    1 1 9 5
    end

我假设您的ABC D 代表两个二元变量，数字是数据中的频率。您必须清除内存，因为 Stata 一次只考虑一个数据集。

然后，您可以制表结果并生成包含p值的新变量，尽管创建包含常量值的变量会浪费大量内存：

    tabulate x y [fw=f1], exact
    return list
    generate p1 = r(p_exact)
    tabulate x y [fw=f2], exact
    generate p2 = r(p_exact)

这里，< code>[fw=variable] 是指定频率权重的一种方式；我输入了 return list 来了解 Stata 将什么样的信息存储为过程结果。 这就是 Stata 所使用的类似对象的东西。 R 将在 fisher.test()$p.value 组件中返回测试结果，Stata 创建返回值，r(component) 用于简单命令，e(component) 用于估计命令。

如果你想要一个循环解决方案（如果你有很多集合），你可以这样做：

   forvalues k=1/2 {
    tabulate x y [fw=f`k'], exact
    generate p`k' = r(p_exact)
   }

恕我直言，这就是 Stata 的脚本编写能力，比 R 强得多（尽管可以说这是一个极其肮脏的编程技巧）。本地宏 k 的取值范围为 1 到 2，并且该宏在大括号内的代码段中的任何位置都被替换为“k”。

或者，您可以将结果作为标量保存在 Stata 短期内存中：

    tabulate x y [fw=f1], exact
    scalar p1 = r(p_exact)
    tabulate x y [fw=f2], exact
    scalar p2 = r(p_exact)

但是，标量与数据集无关，因此您无法使用
数据。

像cci这样的直接命令建议here 也会返回您可以类似检索的值。

HTH、斯塔斯

You are thinking in R terms, and that is often fruitless in Stata (just as it is impossible for a Stata guy to figure out how to do by ... : regress in R; every package has its own paradigm and its own strengths).

There are no objects to add columns to. May be you could say a little bit more as to what you need to do, eventually, with your p-values, so as to find an appropriate solution that your Stata collaborators would sympathize with.

If you really want to add a new column (generate a new variable, speaking Stata), then you might want to look at tabulate and its returned values:

    clear
    input x y f1 f2
    0 0 5 10
    0 1 7 12
    1 0 3 8
    1 1 9 5
    end

I assume that your A B C D stand for two binary variables, and the numbers are frequencies in the data. You have to clear the memory, as Stata thinks about one data set at a time.

Then you could tabulate the results and generate new variables containing p-values, although that would be a major waste of memory to create variables that contain a constant value:

    tabulate x y [fw=f1], exact
    return list
    generate p1 = r(p_exact)
    tabulate x y [fw=f2], exact
    generate p2 = r(p_exact)

Here, [fw=variable] is a way to specify frequency weights; I typed return list to find out what kind of information Stata stores as the result of the procedure. THAT'S the object-like thing Stata works with. R would return the test results in the fisher.test()$p.value component, and Stata creates returned values, r(component) for simple commands and e(component) for estimation commands.

If you want a loop solution (if you have many sets), you can do this:

   forvalues k=1/2 {
    tabulate x y [fw=f`k'], exact
    generate p`k' = r(p_exact)
   }

That's the scripting capacity in which Stata, IMHO, is way stronger than R (although it can be argued that this is an extremely dirty programming trick). The local macro k takes values from 1 to 2, and this macro is substituted as ``k'` everywhere in the curly bracketed piece of code.

Alternatively, you can keep the results in Stata short term memory as scalars:

    tabulate x y [fw=f1], exact
    scalar p1 = r(p_exact)
    tabulate x y [fw=f2], exact
    scalar p2 = r(p_exact)

However, the scalars are not associated with the data set, so you cannot save them with the
data.

The immediate commands like cci suggested here would also have returned values that you can similarly retrieve.

HTH, Stas

回复收藏 0 原文

捎一片雪花 2024-12-05 08:44:51

看一下带有 exact 选项的 cci 命令：

cci 10 15 30 10, exact

它是所谓的“立即”命令的一部分。它们允许您直接根据参数而不是根据存储在内存中的数据进行计算。查看立即帮助

Have a look the cci command with the exact option:

cci 10 15 30 10, exact

It is part of the so-called "immediate" commands. They allow you to do computations directly from the arguments rather than from data stored in memory. Have a look at help immediate

回复收藏 0 原文

十秒萌定你 2024-12-05 08:44:51

发帖者原始问题中的每个观察结果显然都由一个传统 2 x 2 表中的四个计数组成。斯塔斯代码适用于个人观察数据。 Nick指出-cci-可以分析abcd数据。下面的代码将 -cci 应用于每个表，并且与 Stas 的代码一样，将 p 值添加到数据集。 forvalues i = 1/`=_N' 语句告诉 Stata 从第一个观察到最后一个观察运行循环。 a[`i'] 指的是第 i 个观察中变量“a”的值。

    clear
    input a b c d
    10 2 8 4
    5 8 2 1
    end

    gen exactp1 = .
    gen exactp2 =.
    label var exactp1 "1-sided exact p"
    label var exactp2 "2-sided exact p"
    forvalues i = 1/`=_N'{
      local a = a[`i']
      local b = b[`i']
      local c = c[`i']
      local d = d[`i']
      qui cci `a' `b' `c' `d', exact
      replace exactp1 = r(p1_exact) in `i'
      replace exactp2 = r(p_exact) in `i'
      }
    list

请注意，为局部宏指定与变量相同的名称是没有问题的。

Each observation in the poster's original question apparently consisted of the four counts in one traditional 2 x 2 table. Stas's code applied to data of individual observations. Nick pointed out that -cci- can analyze a b c d data. Here's code that applies -cci to each table and, like Stas's code, adds the p-values to the data set. The forvalues i = 1/`=_N' statement tells Stata to run the loop from the first to the last observation. a[`i'] refers to the the value of the variable `a' in the i-th observation.

    clear
    input a b c d
    10 2 8 4
    5 8 2 1
    end

    gen exactp1 = .
    gen exactp2 =.
    label var exactp1 "1-sided exact p"
    label var exactp2 "2-sided exact p"
    forvalues i = 1/`=_N'{
      local a = a[`i']
      local b = b[`i']
      local c = c[`i']
      local d = d[`i']
      qui cci `a' `b' `c' `d', exact
      replace exactp1 = r(p1_exact) in `i'
      replace exactp2 = r(p_exact) in `i'
      }
    list

Note that there is no problem in giving a local macro the same name as a variable.

回复收藏 0 原文

~没有更多了~