是什么使得这两个 R 数据帧不相同?

发布于 2024-08-29 10:12:26 字数 833 浏览 2 评论 0原文

我有两个小数据框,this_txlast_tx。据我所知,它们在各方面都完全相同。 this_tx == last_tx 会产生尺寸相同的框架,均为 TRUEthis_tx %in% last_tx,两个TRUE。目视检查,明显相同。但是当我调用

identical(this_tx, last_tx)

时,我得到一个FALSE。有趣的是,即使

identical(str(this_tx), str(last_tx))

也会返回 TRUE。如果我设置 this_tx <- last_tx,我会得到 TRUE

到底是怎么回事?我对 R 的内部机制没有最深入的了解,但我找不到两个数据帧之间的任何差异。如果相关,则帧中的两个变量都是因素 - 相同的级别,级别的相同数字编码,两者都只是同一原始数据帧的子集。将它们转换为字符向量没有帮助。

背景(因为我也不介意对此提供帮助):我有对患者进行药物治疗的记录。每条治疗记录本质上都指定了一个人和一个日期。第二个表记录了特定治疗期间给予的每种药物和剂量(通常,每次治疗给予几种药物)。我试图确定该人以相同剂量服用相同药物组合的连续时期。

我想出的最好的计划是按时间顺序检查治疗。如果治疗[i] 的药物和剂量组合与治疗[i-1] 的组合相同,则治疗[i] 与治疗[i-1] 属于同一阶段的一部分。当然,如果我无法比较药物/剂量组合,那是正确的。

I have two small data frames, this_tx and last_tx. They are, in every way that I can tell, completely identical. this_tx == last_tx results in a frame of identical dimensions, all TRUE. this_tx %in% last_tx, two TRUEs. Inspected visually, clearly identical. But when I call

identical(this_tx, last_tx)

I get a FALSE. Hilariously, even

identical(str(this_tx), str(last_tx))

will return a TRUE. If I set this_tx <- last_tx, I'll get a TRUE.

What is going on? I don't have the deepest understanding of R's internal mechanics, but I can't find a single difference between the two data frames. If it's relevant, the two variables in the frames are both factors - same levels, same numeric coding for the levels, both just subsets of the same original data frame. Converting them to character vectors doesn't help.

Background (because I wouldn't mind help on this, either): I have records of drug treatments given to patients. Each treatment record essentially specifies a person and a date. A second table has a record for each drug and dose given during a particular treatment (usually, a few drugs are given each treatment). I'm trying to identify contiguous periods during which the person was taking the same combinations of drugs at the same doses.

The best plan I've come up with is to check the treatments chronologically. If the combination of drugs and doses for treatment[i] is identical to the combination at treatment[i-1], then treatment[i] is a part of the same phase as treatment[i-1]. Of course, if I can't compare drug/dose combinations, that's right out.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赏烟花じ飞满天 2024-09-05 10:12:26

一般来说,在这种情况下,尝试 all.equal 会很有用,它会给您一些关于为什么两个对象不相等的信息。

Generally, in this situation it's useful to try all.equal which will give you some information about why two objects are not equivalent.

拥有 2024-09-05 10:12:26

好吧,“请详细说明摩尔!”的疲惫喊声。在这种情况下可能会获胜:

检查 dput() 的输出并发布(如果可能)。 str() 只是总结对象的内容,而 dput() 以某种形式转储所有血淋淋的细节,可以将其复制并粘贴到另一个 R 解释器中以重新生成目的。

Well, the jaded cry of "moar specifics plz!" may win in this case:

Check the output of dput() and post if possible. str() just summarizes the contents of an object whilst dput() dumps out all the gory details in a form that may be copied and pasted into another R interpreter to regenerate the object.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文