同一 Sparklyr data.frame 上的 sdf_nrow、collect() 和 sdf_collect() 返回不同的表作为输出
我有一个 Sparklyr data.frame patsEnrollment
我使用两种不同的方法将其转换为原生 R data.frame;一种使用 sdf_collect()
,另一种使用 dplyr 中的 collect()
函数。我对这两个输出中的表输出不同这一事实感到非常困惑!不仅如此,如果我在sparklyr data.frame上执行sdf_nrow(),我也会得到不同的计数。这是 R 代码:
patsEnrollment1 <- sdf_collect(patsEnrollment)
patsEnrollment2 <- patsEnrollment %>% collect()
print(nrow(patsEnrollment1))
print(nrow(patsEnrollment2))
print(sdf_nrow(patsEnrollment))
这是我在控制台中得到的输出
> print(nrow(patsEnrollment1))
[1] 1077142
> print(nrow(patsEnrollment2))
[1] 1293446
> print(sdf_nrow(patsEnrollment))
[1] 1213967
理想情况下,它应该返回相同的结果,但三个结果都不同。
更新:
这是一瞥的输出
> glimpse(patsEnrollment)
Rows: ??
Columns: 4
Database: spark_connection
$ enrolid <int64> 2805363, 20596958
$ dobyr <int> 1997, 2004
$ dtstart <dttm> 2009-01-01, 2014-03-01
$ dtend <dttm> 2009-01-31, 2014-03-31
I have a sparklyr data.frame patsEnrollment
I convert this to a native R data.frame using two different methods; one using sdf_collect()
and the other by using collect()
function within dplyr. I am quite baffled by the fact that the table outputs are different in both these outputs! Not only this, if I do sdf_nrow() on the sparklyr data.frame I get a different count too. Here's the R code:
patsEnrollment1 <- sdf_collect(patsEnrollment)
patsEnrollment2 <- patsEnrollment %>% collect()
print(nrow(patsEnrollment1))
print(nrow(patsEnrollment2))
print(sdf_nrow(patsEnrollment))
And this is the output I get in the console
> print(nrow(patsEnrollment1))
[1] 1077142
> print(nrow(patsEnrollment2))
[1] 1293446
> print(sdf_nrow(patsEnrollment))
[1] 1213967
Ideally, it should all return the same result, but all three are different.
UPDATE:
Here's the output of glimpse
> glimpse(patsEnrollment)
Rows: ??
Columns: 4
Database: spark_connection
$ enrolid <int64> 2805363, 20596958
$ dobyr <int> 1997, 2004
$ dtstart <dttm> 2009-01-01, 2014-03-01
$ dtend <dttm> 2009-01-31, 2014-03-31
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论