Pyspark如何从两个数据框中识别不匹配的行值
我有以下两个数据帧,我试图从中识别数据帧二中不匹配的行值。这是迁移的一部分,我希望看到源数据迁移/移动到不同目的地后的差异。
source_df
+---+-----+-----+
|key|val11|val12|
+---+-----+-----+
|abc| 1.1| 1.2|
|def| 3.0| 3.4|
+---+-----+-----+
dest_df
+---+-----+-----+
|key|val11|val12|
+---+-----+-----+
|abc| 2.1| 2.2|
|def| 3.0| 3.4|
+---+-----+-----+
我想看到类似下面的输出
key: abc,
col: val11 val12
difference: [src-1.1,dst:2.1] [src:1.2,dst:2.2]
有解决方案吗?
I have below two data frame from which i am trying to identify the unmatched row value from data frame two. This is the part of migration where i want to see the difference after source data being migrated/moved to different destination.
source_df
+---+-----+-----+
|key|val11|val12|
+---+-----+-----+
|abc| 1.1| 1.2|
|def| 3.0| 3.4|
+---+-----+-----+
dest_df
+---+-----+-----+
|key|val11|val12|
+---+-----+-----+
|abc| 2.1| 2.2|
|def| 3.0| 3.4|
+---+-----+-----+
i want to see the output something like below
key: abc,
col: val11 val12
difference: [src-1.1,dst:2.1] [src:1.2,dst:2.2]
Any solution for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
或者,如果您想要完全采用该格式:
输出:
Or, if you want exactally in that format:
Output: