如何链接爆炸和结构场选择？

发布于 2025-02-11 09:39:05 字数 1457 浏览 1 评论 0原文

dataframe：

from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')

df.show()
# +----------------+
# |        col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+

df.printSchema()
# root
#  |-- col_name: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- c1: integer (nullable = true)
#  |    |    |-- c2: integer (nullable = true)

i 爆炸阵列（结果是类型struct＆lt; c1：int，c2：int＆gt;）的列。
然后选择每个结构字段（但是i 选择两次）：

df = df.select(
    F.explode('col_name')
).select(
    [f'col.{c}' for c in ('c1', 'c2')]
)

df.show()
# +---+---+
# | c1| c2|
# +---+---+
# |  1|  2|
# |  3|  4|
# +---+---+

df.printSchema()
# root
#  |-- c1: integer (nullable = true)
#  |-- c2: integer (nullable = true)

我知道我可以将第二个选择缩短到'col。*'。但是我仍然有2个选择。

问题。是否有一种方法可以在爆炸仅选择1个选择后选择结构字段？

由于爆炸的结果具有架构struct＆lt; c1：int，c2：int＆gt;，我认为这会起作用...

df = df.select(
    [F.explode('col_name')[c] for c in ('c1', 'c2')]
)

分析感受：col
中没有这样的结构字段C1

原文

The dataframe:

from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')

df.show()
# +----------------+
# |        col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+

df.printSchema()
# root
#  |-- col_name: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- c1: integer (nullable = true)
#  |    |    |-- c2: integer (nullable = true)

I explode the array (the result is a column of type struct<c1:int,c2:int>).
And then select every struct field (but I select twice):

df = df.select(
    F.explode('col_name')
).select(
    [f'col.{c}' for c in ('c1', 'c2')]
)

df.show()
# +---+---+
# | c1| c2|
# +---+---+
# |  1|  2|
# |  3|  4|
# +---+---+

df.printSchema()
# root
#  |-- c1: integer (nullable = true)
#  |-- c2: integer (nullable = true)

I know I can shorten the second select to just 'col.*'. But I would still have 2 selects.

Question. Is there a method to select struct fields right after the explode with only 1 select?

As the result of the explode has schema struct<c1:int,c2:int>, I thought this would work...

df = df.select(
    [F.explode('col_name')[c] for c in ('c1', 'c2')]
)

AnalysisException: No such struct field c1 in col

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

長街聽風 2025-02-18 09:39:05

使用魔术

df.selectExpr('inline(col_name)').show()

+---+---+
| c1| c2|
+---+---+
|  1|  2|
|  3|  4|
+---+---+

Use the magic inline

df.selectExpr('inline(col_name)').show()

+---+---+
| c1| c2|
+---+---+
|  1|  2|
|  3|  4|
+---+---+

回复收藏 0 原文

~没有更多了~

关于作者

浪荡不羁

暂无简介

文章

26 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

如何链接爆炸和结构场选择？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何链接爆炸和结构场选择？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。