如何链接爆炸和结构场选择?
dataframe:
from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')
df.show()
# +----------------+
# | col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+
df.printSchema()
# root
# |-- col_name: array (nullable = true)
# | |-- element: struct (containsNull = true)
# | | |-- c1: integer (nullable = true)
# | | |-- c2: integer (nullable = true)
i 爆炸
阵列(结果是类型struct&lt; c1:int,c2:int&gt;
)的列。
然后选择每个结构字段(但是i 选择
两次):
df = df.select(
F.explode('col_name')
).select(
[f'col.{c}' for c in ('c1', 'c2')]
)
df.show()
# +---+---+
# | c1| c2|
# +---+---+
# | 1| 2|
# | 3| 4|
# +---+---+
df.printSchema()
# root
# |-- c1: integer (nullable = true)
# |-- c2: integer (nullable = true)
我知道我可以将第二个选择缩短到'col。*'
。但是我仍然有2个选择。
问题。是否有一种方法可以在爆炸仅选择1个选择后选择结构字段?
由于爆炸的结果具有架构struct&lt; c1:int,c2:int&gt;
,我认为这会起作用...
df = df.select(
[F.explode('col_name')[c] for c in ('c1', 'c2')]
)
分析感受:col
中没有这样的结构字段C1
The dataframe:
from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')
df.show()
# +----------------+
# | col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+
df.printSchema()
# root
# |-- col_name: array (nullable = true)
# | |-- element: struct (containsNull = true)
# | | |-- c1: integer (nullable = true)
# | | |-- c2: integer (nullable = true)
I explode
the array (the result is a column of type struct<c1:int,c2:int>
).
And then select every struct field (but I select
twice):
df = df.select(
F.explode('col_name')
).select(
[f'col.{c}' for c in ('c1', 'c2')]
)
df.show()
# +---+---+
# | c1| c2|
# +---+---+
# | 1| 2|
# | 3| 4|
# +---+---+
df.printSchema()
# root
# |-- c1: integer (nullable = true)
# |-- c2: integer (nullable = true)
I know I can shorten the second select to just 'col.*'
. But I would still have 2 selects.
Question. Is there a method to select struct fields right after the explode with only 1 select?
As the result of the explode has schema struct<c1:int,c2:int>
, I thought this would work...
df = df.select(
[F.explode('col_name')[c] for c in ('c1', 'c2')]
)
AnalysisException: No such struct field c1 in col
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用魔术
Use the magic inline