如何链接爆炸和结构场选择?

发布于 2025-02-11 09:39:05 字数 1457 浏览 1 评论 0原文

dataframe:

from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')

df.show()
# +----------------+
# |        col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+

df.printSchema()
# root
#  |-- col_name: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- c1: integer (nullable = true)
#  |    |    |-- c2: integer (nullable = true)

i 爆炸阵列(结果是类型struct&lt; c1:int,c2:int&gt;)的列。
然后选择每个结构字段(但是i 选择两次)

df = df.select(
    F.explode('col_name')
).select(
    [f'col.{c}' for c in ('c1', 'c2')]
)
df.show()
# +---+---+
# | c1| c2|
# +---+---+
# |  1|  2|
# |  3|  4|
# +---+---+

df.printSchema()
# root
#  |-- c1: integer (nullable = true)
#  |-- c2: integer (nullable = true)

我知道我可以将第二个选择缩短到'col。*'。但是我仍然有2个选择。

问题。是否有一种方法可以在爆炸仅选择1个选择后选择结构字段

由于爆炸的结果具有架构struct&lt; c1:int,c2:int&gt;,我认为这会起作用...

df = df.select(
    [F.explode('col_name')[c] for c in ('c1', 'c2')]
)

分析感受:col

中没有这样的结构字段C1

The dataframe:

from pyspark.sql import functions as F
df = spark.createDataFrame([([(1, 2), (3, 4)],)], 'col_name array<struct<c1:int,c2:int>>')

df.show()
# +----------------+
# |        col_name|
# +----------------+
# |[{1, 2}, {3, 4}]|
# +----------------+

df.printSchema()
# root
#  |-- col_name: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- c1: integer (nullable = true)
#  |    |    |-- c2: integer (nullable = true)

I explode the array (the result is a column of type struct<c1:int,c2:int>).
And then select every struct field (but I select twice):

df = df.select(
    F.explode('col_name')
).select(
    [f'col.{c}' for c in ('c1', 'c2')]
)
df.show()
# +---+---+
# | c1| c2|
# +---+---+
# |  1|  2|
# |  3|  4|
# +---+---+

df.printSchema()
# root
#  |-- c1: integer (nullable = true)
#  |-- c2: integer (nullable = true)

I know I can shorten the second select to just 'col.*'. But I would still have 2 selects.

Question. Is there a method to select struct fields right after the explode with only 1 select?

As the result of the explode has schema struct<c1:int,c2:int>, I thought this would work...

df = df.select(
    [F.explode('col_name')[c] for c in ('c1', 'c2')]
)

AnalysisException: No such struct field c1 in col

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

長街聽風 2025-02-18 09:39:05

使用魔术

df.selectExpr('inline(col_name)').show()

+---+---+
| c1| c2|
+---+---+
|  1|  2|
|  3|  4|
+---+---+

Use the magic inline

df.selectExpr('inline(col_name)').show()

+---+---+
| c1| c2|
+---+---+
|  1|  2|
|  3|  4|
+---+---+
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文