Scala Spark：如何从Parquet文件中提取嵌套的列名并在其上添加前缀

发布于 2025-02-08 03:54:18 字数 889 浏览 0 评论 0原文

这个想法是在数据框架中读取一个镶木quet文件。然后，从架构中提取所有列名和类型。如果我们有一个嵌套的列，我想在列名之前添加一个“前缀”。

考虑到我们可以具有正确命名的子列的嵌套列，并且我们还可以拥有一个嵌套的列，只有一个数组，没有列名，而是“元素”。

val dfSource: DataFrame = spark.read.parquet("path.parquet")
val dfSourceSchema: StructType = dfSource.schema

dfsourceschema的示例（输入）：

 |-- exCar: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: binary (nullable = true)
 |-- exProduct: string (nullable = true)
 |-- exName: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- exNameOne: string (nullable = true)
 |    |    |-- exNameTwo: string (nullable = true)

所需的输出：

((exCar.prefix.prefix,binary)),(exProduct, String), (exName.prefix.exNameOne, String), (exName.prefix.exNameTwo, String) )

原文

The idea is to read a parquet file into dataFrame. Then, extract all column name's and type's from it's schema. If we have a nested columns, i would like to add a "prefix" before the column name.

Considering that we can have a nested column with sub column named properly, and we can have also a nested column with just an array of array without column name but "element".

val dfSource: DataFrame = spark.read.parquet("path.parquet")
val dfSourceSchema: StructType = dfSource.schema

Example of dfSourceSchema (Input):

 |-- exCar: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: binary (nullable = true)
 |-- exProduct: string (nullable = true)
 |-- exName: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- exNameOne: string (nullable = true)
 |    |    |-- exNameTwo: string (nullable = true)

Desired output :

((exCar.prefix.prefix,binary)),(exProduct, String), (exName.prefix.exNameOne, String), (exName.prefix.exNameTwo, String) )

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

过期以后

暂无简介

文章

28 人气

关注发私信

alipaysp_snBf0MSZIv

文章 0 评论 0

关注

梦断已成空

文章 0 评论 0

关注

瞎闹

文章 0 评论 0

关注

凯凯我们等你回来

文章 0 评论 0

关注

寄意

文章 0 评论 0

关注

似梦非梦

文章 0 评论 0

友情链接

文江博客

Scala Spark：如何从Parquet文件中提取嵌套的列名并在其上添加前缀

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

Scala Spark：如何从Parquet文件中提取嵌套的列名并在其上添加前缀

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。