如何将所有列转换为Pyspark中的行?
我正在尝试将列转换为行并将其加载到数据库中。我的输入是JSON文件。
{"09087":{"values": ["76573433","2222322323","768346865"],"values1": ["7686548898","33256768","09864324567"],"values2": ["234523723","64238793333333","75478393333"],"values3": ["87765","46389333","9234689677"]},"090881": {"values": ["76573443433","22276762322323","7683878746865"],"values1": ["768637676548898","3398776256768","0986456834324567"],"values2": ["23877644523723","64238867658793333333","754788776393333"],"values3": ["87765","46389333","9234689677"]}}
pyspark:
df = spark.read.option(“ multiline”,“ true”)。格式(“ json”)。加载(“ testfile.json”)
Schema:
root
|-- 09087: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 090881: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
data:data:
df.show()
+--------------------+--------------------+
| 09087| 090881|
+--------------------+--------------------+
|{[76573433, 22223...|{[76573443433, 22...|
+--------------------+--------------------+
output:output:
Name values values1 values2 values3
09087 76573433 7686548898 234523723 87765
09087 2222322323 33256768 64238793333333 9234689677
09087 768346865 09864324567 75478393333 46389333
090881 76573443433 768637676548898 23877644523723 87765
090881 22276762322323 3398776256768 64238867658793333333 46389333
090881 7683878746865 0986456834324567 754788776393333 9234689677
实际上我刚刚给了2个列是输入,但我有很多。我一直在尝试这种方法 - 有人可以帮助我。提前致谢。
I am trying to transpose the columns to rows and load it to the data base. My input is the Json file.
{"09087":{"values": ["76573433","2222322323","768346865"],"values1": ["7686548898","33256768","09864324567"],"values2": ["234523723","64238793333333","75478393333"],"values3": ["87765","46389333","9234689677"]},"090881": {"values": ["76573443433","22276762322323","7683878746865"],"values1": ["768637676548898","3398776256768","0986456834324567"],"values2": ["23877644523723","64238867658793333333","754788776393333"],"values3": ["87765","46389333","9234689677"]}}
Pyspark:
df = spark.read.option("multiline", "true").format("json").load("testfile.json")
Schema:
root
|-- 09087: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 090881: struct (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values1: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values2: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- values3: array (nullable = true)
| | |-- element: string (containsNull = true)
Data:
df.show()
+--------------------+--------------------+
| 09087| 090881|
+--------------------+--------------------+
|{[76573433, 22223...|{[76573443433, 22...|
+--------------------+--------------------+
OUTPUT:
Name values values1 values2 values3
09087 76573433 7686548898 234523723 87765
09087 2222322323 33256768 64238793333333 9234689677
09087 768346865 09864324567 75478393333 46389333
090881 76573443433 768637676548898 23877644523723 87765
090881 22276762322323 3398776256768 64238867658793333333 46389333
090881 7683878746865 0986456834324567 754788776393333 9234689677
Actually I just gave 2 columns as input but I have lot of them. I have been trying this- could someone please help me on this. Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的Scala解决方案的Pyspark翻译:
Pyspark translation of my scala solution:
有关
for more info on arrays_zip see here.