pyspark from_json失败了错误:不能以JSON格式解析架构:未识别的令牌' array':warde tresgess(json string,number,array,array)
parquet_path =/tmp/test-parquet
t2.json的内容是:
{
"id": "OK_good2",
"some-array": [
{"array-field-1":"f1a","array-field-2":"f2a"},
{"array-field-1":"f1b","array-field-2":"f2b"}
]
}
从 t2.json
创建dataframe
df = spark.read.json('t2.json')
df = df.withColumn('some-array', col('some-array').cast('string'))
df.write.mode("overwrite").parquet(parquet_path)
:读取的架构
schema = dict(df.dtypes)['some-array'] # o/p array<struct<array-field-1:string,array-field-2:string>>
从 parquet_path
final_df = spark.read.parquet(parquet_path)
final_df.select('some-array').show(3, False)
+------------------------+
|some-array |
+------------------------+
|[{f1a, f2a}, {f1b, f2b}]|
+------------------------+
:在尝试使用 from_json
进行失败的情况下,尝试获得相同的JSON模式。我无法弄清楚原因。请提供一些帮助。
final_df.select(from_json(col('some-array'), 'array<struct<array-field-1:string,array-field-2:string>>', {'allowUnquotedFieldNames':True}).
alias('json1')).show(2, False)
抛出错误:
AnalysisException: Cannot parse the schema in JSON format: Unrecognized token 'array': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (String)"array<struct<array-field-1:string,array-field-2:string>>"; line: 1, column: 6]
Failed fallback parsing: Cannot parse the data type:
如果有人有兴趣,我正在尝试遵循此 post
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的代码中的一些问题。
因此,当您读取保存的镶木quet文件时,只会看到此内容。
而不是。 (这是您期望的。)
将数组结构施放为JSON字符串的正确方法是使用
to_json
。请检查
df.show()
或df.take(1)
以查看差异。A few issues in your code.
So, when you are reading the parquet file that is saved, you only see this.
and NOT. (This is what you expect to have.)
The proper way to cast the array structure to JSON string is to use
to_json
.Please check
df.show()
ordf.take(1)
to see the difference.