如何“爆炸”?单元格上的数据格式化为字符串,以便我可以将键转换为 pyspark 上的列?
我正在使用的数据集之一有一个名为 json_data 的列,其中包含如下数据:
{
"eta": "",
"eta_value": 0,
"schedules": [{
"open_time": "10:15:00",
"close_time": "14:00:00"
}, {
"open_time": "18:00:00",
"close_time": "20:00:00"
}],
"logo": "1617723892776.png",
"score_v2": 0,
"id": "900371722_8339714",
"store_id": 900371722,
"super_store_id": 900371722,
"index": 375,
"brand_name": "Carl's Restaurant",
"store_type": "restaurant",
"has_promise": false,
"tags": [189],
"background": "1618349497.jpg",
"is_enabled": false,
"friendly_url": {
"store_id": 90037172
}
}
该列是字符串类型,这意味着我无法轻松地将其中的信息转换为列。这就是我来到这里的原因:如何将数据转换为列?特别是“时间表”内的嵌套数据。
我在这个专栏上遇到了困难。
One of the datasets I'm working with has a column called json_data, which contains data like this:
{
"eta": "",
"eta_value": 0,
"schedules": [{
"open_time": "10:15:00",
"close_time": "14:00:00"
}, {
"open_time": "18:00:00",
"close_time": "20:00:00"
}],
"logo": "1617723892776.png",
"score_v2": 0,
"id": "900371722_8339714",
"store_id": 900371722,
"super_store_id": 900371722,
"index": 375,
"brand_name": "Carl's Restaurant",
"store_type": "restaurant",
"has_promise": false,
"tags": [189],
"background": "1618349497.jpg",
"is_enabled": false,
"friendly_url": {
"store_id": 90037172
}
}
This column is a string type, which means I cannot easily turn the info inside it into columns. And that's what brings me here: how can I turn the data here in columns? Specially with the nested data inside "schedules".
I'm having a hard time with this column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
几个月前,我也在为类似的 json 结构而苦苦挣扎。很高兴你提起它,帮助刷新了我的记忆!
我按照以下步骤解决 -
输入数据
将
jsondata
列转换为MapType
如下 -现在,列
cols
需要按如下方式展开 -一旦您将
col_columns
和col_rows
作为单独的列,所需要做的就是对col_columns
进行透视并聚合它使用其对应的第一个col_rows
如下 -输出
PS - 如果您想分解诸如
schedules
、Friendly_url< 等列/code> 那么您可能必须重复上述步骤。如下所示-
Few months back, I was also struggling with similar
json
structure. Glad you brought it up, helped refreshing my memory!I followed the below steps to resolve -
Input Data
Convert
jsondata
column toMapType
as below -Now, column
cols
needs to be exploded as below -Once, you have
col_columns
andcol_rows
as individual columns, all that is needed to do is pivotcol_columns
and aggregate it using its corresponding firstcol_rows
as below -Output
P.S. - If you want to explode the columns like
schedules
,friendly_url
as well then you might have to repeat the above steps. Something as below -