pyspark Hive SQL转换阵列(Map(varchar,varchar))到串行的字符串

发布于 2025-02-13 09:19:29 字数 1554 浏览 0 评论 0原文

我想

   array(map(varchar, varchar))

从Jupyter Notebook Python3中编程地将Prestpark Hive SQL在Presto db上的表行转换为Presto DB上的一行。

示例

user_id     sport_ids
 'aca'       [ {'sport_id': '5818'}, {'sport_id': '6712'}, {'sport_id': '1065'} ]

的结果

  user_id.    sport_ids
  'aca'.          '5815'
  'aca'.          '5712'
  'aca'.          '1065'

我尝试过

     sql_q= """
            select distinct, user_id, transform(sport_ids, x -> element_at(x, 'sport_id')
            from tab """
            
     spark.sql(sql_q)

,但出现了错误:

   '->' cannot be resolved  

我也尝试过,

  sql_q= """
            select distinct, user_id, sport_ids
            from tab"""
            
     spark.sql(sql_q)

但出错了:

    org.apache.spark.sql.AnalysisException: Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column request_features[0] is map<string,string>;;

我错过了什么吗?

我尝试过,但是很有帮助 hive convert convert arnay&lt arnay&lt and string,string string,string&gt;&gt;&gt;到字符串 提取地图(varchar,array(varray(var charh))-Hive sql

谢谢

I would like to transform a column of

   array(map(varchar, varchar))

to string as rows of a table on presto db by pyspark hive sql programmatically from jupyter notebook python3.

example

user_id     sport_ids
 'aca'       [ {'sport_id': '5818'}, {'sport_id': '6712'}, {'sport_id': '1065'} ]

expected results

  user_id.    sport_ids
  'aca'.          '5815'
  'aca'.          '5712'
  'aca'.          '1065'

I have tried

     sql_q= """
            select distinct, user_id, transform(sport_ids, x -> element_at(x, 'sport_id')
            from tab """
            
     spark.sql(sql_q)

but got error:

   '->' cannot be resolved  

I have also tried

  sql_q= """
            select distinct, user_id, sport_ids
            from tab"""
            
     spark.sql(sql_q)

but got error:

    org.apache.spark.sql.AnalysisException: Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column request_features[0] is map<string,string>;;

Did I miss something ?

I have tried this, but helpful
hive convert array<map<string, string>> to string
Extract map(varchar, array(varchar)) - Hive SQL

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

以可爱出名 2025-02-20 09:19:29

让我们尝试使用高阶功能查找地图值并爆炸到单个行中

df.withColumn('sport_ids', explode(expr("transform(sport_ids, x->map_values(x)[0])"))).show()


+-------+---------+
|user_id|sport_ids|
+-------+---------+
|    aca|     5818|
|    aca|     6712|
|    aca|     1065|
+-------+---------+

Lets try use higher order functions to find map values and explode into individual rows

df.withColumn('sport_ids', explode(expr("transform(sport_ids, x->map_values(x)[0])"))).show()


+-------+---------+
|user_id|sport_ids|
+-------+---------+
|    aca|     5818|
|    aca|     6712|
|    aca|     1065|
+-------+---------+
何处潇湘 2025-02-20 09:19:29

您可以处理JSON数据(JSON_PARSE,铸造为JSON和JSON_EXTRACT_SCALAR - 有关更多JSON函数 - 请参阅此处)和扁平( unnest

-- sample data
WITH dataset(user_id, sport_ids) AS (
    VALUES 
        ('aca', '[ {"sport_id": "5818"}, {"sport_id": "6712"}, {"sport_id": "1065"} ]')
) 

-- query
select user_id,
    json_extract_scalar(record, '$.sport_id') sport_id
from dataset,
    unnest(cast(json_parse(sport_ids) as array(json))) as t(record)

输出:

USER_IDSport_ID
ACA5818
ACA6712
ACA1065

You can process json data (json_parse, cast to array of json and json_extract_scalar - for more json functions - see here) and flatten (unnest) on presto side:

-- sample data
WITH dataset(user_id, sport_ids) AS (
    VALUES 
        ('aca', '[ {"sport_id": "5818"}, {"sport_id": "6712"}, {"sport_id": "1065"} ]')
) 

-- query
select user_id,
    json_extract_scalar(record, '$.sport_id') sport_id
from dataset,
    unnest(cast(json_parse(sport_ids) as array(json))) as t(record)

Output:

user_idsport_id
aca5818
aca6712
aca1065
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文