在配置单元中支持外部表的数组列类型的最佳方法是什么?

发布于 2024-11-14 00:42:08 字数 580 浏览 1 评论 0原文

所以我有制表符分隔数据的外部表。一个简单的表格如下所示:

create external table if not exists categories
(id string, tag string, legid string, image string, parent string, created_date string, time_stamp int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://somewhere/';

现在我要在末尾添加另一个字段,它将是一个以逗号分隔的值列表。

有没有一种方法可以像我指定字段终止符一样指定它,或者我是否必须依赖其中一个 Serdes?

例如:(

...list_of_names ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ARRAY ELEMENTS SEPARATED BY ','
...

我假设我需要为此使用 Serde,但我认为询问没有任何坏处)

So i have external tables of tab delimited data. A simple table looks like this:

create external table if not exists categories
(id string, tag string, legid string, image string, parent string, created_date string, time_stamp int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://somewhere/';

Now I'm adding another field to the end, it will be a comma separated list of values.

Is there a way to specify this in the same way that I specify a field terminator, or do I have to rely on one of the serdes?

eg:

...list_of_names ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ARRAY ELEMENTS SEPARATED BY ','
...

(I'm assuming I'll need to use a serde for this, but I figured there wasn't any harm in asking)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤芳又自赏 2024-11-21 00:42:08

我不知道如何更新现有表来做到这一点,但为了创建一个表;您可以在 https://cwiki 深入找到您要查找的内容。 apache.org/confluence/display/Hive/LanguageManual+DDL

row_format
  : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

我们创建表的一个示例是

CREATE TABLE IF NOT EXISTS visits
(
    ... Columns Removed...
)
    PARTITIONED BY (userdate STRING)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '\001'
        COLLECTION ITEMS TERMINATED BY '\002'
        MAP KEYS TERMINATED BY '\003'
    STORED AS TEXTFILE
;

要查找的行是数组的COLLECTION ITEMS TERMINATED BY char

I don't know how to update an existing table to do that, but for creating a table; what you are looking for can be found, in depth, at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL.
A snippet from there

row_format
  : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

An example from our table creation is

CREATE TABLE IF NOT EXISTS visits
(
    ... Columns Removed...
)
    PARTITIONED BY (userdate STRING)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '\001'
        COLLECTION ITEMS TERMINATED BY '\002'
        MAP KEYS TERMINATED BY '\003'
    STORED AS TEXTFILE
;

The line from that you'd be looking for is the COLLECTION ITEMS TERMINATED BY char for an array.

hth

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文