Hive:将列标题写入本地文件?

发布于 2024-11-01 08:00:56 字数 248 浏览 7 评论 0原文

Hive 文档再次缺失:

我想将查询结果以及列名写入本地文件。

Hive 支持这个吗?

Insert overwrite local directory 'tmp/blah.blah' select * from table_name;

另外,还有一个问题:StackOverflow 是获取 Hive 帮助的最佳场所吗? @Nija,非常有帮助,但我不想继续打扰他们......

Hive documentation lacking again:

I'd like to write the results of a query to a local file as well as the names of the columns.

Does Hive support this?

Insert overwrite local directory 'tmp/blah.blah' select * from table_name;

Also, separate question: Is StackOverflow the best place to get Hive Help? @Nija, has been very helpful, but I don't to keep bothering them...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

糖果控 2024-11-08 08:00:57

这是我的看法。注意,我不太熟悉 bash,所以欢迎提出改进建议:)

#!/usr/bin/env bash

# works like this:
# ./get_data.sh database.table > data.csv

INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}

HEADER=`hive -e "
  set hive.cli.print.header=true;
  use $DB;
  INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
  row format delimited
  fields terminated  by ','
  SELECT * FROM $TABLE;"`

HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*

Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)

#!/usr/bin/env bash

# works like this:
# ./get_data.sh database.table > data.csv

INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}

HEADER=`hive -e "
  set hive.cli.print.header=true;
  use $DB;
  INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
  row format delimited
  fields terminated  by ','
  SELECT * FROM $TABLE;"`

HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*
傲世九天 2024-11-08 08:00:56

尝试

set hive.cli.print.header=true;

Try

set hive.cli.print.header=true;
愛放△進行李 2024-11-08 08:00:56

是的,你可以。将 set hive.cli.print.header=true; 放入主目录中的 .hiverc 文件或任何其他 Hive 用户属性文件中。

模糊警告:要小心,因为这使我过去的查询崩溃了(但我不记得原因了)。

Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.

Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).

那伤。 2024-11-08 08:00:56

事实上,@nija 的答案是正确的——至少据我所知。在执行插入覆盖[本地]目录...时,没有任何方法可以写入列名称(无论您是否使用本地)。

关于 @user1735861 描述的崩溃,hive 0.7.1 中存在一个已知错误(已在 0.8.0 中修复),在执行 set hive 后.cli.print.header=true;,对于任何不产生输出的 HQL 命令/查询都会导致 NullPointerException。例如:

$ hive -S
hive> use default; 
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

虽然这很好:

$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive> 

但非 HQL 命令也很好(setdfs ! 等...)

更多信息请参见此处:https://issues.apache.org/jira/browse/HIVE-2334

Indeed, @nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).

With regards to the crashes described by @user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:

$ hive -S
hive> use default; 
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Whereas this is fine:

$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive> 

Non-HQL commands are fine though (set,dfs !, etc...)

More info here: https://issues.apache.org/jira/browse/HIVE-2334

半窗疏影 2024-11-08 08:00:56

Hive 确实支持写入本地目录。您的语法看起来也适合它。
请查看有关 SELECTS 和 FILTERS 的文档以获取更多信息。

我认为 Hive 没有办法将列名写入您正在运行的查询的文件中。 。 。我不能肯定地说它不会,但我不知道有什么办法。

我认为对于 Hive 问题来说,唯一比 SO 更好的地方是邮件列表

Hive does support writing to the local directory. You syntax looks right for it as well.
Check out the docs on SELECTS and FILTERS for additional information.

I don't think Hive has a way to write the names of the columns to a file for the query you're running . . . I can't say for sure it doesn't, but I do not know of a way.

I think the only place better than SO for Hive questions would be the mailing list.

夏了南城 2024-11-08 08:00:56

我今天遇到了这个问题,并且能够通过在原始查询和创建标题行的新虚拟查询之间执行 UNION ALL 来获得我需要的内容。我在每个部分上添加了一个排序列,并将标题设置为 0,将数据设置为 1,这样我就可以按该字段排序并确保标题行出现在顶部。

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

它有点笨重,但至少您可以通过单个查询获得所需的内容。

希望这有帮助!

I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

It's a little bulky, but at least you can get what you need with a single query.

Hope this helps!

深居我梦 2024-11-08 08:00:56

这不是一个很好的解决方案,但这就是我所做的:

create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;

hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt

cat header.txt 000* > all.dat

Not a great solution, but here is what I do:

create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;

hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt

cat header.txt 000* > all.dat
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文