Hive：将列标题写入本地文件？

发布于 2024-11-01 08:00:56 字数 248 浏览 7 评论 0原文

Hive 文档再次缺失：

我想将查询结果以及列名写入本地文件。

Hive 支持这个吗？

Insert overwrite local directory 'tmp/blah.blah' select * from table_name;

另外，还有一个问题：StackOverflow 是获取 Hive 帮助的最佳场所吗？ @Nija，非常有帮助，但我不想继续打扰他们......

原文

Hive documentation lacking again:

I'd like to write the results of a query to a local file as well as the names of the columns.

Does Hive support this?

Insert overwrite local directory 'tmp/blah.blah' select * from table_name;

Also, separate question: Is StackOverflow the best place to get Hive Help? @Nija, has been very helpful, but I don't to keep bothering them...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

糖果控 2024-11-08 08:00:57

这是我的看法。注意，我不太熟悉 bash，所以欢迎提出改进建议:)

#!/usr/bin/env bash

# works like this:
# ./get_data.sh database.table > data.csv

INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}

HEADER=`hive -e "
  set hive.cli.print.header=true;
  use $DB;
  INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
  row format delimited
  fields terminated  by ','
  SELECT * FROM $TABLE;"`

HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*

Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)

#!/usr/bin/env bash

# works like this:
# ./get_data.sh database.table > data.csv

INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}

HEADER=`hive -e "
  set hive.cli.print.header=true;
  use $DB;
  INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
  row format delimited
  fields terminated  by ','
  SELECT * FROM $TABLE;"`

HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*

回复收藏 0 原文

傲世九天 2024-11-08 08:00:56

尝试

set hive.cli.print.header=true;

Try

set hive.cli.print.header=true;

回复收藏 0 原文

愛放△進行李 2024-11-08 08:00:56

是的，你可以。将 set hive.cli.print.header=true; 放入主目录中的 .hiverc 文件或任何其他 Hive 用户属性文件中。

模糊警告：要小心，因为这使我过去的查询崩溃了（但我不记得原因了）。

回复收藏 0 原文

那伤。 2024-11-08 08:00:56

事实上，@nija 的答案是正确的——至少据我所知。在执行插入覆盖[本地]目录...时，没有任何方法可以写入列名称（无论您是否使用本地）。

关于 @user1735861 描述的崩溃，hive 0.7.1 中存在一个已知错误（已在 0.8.0 中修复），在执行 set hive 后.cli.print.header=true;，对于任何不产生输出的 HQL 命令/查询都会导致 NullPointerException。例如：

$ hive -S
hive> use default; 
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

虽然这很好：

$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>

但非 HQL 命令也很好（set、dfs ! 等...）

更多信息请参见此处：https://issues.apache.org/jira/browse/HIVE-2334

Indeed, @nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).

With regards to the crashes described by @user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:

$ hive -S
hive> use default; 
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Whereas this is fine:

$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>

Non-HQL commands are fine though (set,dfs !, etc...)

More info here: https://issues.apache.org/jira/browse/HIVE-2334

回复收藏 0 原文

半窗疏影 2024-11-08 08:00:56

Hive 确实支持写入本地目录。您的语法看起来也适合它。
请查看有关 SELECTS 和 FILTERS 的文档以获取更多信息。

我认为 Hive 没有办法将列名写入您正在运行的查询的文件中。。。我不能肯定地说它不会，但我不知道有什么办法。

我认为对于 Hive 问题来说，唯一比 SO 更好的地方是邮件列表。

回复收藏 0 原文

夏了南城 2024-11-08 08:00:56

我今天遇到了这个问题，并且能够通过在原始查询和创建标题行的新虚拟查询之间执行 UNION ALL 来获得我需要的内容。我在每个部分上添加了一个排序列，并将标题设置为 0，将数据设置为 1，这样我就可以按该字段排序并确保标题行出现在顶部。

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

它有点笨重，但至少您可以通过单个查询获得所需的内容。

希望这有帮助！

I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

It's a little bulky, but at least you can get what you need with a single query.

Hope this helps!

回复收藏 0 原文

深居我梦 2024-11-08 08:00:56

这不是一个很好的解决方案，但这就是我所做的：

create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;

hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt

cat header.txt 000* > all.dat

Not a great solution, but here is what I do:

create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;

hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt

cat header.txt 000* > all.dat

回复收藏 0 原文

~没有更多了~