对平面文件进行简单、快速的 SQL 查询

发布于 2024-08-22 00:27:13 字数 665 浏览 6 评论 0原文

有谁知道有什么工具可以使用类似 SQL 的声明性查询语言提供简单、快速的平面文件查询吗？我不想支付将文件加载到数据库中的开销，因为输入数据通常在查询运行后几乎立即被丢弃。

考虑数据文件“animals.txt”：

dog 15
cat 20
dog 10
cat 30
dog 5
cat 40

假设我想提取每个独特动物的最高值。我想写一些类似的东西：

cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"

我可以使用 sort 得到几乎相同的结果：

cat animals.txt | sort -t " " -k1,1 -k2,2nr

而且我总是可以从那里进入 awk，但这一切感觉有点awkward（无法抗拒）当类似 SQL 的语言似乎能够如此干净地解决问题时。

我考虑过为 SQLite 编写一个包装器，它会根据输入数据自动创建一个表，并且我已经研究过在单处理器模式下使用 Hive，但我不禁觉得这个问题以前已经解决了。我错过了什么吗？此功能是否已由其他标准工具实现？

哈普！

原文

Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.

Consider the data file, "animals.txt":

dog 15
cat 20
dog 10
cat 30
dog 5
cat 40

Suppose I want to extract the highest value for each unique animal. I would like to write something like:

cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"

I can get nearly the same result using sort:

cat animals.txt | sort -t " " -k1,1 -k2,2nr

And I can always drop into awk from there, but this all feels a bit awkward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.

I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?

Halp!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪痕残 2024-08-29 00:27:13

我编写 TxtSushi 主要是为了对平面文件进行 SQL 选择。以下是示例的命令链（所有这些命令都来自 TxtSushi）：

tabtocsv animals.txt | namecolumns - | tssql -table animals - \
'select col1, max(as_int(col2)) from animals group by col1'

namecolumns 是必需的，因为 girls.txt 没有标题行。您可以通过查看示例脚本。主页底部还有类似工具的链接。

I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):

tabtocsv animals.txt | namecolumns - | tssql -table animals - \
'select col1, max(as_int(col2)) from animals group by col1'

namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.

回复收藏 0 原文

幽梦紫曦～ 2024-08-29 00:27:13

使用 DBD::AnyData 的 Perl DBI

回复收藏 0 原文

你是我的挚爱i 2024-08-29 00:27:13

你可以使用sqlite。下面是一个使用 Python 的示例。

import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
    cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()

输出

$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]

you can use sqlite. Here's an example using Python.

import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
    cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()

output

$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]

回复收藏 0 原文

小兔几 2024-08-29 00:27:13

我刚刚偶然发现这个 Python 脚本它的功能与您想要的类似，尽管它只支持非常基本的查询。

回复收藏 0 原文

家住魔仙堡 2024-08-29 00:27:13

我们将拥有一个用于 sqlite 的轻量级 ORM，它可以简化此任务，而无需需要任何配置文件等。

如果您可以使用PowerShell，它具有许多用于解析和查询文本文件的强大功能（示例）。否则，使用 .NET/Mono，您可以将其切入并立即使用 LINQ。

回复收藏 0 原文

娜些时光，永不杰束 2024-08-29 00:27:13

我从来没有为我的问题找到令人满意的答案，但我至少使用 uniq 的“-f”选项找到了我的玩具问题的解决方案，这是我不知道的

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

：显然，如果输入文件是使用相反顺序的列创建的，则可以完全跳过上面的 >awk 部分。

不过，我仍然对类似 SQL 的工具抱有希望。

I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniqs "-f" option, which I had been unaware of:

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

The awk portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.

I'm still holding out hope for a SQL-like tool, though.

回复收藏 0 原文

幽梦紫曦～ 2024-08-29 00:27:13

我制作了一个可能有帮助的工具。
http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms
你的sql可能是“从动物中选择最大值（值）”
或者可以通过“按值描述从动物顺序中选择*”

回复收藏 0 原文

人生戏 2024-08-29 00:27:13

您可以查找HXTT JDBC 驱动程序。他们为大多数类型的平面文件、Excel 等提供 JDBC 驱动程序。

您可以对其执行简单的 SQL 查询。

他们也有试用版

回复收藏 0 原文

~没有更多了~

关于作者

云归处

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

对平面文件进行简单、快速的 SQL 查询

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

对平面文件进行简单、快速的 SQL 查询

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。