对平面文件进行简单、快速的 SQL 查询

发布于 2024-08-22 00:27:13 字数 665 浏览 6 评论 0原文

有谁知道有什么工具可以使用类似 SQL 的声明性查询语言提供简单、快速的平面文件查询吗?我不想支付将文件加载到数据库中的开销,因为输入数据通常在查询运行后几乎立即被丢弃。

考虑数据文件“animals.txt”:

dog 15
cat 20
dog 10
cat 30
dog 5
cat 40

假设我想提取每个独特动物的最高值。我想写一些类似的东西:

cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"

我可以使用 sort 得到几乎相同的结果:

cat animals.txt | sort -t " " -k1,1 -k2,2nr

而且我总是可以从那里进入 awk,但这一切感觉有点awkward(无法抗拒)当类似 SQL 的语言似乎能够如此干净地解决问题时。

我考虑过为 SQLite 编写一个包装器,它会根据输入数据自动创建一个表,并且我已经研究过在单处理器模式下使用 Hive,但我不禁觉得这个问题以前已经解决了。我错过了什么吗?此功能是否已由其他标准工具实现?

哈普!

Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.

Consider the data file, "animals.txt":

dog 15
cat 20
dog 10
cat 30
dog 5
cat 40

Suppose I want to extract the highest value for each unique animal. I would like to write something like:

cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"

I can get nearly the same result using sort:

cat animals.txt | sort -t " " -k1,1 -k2,2nr

And I can always drop into awk from there, but this all feels a bit awkward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.

I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?

Halp!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

泪痕残 2024-08-29 00:27:13

我编写 TxtSushi 主要是为了对平面文件进行 SQL 选择。以下是示例的命令链(所有这些命令都来自 TxtSushi):

tabtocsv animals.txt | namecolumns - | tssql -table animals - \
'select col1, max(as_int(col2)) from animals group by col1'

namecolumns 是必需的,因为 girls.txt 没有标题行。您可以通过查看 示例脚本。主页底部还有类似工具的链接。

I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):

tabtocsv animals.txt | namecolumns - | tssql -table animals - \
'select col1, max(as_int(col2)) from animals group by col1'

namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.

幽梦紫曦~ 2024-08-29 00:27:13

使用 DBD::AnyData 的 Perl DBI

Perl DBI using DBD::AnyData

你是我的挚爱i 2024-08-29 00:27:13

你可以使用sqlite。下面是一个使用 Python 的示例。

import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
    cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()

输出

$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]

you can use sqlite. Here's an example using Python.

import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
    cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()

output

$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]
小兔几 2024-08-29 00:27:13

我刚刚偶然发现 这个 Python 脚本 它的功能与您想要的类似,尽管它只支持非常基本的查询。

I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.

家住魔仙堡 2024-08-29 00:27:13

我们将拥有一个用于 sqlite 的轻量级 ORM,它可以简化此任务,而无需需要任何配置文件等。

如果您可以使用PowerShell,它具有许多用于解析和查询文本文件的强大功能(示例)。否则,使用 .NET/Mono,您可以将其切入并立即使用 LINQ。

We'll I have a lightweight ORM for sqlite that would simplify this task without requiring any configuration files, etc.

If you can using PowerShell has a lot of powerful capabilities for parsing and querying text files (example here). Otherwise using .NET/Mono you can cut that up in and use LINQ in no time.

娜些时光,永不杰束 2024-08-29 00:27:13

我从来没有为我的问题找到令人满意的答案,但我至少使用 uniq 的“-f”选项找到了我的玩具问题的解决方案,这是我不知道的

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

:显然,如果输入文件是使用相反顺序的列创建的,则可以完全跳过上面的 >awk 部分。

不过,我仍然对类似 SQL 的工具抱有希望。

I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniqs "-f" option, which I had been unaware of:

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

The awk portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.

I'm still holding out hope for a SQL-like tool, though.

幽梦紫曦~ 2024-08-29 00:27:13

我制作了一个可能有帮助的工具。
http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms
你的sql可能是“从动物中选择最大值(值)”
或者可以通过“按值描述从动物顺序中选择*”

I made a tool that might help.
http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms
your sql could be "Select Max(value) from animals"
or it could by "Select * from animals order by value desc"

人生戏 2024-08-29 00:27:13

您可以查找HXTT JDBC 驱动程序。他们为大多数类型的平面文件、Excel 等提供 JDBC 驱动程序。

您可以对其执行简单的 SQL 查询。

他们也有试用版

You can look for HXTT JDBC Drivers. They provide JDBC drivers for most type of flat files, excel etc .

You can execute simple SQL queries on it.

They have trial versions available as well

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文