对平面文件进行简单、快速的 SQL 查询
有谁知道有什么工具可以使用类似 SQL 的声明性查询语言提供简单、快速的平面文件查询吗?我不想支付将文件加载到数据库中的开销,因为输入数据通常在查询运行后几乎立即被丢弃。
考虑数据文件“animals.txt”:
dog 15
cat 20
dog 10
cat 30
dog 5
cat 40
假设我想提取每个独特动物的最高值。我想写一些类似的东西:
cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"
我可以使用 sort
得到几乎相同的结果:
cat animals.txt | sort -t " " -k1,1 -k2,2nr
而且我总是可以从那里进入 awk
,但这一切感觉有点awk
ward(无法抗拒)当类似 SQL 的语言似乎能够如此干净地解决问题时。
我考虑过为 SQLite 编写一个包装器,它会根据输入数据自动创建一个表,并且我已经研究过在单处理器模式下使用 Hive,但我不禁觉得这个问题以前已经解决了。我错过了什么吗?此功能是否已由其他标准工具实现?
哈普!
Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.
Consider the data file, "animals.txt":
dog 15
cat 20
dog 10
cat 30
dog 5
cat 40
Suppose I want to extract the highest value for each unique animal. I would like to write something like:
cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"
I can get nearly the same result using sort
:
cat animals.txt | sort -t " " -k1,1 -k2,2nr
And I can always drop into awk
from there, but this all feels a bit awk
ward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.
I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?
Halp!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我编写 TxtSushi 主要是为了对平面文件进行 SQL 选择。以下是示例的命令链(所有这些命令都来自 TxtSushi):
namecolumns 是必需的,因为 girls.txt 没有标题行。您可以通过查看 示例脚本。主页底部还有类似工具的链接。
I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):
namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.
使用 DBD::AnyData 的 Perl DBI
Perl DBI using DBD::AnyData
你可以使用sqlite。下面是一个使用 Python 的示例。
输出
you can use sqlite. Here's an example using Python.
output
我刚刚偶然发现 这个 Python 脚本 它的功能与您想要的类似,尽管它只支持非常基本的查询。
I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.
我们将拥有一个用于 sqlite 的轻量级 ORM,它可以简化此任务,而无需需要任何配置文件等。
如果您可以使用PowerShell,它具有许多用于解析和查询文本文件的强大功能(示例)。否则,使用 .NET/Mono,您可以将其切入并立即使用 LINQ。
We'll I have a lightweight ORM for sqlite that would simplify this task without requiring any configuration files, etc.
If you can using PowerShell has a lot of powerful capabilities for parsing and querying text files (example here). Otherwise using .NET/Mono you can cut that up in and use LINQ in no time.
我从来没有为我的问题找到令人满意的答案,但我至少使用 uniq 的“-f”选项找到了我的玩具问题的解决方案,这是我不知道的
:显然,如果输入文件是使用相反顺序的列创建的,则可以完全跳过上面的 >awk 部分。
不过,我仍然对类似 SQL 的工具抱有希望。
I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using
uniq
s "-f" option, which I had been unaware of:The
awk
portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.I'm still holding out hope for a SQL-like tool, though.
我制作了一个可能有帮助的工具。
http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms
你的sql可能是“从动物中选择最大值(值)”
或者可以通过“按值描述从动物顺序中选择*”
I made a tool that might help.
http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms
your sql could be "Select Max(value) from animals"
or it could by "Select * from animals order by value desc"
您可以查找HXTT JDBC 驱动程序。他们为大多数类型的平面文件、Excel 等提供 JDBC 驱动程序。
您可以对其执行简单的 SQL 查询。
他们也有试用版
You can look for HXTT JDBC Drivers. They provide JDBC drivers for most type of flat files, excel etc .
You can execute simple SQL queries on it.
They have trial versions available as well