计算 Feather 文件中的行数、grep、头部和尾部
设置:我正在考虑从使用 csv 写入大型(~20GB)数据文件切换为 Feather 格式,因为我有足够的存储空间,而且额外的速度更重要。我喜欢 csv 文件的一件事是,在命令行中,
wc -l filename
即使对于大型数据文件,我也可以快速获取行数。另外,我可以使用 head
和 tail
命令快速搜索简单的字符串,
grep search_string filename
有时也非常有用。这些很简单,可以很好地处理 csv 文件,但不适用于 Feather。如果我在羽毛锉上尝试其中任何一个,我都不会得到有意义或有帮助的结果。
虽然我当然可以将 Feather 文件读取到 Python 或 R 中,然后对其进行分析,但我宁愿省去写出路径和导入必要库的麻烦。
我的问题:是否存在一个跨平台(至少 Mac 和 Linux)羽毛文件阅读器,我可以用它来快速读入和查看羽毛数据(这将是表格格式),并具有相应的功能行计数、grep、头和尾?或者我可以安装一些简单的 CLI 实用程序,使我能够执行相当于行计数、grep、head 和 tail 的操作吗?
我见过这个问题,但相对于我的问题来说它非常不完整。
Setup: I am contemplating switching from writing large (~20GB) data files with csv to feather format, since I have plenty of storage space and the extra speed is more important. One thing I like about csv files is that at the command line, I can do a quick
wc -l filename
to get a row count, even for large data files. Also, I can quickly search for a simple string with
grep search_string filename
The head
and tail
commands are also very useful at times. These are straight-forward and work well with csv files, but not with feather. If I try any of them on a feather file, I do not get results that make sense or are helpful.
While I certainly can read a feather file into, say, Python or R, and analyze it then, the hassle of writing out the path and importing the necessary libraries is something I'd rather dispense with.
My Question: Does there exist either a cross-platform (at least Mac and Linux) feather file reader I can use to quickly read in and view feather data (this would be in tabular format) with features corresponding to row count, grep, head, and tail? Or are there simple CLI utilities I could install that would enable me to do the equivalent of line count, grep, head, and tail?
I've seen this question, but it is very incomplete relative to my question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
feather
文件,您必须使用Python
或R
程序。要使用 csv,您可以使用 Linxu/Unix 用户可用的任何常见文本操作实用程序。
Linux 文本操作工具
阅读器
less
搜索
grep
转换器
awk
sed
提取器
split
编辑器 < code>vim
上述每个工具都需要一些学习和实践。
建议
如果您有编程技能,请创建一个程序来操作您的
feather
文件。Using
feather
files you must usePython
orR
programs.To use
csv
you can use any of the common text manipulation utilities available to Linxu/Unix users.Linux text manipulation tools
reader
less
search
grep
converters
awk
sed
extractor
split
editor
vim
Each of the above tools requires some learning and practice.
Suggestion
If you have programming skill, create a program to manipulate your
feather
file.