解析和修改文件的最佳(最快)方法
最近我一直在使用大量具有 10-60k 行的文本文件(csv),像这样
id1,id2
id3,id1
id81,id13
...
大多数时候,我需要以数组的形式提取此信息:
id1,id2,id3,id1,id81,id13
或者有时,唯一元素数组:
id1,id2,id3,id81
然后是结果我的代码(java)使用它来做某事。
现在,大多数时候我编写一个 java 函数来为我完成任务,直接从文件读取、逻辑,然后返回 Id 列表。
有没有更好更快的方法来实现这一点,也许通过命令行?
更新:
如果我被要求构建一个应用程序来读取文件并对其执行某些操作,我肯定会用 Java 编写该逻辑,但就我而言,我必须经历很多我从数据仓库获取的文本文件,从中提取相关信息,然后在我的基于 java 的应用程序上运行它。
现在,这仅用于我的应用程序的实验和评估。
Recently I have been using alot of text files (csv) with 10-60k lines, something like this
id1,id2
id3,id1
id81,id13
...
And most of the times, I need to extract this informaton in form of an array:
id1,id2,id3,id1,id81,id13
Or at times, unique elements array:
id1,id2,id3,id81
Then the result is used by my code (java) to do something.
Now, most of the times I write a java function which does the task for me, right from file reading, logic and then returning back the list of Ids.
Is there is a better and a quicker way to achieve this, maybe via command line?
Update:
If I was asked to build an app which was supposed to read a file and do something with it, I will surely write that logic in Java, but in my case I have to go through alot of text files which I get from the data warehouse, extract relevant info from it and then run it over my java based app.
Now, this is only for my experiment and evaluation of my app.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将您的输入复制到文件 test.csv 中:
现在,使用“tr”实用程序,您可以执行以下操作:
并且您拥有:
I copied your input in a file, test.csv:
Now, with the 'tr' utility, you can do:
and you have:
除非您的 Java 代码做了一些愚蠢的事情,否则它将与其他代码处于相同的速度。
命令行工具并没有什么神奇之处,可以让它们比你的代码更快。
Unless your Java code is doing something silly, it will be in the same speed ballpark as anything else.
There's nothing magic about command-line tools that will make them faster than your code.