在这种情况下,为什么尴尬比Python快得多?
我有一个带有200,000行的剪辑列表,每行都是表单
<field 1> <field2>
以获取字段1,我可以运行一个看起来像这样的脚本
import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()
for line in text:
clip_to_add = line.split(" ")[0]
list_of_clips = list_of_clips + clip_to_add +'\n'
with open ('clips.list', 'w') as file:
file.write (list_of_clips)
jump.close()
,或者我只能使用awk'print {($ 1)}'< /code>
为什么尴尬会这么快?它在大约1秒内完成了工作。
I have a clip list with 200,000 rows, each row is of the form
<field 1> <field2>
In order to get just field 1, I can run a script that looks like this
import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()
for line in text:
clip_to_add = line.split(" ")[0]
list_of_clips = list_of_clips + clip_to_add +'\n'
with open ('clips.list', 'w') as file:
file.write (list_of_clips)
jump.close()
or I can just use awk 'print{($1)}'
why is awk SO much quicker? It completes the job in about 1 second.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从性能的角度来看,该代码写得不好。
.readlines()
需要读取整个文件以创建列表(这是可变的,您根本不使用的功能),即使在您的情况下,您不必知道整个文件的内容完成处理。当您读取文件时,您可以使用进行
首个字段code> file.txt 喜欢这样in&lt; filehandle&gt;:
以避免将整个文件读取到内存,使用此文件,您可以print
space分开<<<做
导入OS
,然后不使用其中包含的任何功能,也打开clips.list
两次,一次为剪辑
以后为文件
,然后切勿使用以前的任何使用。简单地总结一下:
awk'{打印$ 1}'
是正确地书面awk代码,而python
代码>代码质量非常可疑,将它们进行比较给出不可靠的结果。This code is poorly written from performance point of view.
.readlines()
needs to read whole file to create list (which is mutable, feature which you do not use at all), even despite in your case you do not have to know content of whole file to get processing done. When you are reading file you might usefor line in <filehandle>:
to avoid reading whole file to memory, using this you mightprint
first field of SPACE-separatedfile.txt
like soMoreover you do
import os
and then do not use any features contained therein and also openclips.list
twice, once asclips
later asfile
and then never make any use of former.To sum it shortly:
awk '{print $1}'
is correctly written AWK code whilst presentedpython
code is of very dubious quality, comparing them gives unreliable result.