在这种情况下，为什么尴尬比Python快得多？

发布于 2025-02-13 16:14:27 字数 562 浏览 0 评论 0原文

我有一个带有200,000行的剪辑列表，每行都是表单

<field 1> <field2>

以获取字段1，我可以运行一个看起来像这样的脚本

import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()

for line in text: 
     clip_to_add =   line.split(" ")[0]
     list_of_clips = list_of_clips + clip_to_add +'\n' 

with open ('clips.list', 'w') as file:
file.write (list_of_clips)

jump.close()

，或者我只能使用awk'print {（$ 1）}'< /code>

为什么尴尬会这么快？它在大约1秒内完成了工作。

原文

I have a clip list with 200,000 rows, each row is of the form

<field 1> <field2>

In order to get just field 1, I can run a script that looks like this

import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()

for line in text: 
     clip_to_add =   line.split(" ")[0]
     list_of_clips = list_of_clips + clip_to_add +'\n' 

with open ('clips.list', 'w') as file:
file.write (list_of_clips)

jump.close()

or I can just use awk 'print{($1)}'

why is awk SO much quicker? It completes the job in about 1 second.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

方圜几里 2025-02-20 16:14:27

import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()

for line in text: 
     clip_to_add =   line.split(" ")[0]
     list_of_clips = list_of_clips + clip_to_add +'\n' 

with open ('clips.list', 'w') as file:
file.write (list_of_clips)

jump.close()

从性能的角度来看，该代码写得不好。 .readlines（）需要读取整个文件以创建列表（这是可变的，您根本不使用的功能），即使在您的情况下，您不必知道整个文件的内容完成处理。当您读取文件时，您可以使用进行in＆lt; filehandle＆gt;：以避免将整个文件读取到内存，使用此文件，您可以print space分开<<<首个字段code> file.txt 喜欢这样

with open("file.txt","r") as f:
    for line in f:
        print(line.split(" ")[0])

做导入OS，然后不使用其中包含的任何功能，也打开clips.list两次，一次为剪辑以后为文件，然后切勿使用以前的任何使用。

简单地总结一下：awk'{打印$ 1}'是正确地书面awk代码，而python代码>代码质量非常可疑，将它们进行比较给出不可靠的结果。

import os
import sys
jump = open(sys.argv[1],"r")
clips = open("clips.list","w")
text = jump.readlines()
list_of_clips = str()

for line in text: 
     clip_to_add =   line.split(" ")[0]
     list_of_clips = list_of_clips + clip_to_add +'\n' 

with open ('clips.list', 'w') as file:
file.write (list_of_clips)

jump.close()

This code is poorly written from performance point of view. .readlines() needs to read whole file to create list (which is mutable, feature which you do not use at all), even despite in your case you do not have to know content of whole file to get processing done. When you are reading file you might use for line in <filehandle>: to avoid reading whole file to memory, using this you might print first field of SPACE-separated file.txt like so

with open("file.txt","r") as f:
    for line in f:
        print(line.split(" ")[0])

Moreover you do import os and then do not use any features contained therein and also open clips.list twice, once as clips later as file and then never make any use of former.

To sum it shortly: awk '{print $1}' is correctly written AWK code whilst presented python code is of very dubious quality, comparing them gives unreliable result.

回复收藏 0 原文

~没有更多了~