Python：重新格式化一组文本文件的简洁/优雅的方法？

发布于 2024-12-19 16:38:36 字数 2290 浏览 0 评论 0原文

我编写了一个 python 脚本来处理给定目录中的一组 ASCII 文件。我想知道是否有一种更简洁和/或“pythonesque”的方法来做到这一点，而不失去可读性？

Python 代码

import os
import fileinput
import glob
import string

indir='./'
outdir='./processed/'

for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
    fin=open(indir+filename,'r')   # input file
    fout=open(outdir+filename,'w') # out: processed file

    lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
    fout.write(next(lines)) # just copy the first line (the header) to output

    for line in lines:
        val=iter(string.split(line,' '))
        fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
        for x in val: # iterate over the rest of the numbers in the line
            fout.write('{0:10.6f}'.format(float(val.next()))),  # the rest of the values in the line has a different format 
        fout.write('\n')

    fin.close()
    fout.close()

示例：

输入：

;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398

已处理：

;;; This line is the header line
-5.00  1.003466  0.786494  0.437988  0.087808
-4.99  1.002548  0.785774  0.437586  0.087727
-4.98  1.001632  0.785055  0.437185  0.087647
-4.97  1.000717  0.784338  0.436785  0.087567
-4.96  0.999805  0.783622  0.436386  0.087486

原文

I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?

Python Code

import os
import fileinput
import glob
import string

indir='./'
outdir='./processed/'

for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
    fin=open(indir+filename,'r')   # input file
    fout=open(outdir+filename,'w') # out: processed file

    lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
    fout.write(next(lines)) # just copy the first line (the header) to output

    for line in lines:
        val=iter(string.split(line,' '))
        fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
        for x in val: # iterate over the rest of the numbers in the line
            fout.write('{0:10.6f}'.format(float(val.next()))),  # the rest of the values in the line has a different format 
        fout.write('\n')

    fin.close()
    fout.close()

An example:

Input:

;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398

Processed:

;;; This line is the header line
-5.00  1.003466  0.786494  0.437988  0.087808
-4.99  1.002548  0.785774  0.437586  0.087727
-4.98  1.001632  0.785055  0.437185  0.087647
-4.97  1.000717  0.784338  0.436785  0.087567
-4.96  0.999805  0.783622  0.436386  0.087486

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷情妓 2024-12-26 16:38:36

除了一些小的变化之外，由于 Python 随着时间的推移而发生了变化，这看起来很好。

您正在混合两种不同风格的 next();旧的方法是 it.next()，新的方法是 next(it)。您应该使用 string 方法 split() 而不是通过 string 模块（该模块主要是为了向后兼容 Python 1.x）。没有必要使用几乎无用的“fileinput”模块，因为打开的文件句柄也是迭代器（该模块来自Python的文件句柄是迭代器之前的时间。）

编辑：正如@codeape指出的，glob()返回完整路径。如果 indir 不是“./”，您的代码将无法工作。我更改了以下内容以使用正确的 listdir/os.path.join 解决方案。我对“%”字符串插值也比字符串格式化更熟悉。

下面是我如何用更惯用的现代 Python 来写这个

def reformat(fin, fout):
    fout.write(next(fin)) # just copy the first line (the header) to output
    for line in fin:
        fields = line.split(' ')

        # Make a format header specific to the number of fields
        fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'

        fout.write(fmt % tuple(map(float, fields)))

basenames = os.listdir(indir)  # get a list of input ASCII files to be processed
for basename in basenames:
    input_filename = os.path.join(indir, basename)
    output_filename = os.path.join(outdir, basename)
    with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
        reformat(fin, fout)

Python 之禅是“应该有一种——最好只有一种——明显的方法来做到这一点”。有趣的是，你的运作方式在过去 10 多年里“显然”是正确的解决方案，但现在不再是了。 :)

Other than a few minor changes, due to how Python has changed through time, this looks fine.

You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)

Edit: As @codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.

Here's how I would write this in more idiomatic modern Python

def reformat(fin, fout):
    fout.write(next(fin)) # just copy the first line (the header) to output
    for line in fin:
        fields = line.split(' ')

        # Make a format header specific to the number of fields
        fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'

        fout.write(fmt % tuple(map(float, fields)))

basenames = os.listdir(indir)  # get a list of input ASCII files to be processed
for basename in basenames:
    input_filename = os.path.join(indir, basename)
    output_filename = os.path.join(outdir, basename)
    with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
        reformat(fin, fout)

The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)

回复收藏 0 原文

假面具 2024-12-26 16:38:36

fin=open(indir+filename,'r')   # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()

可以写成：

with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
    #code

在python 2.6中，你可以使用：

with open(indir+filename,'r') as fin:
    with open(outdir+filename,'w') as fout:
        #code

并且该行

lines = iter(fileinput.input([indir+filename]))

没有用。您可以迭代打开的文件（在您的情况下为 fin）

您也可以执行 line.split(' ') 而不是 string.split(line, ' ')

如果您更改这些内容，则无需导入字符串和文件输入。

编辑：我不知道你可以使用内联代码。太酷了

fin=open(indir+filename,'r')   # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()

can be written as:

with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
    #code

In python 2.6, you can use:

with open(indir+filename,'r') as fin:
    with open(outdir+filename,'w') as fout:
        #code

And the line

lines = iter(fileinput.input([indir+filename]))

is useless. You can just iterate over an open file(fin in your case)

You can also do line.split(' ') instead of string.split(line, ' ')

If you change those things, there is no need to import string and fileinput.

Edit: I didn't know you can use inline code. That's cool

回复收藏 0 原文

在巴黎塔顶看东京樱花 2024-12-26 16:38:36

在我的构建脚本中，我有以下代码：

inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
    line = doKeywordSubstitution(line)
    outFile.write(line)
inFile.close()
outFile.close()

我不知道有什么方法可以使其更加简洁。不过，在我看来，将换行逻辑放在不同的函数中看起来更整洁。

我可能错过了你的代码的要点，但我不明白为什么你有 lines = iter(fileinput.input([indir+filename]))。

In my build script, I have this code:

inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
    line = doKeywordSubstitution(line)
    outFile.write(line)
inFile.close()
outFile.close()

I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.

I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).

回复收藏 0 原文

秋意浓 2024-12-26 16:38:36

我不明白你为什么使用：string.split(line, ' ') 而不是仅仅 line.split(' ')。

好吧，也许我会像这样编写字符串处理部分：

values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))

至少对我来说，这看起来更好，但这可能是主观的:)

我会使用 os.curdir 而不是 indir 。我会这样做：os.path.join(os.curdir, 'processed')，而不是“./processed”。

I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').

Well maybe I would write the string-processing part like this:

values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))

At least for me this looks better but this might be subjective :)

Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').

回复收藏 0 原文

~没有更多了~