Python:读取文件时如何忽略#comment行

发布于 2024-08-11 01:48:03 字数 204 浏览 11 评论 0原文

在Python中,我刚刚从文本文件中读取了一行,我想知道如何编写代码来忽略行开头带有哈希#的注释。

我认为应该是这样的:

for 
   if line !contain #
      then ...process line
   else end for loop 

但我是Python新手,我不知道语法

In Python, I have just read a line form a text file and I'd like to know how to code to ignore comments with a hash # at the beginning of the line.

I think it should be something like this:

for 
   if line !contain #
      then ...process line
   else end for loop 

But I'm new to Python and I don't know the syntax

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

过去的过去 2024-08-18 01:48:03

您可以使用 startswith()

例如

for line in open("file"):
    li=line.strip()
    if not li.startswith("#"):
        print line.rstrip()

you can use startswith()

eg

for line in open("file"):
    li=line.strip()
    if not li.startswith("#"):
        print line.rstrip()
淡紫姑娘! 2024-08-18 01:48:03

我建议您在看到 # 字符时不要忽略整行;忽略该行的其余部分。您可以使用名为 partition 的字符串方法函数轻松完成此操作:

with open("filename") as f:
    for line in f:
        line = line.partition('#')[0]
        line = line.rstrip()
        # ... do something with line ...

partition 返回一个元组:分区字符串之前的所有内容、分区字符串以及分区字符串之后的所有内容。因此,通过使用 [0] 进行索引,我们仅获取分区字符串之前的部分。

编辑:
如果您使用的 Python 版本没有 partition(),则可以使用以下代码:

with open("filename") as f:
    for line in f:
        line = line.split('#', 1)[0]
        line = line.rstrip()
        # ... do something with line ...

这会在“#”字符上拆分字符串,然后保留拆分之前的所有内容。 1 参数使 .split() 方法在一次分割后停止;因为我们只是获取第 0 个子字符串(通过使用 [0] 进行索引),如果没有 1 参数,您会得到相同的答案,但这可能会快一点。 (由于 @gnr 的评论,从我的原始代码进行了简化。我的原始代码毫无理由地更加混乱;谢谢 @gnr。)

您也可以编写自己的 partition() 版本。这是一个名为 part() 的函数:

def part(s, s_part):
    i0 = s.find(s_part)
    i1 = i0 + len(s_part)
    return (s[:i0], s[i0:i1], s[i1:])

@dalle 指出“#”可以出现在字符串内。正确处理这个案子并不容易,所以我就忽略了它,但我应该说些什么。

如果您的输入文件对于带引号的字符串有足够简单的规则,那么这并不难。如果你接受任何合法的Python引用字符串,那就很难了,因为有单引号、双引号、多行引号(带有转义行尾的反斜杠)、三引号字符串(使用单引号或双引号),并且甚至是原始字符串!正确处理所有这些的唯一可能的方法是使用复杂的状态机。

但是如果我们将自己限制为一个简单的带引号的字符串,我们可以用一个简单的状态机来处理它。我们甚至可以允许在字符串内使用反斜杠引用的双引号。

c_backslash = '\\'
c_dquote = '"'
c_comment = '#'


def chop_comment(line):
    # a little state machine with two state varaibles:
    in_quote = False  # whether we are in a quoted string right now
    backslash_escape = False  # true if we just saw a backslash

    for i, ch in enumerate(line):
        if not in_quote and ch == c_comment:
            # not in a quote, saw a '#', it's a comment.  Chop it and return!
            return line[:i]
        elif backslash_escape:
            # we must have just seen a backslash; reset that flag and continue
            backslash_escape = False
        elif in_quote and ch == c_backslash:
            # we are in a quote and we see a backslash; escape next char
            backslash_escape = True
        elif ch == c_dquote:
            in_quote = not in_quote

    return line

我真的不想在标记为“初学者”的问题中变得如此复杂,但这个状态机相当简单,我希望它会很有趣。

I recommend you don't ignore the whole line when you see a # character; just ignore the rest of the line. You can do that easily with a string method function called partition:

with open("filename") as f:
    for line in f:
        line = line.partition('#')[0]
        line = line.rstrip()
        # ... do something with line ...

partition returns a tuple: everything before the partition string, the partition string, and everything after the partition string. So, by indexing with [0] we take just the part before the partition string.

EDIT:
If you are using a version of Python that doesn't have partition(), here is code you could use:

with open("filename") as f:
    for line in f:
        line = line.split('#', 1)[0]
        line = line.rstrip()
        # ... do something with line ...

This splits the string on a '#' character, then keeps everything before the split. The 1 argument makes the .split() method stop after a one split; since we are just grabbing the 0th substring (by indexing with [0]) you would get the same answer without the 1 argument, but this might be a little bit faster. (Simplified from my original code thanks to a comment from @gnr. My original code was messier for no good reason; thanks, @gnr.)

You could also just write your own version of partition(). Here is one called part():

def part(s, s_part):
    i0 = s.find(s_part)
    i1 = i0 + len(s_part)
    return (s[:i0], s[i0:i1], s[i1:])

@dalle noted that '#' can appear inside a string. It's not that easy to handle this case correctly, so I just ignored it, but I should have said something.

If your input file has simple enough rules for quoted strings, this isn't hard. It would be hard if you accepted any legal Python quoted string, because there are single-quoted, double-quoted, multiline quotes with a backslash escaping the end-of-line, triple quoted strings (using either single or double quotes), and even raw strings! The only possible way to correctly handle all that would be a complicated state machine.

But if we limit ourselves to just a simple quoted string, we can handle it with a simple state machine. We can even allow a backslash-quoted double quote inside the string.

c_backslash = '\\'
c_dquote = '"'
c_comment = '#'


def chop_comment(line):
    # a little state machine with two state varaibles:
    in_quote = False  # whether we are in a quoted string right now
    backslash_escape = False  # true if we just saw a backslash

    for i, ch in enumerate(line):
        if not in_quote and ch == c_comment:
            # not in a quote, saw a '#', it's a comment.  Chop it and return!
            return line[:i]
        elif backslash_escape:
            # we must have just seen a backslash; reset that flag and continue
            backslash_escape = False
        elif in_quote and ch == c_backslash:
            # we are in a quote and we see a backslash; escape next char
            backslash_escape = True
        elif ch == c_dquote:
            in_quote = not in_quote

    return line

I didn't really want to get this complicated in a question tagged "beginner" but this state machine is reasonably simple, and I hope it will be interesting.

沦落红尘 2024-08-18 01:48:03

我来得这么晚,但是处理 shell 风格(或 python 风格)# 注释的问题是一个很常见的问题。

我几乎每次读取文本文件时都会使用一些代码。
问题在于它无法正确处理引用或转义的评论。但它适用于简单的情况并且很容易。

for line in whatever:
    line = line.split('#',1)[0].strip()
    if not line:
        continue
    # process line

更强大的解决方案是使用 shlex

import shlex
for line in instream:
    lex = shlex.shlex(line)
    lex.whitespace = '' # if you want to strip newlines, use '\n'
    line = ''.join(list(lex))
    if not line:
        continue
    # process decommented line

这种 shlex 方法不仅可以正确处理引号和转义,它还添加了许多很酷的功能(例如,如果需要,可以将文件作为其他文件的源)。我还没有测试过它处理大文件的速度,但它处理小文件的速度已经足够快了。

当您还将每个输入行拆分为字段(在空白上)时,常见的情况甚至更简单:

import shlex
for line in instream:
    fields = shlex.split(line, comments=True)
    if not fields:
        continue
    # process list of fields 

I'm coming at this late, but the problem of handling shell style (or python style) # comments is a very common one.

I've been using some code almost everytime I read a text file.
Problem is that it doesn't handle quoted or escaped comments properly. But it works for simple cases and is easy.

for line in whatever:
    line = line.split('#',1)[0].strip()
    if not line:
        continue
    # process line

A more robust solution is to use shlex:

import shlex
for line in instream:
    lex = shlex.shlex(line)
    lex.whitespace = '' # if you want to strip newlines, use '\n'
    line = ''.join(list(lex))
    if not line:
        continue
    # process decommented line

This shlex approach not only handles quotes and escapes properly, it adds a lot of cool functionality (like the ability to have files source other files if you want). I haven't tested it for speed on large files, but it is zippy enough of small stuff.

The common case when you're also splitting each input line into fields (on whitespace) is even simpler:

import shlex
for line in instream:
    fields = shlex.split(line, comments=True)
    if not fields:
        continue
    # process list of fields 
自由如风 2024-08-18 01:48:03

这是最短的可能形式:

for line in open(filename):
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE

如果您调用的字符串以您传入的字符串开头,则字符串上的 startswith() 方法将返回 True。

虽然这在某些情况下(例如 shell 脚本)是可以的,它有两个问题。首先,它没有指定如何打开文件。打开文件的默认模式是'r',意思是“以二进制模式读取文件”。由于您需要一个文本文件,因此最好使用 'rt' 打开它。尽管这种区别在类 UNIX 操作系统上无关紧要,但在 Windows(以及 OS X 之前的 Mac)上却很重要。

第二个问题是打开的文件句柄。 open() 函数返回一个文件对象,使用完文件后关闭文件被认为是一个很好的做法。为此,请调用对象的 close() 方法。现在,Python可能会为你做这件事,最终;在 Python 中,对象是引用计数的,当对象的引用计数为零时,它就会被释放,并且在某些情况下对象被释放后,Python 将调用其析构函数(一种称为 __del__ 的特殊方法)。请注意,我说的是可能:Python 有一个坏习惯,即不会在程序结束前不久对引用计数降至零的对象实际调用析构函数。估计是着急了!

对于像 shell 脚本这样的短期程序,特别是文件对象,这并不重要。当程序完成时,您的操作系统将自动清理所有打开的文件句柄。但是,如果您打开文件,读取内容,然后开始长时间计算,而没有先显式关闭文件句柄,Python 可能会在计算期间使文件句柄保持打开状态。这是不好的做法。

这个版本可以在任何 2.x 版本的 Python 中工作,并修复了我上面讨论的两个问题:

f = open(file, 'rt')
for line in f:
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE
f.close()

这是旧版本 Python 的最佳通用形式。

正如 steveha 所建议的,使用“with”语句现在被认为是最佳实践。如果您使用的是 2.6 或更高版本,您应该这样编写:

with open(filename, 'rt') as f:
  for line in f:
    if line.startswith('#'):
      continue
    # PROCESS LINE HERE

“with”语句将为您清理文件句柄。

在你的问题中你说“以#开头的行”,所以这就是我在这里向你展示的内容。如果您想过滤掉以可选空格然后“#”开头的行,则应在查找“#”之前删除空格。在这种情况下,您应该将以下内容更改

    if line.startswith('#'):

    if line.lstrip().startswith('#'):

:在 Python 中,字符串是不可变的,因此这不会更改 line 的值。 lstrip() 方法返回删除了所有前导空格的字符串副本。

This is the shortest possible form:

for line in open(filename):
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE

The startswith() method on a string returns True if the string you call it on starts with the string you passed in.

While this is okay in some circumstances like shell scripts, it has two problems. First, it doesn't specify how to open the file. The default mode for opening a file is 'r', which means 'read the file in binary mode'. Since you're expecting a text file it is better to open it with 'rt'. Although this distinction is irrelevant on UNIX-like operating systems, it's important on Windows (and on pre-OS X Macs).

The second problem is the open file handle. The open() function returns a file object, and it's considered good practice to close files when you're done with them. To do that, call the close() method on the object. Now, Python will probably do this for you, eventually; in Python objects are reference-counted, and when an object's reference count goes to zero it gets freed, and at some point after an object is freed Python will call its destructor (a special method called __del__). Note that I said probably: Python has a bad habit of not actually calling the destructor on objects whose reference count drops to zero shortly before the program finishes. I guess it's in a hurry!

For short-lived programs like shell scripts, and particularly for file objects, this doesn't matter. Your operating system will automatically clean up any file handles left open when the program finishes. But if you opened the file, read the contents, then started a long computation without explicitly closing the file handle first, Python is likely to leave the file handle open during your computation. And that's bad practice.

This version will work in any 2.x version of Python, and fixes both the problems I discussed above:

f = open(file, 'rt')
for line in f:
  if line.startswith('#'):
    continue
  # PROCESS LINE HERE
f.close()

This is the best general form for older versions of Python.

As suggested by steveha, using the "with" statement is now considered best practice. If you're using 2.6 or above you should write it this way:

with open(filename, 'rt') as f:
  for line in f:
    if line.startswith('#'):
      continue
    # PROCESS LINE HERE

The "with" statement will clean up the file handle for you.

In your question you said "lines that start with #", so that's what I've shown you here. If you want to filter out lines that start with optional whitespace and then a '#', you should strip the whitespace before looking for the '#'. In that case, you should change this:

    if line.startswith('#'):

to this:

    if line.lstrip().startswith('#'):

In Python, strings are immutable, so this doesn't change the value of line. The lstrip() method returns a copy of the string with all its leading whitespace removed.

日记撕了你也走了 2024-08-18 01:48:03

我最近发现生成器函数在这方面做得很好。我使用类似的函数来跳过注释行、空行等。

我将函数定义为

def skip_comments(file):
    for line in file:
        if not line.strip().startswith('#'):
            yield line

这样,我可以这样做,

f = open('testfile')
for line in skip_comments(f):
    print line

这可以在我的所有代码中重用,并且我可以添加任何额外的处理/日志记录/等。我需要的。

I've found recently that a generator function does a great job of this. I've used similar functions to skip comment lines, blank lines, etc.

I define my function as

def skip_comments(file):
    for line in file:
        if not line.strip().startswith('#'):
            yield line

That way, I can just do

f = open('testfile')
for line in skip_comments(f):
    print line

This is reusable across all my code, and I can add any additional handling/logging/etc. that I need.

苍暮颜 2024-08-18 01:48:03

我知道这是一个旧线程,但这是我的生成器函数
用于我自己的目的。它会删除评论,无论它们在哪里
出现在行中,以及剥离前导/尾随空格和
空行。以下源文本:

# Comment line 1
# Comment line 2

# host01  # This host commented out.
host02  # This host not commented out.
host03
  host04  # Oops! Included leading whitespace in error!
  

将产生:

host02
host03
host04

这是记录的代码,其中包括演示:

def strip_comments(item, *, token='#'):
    """Generator. Strips comments and whitespace from input lines.
    
    This generator strips comments, leading/trailing whitespace, and
    blank lines from its input.
    
    Arguments:
        item (obj):  Object to strip comments from.
        token (str, optional):  Comment delimiter.  Defaults to ``#``.
    
    Yields:
        str:  Next uncommented non-blank line from ``item`` with
            comments and leading/trailing whitespace stripped.
    
    """
    
    for line in item:
        s = line.split(token, 1)[0].strip()
        if s:
            yield s
    
    
if __name__ == '__main__':
    HOSTS = """# Comment line 1
    # Comment line 2

    # host01  # This host commented out.
    host02  # This host not commented out.
    host03
      host04  # Oops! Included leading whitespace in error!""".split('\n')

    
    hosts = strip_comments(HOSTS)
    print('\n'.join(h for h in hosts))

正常用例是从文件(即主机文件,如我上面的示例中)中删除注释。如果是这种情况,那么上面代码的尾部将修改为:

if __name__ == '__main__':
    with open('aa.txt', 'r') as f:
        hosts = strip_comments(f)

        for host in hosts:
            print('\'%s\'' % host)

I know that this is an old thread, but this is a generator function that I
use for my own purposes. It strips comments no matter where they
appear in the line, as well as stripping leading/trailing whitespace and
blank lines. The following source text:

# Comment line 1
# Comment line 2

# host01  # This host commented out.
host02  # This host not commented out.
host03
  host04  # Oops! Included leading whitespace in error!
  

will yield:

host02
host03
host04

Here is documented code, which includes a demo:

def strip_comments(item, *, token='#'):
    """Generator. Strips comments and whitespace from input lines.
    
    This generator strips comments, leading/trailing whitespace, and
    blank lines from its input.
    
    Arguments:
        item (obj):  Object to strip comments from.
        token (str, optional):  Comment delimiter.  Defaults to ``#``.
    
    Yields:
        str:  Next uncommented non-blank line from ``item`` with
            comments and leading/trailing whitespace stripped.
    
    """
    
    for line in item:
        s = line.split(token, 1)[0].strip()
        if s:
            yield s
    
    
if __name__ == '__main__':
    HOSTS = """# Comment line 1
    # Comment line 2

    # host01  # This host commented out.
    host02  # This host not commented out.
    host03
      host04  # Oops! Included leading whitespace in error!""".split('\n')

    
    hosts = strip_comments(HOSTS)
    print('\n'.join(h for h in hosts))

The normal use case will be to strip the comments from a file (i.e., a hosts file, as in my example above). If this is the case, then the tail end of the above code would be modified to:

if __name__ == '__main__':
    with open('aa.txt', 'r') as f:
        hosts = strip_comments(f)

        for host in hosts:
            print('\'%s\'' % host)
七秒鱼° 2024-08-18 01:48:03

过滤表达式的更紧凑版本也可以如下所示:

for line in (l for l in open(filename) if not l.startswith('#')):
    # do something with line

(l for ... ) 称为“生成器表达式”,它在这里充当包装迭代器,它将过滤掉文件中所有不需要的行迭代它的同时。不要将其与方括号 [l for ... ] 中的相同内容混淆,后者是一个“列表理解”,它首先将文件中的所有行读入内存,然后才开始迭代它。

有时您可能希望让它少一些单行且更具可读性:

lines = open(filename)
lines = (l for l in lines if ... )
# more filters and mappings you might want
for line in lines:
    # do something with line

所有过滤器都将在一次迭代中即时执行。

A more compact version of a filtering expression can also look like this:

for line in (l for l in open(filename) if not l.startswith('#')):
    # do something with line

(l for ... ) is called "generator expression" which acts here as a wrapping iterator that will filter out all unneeded lines from file while iterating over it. Don't confuse it with the same thing in square brakets [l for ... ] which is a "list comprehension" that will first read all the lines from the file into memory and only then will start iterating over it.

Sometimes you might want to have it less one-liney and more readable:

lines = open(filename)
lines = (l for l in lines if ... )
# more filters and mappings you might want
for line in lines:
    # do something with line

All the filters will be executed on the fly in one iteration.

醉态萌生 2024-08-18 01:48:03

使用正则表达式 re.compile("^(?:\s+)*#|(?:\s+)") 跳过新行和注释。

Use regex re.compile("^(?:\s+)*#|(?:\s+)") to skip the new lines and comments.

心在旅行 2024-08-18 01:48:03

我倾向于使用

for line  in lines:
    if '#' not in line:
        #do something

This 会忽略整行,尽管包含 rpartition 的答案得到了我的支持,因为它可以包含 # 之前的任何信息

I tend to use

for line  in lines:
    if '#' not in line:
        #do something

This will ignore the whole line, though the answer which includes rpartition has my upvote as it can include any information from before the #

将军与妓 2024-08-18 01:48:03

删除同时适用于内联和一行的注释是一件好事

def clear_coments(f):
    new_text = ''
    for line in f.readlines():
        if "#" in line: line = line.split("#")[0]

        new_text += line

    return new_text

a good thing to get rid of coments that works for both inline and on a line

def clear_coments(f):
    new_text = ''
    for line in f.readlines():
        if "#" in line: line = line.split("#")[0]

        new_text += line

    return new_text
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文