使用Python中的过滤器函数

发布于 2024-12-18 13:24:55 字数 100 浏览 0 评论 0原文

我正在尝试使用 Python 的内置过滤器函数来从 CSV 中的某些列中提取数据。这是过滤功能的好用吗?我是否必须首先定义这些列中的数据,或者Python是否已经知道哪些列包含哪些数据?

I am trying to use Python's built-in filter function in order to extract data from certain columns in a CSV. Is this a good use of the filter function? Would I have to define the data in these columns first, or would Python somehow already know which columns contain what data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

哆兒滾 2024-12-25 13:24:55

由于Python吹嘘“自带电池”,对于大多数日常情况,可能已经有人提供了解决方案。
CSV就是其中之一,有内置csv模块

还有tablib 是一个非常好的第 3 方模块,尤其是在处理非 ASCII 数据时。

对于您在评论中描述的行为,这将执行以下操作:

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)

Since python boasted "batteries included", for most the everyday situations, someone might already provided a solution.
CSV is one of them, there is built-in csv module

Also tablib is a very good 3rd-party module especially you're dealing with non-ascii data.

For the behaviour you described in the comment, this will do:

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)
灼疼热情 2024-12-25 13:24:55

filter 函数旨在从列表(或一般来说,任何可迭代的)中选择满足特定条件的元素。它并不是真正用于基于索引的选择。因此,尽管您可以使用它来挑选 CSV 文件的指定列,但我不推荐它。相反,您可能应该使用类似这样的内容:

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

根据您对记录的具体操作,最好在感兴趣的列上创建一个迭代器:

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

或者,如果您需要非顺序处理,也许是一个列表:

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

I'我不确定定义列中的数据是什么意思。数据由 CSV 文件定义。


相比之下,这里是您希望使用过滤器的情况:假设您的 CSV 文件包含数字数据,并且您需要构建一个记录列表,其中的数字严格按递增顺序排列:行。您可以编写一个函数来确定数字列表是否严格按递增顺序排列:(

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

请参阅 itertools 文档,了解pairwise 的定义)。然后您可以将其用作 filter 中的条件:

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

当然,同样的事情可以而且通常会作为列表理解来实现:

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

因此没有理由使用 filter > 在实践中。

The filter function is intended to select from a list (or in general, any iterable) those elements which satisfy a certain condition. It's not really intended for index-based selection. So although you could use it to pick out specified columns of a CSV file, I wouldn't recommend it. Instead you should probably use something like this:

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

Depending on what exactly you are doing with the records, it may be better to create an iterator over the columns of interest:

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

or, if you need non-sequential processing, perhaps a list:

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

I'm not sure what you mean by defining the data in the columns. The data are defined by the CSV file.


By comparison, here's a case in which you would want to use filter: suppose your CSV file contains numeric data, and you need to build a list of the records in which the numbers are in strictly increasing order within the row. You could write a function to determine whether a list of numbers is in strictly increasing order:

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

(see the itertools documentation for a definition of pairwise). Then you can use this as the condition in filter:

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

Of course, the same thing could, and usually would, be implemented as a list comprehension:

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

so there's little reason to use filter in practice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文