读取 CSV 文件并在 Python 中对其进行排序

发布于 2024-10-31 18:32:12 字数 751 浏览 0 评论 0原文

我正在尝试读取如下所示的 CSV 文件：

ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60

这是我一直在尝试的一些代码。

class jewel:
    def __init__(gem, name, carat, value):
            gem.name = name
            gem.carot = carat
            gem.value = value
    def __repr__(gem):
            return repr((gem.name, gem.carat, gem.value))

jewel_objects = [jewel('diamond', '1', 400),
                 jewel('ruby', '2', 200),
                 jewel('opal', '1', 600),
                ]

aList = [sorted(jewel_objects, key=lambda jewel: (jewel.value))]
print aList

我想读入这些值并将它们分配给名称、克拉和值，但我不知道该怎么做。然后，一旦我读入它们，我想按每克拉价值对它们进行排序，即价值/克拉。我做了很多搜索，但结果都是空白。非常感谢您提前提供的帮助。

原文

I am trying to read in a CSV file that looks like this:

ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60

Here is some code I have been experimenting with.

class jewel:
    def __init__(gem, name, carat, value):
            gem.name = name
            gem.carot = carat
            gem.value = value
    def __repr__(gem):
            return repr((gem.name, gem.carat, gem.value))

jewel_objects = [jewel('diamond', '1', 400),
                 jewel('ruby', '2', 200),
                 jewel('opal', '1', 600),
                ]

aList = [sorted(jewel_objects, key=lambda jewel: (jewel.value))]
print aList

I would like to read in the values and assign them to name, carat, and value but I'm not sure how to do so. Then once I get them read in I would like to sort them by value per carat so value/carat. I have done quite a bit of searching and have came up blank. Thank you very much for your help in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅沫记忆 2024-11-07 18:32:12

您需要在这里做两件事，第一件事是将数据实际加载到对象中。我建议您查看标准 python 库中的“csv”模块。它非常完整，将读取每一行并使其易于访问

CSV 文档： http://docs.python .org/library/csv.html

我将创建一个对象列表，然后在您的对象中实现 cmp 函数，或者（如果您使用的是旧版本python) 你可以将一个函数传递给sorted()来定义它。您可以在 python wiki Wiki 文档中获取有关排序的更多信息

： http://wiki.python.org /moin/HowTo/Sorting

你可以在你的类中实现这样的 cmp 函数（这可以变得更有效，但我在这里只是进行描述）

def __cmp__(gem, other):
    if (gem.value / gem.carot) < (other.value / other.carot):
        return -1
    elif (gem.value / gem.carot) > (other.value / other.carot): 
        return 1
    else:
        return 0

You need to do two things here, the first is actually loading the data into the objects. I recommend you look at the 'csv' module in the standard python library for this. It's very complete and will read each row and make it easily accessable

CSV docs: http://docs.python.org/library/csv.html

I would create a list of the objects, and then implement either an cmp function in your object, or (if you're using an older version of python) you can pass a function to sorted() that would define it. You can get more info about sorting in the python wiki

Wiki docs: http://wiki.python.org/moin/HowTo/Sorting

You would implement the cmp function like this in your class (this can be made a bit more efficent, but I'm being descriptive here)

def __cmp__(gem, other):
    if (gem.value / gem.carot) < (other.value / other.carot):
        return -1
    elif (gem.value / gem.carot) > (other.value / other.carot): 
        return 1
    else:
        return 0

回复收藏 0 原文

夏日浅笑〃 2024-11-07 18:32:12

Python 有一个 csv 模块，应该对你很有帮助。

http://docs.python.org/library/csv.html

回复收藏 0 原文

不再让梦枯萎 2024-11-07 18:32:12

您可以将 numpy 结构化数组与 csv 模块一起使用，并使用 numpy.sort() 对数据进行排序。下面的代码应该可以工作。假设您的 csv 文件名为 geminfo.csv

import numpy as np
import csv

fileobj = open('geminfo.csv','rb')
csvreader = csv.reader(fileobj)

# Convert data to a list of lists
importeddata = list(csvreader)

# Calculate Value/Carat and add it to the imported data
# and convert each entry to a tuple
importeddata = [tuple(entry + [float(entry[2])/entry[1]]) for entry in importeddata]

对此数据进行排序的一种方法是使用 numpy，如下所示。

# create an empty array
data = np.zeros(len(importeddata), dtype = [('Stone Name','a20'),
                            ('Carats', 'f4'),
                            ('Value', 'f4'), 
                            ('valuepercarat', 'f4')]
                        )
data[:] = importeddata[:]
datasortedbyvaluepercarat = np.sort(data, order='valuepercarat')

You can use numpy structured arrays along with the csv module and use numpy.sort() to sort the data. The following code should work. Suppose your csv file is named geminfo.csv

import numpy as np
import csv

fileobj = open('geminfo.csv','rb')
csvreader = csv.reader(fileobj)

# Convert data to a list of lists
importeddata = list(csvreader)

# Calculate Value/Carat and add it to the imported data
# and convert each entry to a tuple
importeddata = [tuple(entry + [float(entry[2])/entry[1]]) for entry in importeddata]

One way to sort this data is to use numpy as shown below.

# create an empty array
data = np.zeros(len(importeddata), dtype = [('Stone Name','a20'),
                            ('Carats', 'f4'),
                            ('Value', 'f4'), 
                            ('valuepercarat', 'f4')]
                        )
data[:] = importeddata[:]
datasortedbyvaluepercarat = np.sort(data, order='valuepercarat')

回复收藏 0 原文

穿透光 2024-11-07 18:32:12

import csv
import operator

class Jewel(object):
    @classmethod
    def fromSeq(cls, seq):
        return cls(*seq)

    def __init__(self, name, carat, value):
        self.name  = str(name)
        self.carat = float(carat)
        self.value = float(value)

    def __repr__(self):
        return "{0}{1}".format(self.__class__.__name__, (self.name, self.carat, self.value))

    @property
    def valuePerCarat(self):
        return self.value / self.carat

def loadJewels(fname):
    with open(fname, 'rb') as inf:
        incsv = csv.reader(inf)
        jewels = [Jewel.fromSeq(row) for row in incsv if row]
    jewels.sort(key=operator.attrgetter('valuePerCarat'))
    return jewels

def main():
    jewels = loadJewels('jewels.csv')
    for jewel in jewels:
        print("{0:35} ({1:>7.2f})".format(jewel, jewel.valuePerCarat))

if __name__=="__main__":
    main()

产生

Jewel('amethyst', 2.0, 50.0)        (  25.00)
Jewel('ruby', 2.0, 100.0)           (  50.00)
Jewel('malachite', 1.0, 60.0)       (  60.00)
Jewel('emerald', 3.0, 250.0)        (  83.33)
Jewel('sapphire', 2.0, 500.0)       ( 250.00)
Jewel('opal', 1.0, 300.0)           ( 300.00)
Jewel('diamond', 1.0, 400.0)        ( 400.00)

import csv
import operator

class Jewel(object):
    @classmethod
    def fromSeq(cls, seq):
        return cls(*seq)

    def __init__(self, name, carat, value):
        self.name  = str(name)
        self.carat = float(carat)
        self.value = float(value)

    def __repr__(self):
        return "{0}{1}".format(self.__class__.__name__, (self.name, self.carat, self.value))

    @property
    def valuePerCarat(self):
        return self.value / self.carat

def loadJewels(fname):
    with open(fname, 'rb') as inf:
        incsv = csv.reader(inf)
        jewels = [Jewel.fromSeq(row) for row in incsv if row]
    jewels.sort(key=operator.attrgetter('valuePerCarat'))
    return jewels

def main():
    jewels = loadJewels('jewels.csv')
    for jewel in jewels:
        print("{0:35} ({1:>7.2f})".format(jewel, jewel.valuePerCarat))

if __name__=="__main__":
    main()

produces

Jewel('amethyst', 2.0, 50.0)        (  25.00)
Jewel('ruby', 2.0, 100.0)           (  50.00)
Jewel('malachite', 1.0, 60.0)       (  60.00)
Jewel('emerald', 3.0, 250.0)        (  83.33)
Jewel('sapphire', 2.0, 500.0)       ( 250.00)
Jewel('opal', 1.0, 300.0)           ( 300.00)
Jewel('diamond', 1.0, 400.0)        ( 400.00)

回复收藏 0 原文

等你爱我 2024-11-07 18:32:12

为了解析真实的 CSV（逗号分隔值）数据，您需要使用最新版本的 Python 中包含的 CSV 模块。

CSV 是一组约定而不是标准。您显示的示例数据简单且规则，但 CSV 通常存在一些丑陋的极端情况，例如，任何字段的内容可能嵌入逗号。

这是一个非常粗糙的程序，基于您的代码，它对数据进行简单的解析（按行分割，然后用逗号分割每行）。它不会处理任何未精确拆分为正确数量字段的数据，也不会处理任何未由 Python int() 和 float() 正确解析数字字段的数据 函数（对象构造函数）。换句话说，这不包含错误检查或异常处理。

不过，我故意将其保持简单，以便可以轻松地将其与您的粗略笔记进行比较。另请注意，我在类定义中使用了有关“self”引用的常规 Python 约定。（大约唯一一次使用“self”以外的名称是在进行“元类”编程时......编写动态实例化其他类的类。任何其他情况几乎肯定会引起任何人的严重担忧经验丰富的 Python 程序员正在查看您的代码）。

#!/usr/bin/env python
class Jewel:
    def __init__(self, name, carat, value):
        self.name = name
        self.carat = int(carat)
        self.value = float(value)
        assert self.carat != 0      # Division by zero would result from this
    def __repr__(self):
        return repr((self.name, self.carat, self.value))

if __name__ == '__main__':
    sample='''ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60'''

    these_jewels = list()
    for each_line in sample.split('\n'):
        gem_type, carat, value = each_line.split(',')
        these_jewels.append(Jewel(gem_type, carat, value))
        # Equivalently: 
        # these_jewels.append(Jewel(*each_line.split(',')))

    decorated = [(x.value/x.carat, x) for x in these_jewels]
    results = [x[1] for x in sorted(decorated)]
    print '\n'.join([str(x) for x in results])

这里的解析是简单地使用字符串 .split() 方法完成的，并且使用 Python 的“元组解包”语法将数据提取到名称中（如果任何输入行有错误，这将失败）字段数）。

这两行的替代语法使用 Python 的“apply”语法。参数上的 * 前缀导致它被解包为单独的参数，这些参数被传递给 Jewel() 类实例化。

此代码还使用广泛（且广泛推荐）的 DSU（装饰、排序、取消装饰）模式对数据的某些字段进行排序。我通过创建一系列元组来“装饰”数据：（计算值，对象引用），然后以我希望您清楚的方式“取消装饰”排序的数据。（任何有经验的 Python 程序员都会立即清楚这一点）。

是的，整个 DSU 可以简化为一条线；出于易读性和教学目的，我将其在这里分开。

同样，此示例代码纯粹是为了启发您。您应该对任何真实数据使用 CSV 模块；并且您应该在解析或 Jewel.__init__ 处理中引入异常处理（用于将数字数据转换为正确的 Python 类型）。
（另请注意，您应该考虑使用 Python 的 Decimal 模块而不是 float() 来表示货币值......或者至少以美分或密耳为单位存储值并使用您自己的函数将其表示为美元和美分）。

For parsing real-world CSV (comma-separated values) data you'll want to use the CSV module that's included with recent versions of Python.

CSV is a set of conventions rather than standard. The sample data you show is simple and regular, but CSV generally has some ugly corner cases for quoting where the contents of any field might have embedded commas, for example.

Here is a very crude program, based on your code, which does naïve parsing of the data (splitting by lines, then splitting each line on commas). It will not handle any data which doesn't split to precisely the correct number of fields, nor any where the numeric fields aren't correctly parsed by the Python int() and float() functions (object constructors). In other words this contains no error checking nor exception handling.

However, I've kept it deliberately simple so it can be easily compared to your rough notes. Also note that I've used the normal Python conventions regarding "self" references in the class definition. (About the only time one would use names other than "self" for these is when doing "meta-class" programming ... writing classes which dynamically instantiate other classes. Any other case will almost certainly cause serious concerns in the minds of any experienced Python programmers looking at your code).

#!/usr/bin/env python
class Jewel:
    def __init__(self, name, carat, value):
        self.name = name
        self.carat = int(carat)
        self.value = float(value)
        assert self.carat != 0      # Division by zero would result from this
    def __repr__(self):
        return repr((self.name, self.carat, self.value))

if __name__ == '__main__':
    sample='''ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60'''

    these_jewels = list()
    for each_line in sample.split('\n'):
        gem_type, carat, value = each_line.split(',')
        these_jewels.append(Jewel(gem_type, carat, value))
        # Equivalently: 
        # these_jewels.append(Jewel(*each_line.split(',')))

    decorated = [(x.value/x.carat, x) for x in these_jewels]
    results = [x[1] for x in sorted(decorated)]
    print '\n'.join([str(x) for x in results])

The parsing here is done simply using the string .split() method, and the data is extracted into names using Python's "tuple unpacking" syntax (this would fail if any line of input were to have the wrong number of fields).

The alternative syntax to those two lines uses Python's "apply" syntax. The * prefix on the argument causes it to be unpacked into separate arguments which are passed to the Jewel() class instantiation.

This code also uses the widespread (and widely recommended) DSU (decorate, sort, undecorate) pattern for sorting on some field of your data. I "decorate" the data by creating a series of tuples: (computed value, object reference), then "undecorate" the sorted data in a way which I hope is clear to you. (It would be immediately clear to any experienced Python programmer).

Yes the whole DSU could be reduced to a single line; I've separated it here for legibility and pedagogical purposes.

Again this sample code is purely for your edification. You should use the CSV module on any real-world data; and you should introduce exception handling either in the parsing or in the Jewel.__init__ handling (for converting the numeric data into the correct Python types.
(Also note that you should consider using Python's Decimal module rather than float()s for representing monetary values ... or at least storing the values in cents or mils and using your own functions to represent those as dollars and cents).

回复收藏 0 原文

~没有更多了~