python新手问题：将代码转换为类

发布于 2024-09-11 14:57:29 字数 937 浏览 3 评论 0原文

我有这个代码：

import csv
import collections

def do_work():
      (data,counter)=get_file('thefile.csv')
      b=samples_subset1(data, counter,'/pythonwork/samples_subset3.csv',500)
      return

def get_file(start_file):

        with open(start_file, 'rb') as f:
            data = list(csv.reader(f))
            counter = collections.defaultdict(int)

            for row in data:
              counter[row[10]] += 1
            return (data,counter)

def samples_subset1(data,counter,output_file,sample_cutoff):

      with open(output_file, 'wb') as outfile:
          writer = csv.writer(outfile)
          b_counter=0
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 b.append(row) 
                 writer.writerow(row)
                 b_counter+=1
      return (b)

我最近开始学习Python，并且想从良好的习惯开始。因此，我想知道您是否可以帮助我开始将此代码转换为类。我不知道从哪里开始。

原文

i have this code:

import csv
import collections

def do_work():
      (data,counter)=get_file('thefile.csv')
      b=samples_subset1(data, counter,'/pythonwork/samples_subset3.csv',500)
      return

def get_file(start_file):

        with open(start_file, 'rb') as f:
            data = list(csv.reader(f))
            counter = collections.defaultdict(int)

            for row in data:
              counter[row[10]] += 1
            return (data,counter)

def samples_subset1(data,counter,output_file,sample_cutoff):

      with open(output_file, 'wb') as outfile:
          writer = csv.writer(outfile)
          b_counter=0
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 b.append(row) 
                 writer.writerow(row)
                 b_counter+=1
      return (b)

i recently started learning python, and would like to start off with good habits. therefore, i was wondering if you can help me get started to turn this code into classes. i dont know where to start.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘香 2024-09-18 14:57:29

根据我对原始帖子的评论，我认为这里不需要上课。不过，如果其他 Python 程序员会读到这篇文章，我建议将其与 PEP8（Python 风格指南）内联。这是一个快速重写：

import csv
import collections

def do_work():
    data, counter = get_file('thefile.csv')
    b = samples_subset1(data, counter, '/pythonwork/samples_subset3.csv', 500)

def get_file(start_file):
    with open(start_file, 'rb') as f:
        counter = collections.defaultdict(int)
        data = list(csv.reader(f))

        for row in data:
            counter[row[10]] += 1

    return (data, counter)

def samples_subset1(data, counter, output_file, sample_cutoff):
    with open(output_file, 'wb') as outfile:
        writer = csv.writer(outfile)
        b = []
        for row in data:
            if counter[row[10]] >= sample_cutoff:
                b.append(row) 
                writer.writerow(row)

    return b

注意：

没有人使用超过 4 个空格
永远缩进。使用 2 - 4。以及所有
你的缩进级别应该
匹配。
参数之间的逗号后使用一个空格
到函数（“F（a，b，c）”不
"F(a,b,c)")
函数末尾的裸返回语句
是没有意义的。功能无
return 语句隐式返回
无
全部周围有一个空格
（a = 1，而不是 a=1）
运算符
将单个值括在括号中。
它看起来像一个元组，但事实并非如此。
b_counter 根本没有使用，所以我
将其删除。
csv.reader 返回一个迭代器，您将其转换为列表。这通常是一个坏主意，因为它迫使 Python 立即将整个文件加载到内存中，而迭代器只会根据需要返回每一行。理解迭代器对于编写高效的 Python 代码绝对必要。我暂时保留了 data，但您可以重写以在使用 data（它是一个列表）的任何地方使用迭代器。

Per my comment on the original post, I don't think a class is necessary here. Still, if other Python programmers will ever read this, I'd suggest getting it inline with PEP8, the Python style guide. Here's a quick rewrite:

import csv
import collections

def do_work():
    data, counter = get_file('thefile.csv')
    b = samples_subset1(data, counter, '/pythonwork/samples_subset3.csv', 500)

def get_file(start_file):
    with open(start_file, 'rb') as f:
        counter = collections.defaultdict(int)
        data = list(csv.reader(f))

        for row in data:
            counter[row[10]] += 1

    return (data, counter)

def samples_subset1(data, counter, output_file, sample_cutoff):
    with open(output_file, 'wb') as outfile:
        writer = csv.writer(outfile)
        b = []
        for row in data:
            if counter[row[10]] >= sample_cutoff:
                b.append(row) 
                writer.writerow(row)

    return b

Notes:

No one uses more than 4 spaces to
indent ever. Use 2 - 4. And all
your levels of indentation should
match.
Use a single space after the commas between arguments
to functions ("F(a, b, c)" not
"F(a,b,c)")
Naked return statements at the end of a function
are meaningless. Functions without
return statements implicitly return
None
Single space around all
operators (a = 1, not a=1)
Do not
wrap single values in parentheses.
It looks like a tuple, but it isn't.
b_counter wasn't used at all, so I
removed it.
csv.reader returns an iterator, which you are casting to a list. That's usually a bad idea because it forces Python to load the entire file into memory at once, whereas the iterator will just return each line as needed. Understanding iterators is absolutely essential to writing efficient Python code. I've left data in for now, but you could rewrite to use an iterator everywhere you're using data, which is a list.

回复收藏 0 原文

枫以 2024-09-18 14:57:29

好吧，我不确定你想把什么变成一个类。你知道什么是班级吗？您想要创建一个类来表示某种类型的事物。如果我正确理解您的代码，您希望过滤 CSV 以仅显示那些 row[ 10 ] 至少被 sample_cutoff 其他行共享的行。当然，使用 Excel 过滤器可以比在 Python 中读取文件更容易做到这一点吗？

其他线程中的人建议的内容是正确的，但并不真正适用于您的情况。您不必要地使用了很多全局变量：如果它们对于代码来说是必需的，您应该将所有内容放入类中并将它们设为属性，但是由于您一开始就不需要它们，因此没有必要创建班级。

关于代码的一些提示：

不要将文件转换为列表。这使得 Python 将整个内容一次性读入内存，如果你有一个大文件，这会很糟糕。相反，只需迭代文件本身：for row in csv.reader(f): 然后，当您想再次浏览文件时，只需执行 < code>f.seek(0) 返回顶部并重新开始。
不要将return放在每个函数的末尾；那是没有必要的。您也不需要括号：返回垃圾邮件 就可以。

改写

import csv
import collections

def do_work():
    with open( 'thefile.csv' ) as f:
        # Open the file and count the rows.
        data, counter = get_file(f)
        
        # Go back to the start of the file.
        f.seek(0)

        # Filter to only common rows.
        b = samples_subset1(data, counter, 
            '/pythonwork/samples_subset3.csv', 500)
   
     return b

def get_file(f):
    counter = collections.defaultdict(int)
    data = csv.reader(f)
    
    for row in data:
        counter[row[10]] += 1

    return data, counter

def samples_subset1(data, counter, output_file, sample_cutoff):
    with open(output_file, 'wb') as outfile:
        writer = csv.writer(outfile)
        b = []
        for row in data:
            if counter[row[10]] >= sample_cutoff:
                b.append(row) 
                writer.writerow(row)

    return b

Well, I'm not sure what you want to turn into a class. Do you know what a class is? You want to make a class to represent some type of thing. If I understand your code correctly, you want to filter a CSV to show only those rows whose row[ 10 ] is shared by at least sample_cutoff other rows. Surely you could do that with an Excel filter much more easily than by reading through the file in Python?

What the guy in the other thread suggested is true, but not really applicable to your situation. You used a lot of global variables unnecessarily: if they'd been necessary to the code you should have put everything into a class and made them attributes, but as you didn't need them in the first place, there's no point in making a class.

Some tips on your code:

Don't cast the file to a list. That makes Python read the whole thing into memory at once, which is bad if you have a big file. Instead, simply iterate through the file itself: for row in csv.reader(f): Then, when you want to go through the file a second time, just do f.seek(0) to return to the top and start again.
Don't put return at the end of every function; that's just unnecessary. You don't need parentheses, either: return spam is fine.

Rewrite

import csv
import collections

def do_work():
    with open( 'thefile.csv' ) as f:
        # Open the file and count the rows.
        data, counter = get_file(f)
        
        # Go back to the start of the file.
        f.seek(0)

        # Filter to only common rows.
        b = samples_subset1(data, counter, 
            '/pythonwork/samples_subset3.csv', 500)
   
     return b

def get_file(f):
    counter = collections.defaultdict(int)
    data = csv.reader(f)
    
    for row in data:
        counter[row[10]] += 1

    return data, counter

def samples_subset1(data, counter, output_file, sample_cutoff):
    with open(output_file, 'wb') as outfile:
        writer = csv.writer(outfile)
        b = []
        for row in data:
            if counter[row[10]] >= sample_cutoff:
                b.append(row) 
                writer.writerow(row)

    return b

回复收藏 0 原文

~没有更多了~