在python3中的类中向列表中添加行时出现无限循环
我有一个包含两个类的脚本。 (显然,我删除了很多我认为与我正在处理的错误无关的内容。)最终的任务是创建决策树,正如我在 这个问题。
不幸的是,我遇到了无限循环,并且很难确定原因。我已经确定了出错的代码行,但我认为迭代器和我要添加的列表将是不同的对象。列表的 .append 功能是否有一些我不知道的副作用?或者我犯了其他一些明显的错误?
class Dataset:
individuals = [] #Becomes a list of dictionaries, in which each dictionary is a row from the CSV with the headers as keys
def field_set(self): #Returns a list of the fields in individuals[] that can be used to split the data (i.e. have more than one value amongst the individuals
def classified(self, predicted_value): #Returns True if all the individuals have the same value for predicted_value
def fields_exhausted(self, predicted_value): #Returns True if all the individuals are identical except for predicted_value
def lowest_entropy_value(self, predicted_value): #Returns the field that will reduce <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">entropy</a> the most
def __init__(self, individuals=[]):
我的初始化代码
class Node:
ds = Dataset() #The data that is associated with this Node
links = [] #List of Nodes, the offspring Nodes of this node
level = 0 #Tree depth of this Node
split_value = '' #Field used to split out this Node from the parent node
node_value = '' #Value used to split out this Node from the parent Node
def split_dataset(self, split_value): #Splits the dataset into a series of smaller datasets, each of which has a unique value for split_value. Then creates subnodes to store these datasets.
fields = [] #List of options for split_value amongst the individuals
datasets = {} #Dictionary of Datasets, each one with a value from fields[] as its key
for field in self.ds.field_set()[split_value]: #Populates the keys of fields[]
fields.append(field)
datasets[field] = Dataset()
for i in self.ds.individuals: #Adds individuals to the datasets.dataset that matches their result for split_value
datasets[i[split_value]].individuals.append(i) #<---Causes an infinite loop on the second hit
for field in fields: #Creates subnodes from each of the datasets.Dataset options
self.add_subnode(datasets[field],split_value,field)
def add_subnode(self, dataset, split_value='', node_value=''):
def __init__(self, level, dataset=Dataset()):
目前是:
if __name__ == '__main__':
filename = (sys.argv[1]) #Takes in a CSV file
predicted_value = "# class" #Identifies the field from the CSV file that should be predicted
base_dataset = parse_csv(filename) #Turns the CSV file into a list of lists
parsed_dataset = individual_list(base_dataset) #Turns the list of lists into a list of dictionaries
root = Node(0, Dataset(parsed_dataset)) #Creates a root node, passing it the full dataset
root.split_dataset(root.ds.lowest_entropy_value(predicted_value)) #Performs the first split, creating multiple subnodes
n = root.links[0]
n.split_dataset(n.ds.lowest_entropy_value(predicted_value)) #Attempts to split the first subnode.
I have a script which contains two classes. (I'm obviously deleting a lot of stuff that I don't believe is relevant to the error I'm dealing with.) The eventual task is to create a decision tree, as I mentioned in this question.
Unfortunately, I'm getting an infinite loop, and I'm having difficulty identifying why. I've identified the line of code that's going haywire, but I would have thought the iterator and the list I'm adding to would be different objects. Is there some side effect of list's .append functionality that I'm not aware of? Or am I making some other blindingly obvious mistake?
class Dataset:
individuals = [] #Becomes a list of dictionaries, in which each dictionary is a row from the CSV with the headers as keys
def field_set(self): #Returns a list of the fields in individuals[] that can be used to split the data (i.e. have more than one value amongst the individuals
def classified(self, predicted_value): #Returns True if all the individuals have the same value for predicted_value
def fields_exhausted(self, predicted_value): #Returns True if all the individuals are identical except for predicted_value
def lowest_entropy_value(self, predicted_value): #Returns the field that will reduce <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">entropy</a> the most
def __init__(self, individuals=[]):
and
class Node:
ds = Dataset() #The data that is associated with this Node
links = [] #List of Nodes, the offspring Nodes of this node
level = 0 #Tree depth of this Node
split_value = '' #Field used to split out this Node from the parent node
node_value = '' #Value used to split out this Node from the parent Node
def split_dataset(self, split_value): #Splits the dataset into a series of smaller datasets, each of which has a unique value for split_value. Then creates subnodes to store these datasets.
fields = [] #List of options for split_value amongst the individuals
datasets = {} #Dictionary of Datasets, each one with a value from fields[] as its key
for field in self.ds.field_set()[split_value]: #Populates the keys of fields[]
fields.append(field)
datasets[field] = Dataset()
for i in self.ds.individuals: #Adds individuals to the datasets.dataset that matches their result for split_value
datasets[i[split_value]].individuals.append(i) #<---Causes an infinite loop on the second hit
for field in fields: #Creates subnodes from each of the datasets.Dataset options
self.add_subnode(datasets[field],split_value,field)
def add_subnode(self, dataset, split_value='', node_value=''):
def __init__(self, level, dataset=Dataset()):
My initialisation code is currently:
if __name__ == '__main__':
filename = (sys.argv[1]) #Takes in a CSV file
predicted_value = "# class" #Identifies the field from the CSV file that should be predicted
base_dataset = parse_csv(filename) #Turns the CSV file into a list of lists
parsed_dataset = individual_list(base_dataset) #Turns the list of lists into a list of dictionaries
root = Node(0, Dataset(parsed_dataset)) #Creates a root node, passing it the full dataset
root.split_dataset(root.ds.lowest_entropy_value(predicted_value)) #Performs the first split, creating multiple subnodes
n = root.links[0]
n.split_dataset(n.ds.lowest_entropy_value(predicted_value)) #Attempts to split the first subnode.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可疑的。除非您希望有一个由
Dataset
的所有实例共享的静态成员列表,否则您不应该这样做。如果您在__init__
中设置self.individuals= Something
,那么您也不需要在此处设置individuals
。还是很可疑。您是否将
individuals
参数分配给self.individuals
?如果是这样,您将在函数定义时创建的相同individuals
列表分配给使用默认参数创建的每个Dataset
。将一项添加到一个Dataset
的列表中,并且在没有显式individuals
参数的情况下创建的所有其他项也将获得该项。类似地:
所有在没有显式
dataset
参数的情况下创建的Node
都将接收完全相同的默认Dataset
实例。这是可变默认参数问题以及破坏性的类型- 它产生的迭代似乎很可能导致无限循环。
Suspicious. Unless you want to have a static member list shared by all instances of
Dataset
you shouldn't do that. If you are settingself.individuals= something
in the__init__
, then you don't need to setindividuals
here too.Still suspicious. Are you assigning the
individuals
argument toself.individuals
? If so, you are assigning the sameindividuals
list, created at function definition time, to everyDataset
that is created with a default argument. Add an item to oneDataset
's list and all the others created without an explicitindividuals
argument will get that item too.Similarly:
All
Node
s created without an explicitdataset
argument will receive the exact same defaultDataset
instance.This is the mutable default argument problem and the kind of destructive-iterations it would produce would seem very likely to be causing your infinite loop.
我怀疑您正在附加到您正在迭代的同一个列表,导致它在迭代器到达末尾之前增加大小。尝试迭代列表的副本:
I suspect that you are appending to the same list that you are iterating over causing it to increase in size before the iterator can reach the end of it. Try iterating over a copy of the list instead: