Numpy csv 脚本给出“ValueError:使用序列设置数组元素”
我有一个 python 脚本,它成功地将 csv 文件加载到 2d numpy 数组中,然后根据其列和行标题值成功提取所需单元格的值。出于诊断目的,我让脚本在将数据矩阵放入 numpy 数组之前打印数据矩阵的内容。当基础 csv 文件中的数据包含所有行/列的值时,该脚本将起作用。问题是,当我在 csv 文件上运行脚本时,它会抛出错误,该文件的末尾显然有几个空行/列。我尝试通过在 Notepad++ 中打开 csv 文件并删除尽可能多的内容来解决此问题,这样我就可以从文件末尾删除。 Notepad++让我删除最后一行,但没有表明有任何空列。在更深入地检查 python 打印输出和底层数据结构之间的关系后,我发现 python print 命令说数组末尾有两个空列。无论如何,编辑 csv 文件后,运行脚本时仍然打印出相同的数据,并且仍然抛出相同的错误,就好像我没有删除 csv 文件末尾的空行一样。我检查是否已保存 csv 文件,打开并关闭 csv 文件几次,关闭并重新打开 python 几次,但错误仍然存在:
这是我的问题:
如何修改下面的脚本以避免此错误?
这是我上面提到的函数:
def GetHSD_alpha(NumberOfColumnMeans,dfResid):
dirname=os.path.dirname(os.getcwd())
resources=os.path.join(dirname,'resources')
inputfile=os.path.join(resources,'CriticalValuesOfTukeysHSD_a_0_01.csv')
separator=','
ColumnIndex=NumberOfColumnMeans
RowIndex=dfResid
cast = p.cast
data = [[] for dummy in xrange(13)]
for line in open(inputfile, 'r'):
fields = line.strip().split(separator)
for i, number in enumerate(fields):
data[i].append(number)
print 'data HSD alpha is: ',data
time.sleep(2)
CriticalValuesArray=p.array(data)
HSD_alpha_0_01=CriticalValuesArray[ColumnIndex,RowIndex]
return HSD_alpha_0_01
另外,作为参考,这里是结果的缩写版本打印引发错误的数据。请注意末尾的空元素,在运行脚本之前我似乎无法从 csv 文件中手动消除这些元素:
data HSD alpha is: [['', '5', '6', '7'], ['2 ', '5.7', '5.24', '4.95'], ['3', '6.98', '6.33', '5.92'], ['11', '10.48', '9.3', '8.55'] , [], []]
另外供参考,这里是从另一个 csv 文件打印数据的结果的缩写版本,我出于诊断目的将其导入到脚本中。下面打印输出中对应的数据不会导致脚本抛出错误:
数据 HSD alpha 为:[['', '1', '2', '3'], ['1', '4052', '98.49', '34.12'], ['2', '4999', '99.01', '30.81'], ['3', '5403', '99.17', '29.46']]
再次,当我在 Notepad++ 中打开底层 csv 文件时,似乎没有任何空列或空行,我已经仔细检查了那些数据文件。
最后,我想空行/列的数量可能会有所不同,因此任何解决方案都需要能够处理空行/列的变量数量。
先感谢您。
I have a python script that successfully loads a csv file into a 2d numpy array and which then successfully extracts the value of a desired cell based on its column and row header values. For diagnostic purposes, I have the script print the contents of the data matrix before it is put into a numpy array. The script works when the data from the underlying csv file contains values for all rows/columns. The problem is that it throws an error when I run the script on a csv file that apparently has a couple of empty rows/columns at the end of the csv file. I tried to address this by opening up the csv file in Notepad++ and deleting as much as it would let me delete from the end of the file. Notepad++ let me delete one row at the end, but did not indicate that there were any empty columns. Upon deeper examination of the relationship between the python printout and the structure of my underlying data, I see that the python print command is saying that there are two empty columns at the end of the array. In any event, after editing the csv file, I still got the same data printed out when I ran the script, and it still threw the same error, as if I had not deleted the empty line from the end of the csv file. I checked that I had saved the csv file, opened and closed the csv file a couple of times, and closed and re-opened python a couple of times, but the error still persists:
Here is my question:
How do I modify the script below to avoid this error?
Here is the function I was referring to above:
def GetHSD_alpha(NumberOfColumnMeans,dfResid):
dirname=os.path.dirname(os.getcwd())
resources=os.path.join(dirname,'resources')
inputfile=os.path.join(resources,'CriticalValuesOfTukeysHSD_a_0_01.csv')
separator=','
ColumnIndex=NumberOfColumnMeans
RowIndex=dfResid
cast = p.cast
data = [[] for dummy in xrange(13)]
for line in open(inputfile, 'r'):
fields = line.strip().split(separator)
for i, number in enumerate(fields):
data[i].append(number)
print 'data HSD alpha is: ',data
time.sleep(2)
CriticalValuesArray=p.array(data)
HSD_alpha_0_01=CriticalValuesArray[ColumnIndex,RowIndex]
return HSD_alpha_0_01
Also, for reference, here is an ABBREVIATED version of the result of printing the data that throws the error. Notice the empty elements at the end, which I cannot seem to manually eliminate from my csv file before running the script:
data HSD alpha is: [['', '5', '6', '7'], ['2', '5.7', '5.24', '4.95'], ['3', '6.98', '6.33', '5.92'], ['11', '10.48', '9.3', '8.55'], [], []]
Also for reference, here is the ABBREVIATED version of the result of printing data from another csv file that I imported into the script for diagnostic purposes. The data corresponding from the printout below did NOT cause the script to throw an error:
data HSD alpha is: [['', '1', '2', '3'], ['1', '4052', '98.49', '34.12'], ['2', '4999', '99.01', '30.81'], ['3', '5403', '99.17', '29.46']]
Again, when I open the underlying csv files in Notepad++, there do not seem to be any empty columns or rows, and I have checked those data files carefully.
Finally, I imagine that the number of empty rows/columns may vary, so any solution would need to be able to handle variables numbers of empty rows/columns.
Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
找到了答案。
我需要更改以下代码行:
xrange 需要设置为 11 而不是 13。
答案很简单,但需要进行大量挖掘。
该线程现已回答/完成。
found the answer.
I needed to change the following line of code:
xrange needed to be set to 11 and not to 13.
simple answer, but it took a lot of digging.
this thread is answered/finished now.
你为什么要编写自己的 csv 加载器?
numpy.loadtxt?或者在您缺少值的情况下:
numpy.genfromtxt
why do you write your own csv loader?
numpy.loadtxt? or in your case with missing values:
numpy.genfromtxt