在 Python 中绘制箱线图的元组、列表、Numpy 数组

发布于 2024-12-02 16:33:47 字数 2724 浏览 1 评论 0原文

我正在尝试为几个 csv 文件中的列绘制箱线图(当然没有标题行),但在元组、列表和数组方面遇到了一些混乱。这是我到目前为止所拥有的

    #!/usr/bin/env python

    import csv
    from numpy import *
    import pylab as p
    import matplotlib

    #open one file, until boxplot-ing works
    f = csv.reader (open('2-node.csv'))
    #get all the columns in the file
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,bytes,Latency = zip(*f)

    #Make list out of elapsed to pop the 1st element -- the header
    elapsed_list = list(elapsed)
    elapsed_list.pop(0)

    #Turn list back to a tuple
    elapsed = tuple(elapsed_list)

    #Turn list to an numpy array 
    elapsed_array = array(elapsed_list)

    #Elapsed Column statically entered into an array
    data_array = ([4631, 3641, 1902, 1937, 1745, 8937] )

    print data_array #prints in this format: ([xx,xx,xx,xx]), .__class__ is list ... ?
    print elapsed    #prints in this format: ('xx','xx','xx','xx'), .__class__ is tuple
    print elapsed_list # #print in this format: ['xx', 'xx', 'xx', 'xx', 'xx'], .__class__ is list
    print elapsed_array #prints in this format: ['xx' 'xx' 'xx' 'xx' 'xx'] -- notice no commas, .__class__ is numpy.ndarray

    p.boxplot (data_array) #works
    p.boxplot (elapsed) # does not work, error below
    p.boxplit (elapsed_list) #does not work
    p.boxplot (elapsed_array) #does not work
    p.show()

对于箱线图,第一个参数是“an数组或向量序列”,所以我认为elapsed_array会起作用......?但是,data_array,一个“列表”,可以工作......但是 elapsed_list` 一个“列表”却不能......?有没有更好的方法来做到这一点......?

我对 python 相当陌生,我想了解元组、列表和 numpy 数组之间的差异如何阻止此箱线图工作。

错误消息示例是:

Traceback (most recent call last):
  File "../pullcol.py", line 32, in <module>
    p.boxplot (elapsed_list)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/pyplot.py", line 1962, in boxplot
    ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py", line 5383, in boxplot
    q1, med, q3 = mlab.prctile(d,[25,50,75])
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 946, in prctile
    return _interpolate(values[ai],values[bi],frac)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 920, in _interpolate
    return a + (b - a)*fraction
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

I am trying to plot a boxplot for a column in several csv files (without the header row of course), but running into some confusion around tuples, lists and arrays. Here is what I have so far

    #!/usr/bin/env python

    import csv
    from numpy import *
    import pylab as p
    import matplotlib

    #open one file, until boxplot-ing works
    f = csv.reader (open('2-node.csv'))
    #get all the columns in the file
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,bytes,Latency = zip(*f)

    #Make list out of elapsed to pop the 1st element -- the header
    elapsed_list = list(elapsed)
    elapsed_list.pop(0)

    #Turn list back to a tuple
    elapsed = tuple(elapsed_list)

    #Turn list to an numpy array 
    elapsed_array = array(elapsed_list)

    #Elapsed Column statically entered into an array
    data_array = ([4631, 3641, 1902, 1937, 1745, 8937] )

    print data_array #prints in this format: ([xx,xx,xx,xx]), .__class__ is list ... ?
    print elapsed    #prints in this format: ('xx','xx','xx','xx'), .__class__ is tuple
    print elapsed_list # #print in this format: ['xx', 'xx', 'xx', 'xx', 'xx'], .__class__ is list
    print elapsed_array #prints in this format: ['xx' 'xx' 'xx' 'xx' 'xx'] -- notice no commas, .__class__ is numpy.ndarray

    p.boxplot (data_array) #works
    p.boxplot (elapsed) # does not work, error below
    p.boxplit (elapsed_list) #does not work
    p.boxplot (elapsed_array) #does not work
    p.show()

For boxplots, the 1st argument is an "an array or a sequence of vectors", so I would think elapsed_array would work ... ? But yet data_array, a "list," works ... but elapsed_list` a "list" does not ... ? Is there a better way to do this ... ?

I am fairly new to python, and I would like to understand the what about the differences among a tuple, list, and numpy-array prevents this boxplot from working.

Example error message is:

Traceback (most recent call last):
  File "../pullcol.py", line 32, in <module>
    p.boxplot (elapsed_list)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/pyplot.py", line 1962, in boxplot
    ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py", line 5383, in boxplot
    q1, med, q3 = mlab.prctile(d,[25,50,75])
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 946, in prctile
    return _interpolate(values[ai],values[bi],frac)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 920, in _interpolate
    return a + (b - a)*fraction
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冰之心 2024-12-09 16:33:47

elapsed 包含字符串。 Matplotlib 需要整数或浮点数来绘制某些内容。尝试将 elapsed 的每个值转换为整数。你可以像这样或者像下面 FredL 评论的那样这样做

elapsed = tuple([int(i) for i in elapsed])

elapsed_list = array(elapsed_list, dtype=float)

elapsed contains strings. Matplotlib needs integers or floats to plot something. Try converting each value of elapsed to integer. You can do this like so

elapsed = tuple([int(i) for i in elapsed])

or as FredL commented below:

elapsed_list = array(elapsed_list, dtype=float)
葬シ愛 2024-12-09 16:33:47

我不熟悉 numpy 或 matplotlib,但仅从描述和工作原理来看,它似乎正在寻找嵌套的序列序列。这就是为什么 data_array 工作的原因,因为它是一个包含列表的元组,而所有其他输入只有一层深。

至于区别,列表是对象的可变序列,元组是对象的不可变序列,数组是字节、整数、字符(基本上是 1、2、4 或 8 字节值)的可变序列。

这是有关 5.6 的 Python 文档的链接。序列类型,从那里您可以跳转到有关 Python 中的列表、元组、数组或任何其他序列类型的更详细信息。

I'm not familiar with numpy or matplotlib, but just from the description and what's working, it appears it is looking for a nested sequence of sequences. Which is why data_array works as it's a tuple containing a list, where as all your other input is only one layer deep.

As for the differences, a list is a mutable sequence of objects, a tuple is an immutable sequence of objects and an array is a mutable sequence of bytes, ints, chars (basically 1, 2, 4 or 8 byte values).

Here's a link to the Python docs about 5.6. Sequence Types, from there you can jump to more detailed info about lists, tuples, arrays or any of the other sequence types in Python.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文