如何在Python中将一系列浮点值合并到直方图中？

发布于 2024-08-11 03:37:14 字数 896 浏览 5 评论 0原文

我有一组浮点值（始终小于 0）。我想将其合并到直方图中， IE。直方图中的每个条形都包含值范围 [0,0.150)

我拥有的数据如下所示：

在下面的代码中，我期望得到的结果看起来像

[0, 0.005) 5
[0.005, 0.011) 0
...etc..

我尝试用我的这段代码进行这样的分箱。但这似乎不起作用。正确的做法是什么？

#! /usr/bin/env python


import fileinput, math

log2 = math.log(2)

def getBin(x):
    return int(math.log(x+1)/log2)

diffCounts = [0] * 5

for line in fileinput.input():
    words = line.split()
    diff = float(words[0]) * 1000;

    diffCounts[ str(getBin(diff)) ] += 1

maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)


for i in range(maxBin+1):
     lo = 2**i - 1
     hi = 2**(i+1) - 1
     binStr = '[' + str(lo) + ',' + str(hi) + ')'
     print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

～

原文

I have set of value in float (always less than 0). Which I want to bin into histogram,
i,e. each bar in histogram contain range of value [0,0.150)

The data I have looks like this:

Whith my code below I expect to get result that looks like

[0, 0.005) 5
[0.005, 0.011) 0
...etc..

I tried to do do such binning with this code of mine.
But it doesn't seem to work. What's the right way to do it?

#! /usr/bin/env python


import fileinput, math

log2 = math.log(2)

def getBin(x):
    return int(math.log(x+1)/log2)

diffCounts = [0] * 5

for line in fileinput.input():
    words = line.split()
    diff = float(words[0]) * 1000;

    diffCounts[ str(getBin(diff)) ] += 1

maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)


for i in range(maxBin+1):
     lo = 2**i - 1
     hi = 2**(i+1) - 1
     binStr = '[' + str(lo) + ',' + str(hi) + ')'
     print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

断桥再见 2024-08-18 03:37:14

如果可能的话，不要重新发明轮子。 NumPy 拥有您需要的一切：

#!/usr/bin/env python
import numpy as np

a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0.     0.005  0.124  0.     0.004  0.     0.111  0.112]

# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0.    0.15]

# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0.     0.031  0.062  0.093  0.124]

When possible, don't reinvent the wheel. NumPy has everything you need:

#!/usr/bin/env python
import numpy as np

a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0.     0.005  0.124  0.     0.004  0.     0.111  0.112]

# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0.    0.15]

# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0.     0.031  0.062  0.093  0.124]

回复收藏 0 原文

溺深海 2024-08-18 03:37:14

from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
    data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
    bincounts.append(0)
for d in data:
    b = int((d - minv) / (maxv - minv) * B)
    bincounts[b] += 1
# plot histogram

plot(bincounts,'o')
show()

from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
    data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
    bincounts.append(0)
for d in data:
    b = int((d - minv) / (maxv - minv) * B)
    bincounts[b] += 1
# plot histogram

plot(bincounts,'o')
show()

回复收藏 0 原文

涫野音 2024-08-18 03:37:14

第一个错误是：

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers

当需要 str 时，为什么要将 int 转换为 str？解决这个问题，然后我们得到：

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range

因为你只做了 5 个桶。我不明白你的存储方案，但让我们将其设置为 50 个存储桶，看看会发生什么：

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 21, in <module>
    maxBin = max(maxdiff)
TypeError: 'int' object is not iterable

maxdiff 是整数列表中的单个值，那么 max 是多少？在这里做什么？删除它，现在我们得到：

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 28, in <module>
    print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration

果然，您使用单个值作为 map 的第二个参数。让我们将最后两行从这样简化：

 binStr = '[' + str(lo) + ',' + str(hi) + ')'
 print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

到这样：

 print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])

Now it prints:

6
[0.000000, 1.000000)    3
[1.000000, 3.000000)    0
[3.000000, 7.000000)    2
[7.000000, 15.000000)   0
[15.000000, 31.000000)  0
[31.000000, 63.000000)  0
[63.000000, 127.000000) 3

我不知道在这里还能做什么，因为我不太了解您希望使用的分桶。它似乎涉及二进制权力，但对我来说没有意义......

The first error is:

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers

Why are you converting an int to a str when a str is needed? Fix that, then we get:

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range

because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 21, in <module>
    maxBin = max(maxdiff)
TypeError: 'int' object is not iterable

maxdiff is a single value out of your list of ints, so what is max doing here? Remove it, now we get:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 28, in <module>
    print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration

Sure enough, you're using a single value as the second argument to map. Let's simplify the last two lines from this:

 binStr = '[' + str(lo) + ',' + str(hi) + ')'
 print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

to this:

 print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])

Now it prints:

6
[0.000000, 1.000000)    3
[1.000000, 3.000000)    0
[3.000000, 7.000000)    2
[7.000000, 15.000000)   0
[15.000000, 31.000000)  0
[31.000000, 63.000000)  0
[63.000000, 127.000000) 3

I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...

回复收藏 0 原文

~没有更多了~