如何在 MATLAB 中标准化直方图?

发布于 2024-10-22 01:32:56 字数 35 浏览 2 评论 0 原文

如何对直方图进行归一化,使概率密度函数下的面积等于 1?

How to normalize a histogram such that the area under the probability density function is equal to 1?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

七度光 2024-10-29 01:32:56

我对此的回答与对您的 之前的问题。对于概率密度函数,整个空间的积分为 1。除以总和不会给出正确的密度。要获得正确的密度,必须除以面积。为了说明我的观点,请尝试以下示例。

[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution

% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off

% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off

您可以亲自查看哪种方法与正确答案一致(红色曲线)。

在此处输入图像描述

另一种标准化直方图的方法(比方法 2 更简单)是除以 sum( f * dx) 表示概率密度函数的积分,即

% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off

My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.

[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution

% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off

% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off

You can see for yourself which method agrees with the correct answer (red curve).

enter image description here

Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.

% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
生来就爱笑 2024-10-29 01:32:56

自 2014b 起,Matlab 将这些标准化例程原生嵌入直方图函数中(请参阅帮助文件 该函数提供的 6 个例程)。以下是使用 PDF 标准化 的示例(所有 bin 的总和为 1)。

data = 2*randn(5000,1) + 5;             % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf')   % PDF normalization

相应的 PDF 是

Nbins = h.NumBins;
edges = h.BinEdges; 
x = zeros(1,Nbins);
for counter=1:Nbins
    midPointShift = abs(edges(counter)-edges(counter+1))/2;
    x(counter) = edges(counter)+midPointShift;
end

mu = mean(data);
sigma = std(data);

f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));

两者一起给出

hold on;
plot(x,f,'LineWidth',1.5)

在此处输入图像描述

一项改进很可能是由于实际问题和接受的答案的成功!


编辑 - hist 的使用histc 不推荐 现在,应该使用直方图来代替。请注意,使用此新函数创建 bin 的 6 种方法都不会产生 histhistc 生成的 bin。有一个 Matlab 脚本可以更新以前的代码以适应 histogram 的调用方式(bin 边缘而不是 bin 中心 - 链接)。通过这样做,我们可以比较@abcd(trapzsum)和Matlab( pdf)。

3 pdf 标准化方法给出几乎相同的结果(在 eps 范围内)

测试:

A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));

figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');

subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');

subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');


subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');

dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);

max(normalization_difference_trapz)
max(normalization_difference_sum)

enter image description here

新的 PDF 标准化与前一个标准化之间的最大差异为 5.5511e-17。

Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).

data = 2*randn(5000,1) + 5;             % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf')   % PDF normalization

The corresponding PDF is

Nbins = h.NumBins;
edges = h.BinEdges; 
x = zeros(1,Nbins);
for counter=1:Nbins
    midPointShift = abs(edges(counter)-edges(counter+1))/2;
    x(counter) = edges(counter)+midPointShift;
end

mu = mean(data);
sigma = std(data);

f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));

The two together gives

hold on;
plot(x,f,'LineWidth',1.5)

enter image description here

An improvement that might very well be due to the success of the actual question and accepted answer!


EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of @abcd (trapz and sum) and Matlab (pdf).

The 3 pdf normalization method give nearly identical results (within the range of eps).

TEST:

A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));

figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');

subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');

subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');


subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');

dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);

max(normalization_difference_trapz)
max(normalization_difference_sum)

enter image description here

The maximum difference between the new PDF normalization and the former one is 5.5511e-17.

橘亓 2024-10-29 01:32:56

hist 不仅可以绘制直方图,还可以返回每个 bin 中的元素计数,因此您可以获得该计数,通过将每个 bin 除以总数来标准化它,并使用 绘制结果栏。示例:

Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)

或者如果您想要一行:

bar(hist(Y) ./ sum(hist(Y)))

文档:

编辑:此解决方案回答了问题如何获得所有垃圾箱的总和等于 1。仅当您的 bin 大小相对于数据方差较小时,此近似值才有效。这里使用的和对应于一个简单的求积公式,可以使用更复杂的公式,例如RM提出的trapz

hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:

Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)

or if you want a one-liner:

bar(hist(Y) ./ sum(hist(Y)))

Documentation:

Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.

妳是的陽光 2024-10-29 01:32:56
[f,x]=hist(data)

每个单独条形的面积为高度*宽度。由于 MATLAB 将为条形选择等距点,因此宽度为:

delta_x = x(2) - x(1)

现在,如果我们将所有单个条形相加,则总面积将如下所示

A=sum(f)*delta_x

因此,正确缩放的图可以通过以下方式获得

bar(x, f/sum(f)/(x(2)-x(1)))
[f,x]=hist(data)

The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:

delta_x = x(2) - x(1)

Now if we sum up all the individual bars the total area will come out as

A=sum(f)*delta_x

So the correctly scaled plot is obtained by

bar(x, f/sum(f)/(x(2)-x(1)))
梦幻之岛 2024-10-29 01:32:56

abcd 的 PDF 区域不是一个,正如许多评论中指出的那样,这是不可能的。
这里许多答案中所做的假设

  1. 假设连续边缘之间的距离恒定。
  2. pdf 下的概率应为 1。归一化应以 probabilityNormalization 方式完成,而不是使用 Normalization 的方式进行code>pdf,在 histogram() 和 hist() 中。

图1 hist()方法的输出,图2 histogram()方法的输出

在此处输入图像描述
输入图片这里的描述

两种方法之间的最大幅度不同,这表明 hist() 的方法存在一些错误,因为 histogram() 的方法使用标准归一化。
我认为 hist() 方法的错误在于部分归一化为 pdf,而不是完全归一化为概率。

使用 hist() 的代码 [已弃用]

一些备注

  1. 首先检查:如果手动设置 Nbinssum(f)/N 给出 1
  2. pdf 需要图中 g 中 bin 的宽度 (dx)

代码

%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND

%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off

输出如图 1 所示。

使用 histogram() 的代码

一些备注

  1. 首先检查: a) Nbins,则 >sum(f) 为 1,b) sum(f)/N<如果 Nbins 是手动设置且未标准化,则 /code> 为 1。
  2. 输出中 bin 的宽度 (dx),

pdf 需要图 2 中的 g代码

%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;

figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges; 
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
    midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
    x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off

并且满足预期输出:面积 1.0000。

Matlab:2016a
系统:Linux Ubuntu 16.04 64位
Linux内核4.6

The area of abcd`s PDF is not one, which is impossible like pointed out in many comments.
Assumptions done in many answers here

  1. Assume constant distance between consecutive edges.
  2. Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().

Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach

enter image description here
enter image description here

The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization.
I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.

Code with hist() [deprecated]

Some remarks

  1. First check: sum(f)/N gives 1 if Nbins manually set.
  2. pdf requires the width of the bin (dx) in the graph g

Code

%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND

%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off

Output is in Fig. 1.

Code with histogram()

Some remarks

  1. First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
  2. pdf requires the width of the bin (dx) in the graph g

Code

%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;

figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges; 
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
    midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
    x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off

Output in Fig. 2 and expected output is met: area 1.0000.

Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6

梦一生花开无言 2024-10-29 01:32:56

对于某些分布,柯西我认为,我发现 trapz 会高估面积,因此 pdf 会根据您选择的 bin 数量而变化。在这种情况下我会这样做

[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')

For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do

[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
清音悠歌 2024-10-29 01:32:56

有一个关于 MATLAB 中的直方图调整的优秀指南 (原始链接已损坏archive.org 链接),
第一部分是直方图拉伸。

There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link),
the first part is on Histogram Stretching.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文