估计幂律分布中的指数截止

发布于 2024-12-29 22:18:15 字数 519 浏览 2 评论 0原文

当我进行一些社交网络分析时,我偶然发现了在网络度上拟合概率分布的问题。

因此,我有一个概率分布 P(X >= x),从目视检查来看,它遵循具有指数截止的幂律,而不是纯幂律(直线)。

因此,假设具有指数截止的幂律分布方程为:

f(x) = x**alpha * exp(beta*x)

如何使用 Python 估计参数 alphabeta

我知道 scipy.stats.powerlaw 包存在,并且它们有一个 .fit() 函数,但这似乎不起作用,因为它只返回绘图的位置和比例,这似乎仅对正态分布有用吗?这个包上也没有足够的教程。

PS 我很清楚 CLauset 等人 的实现,但他们不知道似乎提供了估计替代分布参数的方法。

As I have been doing some social network analysis, I have stumbled upon the problem of fitting a probability distribution on network degree.

So, I have a probability distribution P(X >= x) which, from visual inspection, follows a power law with an exponential cutoff rather than a pure power law (a straight line).

So, given that the equation for power law distribution with exponential cutoff is:

f(x) = x**alpha * exp(beta*x)

How might I estimate the parameters alpha and beta using Python?

I know scipy.stats.powerlaw package exists and they have a .fit() function but that doesn't seem to do the job as it only returns the location and scale of the plot, which seems to be useful only for normal distribution? There are also not enough tutorials on this package.

P.S. I'm well aware of the implementation of CLauset et al but they don't seem to provide ways to estimate the parameters of alternate distributions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烟凡古楼 2025-01-05 22:18:15

Powerlaw库可以直接用于估计参数,如下:

  1. 安装所有python依赖项:

    pip install powerlaw mpmath scipy
    
  2. 在 python 环境中运行 powerlaw 包:

    导入幂律
    数据 = [5, 4, ... ]
    结果 = powerlaw.Fit(数据)
    
  3. 从结果中获取参数

    results.truncated_power_law.parameter1 # 幂律参数(alpha)
    results.truncated_power_law.parameter2 # 指数截止参数(beta)
    

Powerlaw library can directly be used to estimate the parameters as follows:

  1. Install all the pythons dependencies:

    pip install powerlaw mpmath scipy
    
  2. Run the powerlaw package fit in a python environment:

    import powerlaw
    data = [5, 4, ... ]
    results = powerlaw.Fit(data)
    
  3. get the parameters from the results

    results.truncated_power_law.parameter1 # power law  parameter (alpha)
    results.truncated_power_law.parameter2 # exponential cut-off parameter (beta)
    
放飞的风筝 2025-01-05 22:18:15

函数 scipy.stats.powerlaw.fit 可能仍然适用于您的目的。 scipy.stats 中的分布如何工作有点令人困惑(每个分布的文档都引用了可选参数 loc 和 scale,尽管并非所有分布都使用这些参数,并且每个分布的使用方式都不同)。如果您查看文档:

http://docs。 scipy.org/doc/scipy/reference/ generated/scipy.stats.powerlaw.html

还有第二个非可选参数“a”,即“形状参数”。在幂律的情况下,它包含单个参数。不用担心“loc”和“scale”。

编辑:抱歉,忘记了您也想要 beta 参数。最好的方法可能是自己定义所需的幂律函数,然后使用 scipy 的通用拟合算法来学习参数。例如:
http://www.scipy.org/Cookbook/FittingData#head-5eba0779a34c07f5a596bbcf99dbc7886eac18e5

The function scipy.stats.powerlaw.fit may still work for your purposes. It's a bit confusing how the distributions in scipy.stats work (the documentation for each one refers to the optional parameters loc and scale, even though not all of them use these parameters, and each uses them differently). If you look at the docs:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.powerlaw.html

there's also a second non-optional parameter "a", which is the "shape parameters". In the case of powerlaw, this contains a single parameter. Don't worry about "loc" and "scale".

Edit: Sorry, forgot that you wanted the beta parameter too. Your best best may be to define the powerlaw function you want yourself, and then use scipy's generic fitting algorithms to learn the parameters. For example:
http://www.scipy.org/Cookbook/FittingData#head-5eba0779a34c07f5a596bbcf99dbc7886eac18e5

|煩躁 2025-01-05 22:18:15

以下是通过最大化 R 中的可能性来估计具有指数截止的幂律的标度指数和指数率的方法:

# Input: Data vector, lower threshold
# Output: List, giving type ("powerexp"), scaling exponent, exponential rate, lower threshold, log-likelihood


powerexp.fit <- function(data,threshold=1,method="constrOptim",initial_rate=-1) {
  x <- data[data>=threshold]
  negloglike <- function(theta) {
    -powerexp.loglike(x,threshold,exponent=theta[1],rate=theta[2])
  }
  # Fit a pure power-law distribution
  pure_powerlaw <- pareto.fit(data,threshold)
  # Use this as a first guess at the exponent
  initial_exponent <- pure_powerlaw$exponent
  if (initial_rate < 0) { initial_rate <- exp.fit(data,threshold)$rate }
  minute_rate <- 1e-6
  theta_0 <- as.vector(c(initial_exponent,initial_rate))
  theta_1 <- as.vector(c(initial_exponent,minute_rate))
  switch(method,
    constrOptim = {
      # Impose the constraint that rate >= 0
      # and that exponent >= -1
      ui <- rbind(c(1,0),c(0,1))
      ci <- c(-1,0)
      # Can't start with values on the boundary of the feasible set so add
      # tiny amounts just in case
      if (theta_0[1] == -1) {theta_0[1] <- theta_0[1] + minute_rate}
      if (theta_0[2] == 0) {theta_0[2] <- theta_0[2] + minute_rate}
      est <- constrOptim(theta=theta_0,f=negloglike,grad=NULL,ui=ui,ci=ci)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    optim = {
      est <- optim(par=theta_0,fn=negloglike)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    nlm = {
      est.0 <- nlm(f=negloglike,p=theta_0)
      est.1 <- nlm(f=negloglike,p=theta_1)
      est <- est.0
      if (-est.1$minimum > -est.0$minimum) { est <- est.1;cat("NLM had to switch\n") }
      alpha <- est$estimate[1]
      lambda <- est$estimate[2]
      loglike <- -est$minimum},
    {cat("Unknown method",method,"\n"); alpha<-NA; lambda<-NA; loglike<-NA}
  )
  fit <- list(type="powerexp", exponent=alpha, rate=lambda, xmin=threshold,
              loglike=loglike, samples.over.threshold=length(x))
  return(fit)
}

查看 https://github.com/jeffalstott/powerlaw/ 了解更多信息

Here is a means of estimating the scaling exponent and exponential rate of power law with exponential cut-off by maximizing likelihood in R:

# Input: Data vector, lower threshold
# Output: List, giving type ("powerexp"), scaling exponent, exponential rate, lower threshold, log-likelihood


powerexp.fit <- function(data,threshold=1,method="constrOptim",initial_rate=-1) {
  x <- data[data>=threshold]
  negloglike <- function(theta) {
    -powerexp.loglike(x,threshold,exponent=theta[1],rate=theta[2])
  }
  # Fit a pure power-law distribution
  pure_powerlaw <- pareto.fit(data,threshold)
  # Use this as a first guess at the exponent
  initial_exponent <- pure_powerlaw$exponent
  if (initial_rate < 0) { initial_rate <- exp.fit(data,threshold)$rate }
  minute_rate <- 1e-6
  theta_0 <- as.vector(c(initial_exponent,initial_rate))
  theta_1 <- as.vector(c(initial_exponent,minute_rate))
  switch(method,
    constrOptim = {
      # Impose the constraint that rate >= 0
      # and that exponent >= -1
      ui <- rbind(c(1,0),c(0,1))
      ci <- c(-1,0)
      # Can't start with values on the boundary of the feasible set so add
      # tiny amounts just in case
      if (theta_0[1] == -1) {theta_0[1] <- theta_0[1] + minute_rate}
      if (theta_0[2] == 0) {theta_0[2] <- theta_0[2] + minute_rate}
      est <- constrOptim(theta=theta_0,f=negloglike,grad=NULL,ui=ui,ci=ci)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    optim = {
      est <- optim(par=theta_0,fn=negloglike)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    nlm = {
      est.0 <- nlm(f=negloglike,p=theta_0)
      est.1 <- nlm(f=negloglike,p=theta_1)
      est <- est.0
      if (-est.1$minimum > -est.0$minimum) { est <- est.1;cat("NLM had to switch\n") }
      alpha <- est$estimate[1]
      lambda <- est$estimate[2]
      loglike <- -est$minimum},
    {cat("Unknown method",method,"\n"); alpha<-NA; lambda<-NA; loglike<-NA}
  )
  fit <- list(type="powerexp", exponent=alpha, rate=lambda, xmin=threshold,
              loglike=loglike, samples.over.threshold=length(x))
  return(fit)
}

Check out https://github.com/jeffalstott/powerlaw/ for more info

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文