在 R 中计算投票权指数

发布于 2024-09-19 21:48:23 字数 1595 浏览 7 评论 0原文

我有一个项目，我需要能够在 R 中计算不同的投票权指数。作为第一次尝试，我编写了一个小函数来计算 banzhaf 指数。它需要两个参数，一个包含两列的数据框，必须标记为成员和投票，以及多数票需要多少票（配额）：

library(combinat)
banzhaf <- function(data,quota){
 f <- vector()
 m <- vector()
 score <- vector()
 name <- vector()
 pivot <- vector()
 for (n in 1:nrow(data)){
  y <- as.matrix(combn(data$member,n))
  for (i in 1:ncol(y)){
   for ( j in 1:n){
    f[j] <- data[data$member == y[j,i],]$vote
    m[j] <- as.character(data[data$member == y[j,i],]$member)
    o <- data.frame(member = m, vote = f)
    }

   if (sum(o$vote) >= quota){
    for (k in 1:length(o$member)){
     t <- o[-k,]
    if (sum(t$vote) < quota){
     pivot[length(pivot) + 1] <- as.character(o$member[k])
     }
    }
   }
  }
 }

 for (l in unique(pivot)){
  score[length(score) + 1] <- sum(pivot == l)
  name[length(name) + 1] <- l
  }
 out <- data.frame(name = name, score = score/length(pivot))
 return(out)
}

此函数的问题是，当我有超过 8 个成员时，它会变得非常慢在数据框中。这是由于最外层循环中使用的 commn() 函数（我认为）。有谁知道如何使其运行得更快？

最好，Thomas

P.S：如果您想测试它，请使用以下数据，但请注意它可能会永远运行！

x <- c("Germany","France","UK","Italy","Spain","Poland","Romania","Netherlands","Greece","Portugal","Belgium","Czech Rep.","Hungary","Sweden","Austria","Bulgaria","Denmark","Slovakia","Finland","Ireland","Lithuania","Latvia","Slovenia","Estonia","Cyprus","Luxembourg","Malta")
z <- c(29,29,29,29,27,27,14,13,12,12,12,12,12,10,10,10,7,7,7,7,7,4,4,4,4,4,3)

dat <- data.frame(member = as.character(x),vote = z)

oi <- banzhaf(dat, 255)
oi

原文

I have a project in which i need to be able to calculate different voting power indexes in R. As a first attempt at this I wrote a small function to calculate the banzhaf index. It takes two arguments, a dataframe that has two columns which must be labelled member and vote, and how many votes are needed for a majority (quota):

library(combinat)
banzhaf <- function(data,quota){
 f <- vector()
 m <- vector()
 score <- vector()
 name <- vector()
 pivot <- vector()
 for (n in 1:nrow(data)){
  y <- as.matrix(combn(data$member,n))
  for (i in 1:ncol(y)){
   for ( j in 1:n){
    f[j] <- data[data$member == y[j,i],]$vote
    m[j] <- as.character(data[data$member == y[j,i],]$member)
    o <- data.frame(member = m, vote = f)
    }

   if (sum(o$vote) >= quota){
    for (k in 1:length(o$member)){
     t <- o[-k,]
    if (sum(t$vote) < quota){
     pivot[length(pivot) + 1] <- as.character(o$member[k])
     }
    }
   }
  }
 }

 for (l in unique(pivot)){
  score[length(score) + 1] <- sum(pivot == l)
  name[length(name) + 1] <- l
  }
 out <- data.frame(name = name, score = score/length(pivot))
 return(out)
}

The problem with this function is that it becomes incredibly slow when i have more than 8 members in the dataframe. This is due to the combn() function used in the outermost loop (I think). Does anyone know how this can be made to run faster?

Best, Thomas

P.S: If you want to test it use the following data, but beware that it might run forever!

x <- c("Germany","France","UK","Italy","Spain","Poland","Romania","Netherlands","Greece","Portugal","Belgium","Czech Rep.","Hungary","Sweden","Austria","Bulgaria","Denmark","Slovakia","Finland","Ireland","Lithuania","Latvia","Slovenia","Estonia","Cyprus","Luxembourg","Malta")
z <- c(29,29,29,29,27,27,14,13,12,12,12,12,12,10,10,10,7,7,7,7,7,4,4,4,4,4,3)

dat <- data.frame(member = as.character(x),vote = z)

oi <- banzhaf(dat, 255)
oi

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幽梦紫曦～ 2024-09-26 21:48:28

请原谅我重新发布这篇文章，但是有一种更有效的算法（比尝试所有联盟）可以使用动态编程来计算 Banzhaf 指数，未来的读者最好注意这一点。 [1]

总体思路是，决定性的计数联盟可以重写为可以在O(nq)时间内计算的形式，其中n是选民数量，q是配额。

注意：我使用的是 1 索引，这是常见的数学约定。如果您想使用 0 索引，则必须稍微改变一下。

设

L(i) = { 1, 2, ..., i }
U(i) = { i, i+1, ..., n }.

w(i) 为投票者 i 的投票权重，w(S) 为投票者中每个投票者的投票权重集合S。

决定性联盟的数量由

D(i) = |{S : S ⊆ p ∧ i∉S ∧ ( q - w(i) ≤ ∑w(S) < q ) }|.

下式指定：即，在没有 i 的情况下不满足配额，但如果添加 i 则满足配额的联盟数量。

我们可以将联盟分为上层和下层选民集，如下所示：

|{S : S ⊆ p ∧ i∉S ∧ ( q - w(i) ≤ ∑w(S ∩ L(i)) + ∑w(S ∩ U(i)) < q ) }|

因为 S 不包括 i。此外，这又等于

|{S : S ⊆ p ∧ ( q - w(i) ≤ ∑w(S ∩ L(i-1)) + ∑w(S ∩ U(i+1)) < q ) }|.

但是，我们知道 L(i-1)) 和 U(i+1) 不重叠，所以我们可以选择两个集合 S1 和 S2 这样

|{(S1,S2) : S1 ⊆ L(i-1) ∧ S2 ⊆ U(i+1) ∧ ( q - w(i) ≤ ∑w(S1) + ∑w(S2)) < q ) }|.

最后，我们可以像这样分解元组集

∑[y : q-w(i)≤y<q] |{S : S ⊆ L(i-1) ∧ ∑w(S) = y}| *
  (∑[z : max(q-w(i)-y,0)≤z<q] |{S : S ⊆ U(i+1) ∧ ∑w(S) = z}|)

这些总和的特定界限可以保证

q - w(i) ≤ y + z < q.

这就是关键见解的来源预先计算集合计数

l(i, y) = |{S : S ⊆ L(i-1) ∧ ∑w(S) = y}|
u(i, z) = |{S : S ⊆ U(i+1) ∧ ∑w(S) = z}|

我们可以在 O(nq) 时间内

l(1, y)   = (if y = 0 then 1 else 0)
l(i+1, y) = l(i, y) + (if y ≥ w(i) then l(i, y-w(i)) else 0)

u(i-1, z) = u(i, z) + (if y ≥ w(i) then u(i, y-w(i)) else 0)
u(n, z) = (if z = 0 then 1 else 0)

，因为它还有助于一次性计算 u 的总和；您只需要计算

us(i, 0)   = u(i, 0)
us(i, z+1) = u(i, z+1) + us(i, z)

小于配额的 y 和 z，因此您只需分配三个大小为 n * q 的表，计算 l、u 和 us，然后返回

∑[y : q-w(i)≤y<q] l(i, y) * ( us(i, q-1) - us(i, max(q-w(i)-y,0)) )

Forgive me for resurrecting this post, but there's a more efficient algorithm (than trying all coalitions) to compute the Banzhaf index using dynamic programming that it would be good to note for future readers. [1]

The general idea is that the count of decisive coalitions can be rewritten into a form which can be calculated in O(nq) time, where n is the number of voters and q is the quota.

NOTE: I'm using 1-indexing, as is common math convention. You'll have to shift things round a little if you want to use 0-indexing.

Let

L(i) = { 1, 2, ..., i }
U(i) = { i, i+1, ..., n }.

Let w(i) be the vote weight of voter i, and w(S) be the set of vote weights of every voter in S.

The number of decisive coalitions is specified by

D(i) = |{S : S ⊆ p ∧ i∉S ∧ ( q - w(i) ≤ ∑w(S) < q ) }|.

That is, the number of coalitions that do not meet quota without i, but would meet quota if i were added.

We can break the coalitions into upper and lower voter sets like this:

|{S : S ⊆ p ∧ i∉S ∧ ( q - w(i) ≤ ∑w(S ∩ L(i)) + ∑w(S ∩ U(i)) < q ) }|

as S doesn't include i. Further, that is in turn equal to

|{S : S ⊆ p ∧ ( q - w(i) ≤ ∑w(S ∩ L(i-1)) + ∑w(S ∩ U(i+1)) < q ) }|.

However, we know L(i-1)) and U(i+1) don't overlap, so we can just choose two sets S1 and S2 such that

|{(S1,S2) : S1 ⊆ L(i-1) ∧ S2 ⊆ U(i+1) ∧ ( q - w(i) ≤ ∑w(S1) + ∑w(S2)) < q ) }|.

Lastly, we can break up the tuple set like this

∑[y : q-w(i)≤y<q] |{S : S ⊆ L(i-1) ∧ ∑w(S) = y}| *
  (∑[z : max(q-w(i)-y,0)≤z<q] |{S : S ⊆ U(i+1) ∧ ∑w(S) = z}|)

Those particular bounds on the sums are there to guarantee that

q - w(i) ≤ y + z < q.

This is where the key insight comes in. We can precompute the set counts

l(i, y) = |{S : S ⊆ L(i-1) ∧ ∑w(S) = y}|
u(i, z) = |{S : S ⊆ U(i+1) ∧ ∑w(S) = z}|

in O(nq) time, as

l(1, y)   = (if y = 0 then 1 else 0)
l(i+1, y) = l(i, y) + (if y ≥ w(i) then l(i, y-w(i)) else 0)

u(i-1, z) = u(i, z) + (if y ≥ w(i) then u(i, y-w(i)) else 0)
u(n, z) = (if z = 0 then 1 else 0)

It also helps to compute the sums of u all at once; define

us(i, 0)   = u(i, 0)
us(i, z+1) = u(i, z+1) + us(i, z)

You only ever need to compute these for y and z less than the quota, so you just allocate three tables of size n * q, compute l, u, and us, then return

∑[y : q-w(i)≤y<q] l(i, y) * ( us(i, q-1) - us(i, max(q-w(i)-y,0)) )

回复收藏 0 原文

夏花。依旧 2024-09-26 21:48:27

您的示例数据框有 27 行，您正在查看每个集合（空集除外），因此至少有 2^27 - 1 = 134 217 727 次操作...这将需要一些时间。也就是说，这是我认为更有效的代码版本。它似乎至少与维基百科文章相匹配： http://en.wikipedia.org/wiki/Banzhaf_power_index< /a>

banzhaf1 <- function(data, quota) {
  n <- nrow(data)
  vote <- data$vote
  swingsPerIndex <- numeric(n)
  for (setSize in 1:n) {
    sets <- utils::combn(n, setSize)
    numSets <- ncol(sets)
    flatSets <- as.vector(sets)
    voteMatrix <- matrix(vote[flatSets], nrow=setSize, ncol=numSets)
    totals <- colSums(voteMatrix)
    aboveQuota <- totals >= quota
    totalsMatrix <- matrix(rep(totals, each=setSize), nrow=setSize, ncol=numSets)
    winDiffs <- totalsMatrix[, aboveQuota] - voteMatrix[, aboveQuota]
    winSets <- sets[, aboveQuota]
    swingers <- as.vector(winSets[winDiffs < quota])
    swingsPerIndex <- swingsPerIndex + tabulate(swingers, n)
  }
  return(data.frame(name=data$member, score=swingsPerIndex / sum(swingsPerIndex)))
}

（我还没有尝试在完整的数据集上运行它。）

我认为要真正有效地解决这个问题，您必须利用问题的结构。例如，一旦您知道集合 X 的投票总和高于配额，那么您就知道 X 联合 Y 也高于配额。我不确定 R 是否适合遵循这样的结构。

Your example data frame has 27 rows and you're looking at every set (except the null set) so that's 2^27 - 1 = 134 217 727 operations at least... this is going to take some time. That said, here's what I believe to be a more efficient version of your code. It seems to match the Wikipedia article at least: http://en.wikipedia.org/wiki/Banzhaf_power_index

banzhaf1 <- function(data, quota) {
  n <- nrow(data)
  vote <- data$vote
  swingsPerIndex <- numeric(n)
  for (setSize in 1:n) {
    sets <- utils::combn(n, setSize)
    numSets <- ncol(sets)
    flatSets <- as.vector(sets)
    voteMatrix <- matrix(vote[flatSets], nrow=setSize, ncol=numSets)
    totals <- colSums(voteMatrix)
    aboveQuota <- totals >= quota
    totalsMatrix <- matrix(rep(totals, each=setSize), nrow=setSize, ncol=numSets)
    winDiffs <- totalsMatrix[, aboveQuota] - voteMatrix[, aboveQuota]
    winSets <- sets[, aboveQuota]
    swingers <- as.vector(winSets[winDiffs < quota])
    swingsPerIndex <- swingsPerIndex + tabulate(swingers, n)
  }
  return(data.frame(name=data$member, score=swingsPerIndex / sum(swingsPerIndex)))
}

(I haven't tried running this on the full data set.)

I think to really approach this problem efficiently, you'll have to take advantage of the structure of the problem. For instance, once you know set X has vote sum above quota, then you know that X union Y is also above quota. I'm not sure if R will be well-suited to following such structure.

回复收藏 0 原文

鲜肉鲜肉永远不皱 2024-09-26 21:48:26

我的方法与 David 的方法类似，使用批处理矩阵运算来处理大小：

banzhaf = function(votes, pass=sum(votes) %/% 2 + 1, batch.size=500000, quiet=batches == 1) {
  n = length(votes)
  batches = ceiling((2^n / batch.size))
  if (!quiet)
    cat('calculating...\n')
  Reduce(`+`, lapply(1:batches, function(b) {
    if (!quiet)
      cat('-', b, '/', batches, '\n')
    i = ((b - 1) * batch.size + 1):min(2^n, b * batch.size)
    m = do.call(cbind, lapply(as.integer(2^((1:n) - 1L)), function(j, k) (k %/% j) %% 2L, i))
    x = drop(m %*% votes)
    passed = x >= pass
    colSums((outer(x[passed] - pass, votes, `<`) * m[passed, , drop=F]))
  }))
}

使用 R 的名称传播而不是 data.frame，尽可能避免循环，并在可能的情况下使用整数而不是数字。在我的盒子上运行仍然花费了超过 6 分钟的时间：

# wikipedia examples
banzhaf(c(A=4, B=3, C=2, D=1), 6)
banzhaf(c('Hempstead #1'=9, 'Hempstead #2'=9, 'North Hempstead'=7, 'Oyster Bay'=3, 'Glen Cove'=1, 'Long Beach'=1), 16)

# stackoverflow data
system.time(banzhaf(setNames(as.integer(z), x), 255))

想法是这样的：

2^n 种可能的结果（每个玩家 2 个结果，n 个独立玩家）
由数字表示，1:2^n（cf 'i'）
表示二进制数字给出了每个玩家的投票。
使用模数和除法将位提取到投票矩阵中（参见“m”），代替按位运算（我相信最近才添加到 R 中）。

在那之后，我认为它的表现方式与大卫的相同。唯一的复杂之处是确保使用整数来提高效率，并添加批处理，因为创建 27:2^27 的矩阵实际上并不可行！

My approach was similar to David's, using batched matrix operations to handle the size:

banzhaf = function(votes, pass=sum(votes) %/% 2 + 1, batch.size=500000, quiet=batches == 1) {
  n = length(votes)
  batches = ceiling((2^n / batch.size))
  if (!quiet)
    cat('calculating...\n')
  Reduce(`+`, lapply(1:batches, function(b) {
    if (!quiet)
      cat('-', b, '/', batches, '\n')
    i = ((b - 1) * batch.size + 1):min(2^n, b * batch.size)
    m = do.call(cbind, lapply(as.integer(2^((1:n) - 1L)), function(j, k) (k %/% j) %% 2L, i))
    x = drop(m %*% votes)
    passed = x >= pass
    colSums((outer(x[passed] - pass, votes, `<`) * m[passed, , drop=F]))
  }))
}

Uses R's name propagation instead of a data.frame, avoid loops where possible, and use integers instead of numerics if possible. Still took over 6 minutes to run on my box:

# wikipedia examples
banzhaf(c(A=4, B=3, C=2, D=1), 6)
banzhaf(c('Hempstead #1'=9, 'Hempstead #2'=9, 'North Hempstead'=7, 'Oyster Bay'=3, 'Glen Cove'=1, 'Long Beach'=1), 16)

# stackoverflow data
system.time(banzhaf(setNames(as.integer(z), x), 255))

The thinking went something like:

2^n possible outcomes (2 outcomes per player, n independent players)
represented by the numbers the 1:2^n (cf 'i')
expressing the number in binary gives each player's vote.
using modulus and division to extract the bits into a voting matrix (cf 'm'), in lieu of bitwise ops (only added to R recently I believe).

After that I think it plays out in the same manner as David's. The only complication was ensuring use of integers for efficiency, and adding the batching as its not really feasible to create a matrix of 27:2^27!

回复收藏 0 原文

~没有更多了~