在 R 中计算投票权指数
我有一个项目,我需要能够在 R 中计算不同的投票权指数。作为第一次尝试,我编写了一个小函数来计算 banzhaf 指数。它需要两个参数,一个包含两列的数据框,必须标记为成员和投票,以及多数票需要多少票(配额):
library(combinat)
banzhaf <- function(data,quota){
f <- vector()
m <- vector()
score <- vector()
name <- vector()
pivot <- vector()
for (n in 1:nrow(data)){
y <- as.matrix(combn(data$member,n))
for (i in 1:ncol(y)){
for ( j in 1:n){
f[j] <- data[data$member == y[j,i],]$vote
m[j] <- as.character(data[data$member == y[j,i],]$member)
o <- data.frame(member = m, vote = f)
}
if (sum(o$vote) >= quota){
for (k in 1:length(o$member)){
t <- o[-k,]
if (sum(t$vote) < quota){
pivot[length(pivot) + 1] <- as.character(o$member[k])
}
}
}
}
}
for (l in unique(pivot)){
score[length(score) + 1] <- sum(pivot == l)
name[length(name) + 1] <- l
}
out <- data.frame(name = name, score = score/length(pivot))
return(out)
}
此函数的问题是,当我有超过 8 个成员时,它会变得非常慢在数据框中。这是由于最外层循环中使用的 commn() 函数(我认为)。有谁知道如何使其运行得更快?
最好,Thomas
P.S:如果您想测试它,请使用以下数据,但请注意它可能会永远运行!
x <- c("Germany","France","UK","Italy","Spain","Poland","Romania","Netherlands","Greece","Portugal","Belgium","Czech Rep.","Hungary","Sweden","Austria","Bulgaria","Denmark","Slovakia","Finland","Ireland","Lithuania","Latvia","Slovenia","Estonia","Cyprus","Luxembourg","Malta")
z <- c(29,29,29,29,27,27,14,13,12,12,12,12,12,10,10,10,7,7,7,7,7,4,4,4,4,4,3)
dat <- data.frame(member = as.character(x),vote = z)
oi <- banzhaf(dat, 255)
oi
I have a project in which i need to be able to calculate different voting power indexes in R. As a first attempt at this I wrote a small function to calculate the banzhaf index. It takes two arguments, a dataframe that has two columns which must be labelled member and vote, and how many votes are needed for a majority (quota):
library(combinat)
banzhaf <- function(data,quota){
f <- vector()
m <- vector()
score <- vector()
name <- vector()
pivot <- vector()
for (n in 1:nrow(data)){
y <- as.matrix(combn(data$member,n))
for (i in 1:ncol(y)){
for ( j in 1:n){
f[j] <- data[data$member == y[j,i],]$vote
m[j] <- as.character(data[data$member == y[j,i],]$member)
o <- data.frame(member = m, vote = f)
}
if (sum(o$vote) >= quota){
for (k in 1:length(o$member)){
t <- o[-k,]
if (sum(t$vote) < quota){
pivot[length(pivot) + 1] <- as.character(o$member[k])
}
}
}
}
}
for (l in unique(pivot)){
score[length(score) + 1] <- sum(pivot == l)
name[length(name) + 1] <- l
}
out <- data.frame(name = name, score = score/length(pivot))
return(out)
}
The problem with this function is that it becomes incredibly slow when i have more than 8 members in the dataframe. This is due to the combn() function used in the outermost loop (I think). Does anyone know how this can be made to run faster?
Best, Thomas
P.S: If you want to test it use the following data, but beware that it might run forever!
x <- c("Germany","France","UK","Italy","Spain","Poland","Romania","Netherlands","Greece","Portugal","Belgium","Czech Rep.","Hungary","Sweden","Austria","Bulgaria","Denmark","Slovakia","Finland","Ireland","Lithuania","Latvia","Slovenia","Estonia","Cyprus","Luxembourg","Malta")
z <- c(29,29,29,29,27,27,14,13,12,12,12,12,12,10,10,10,7,7,7,7,7,4,4,4,4,4,3)
dat <- data.frame(member = as.character(x),vote = z)
oi <- banzhaf(dat, 255)
oi
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
请原谅我重新发布这篇文章,但是有一种更有效的算法(比尝试所有联盟)可以使用动态编程来计算 Banzhaf 指数,未来的读者最好注意这一点。 [1]
总体思路是,决定性的计数联盟可以重写为可以在
O(nq)
时间内计算的形式,其中n
是选民数量,q
是配额。注意:我使用的是 1 索引,这是常见的数学约定。如果您想使用 0 索引,则必须稍微改变一下。
设
w(i)
为投票者i
的投票权重,w(S)
为投票者中每个投票者的投票权重集合S
。决定性联盟的数量由
下式指定: 即,在没有 i 的情况下不满足配额,但如果添加
i
则满足配额的联盟数量。我们可以将联盟分为上层和下层选民集,如下所示:
因为
S
不包括i
。此外,这又等于但是,我们知道
L(i-1))
和U(i+1)
不重叠,所以我们可以选择两个集合S1
和S2
这样最后,我们可以像这样分解元组集
这些总和的特定界限可以保证
这就是关键见解的来源 预先计算集合计数
我们可以在 O(nq) 时间内
,因为它还有助于一次性计算
u
的总和; 您只需要计算小于配额的
y
和z
,因此您只需分配三个大小为n * q
的表,计算l
、u
和us
,然后返回Forgive me for resurrecting this post, but there's a more efficient algorithm (than trying all coalitions) to compute the Banzhaf index using dynamic programming that it would be good to note for future readers. [1]
The general idea is that the count of decisive coalitions can be rewritten into a form which can be calculated in
O(nq)
time, wheren
is the number of voters andq
is the quota.NOTE: I'm using 1-indexing, as is common math convention. You'll have to shift things round a little if you want to use 0-indexing.
Let
Let
w(i)
be the vote weight of voteri
, andw(S)
be the set of vote weights of every voter inS
.The number of decisive coalitions is specified by
That is, the number of coalitions that do not meet quota without i, but would meet quota if
i
were added.We can break the coalitions into upper and lower voter sets like this:
as
S
doesn't includei
. Further, that is in turn equal toHowever, we know
L(i-1))
andU(i+1)
don't overlap, so we can just choose two setsS1
andS2
such thatLastly, we can break up the tuple set like this
Those particular bounds on the sums are there to guarantee that
This is where the key insight comes in. We can precompute the set counts
in O(nq) time, as
It also helps to compute the sums of
u
all at once; defineYou only ever need to compute these for
y
andz
less than the quota, so you just allocate three tables of sizen * q
, computel
,u
, andus
, then return您的示例数据框有 27 行,您正在查看每个集合(空集除外),因此至少有 2^27 - 1 = 134 217 727 次操作...这将需要一些时间。也就是说,这是我认为更有效的代码版本。它似乎至少与维基百科文章相匹配: http://en.wikipedia.org/wiki/Banzhaf_power_index< /a>
(我还没有尝试在完整的数据集上运行它。)
我认为要真正有效地解决这个问题,您必须利用问题的结构。例如,一旦您知道集合 X 的投票总和高于配额,那么您就知道 X 联合 Y 也高于配额。我不确定 R 是否适合遵循这样的结构。
Your example data frame has 27 rows and you're looking at every set (except the null set) so that's 2^27 - 1 = 134 217 727 operations at least... this is going to take some time. That said, here's what I believe to be a more efficient version of your code. It seems to match the Wikipedia article at least: http://en.wikipedia.org/wiki/Banzhaf_power_index
(I haven't tried running this on the full data set.)
I think to really approach this problem efficiently, you'll have to take advantage of the structure of the problem. For instance, once you know set X has vote sum above quota, then you know that X union Y is also above quota. I'm not sure if R will be well-suited to following such structure.
我的方法与 David 的方法类似,使用批处理矩阵运算来处理大小:
使用 R 的名称传播而不是 data.frame,尽可能避免循环,并在可能的情况下使用整数而不是数字。在我的盒子上运行仍然花费了超过 6 分钟的时间:
想法是这样的:
在那之后,我认为它的表现方式与大卫的相同。唯一的复杂之处是确保使用整数来提高效率,并添加批处理,因为创建 27:2^27 的矩阵实际上并不可行!
My approach was similar to David's, using batched matrix operations to handle the size:
Uses R's name propagation instead of a data.frame, avoid loops where possible, and use integers instead of numerics if possible. Still took over 6 minutes to run on my box:
The thinking went something like:
After that I think it plays out in the same manner as David's. The only complication was ensuring use of integers for efficiency, and adding the batching as its not really feasible to create a matrix of 27:2^27!