社交网络演化优化模型
我正在编写一段代码来模拟社交网络的演变。这个想法是,每个人都被分配到一个节点,人与人之间的关系(网络上的边)被赋予 +1 或 -1 的权重,具体取决于关系是友好还是不友好。
使用这个简单的模型,您可以说三个人组成的三元组是“平衡”还是“不平衡”,具体取决于三元组的边的乘积是正还是负。
所以最后我想做的是实现一个 ising 类型模型。即,如果新网络比翻转之前的网络具有更多平衡三角形(较低能量),则翻转随机边并保留新关系,如果不是这种情况,则仅以一定概率保留新关系。
好吧,最后我的问题是:我已经编写了以下代码,但是我拥有的数据集包含约 120k 三元组,因此运行需要 4 天!
谁能提供有关如何优化代码的任何提示?
谢谢。
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
I am writing a piece of code which models the evolution of a social network. The idea is that each person is assigned to a node and relationships between people (edges on the network) are given a weight of +1 or -1 depending on whether the relationship is friendly or unfriendly.
Using this simple model you can say that a triad of three people is either "balanced" or "unbalanced" depending on whether the product of the edges of the triad is positive or negative.
So finally what I am trying to do is implement an ising type model. I.e. Random edges are flipped and the new relationship is kept if the new network has more balanced triangels (a lower energy) than the network before the flip, if that is not the case then the new relationship is only kept with a certain probability.
Ok so finally onto my question: I have written the following code, however the dataset I have contains ~120k triads, as a result it will take 4 days to run!
Could anyone offer any tips on how I might optimise the code?
Thanks.
#Importing required librarys
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
import csv
import random
import math
def prod(iterable):
p= 1
for n in iterable:
p *= n
return p
def Sum(iterable):
p= 0
for n in iterable:
p += n[3]
return p
def CalcTriads(n):
firstgen=G.neighbors(n)
Edges=[]
Triads=[]
for i in firstgen:
Edges.append(G.edges(i))
for i in xrange(len(Edges)):
for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i)
if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
t=[n,Edges[i][j][0],Edges[i][j][1]]
t.sort()
Triads.append(t)# Add found nodes to Triads.
new_Triads = []# Delete duplicate triads.
for elem in Triads:
if elem not in new_Triads:
new_Triads.append(elem)
Triads = new_Triads
for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
a=G[Triads[i][0]][Triads[i][1]].values()
b=G[Triads[i][1]][Triads[i][2]].values()
c=G[Triads[i][2]][Triads[i][0]].values()
Q=prod(a+b+c)
Triads[i].append(Q)
return Triads
###### Import sorted edge data ######
li=[]
with open('Sorted Data.csv', 'rU') as f:
reader = csv.reader(f)
for row in reader:
li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)
for i in xrange(800000):
e = random.choice(li) # Choose random edge
TriNei=[]
a=CalcTriads(e[0]) # Find triads of first node in the chosen edge
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
TriNei.append(a[i])
preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member
e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge
G.clear()
G.add_weighted_edges_from(li)
TriNei=[]
a=CalcTriads(e[0])
for i in xrange(0,len(a)):
if set([e[1]]).issubset(a[i]):
TriNei.append(a[i])
postH=-Sum(TriNei)# Calculate the post flip "energy".
if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
continue
elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
e[2]=-1*e[2]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以下建议不会对您的性能有太大提升,因为它们不是算法级别的,即不是针对您的问题非常具体。但是,它们是对性能进行轻微改进的通用建议:
除非您使用的是 Python 3,否则更改
为
后一个仅迭代从 0 到 800000 的数字,第一个创建一个巨大的数字列表,然后迭代该列表。使用
range
对其他循环执行类似的操作。另外,随后更改
为
并使用
e
而不是li[j]
。如果您确实需要索引号,请使用random.randint(0, len(li)-1)
。The following suggestions won't boost your performance that much because they are not on the algorithmic level, i.e. not very specific to your problem. However, they are generic suggestions for slight performance improvements:
Unless you are using Python 3, change
to
The latter one just iterates numbers from 0 to 800000, the first one creates a huge list of numbers and then iterates that list. Do something similar for the other loops using
range
.Also, change
to
and use
e
instead ofli[j]
subsequently. If you really need a index number, userandom.randint(0, len(li)-1)
.您可以进行语法更改来加快速度,例如将 Sum 和 Prod 函数替换为内置等效函数
sum(x[3] for x in iterable)
和reduce( (operator.mul, iterable)
- 使用内置函数或生成器表达式通常比显式循环更快。据我所知,该行:
正在测试浮点数是否在浮点数列表中。将其替换为 if e[1] in a[i]: 将消除为每次比较创建两个
set
对象的开销。顺便说一句,如果您只想使用该索引来访问元素,则不需要循环遍历数组的索引值。例如替换
为
但是我怀疑这样的更改不会对整体运行时产生很大的影响。为此,您需要使用不同的算法或切换到更快的语言。您可以尝试在 pypy 中运行它 - 在某些情况下它可能比 CPython 快得多。您还可以尝试 cython,它会将您的代码编译为 C,有时可以带来很大的性能提升,特别是如果您注释了您的代码带有 cython 类型信息的代码。我认为最大的改进可能来自于将算法更改为工作量更少的算法,但我对此没有任何建议。
顺便说一句,为什么循环 800000 次?这个数字有什么意义呢?
另外,请为变量使用有意义的名称。使用单字符名称或 shrtAbbrv 根本不会加快代码速度,而且很难理解它正在做什么。
There are syntactic changes you can make to speed things up, such as replacing your Sum and Prod functions with the built-in equivalents
sum(x[3] for x in iterable)
andreduce(operator.mul, iterable)
- it is generally faster to use builtin functions or generator expressions than explicit loops.As far as I can tell the line:
is testing if a float is in a list of floats. Replacing it with
if e[1] in a[i]:
will remove the overhead of creating twoset
objects for each comparison.Incidentally, you do not need to loop through the index values of an array, if you are only going to use that index to access the elements. e.g. replace
with
However I suspect that changes like this will not make a big difference to the overall runtime. To do that you either need to use a different algorithm or switch to a faster language. You could try running it in pypy - for some cases it can be significantly faster than CPython. You could also try cython, which will compile your code to C and can sometimes give a big performance gain especially if you annotate your code with cython type information. I think the biggest improvement may come from changing the algorithm to one that does less work, but I don't have any suggestions for that.
BTW, why loop 800000 times? What is the significance of that number?
Also, please use meaningful names for your variables. Using single character names or shrtAbbrv does not speed the code up at all, and makes it very hard to follow what it is doing.
这里有很多事情你可以改进。首先使用 cProfile 等工具对程序进行分析。这将告诉您程序的大部分时间都花在哪里,从而优化哪里可能最有帮助。作为提示,您不需要在程序的每次迭代中生成所有三元组。
在得到合适的答案之前,您还需要修复缩进。
无论如何,这个问题可能更适合代码审查。
There are quite a few things you can improve here. Start by profiling your program using a tool like cProfile. This will tell you where most of the program's time is being spent and thus where optimization is likely to be most helpful. As a hint, you don't need to generate all the triads at every iteration of the program.
You also need to fix your indentation before you can expect a decent answer.
Regardless, this question might be better suited to Code Review.
我不确定我是否准确理解您的目标,但至少有两个更改可能会有所帮助。您可能不需要每次在循环中都销毁和创建图形,因为您所做的只是翻转一个边权重符号。并且可以改进查找三角形的计算。
下面是一些代码,它生成具有随机权重的完整图,在循环中选择随机边,找到三元组并翻转边权重......
I'm not sure I understand exactly what you are aiming for, but there are at least two changes that might help. You probably don't need to destroy and create the graph every time in the loop since all you are doing is flipping one edge weight sign. And the computation to find the triangles can be improved.
Here is some code that generates a complete graph with random weights, picks a random edge in a loop, finds the triads and flips the edge weight...