使用 XML minidom 迭代图的后续操作

发布于 2024-08-06 16:54:16 字数 2214 浏览 10 评论 0原文

这是问题的后续内容（链接< /a>)

我打算做的是使用 XML 创建一个使用 NetworkX 的图表。看下面的 DOM 结构，同一节点内的所有节点之间都应该有一条边，并且参加过同一会议的所有节点都应该有一个指向该会议的节点。总而言之，共同撰写论文的所有作者都应该相互连接，并且参加过特定会议的所有作者都应该与该会议连接。

<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>

我已经弄清楚如何将作者与会议联系起来，但我不确定如何将作者彼此联系起来。我遇到的困难是如何迭代撰写同一篇论文的作者并将他们连接在一起。

    dom = parse(filepath)
    conference=dom.getElementsByTagName('conference')
    for node in conference:
        conf_name=node.getAttribute('name')
        print conf_name
        G.add_node(conf_name)

    #The nodeValue is split in order to get the name of the author 
#and to exclude the university they are part of

        plist=node.getElementsByTagName('paper')
        for p in plist:
            author=str(p.childNodes[0].nodeValue)
            author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>

        alist=node.getElementsByTagName('author')
        for a in alist:
            authortext= str(a.childNodes[0].nodeValue).split("(")

            if authortext[0] in dict:
                edgeQuantity=dict[authortext[0]]
                edgeQuantity+=1
                dict[authortext[0]]=edgeQuantity
                G.add_edge(authortext[0],conf_name)

            #Otherwise, add it to the dictionary and create an edge to the conference.
            else:
                dict[authortext[0]]= 1
                G.add_node(authortext[0])
                G.add_edge(authortext[0],conf_name)
                i+=1

原文

This is a follow-up to the question (Link)

What I intend on doing is using the XML to create a graph using NetworkX. Looking at the DOM structure below, all nodes within the same node should have an edge between them, and all nodes that have attended the same conference should have a node to that conference. To summarize, all authors that worked together on a paper should be connected to each other, and all authors who have attended a particular conference should be connected to that conference.

<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>

I've figured out how to connect authors to conferences, but I'm unsure about how to connect authors to each other. The thing that I'm having difficulty with is how to iterate over the authors that have worked on the same paper and connect them together.

    dom = parse(filepath)
    conference=dom.getElementsByTagName('conference')
    for node in conference:
        conf_name=node.getAttribute('name')
        print conf_name
        G.add_node(conf_name)

    #The nodeValue is split in order to get the name of the author 
#and to exclude the university they are part of

        plist=node.getElementsByTagName('paper')
        for p in plist:
            author=str(p.childNodes[0].nodeValue)
            author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>

        alist=node.getElementsByTagName('author')
        for a in alist:
            authortext= str(a.childNodes[0].nodeValue).split("(")

            if authortext[0] in dict:
                edgeQuantity=dict[authortext[0]]
                edgeQuantity+=1
                dict[authortext[0]]=edgeQuantity
                G.add_edge(authortext[0],conf_name)

            #Otherwise, add it to the dictionary and create an edge to the conference.
            else:
                dict[authortext[0]]= 1
                G.add_node(authortext[0])
                G.add_edge(authortext[0],conf_name)
                i+=1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾铮苏瑾 2024-08-13 16:54:16

我不确定如何将作者彼此联系起来。

您需要生成（作者，其他作者）对，以便可以将它们添加为边。典型的方法是嵌套迭代：

for thing in things:
    for otherthing in things:
        add_edge(thing, otherthing)

这是一个简单的实现，其中包括自循环（为作者提供了将自己与自己连接起来的边缘），您可能想要也可能不想要；它还包括 (1,2) 和 (2,1)，如果您正在做无向图，则这是多余的。（在 Python 2.6 中，内置的 排列< /a> 生成器也可以执行此操作。）这是一个修复这些问题的生成器：

def pairs(l):
    for i in range(len(l)-1):
        for j in range(i+1, len(l)):
            yield l[i], l[j]

我没有使用 NetworkX，但查看文档似乎说您可以在同一节点上调用 add_node 两次（第二次没有任何反应）。如果是这样，您可以丢弃您用来尝试跟踪插入的节点的字典。另外，似乎是说，如果您向未知节点添加边，它会自动为您添加该节点。所以应该可以使代码更短：

for conference in dom.getElementsByTagName('conference'):
    var conf_name= node.getAttribute('name')
    for paper in conference.getElementsByTagName('paper'):
        authors= paper.getElementsByTagName('author')
        auth_names= [author.firstChild.data.split('(')[0] for author in authors]

        # Note author's conference attendance
        #
        for auth_name in auth_names:
            G.add_edge(auth_name, conf_name)

        # Note combinations of authors working on same paper
        #
        for auth_name, other_name in pairs(auth_names):
            G.add_edge(auth_name, otherauth_name)

I'm unsure about how to connect authors to each other.

You need to generate (author, otherauthor) pairs so you can add them as edges. The typical way to do that would be a nested iteration:

for thing in things:
    for otherthing in things:
        add_edge(thing, otherthing)

This is a naïve implementation that includes self-loops (giving an author an edge connecting himself to himself), which you may or may not want; it also includes both (1,2) and (2,1), which if you're doing an undirected graph is redundant. (In Python 2.6, the built-in permutations generator also does this.) Here's a generator that fixes these things:

def pairs(l):
    for i in range(len(l)-1):
        for j in range(i+1, len(l)):
            yield l[i], l[j]

I've not used NetworkX, but looking at the doc it seems to say you can call add_node on the same node twice (with nothing happening the second time). If so, you can discard the dict you were using to try to keep track of what nodes you'd inserted. Also, it seems to say that if you add an edge to an unknown node, it'll add that node for you automatically. So it should be possible to make the code much shorter:

for conference in dom.getElementsByTagName('conference'):
    var conf_name= node.getAttribute('name')
    for paper in conference.getElementsByTagName('paper'):
        authors= paper.getElementsByTagName('author')
        auth_names= [author.firstChild.data.split('(')[0] for author in authors]

        # Note author's conference attendance
        #
        for auth_name in auth_names:
            G.add_edge(auth_name, conf_name)

        # Note combinations of authors working on same paper
        #
        for auth_name, other_name in pairs(auth_names):
            G.add_edge(auth_name, otherauth_name)

回复收藏 0 原文