限制美丽的肥皂
我正在尝试使用Beautifutsoup将XML转换为JSON的XML文件,其结构如下:
<H3 id="LinkTarget_311">ISSS.1.A1Acceptance of Overall(B)</H3>
<Standard>An organisation's Top Management.</Standard>
<Standard>The Top Management MUST define.</Standard>
<H3 id="LinkTarget_3116">ISS.2.A2Acceptance of Overall(C)</H3>
<Standard>An organisation's Top.</Standard>
<Standard>Top Management.</Standard>
<H3 id="LinkTarget_316">ISS.2.2Acceptance of Overall(D)</H3>
<Standard>An organisation's Top resource.</Standard>
<Standard>Top Management resource.</Standard>
......
.......
我编写的代码如下:
extract2 = re.compile(r"[A-Z][a-z]\w*")
control_ids = {}
header = bs_content.find_all('h3',{'id':True})
sub = bs_content.find_all('standard')
for i,j in zip(header,sub):
req_id = str.strip(re.split(extract2,i.text)[0])
control_ids[req_id] = j.text
结果太长了,我不是全部粘贴:
预期结果:H3 TAG的文本与文本配对在以下“标准”标签
[{isss.1.a1.a1 accepter for总体(b):'一个组织的顶级管理。 top.top管理。'},....]
I am trying to convert xml to JSON using Beautifulsoup for the xml file having structure as below:
<H3 id="LinkTarget_311">ISSS.1.A1Acceptance of Overall(B)</H3>
<Standard>An organisation's Top Management.</Standard>
<Standard>The Top Management MUST define.</Standard>
<H3 id="LinkTarget_3116">ISS.2.A2Acceptance of Overall(C)</H3>
<Standard>An organisation's Top.</Standard>
<Standard>Top Management.</Standard>
<H3 id="LinkTarget_316">ISS.2.2Acceptance of Overall(D)</H3>
<Standard>An organisation's Top resource.</Standard>
<Standard>Top Management resource.</Standard>
......
.......
The code I wrote is as below :
extract2 = re.compile(r"[A-Z][a-z]\w*")
control_ids = {}
header = bs_content.find_all('h3',{'id':True})
sub = bs_content.find_all('standard')
for i,j in zip(header,sub):
req_id = str.strip(re.split(extract2,i.text)[0])
control_ids[req_id] = j.text
The result is too long I an not paste all of it:
Expected result: text of H3 tag paired with text of the following 'standard' tags
[{ISSS.1.A1Acceptance of Overall(B) : 'An organisation's Top Management.Top Management.'} , {ISS.2.A2Acceptance of Overall(C):'An organisation's Top.Top Management.'},....]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试更简单的东西:
根据您的示例HTML输出:输出:
Try something simpler:
Output, based on your sample html: