美丽汤提取物()问题
XML 概要:
<dasbhoards>
<dashboard name="S1>
<repository-location derived-from='http://dataviz.win.compete.com/workbooks/OTCSurvey_06_15_11_16_54/RT4?rev=' id='RT4' path='/workbooks/RetailFootwear' revision='' />
<style>
</style>
<zones>
<zone h='92975' id='4' param='horz' type='layout-flow' w='87842' x='12158' y='7025'>
<zone h='92975' id='2' type='layout-basic' w='77953' x='12158' y='7025'>
<zone h='92975' id='1' name='RT4_stk_bar_grid' w='77953' x='12158' y='7025'>
</zone>
</zone>
<zone fixed-size='170' h='92975' id='3' is-fixed='true' param='vert' type='layout-flow' w='9889' x='90111' y='7025'>
<zone h='13739' id='6' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:response:nk]' type='color' w='9889' x='90111' y='7025'>
</zone>
</zone>
</zone>
<zone h='7025' id='7' name='Q-RT4' w='87842' x='12158' y='0'>
</zone>
<zone h='100000' id='9' param='vert' type='layout-flow' w='12158' x='0' y='0'>
<zone h='6818' id='5' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:crosstab_group:nk]' type='filter' w='12158' x='0' y='0'>
</zone>
<zone h='31921' id='10' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:question_base:nk]' type='filter' w='12158' x='0' y='6818'>
</zone>
</zone>
</zones>
</dashboard>
<dashboard name="S2">
<more tags>
</dashboard>
</dashboards>
这是我美丽的汤项目的工作流程。我找到所有仪表板元素,并使用 extract() 删除所有不具有“s1”作为属性“name”值的元素。 但问题是,在编写之前,似乎所有仪表板元素都已从最终汤中删除。 我做错了什么吗? 相信我的话,有一个名称为“S1”的仪表板元素。
#load the xml
workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Retail Footwear.twb")
soup = BeautifulStoneSoup(workbook, selfClosingTags=['repository-location', 'style'])
workbook.close()
#get all "dashboard" elements (children of "dashboards")
d = soup.findAll('dashboard')
#extract all but one
for child in d:
if child.get("name", "").lower() != "s1":
child.extract()
#write out the results
modified_workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Footwear.xml", "w")
modified_workbook.write(soup.prettify())
modified_workbook.close()
更多信息: 最有趣的是,如果我在提取之前和之后将仪表板(父)元素写入文件,我就会得到我所期望的。问题是汤本身似乎不一样。
GENERAL XML OUTLINE:
<dasbhoards>
<dashboard name="S1>
<repository-location derived-from='http://dataviz.win.compete.com/workbooks/OTCSurvey_06_15_11_16_54/RT4?rev=' id='RT4' path='/workbooks/RetailFootwear' revision='' />
<style>
</style>
<zones>
<zone h='92975' id='4' param='horz' type='layout-flow' w='87842' x='12158' y='7025'>
<zone h='92975' id='2' type='layout-basic' w='77953' x='12158' y='7025'>
<zone h='92975' id='1' name='RT4_stk_bar_grid' w='77953' x='12158' y='7025'>
</zone>
</zone>
<zone fixed-size='170' h='92975' id='3' is-fixed='true' param='vert' type='layout-flow' w='9889' x='90111' y='7025'>
<zone h='13739' id='6' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:response:nk]' type='color' w='9889' x='90111' y='7025'>
</zone>
</zone>
</zone>
<zone h='7025' id='7' name='Q-RT4' w='87842' x='12158' y='0'>
</zone>
<zone h='100000' id='9' param='vert' type='layout-flow' w='12158' x='0' y='0'>
<zone h='6818' id='5' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:crosstab_group:nk]' type='filter' w='12158' x='0' y='0'>
</zone>
<zone h='31921' id='10' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:question_base:nk]' type='filter' w='12158' x='0' y='6818'>
</zone>
</zone>
</zones>
</dashboard>
<dashboard name="S2">
<more tags>
</dashboard>
</dashboards>
Here is the workflow for my beautiful soup project. I find all the dashboard elements and use extract() to remove all the ones that don't have "s1" as the value for the attribute "name".
The problem though is that it seems ALL of the dashboard elements are being removed from the final soup before writing.
Am I doing something wrong?
Take my word that there IS a dashboard element with name="S1".
#load the xml
workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Retail Footwear.twb")
soup = BeautifulStoneSoup(workbook, selfClosingTags=['repository-location', 'style'])
workbook.close()
#get all "dashboard" elements (children of "dashboards")
d = soup.findAll('dashboard')
#extract all but one
for child in d:
if child.get("name", "").lower() != "s1":
child.extract()
#write out the results
modified_workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Footwear.xml", "w")
modified_workbook.write(soup.prettify())
modified_workbook.close()
MORE INFO:
what's most interesting is that if I write the dashboards (parent) element to file before and after the extract, i get EXACTLY what I expect. The problem is that the soup itself seems to be different.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你的代码看起来没问题。如果不查看 XML 文件,就无法判断为什么没有获得预期结果。
您可能想在循环中添加一条调试行,例如:
...并确保确实有一个带有您正在查找的名称的标签!
Your code looks alright. It is impossible to tell why you don't get the expected result without seeing your XML file.
You might want to add a debug line to your loop like, such as:
...and make sure that there really is a tag with the name you are looking for!
这似乎实际上不是 BeautifulSoup 的问题。问题在于应用程序 (Tabeleau) 未将生成的 XML 识别为有效的 xml。
this seems to actually not be a BeautifulSoup problem. The problem lies with the fact that the XML that's being generated is not being recognized by the application (Tabeleau) as valid xml.