美丽汤提取物()问题

发布于 2024-11-25 07:00:52 字数 2477 浏览 0 评论 0原文

XML 概要:

<dasbhoards>
  <dashboard name="S1>
    <repository-location derived-from='http://dataviz.win.compete.com/workbooks/OTCSurvey_06_15_11_16_54/RT4?rev=' id='RT4' path='/workbooks/RetailFootwear' revision='' />
    <style>
    </style>
    <zones>
      <zone h='92975' id='4' param='horz' type='layout-flow' w='87842' x='12158' y='7025'>
      <zone h='92975' id='2' type='layout-basic' w='77953' x='12158' y='7025'>
        <zone h='92975' id='1' name='RT4_stk_bar_grid' w='77953' x='12158' y='7025'>
        </zone>
      </zone>
      <zone fixed-size='170' h='92975' id='3' is-fixed='true' param='vert' type='layout-flow' w='9889' x='90111' y='7025'>
        <zone h='13739' id='6' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:response:nk]' type='color' w='9889' x='90111' y='7025'>
        </zone>
      </zone>
    </zone>
    <zone h='7025' id='7' name='Q-RT4' w='87842' x='12158' y='0'>
    </zone>
    <zone h='100000' id='9' param='vert' type='layout-flow' w='12158' x='0' y='0'>
      <zone h='6818' id='5' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:crosstab_group:nk]' type='filter' w='12158' x='0' y='0'>
      </zone>
      <zone h='31921' id='10' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:question_base:nk]' type='filter' w='12158' x='0' y='6818'>
        </zone>
      </zone>
    </zones>
  </dashboard>
  <dashboard name="S2">
    <more tags>
  </dashboard>
</dashboards>

这是我美丽的汤项目的工作流程。我找到所有仪表板元素,并使用 extract() 删除所有不具有“s1”作为属性“name”值的元素。 但问题是,在编写之前,似乎所有仪表板元素都已从最终汤中删除。 我做错了什么吗? 相信我的话,有一个名称为“S1”的仪表板元素。

#load the xml
workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Retail Footwear.twb")
soup = BeautifulStoneSoup(workbook, selfClosingTags=['repository-location', 'style'])
workbook.close()

#get all "dashboard" elements (children of "dashboards")
d = soup.findAll('dashboard')

#extract all but one
for child in d:
    if child.get("name", "").lower() != "s1":
        child.extract()

#write out the results
modified_workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Footwear.xml", "w")
modified_workbook.write(soup.prettify())
modified_workbook.close()

更多信息: 最有趣的是,如果我在提取之前和之后将仪表板(父)元素写入文件,我就会得到我所期望的。问题是汤本身似乎不一样。

GENERAL XML OUTLINE:

<dasbhoards>
  <dashboard name="S1>
    <repository-location derived-from='http://dataviz.win.compete.com/workbooks/OTCSurvey_06_15_11_16_54/RT4?rev=' id='RT4' path='/workbooks/RetailFootwear' revision='' />
    <style>
    </style>
    <zones>
      <zone h='92975' id='4' param='horz' type='layout-flow' w='87842' x='12158' y='7025'>
      <zone h='92975' id='2' type='layout-basic' w='77953' x='12158' y='7025'>
        <zone h='92975' id='1' name='RT4_stk_bar_grid' w='77953' x='12158' y='7025'>
        </zone>
      </zone>
      <zone fixed-size='170' h='92975' id='3' is-fixed='true' param='vert' type='layout-flow' w='9889' x='90111' y='7025'>
        <zone h='13739' id='6' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:response:nk]' type='color' w='9889' x='90111' y='7025'>
        </zone>
      </zone>
    </zone>
    <zone h='7025' id='7' name='Q-RT4' w='87842' x='12158' y='0'>
    </zone>
    <zone h='100000' id='9' param='vert' type='layout-flow' w='12158' x='0' y='0'>
      <zone h='6818' id='5' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:crosstab_group:nk]' type='filter' w='12158' x='0' y='0'>
      </zone>
      <zone h='31921' id='10' name='RT4_stk_bar_grid' param='[mysql.40611.854150011575].[none:question_base:nk]' type='filter' w='12158' x='0' y='6818'>
        </zone>
      </zone>
    </zones>
  </dashboard>
  <dashboard name="S2">
    <more tags>
  </dashboard>
</dashboards>

Here is the workflow for my beautiful soup project. I find all the dashboard elements and use extract() to remove all the ones that don't have "s1" as the value for the attribute "name".
The problem though is that it seems ALL of the dashboard elements are being removed from the final soup before writing.
Am I doing something wrong?
Take my word that there IS a dashboard element with name="S1".

#load the xml
workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Retail Footwear.twb")
soup = BeautifulStoneSoup(workbook, selfClosingTags=['repository-location', 'style'])
workbook.close()

#get all "dashboard" elements (children of "dashboards")
d = soup.findAll('dashboard')

#extract all but one
for child in d:
    if child.get("name", "").lower() != "s1":
        child.extract()

#write out the results
modified_workbook = open("C:\\Users\\rabdel.WINCMPT\\Documents\\Footwear.xml", "w")
modified_workbook.write(soup.prettify())
modified_workbook.close()

MORE INFO:
what's most interesting is that if I write the dashboards (parent) element to file before and after the extract, i get EXACTLY what I expect. The problem is that the soup itself seems to be different.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

再可℃爱ぅ一点好了 2024-12-02 07:00:52

你的代码看起来没问题。如果不查看 XML 文件,就无法判断为什么没有获得预期结果。

您可能想在循环中添加一条调试行,例如:

for child in d:
    name = child.get('name', '').lower()
    print 'Name: "{0}"; Equal to "s1": {1}'.format(name, name == 's1')

...并确保确实有一个带有您正在查找的名称的标签!

Your code looks alright. It is impossible to tell why you don't get the expected result without seeing your XML file.

You might want to add a debug line to your loop like, such as:

for child in d:
    name = child.get('name', '').lower()
    print 'Name: "{0}"; Equal to "s1": {1}'.format(name, name == 's1')

...and make sure that there really is a tag with the name you are looking for!

幸福不弃 2024-12-02 07:00:52

这似乎实际上不是 BeautifulSoup 的问题。问题在于应用程序 (Tabeleau) 未将生成的 XML 识别为有效的 xml。

this seems to actually not be a BeautifulSoup problem. The problem lies with the fact that the XML that's being generated is not being recognized by the application (Tabeleau) as valid xml.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文