使用selectorgadget.com 解析HTML 文件

发布于 2024-07-13 20:45:40 字数 400 浏览 10 评论 0原文

我如何使用 beautiful soup 和 selectorgadget 来抓取网站。例如，我有一个网站 - （newegg 产品）并且我希望我的脚本返回该产品的所有规格（单击“规格”），我的意思是 - Intel、台式机、……、2.4GHz、1066Mhz、……、3 年有限。

使用selectorgadget后我得到了字符串- .desc

我该如何使用它？

谢谢：）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦幻的味道 2024-07-20 20:45:40

检查页面，我可以看到规格放置在 ID 为 pcraSpecs 的 div 中：

<div id="pcraSpecs">
  <script type="text/javascript">...</script>
  <TABLE cellpadding="0" cellspacing="0" class="specification">
    <TR>
      <TD colspan="2" class="title">Model</TD>
    </TR>
    <TR>
      <TD class="name">Brand</TD>
      <TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Intel'));</script></TD>
    </TR>
    <TR>
      <TD class="name">Processors Type</TD>
      <TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Desktop'));</script></TD>    
    </TR>
    ...
  </TABLE>
</div>

desc 是表格单元格的类。

您要做的就是提取该表的内容。

soup.find(id="pcraSpecs").findAll("td") 应该可以帮助您入门。

Inspecting the page, I can see that the specifications are placed in a div with the ID pcraSpecs:

<div id="pcraSpecs">
  <script type="text/javascript">...</script>
  <TABLE cellpadding="0" cellspacing="0" class="specification">
    <TR>
      <TD colspan="2" class="title">Model</TD>
    </TR>
    <TR>
      <TD class="name">Brand</TD>
      <TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Intel'));</script></TD>
    </TR>
    <TR>
      <TD class="name">Processors Type</TD>
      <TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Desktop'));</script></TD>    
    </TR>
    ...
  </TABLE>
</div>

desc is the class of the table cells.

What you want to do is to extract the contents of this table.

soup.find(id="pcraSpecs").findAll("td") should get you started.

回复收藏 0 原文