使用 Python 和 Beautiful Soup 从 API XML feed 解析/提取数据
Python/xml 新手在这里使用 Python 和 BeautifulSoup 尝试学习如何解析 XML,特别是使用 Ooodle.com API 来列出汽车分类。我在简单的 XML 和 BS 方面取得了成功,但是在使用它时,无论我如何尝试,我似乎都无法获得我想要的数据。我尝试阅读 Soup 文档几个小时,但无法弄清楚。 XML 的结构如下:
<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
<current>
....
</current>
<listings>
<element>
<id>8453458345</id>
<title>2009 Toyota Avalon XL Sedan 4D</title>
<body>...</body>
<url>...</url>
<images>
<element>...</element>
<element>...</element>
</images>
<attributes>
<features>...</features>
<mileage>32637</mileage>
<price>19999</price>
<trim>XL</trim>
<vin>9234234234234234</vin>
<year>2009</year>
</attributes>
</element>
<element>.. Next car here ..</element>
<element>..Aaaand next one here ..</element>
</listings>
<meta>...</meta>
</oodle_response>
我首先使用 urllib 发出请求以获取提要并保存到本地文件。然后:
xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)
然后我不知道会发生什么。我已经尝试了很多事情,但一切似乎都抛出了比我想要的更多的垃圾,这使得很难找到问题。我正在尝试获取 ID、标题、里程、价格、年份、车辆识别号。那么我如何获得这些并通过循环加快该过程呢?理想情况下,我想要一个像这样的 for 循环:
for soup.listings.element in soup.listings:
id = soup.listings.element.id
...
我知道这显然不起作用,但它会获取列表信息,并将其存储到列表中,然后移至下一个广告。感谢各位的帮助
Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. I tried reading the Soup documentation for hours and can't figure it out. The XML is structured like:
<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
<current>
....
</current>
<listings>
<element>
<id>8453458345</id>
<title>2009 Toyota Avalon XL Sedan 4D</title>
<body>...</body>
<url>...</url>
<images>
<element>...</element>
<element>...</element>
</images>
<attributes>
<features>...</features>
<mileage>32637</mileage>
<price>19999</price>
<trim>XL</trim>
<vin>9234234234234234</vin>
<year>2009</year>
</attributes>
</element>
<element>.. Next car here ..</element>
<element>..Aaaand next one here ..</element>
</listings>
<meta>...</meta>
</oodle_response>
I first make a request with urllib to grab the feed and save to a local file. Then:
xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)
Then I'm not sure what. I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. I'm trying just get the id, title, mileage, price, year, vin. So how do I get these and expedite the process with a loop? Ideally I wanted a for loop like:
for soup.listings.element in soup.listings:
id = soup.listings.element.id
...
I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. Appreciate the help guys
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你可以这样做:
You could do something like this: