使用 Python 和 Beautiful Soup 从 API XML feed 解析/提取数据

发布于 2024-12-09 10:43:30 字数 1814 浏览 1 评论 0原文

Python/xml 新手在这里使用 Python 和 BeautifulSoup 尝试学习如何解析 XML，特别是使用 Ooodle.com API 来列出汽车分类。我在简单的 XML 和 BS 方面取得了成功，但是在使用它时，无论我如何尝试，我似乎都无法获得我想要的数据。我尝试阅读 Soup 文档几个小时，但无法弄清楚。 XML 的结构如下：

<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
    <current>
        ....
    </current>
    <listings>
        <element>
            <id>8453458345</id>
            <title>2009 Toyota Avalon XL Sedan 4D</title>
            <body>...</body>
            <url>...</url>
            <images>
                <element>...</element>
                <element>...</element>
            </images>
            <attributes>
                <features>...</features>
                <mileage>32637</mileage>
                <price>19999</price>
                <trim>XL</trim>
                <vin>9234234234234234</vin>
                <year>2009</year>
            </attributes>
        </element>      
        <element>.. Next car here ..</element>
        <element>..Aaaand next one here ..</element>    
    </listings>
    <meta>...</meta>
</oodle_response>

我首先使用 urllib 发出请求以获取提要并保存到本地文件。然后：

xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)

然后我不知道会发生什么。我已经尝试了很多事情，但一切似乎都抛出了比我想要的更多的垃圾，这使得很难找到问题。我正在尝试获取 ID、标题、里程、价格、年份、车辆识别号。那么我如何获得这些并通过循环加快该过程呢？理想情况下，我想要一个像这样的 for 循环：

for soup.listings.element in soup.listings:
    id = soup.listings.element.id
    ...

我知道这显然不起作用，但它会获取列表信息，并将其存储到列表中，然后移至下一个广告。感谢各位的帮助

原文

Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. I tried reading the Soup documentation for hours and can't figure it out. The XML is structured like:

<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
    <current>
        ....
    </current>
    <listings>
        <element>
            <id>8453458345</id>
            <title>2009 Toyota Avalon XL Sedan 4D</title>
            <body>...</body>
            <url>...</url>
            <images>
                <element>...</element>
                <element>...</element>
            </images>
            <attributes>
                <features>...</features>
                <mileage>32637</mileage>
                <price>19999</price>
                <trim>XL</trim>
                <vin>9234234234234234</vin>
                <year>2009</year>
            </attributes>
        </element>      
        <element>.. Next car here ..</element>
        <element>..Aaaand next one here ..</element>    
    </listings>
    <meta>...</meta>
</oodle_response>

I first make a request with urllib to grab the feed and save to a local file. Then:

xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)

Then I'm not sure what. I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. I'm trying just get the id, title, mileage, price, year, vin. So how do I get these and expedite the process with a loop? Ideally I wanted a for loop like:

for soup.listings.element in soup.listings:
    id = soup.listings.element.id
    ...

I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. Appreciate the help guys

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳絮泡泡 2024-12-16 10:43:30

你可以这样做：

for element in soup('element'):
    id = element.id.text
    mileage = element.attributes.mileage.text
    price = element.attributes.price.text
    year = element.attributes.year.text
    vin = element.attributes.vin.text

You could do something like this:

for element in soup('element'):
    id = element.id.text
    mileage = element.attributes.mileage.text
    price = element.attributes.price.text
    year = element.attributes.year.text
    vin = element.attributes.vin.text

回复收藏 0 原文

~没有更多了~