美丽的汤解析HTML包含JSON

发布于 2025-02-03 08:53:20 字数 787 浏览 3 评论 0原文

输出了此物体:BS输出此(摘要)

>>> soup.body
<body><p>{
    "@context": [
        "https://geojson.org/geojson-ld/geojson-context.jsonld",
        {
            "@version": "1.1",
            "wx": "https://api.weather.gov/ontology#",
            "@vocab": "https://api.weather.gov/ontology#"
        }
    ],
    "type": "FeatureCollection",
    "features": [
        {
            "id": "https://api.weather.gov/alerts/urn:oid:2.49.0.1.840.0.957a95b11de1ec54b622b137ccf43a662d44061f.001.1",
            "type": "Feature",
            "geometry": null,
            "properties": ....(snip)

使用Python3并尝试解析似乎包含JSON对象的NWS天气警报,并从我的了解“ @context”标签中 是JSON数据;那是对的吗?

我如何获得广场和卷曲括号内的元素?

BS显然有JSON解析器,但我没有找到有关这种情况的菜鸟的好教程。

指针最受欢迎。

Using Python3 and trying to parse NWS weather alerts which appear to contain JSON objects using Beautiful Soup and got this far: BS outputs this (snippet from top of output)

>>> soup.body
<body><p>{
    "@context": [
        "https://geojson.org/geojson-ld/geojson-context.jsonld",
        {
            "@version": "1.1",
            "wx": "https://api.weather.gov/ontology#",
            "@vocab": "https://api.weather.gov/ontology#"
        }
    ],
    "type": "FeatureCollection",
    "features": [
        {
            "id": "https://api.weather.gov/alerts/urn:oid:2.49.0.1.840.0.957a95b11de1ec54b622b137ccf43a662d44061f.001.1",
            "type": "Feature",
            "geometry": null,
            "properties": ....(snip)

From what I understand the "@context" tag indicates that the subsequent lines within braces are JSON data; is that correct?

How do I get at the elements inside the square and curly braces?

BS apparently has a JSON parser but I haven't found any good tutorials about how-to for someone who's a noob to this situation.

Pointers would be most welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

×纯※雪 2025-02-10 08:53:20

应该通过一些其他细节来改善问题,如评论中所述,该问题看起来不像,该回答是简单的HTML,而是JSON。

  1. html 是从'lxml'解析器包装的

  2. 包装的,您不需要beautifulsoup no,而不是JSON解析器。


  3. 而不是在响应 - &gt上使用.json(); <

docs
...
json_data = requests.get('YOUR URL').json()

for i in json_data['features']:
    print(i['id'])

...

Question should be improved by some additional details and as mentioned in the comments it do not look like, that response is plain HTML but rather JSON.

  1. HTML in your soup is wrapping from 'lxml' parser

  2. You do not need beautifulsoup for that task and no it is not a JSON parser.

  3. Instead use .json() on your response -> docs

Example
...
json_data = requests.get('YOUR URL').json()

for i in json_data['features']:
    print(i['id'])

...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文