如何获得< ol>的第一行使用美丽的汤,使其成为JSON密钥,其余的是列表的值?

发布于 2025-02-08 15:00:15 字数 1820 浏览 1 评论 0原文

我试图为每组节目(Set1,set2,encore)从setList.fm而不仅仅是没有分离的歌曲列表来制作另一组键值对。我无法弄清楚的是如何访问陈述表演集的元素,然后在歌曲之后添加歌曲列表,直到它达到下一组为止。这是我要访问的HTML: html from setList fm

,我的JSON文件看起来像这样:

''

    "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

当前 我希望它看起来像这样:

 "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "set1": ["Song A"],
    "set2": ["Song B"],
    "encore":["Song C"],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

这是我用来生成JSON歌曲列表的代码,但不确定如何单独获取集合:

def getConcertData(i, url, concerts):

try:
    
    soup = getSoup(url)
    
    dateBlock = soup.find_all("div", {"class": "dateBlock"})[0]
    infoContainer = soup.find_all("div", {"class": "infoContainer"})[0]
    headLineDiv = infoContainer.find_all("div", {"class": "setlistHeadline"})[0]
    setlistDiv = soup.find_all("div", {"class": "setlistList"})[0]


    #removed unrelated code for question
    
    songs = []
    
    for a in setlistDiv.find_all("a", {"class": "songLabel"}):
        songs.append(a.getText().strip())
    
    print(str(year)+"."+str(month).zfill(2)+"."+str(day).zfill(2)+": "+venue)
    
    data = dict()
    data["artist"] = artist
    data["year"] = year
    data["month"] = month
    data["day"] = day
    data["venue"] = venue
    data["tour"] = tour
    data["songs"] = songs
    # data["set1"] = 0
    # data["set2"] = 0
    # data["encore"] = 0
    
    concerts[i] = data
    

I am trying to make another set of key-value pairs for each set of a show (set1, set2, encore) scraped from setlist.fm instead of just the list of songs without separation. What I cannot figure out is how to access the elements that state the set of the show and then append the list of songs after it until it hits the next set. Here is the html I am accessing:
html code from setlist fm

Currently, my JSON file looks like this:

'''`{

    "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

whereas I want it to look like this:

 "artist": "Sample Artist",
    "day": 20,
    "month": 1,
    "songs": ["Song A","Song B","Song C"
    ],
    "set1": ["Song A"],
    "set2": ["Song B"],
    "encore":["Song C"],
    "tour": "2000 U.S. Tour",
    "venue": "Sample Venue, Atlanta, GA, USA",
    "year": 2000
},`

Here is the code I am using to generate the song list of the JSON but am not sure how to get the sets individually:

def getConcertData(i, url, concerts):

try:
    
    soup = getSoup(url)
    
    dateBlock = soup.find_all("div", {"class": "dateBlock"})[0]
    infoContainer = soup.find_all("div", {"class": "infoContainer"})[0]
    headLineDiv = infoContainer.find_all("div", {"class": "setlistHeadline"})[0]
    setlistDiv = soup.find_all("div", {"class": "setlistList"})[0]


    #removed unrelated code for question
    
    songs = []
    
    for a in setlistDiv.find_all("a", {"class": "songLabel"}):
        songs.append(a.getText().strip())
    
    print(str(year)+"."+str(month).zfill(2)+"."+str(day).zfill(2)+": "+venue)
    
    data = dict()
    data["artist"] = artist
    data["year"] = year
    data["month"] = month
    data["day"] = day
    data["venue"] = venue
    data["tour"] = tour
    data["songs"] = songs
    # data["set1"] = 0
    # data["set2"] = 0
    # data["encore"] = 0
    
    concerts[i] = data
    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柠檬色的秋千 2025-02-15 15:00:15

如果我正确理解您,您想将歌曲“将”“分组”到他们的部分:

import requests
from bs4 import BeautifulSoup


url = "https://www.setlist.fm/setlist/phish/2022/ruoff-home-mortgage-music-center-noblesville-in-3b4e5a7.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")


out = {}
out["artist"] = soup.h1.a.get_text(strip=True)
out["month"] = soup.select_one(".month").text
out["day"] = soup.select_one(".day").text
out["year"] = soup.select_one(".year").text
out["venue"] = soup.select_one('a[href*="/venue/"]').text

for li in soup.select(".setlistList li.song"):
    song_name = li.a.get_text(strip=True)
    section = (
        li.find_previous("li", class_="highlight")
        .get_text(strip=True)
        .strip(" :")
    )

    out.setdefault("songs", []).append(song_name)
    out.setdefault(section, []).append(song_name)

print(out)

打印:

{
    "artist": "Phish",
    "month": "Jun",
    "day": "5",
    "year": "2022",
    "venue": "Ruoff Home Mortgage Music Center, Noblesville, IN, USA",
    "songs": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
        "Grind",
        "Slave to the Traffic Light",
    ],
    "Set 1": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
    ],
    "Set 2": [
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
    ],
    "Encore": ["Grind", "Slave to the Traffic Light"],
}

If I understand you correctly, you want to "group" songs to their sections:

import requests
from bs4 import BeautifulSoup


url = "https://www.setlist.fm/setlist/phish/2022/ruoff-home-mortgage-music-center-noblesville-in-3b4e5a7.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")


out = {}
out["artist"] = soup.h1.a.get_text(strip=True)
out["month"] = soup.select_one(".month").text
out["day"] = soup.select_one(".day").text
out["year"] = soup.select_one(".year").text
out["venue"] = soup.select_one('a[href*="/venue/"]').text

for li in soup.select(".setlistList li.song"):
    song_name = li.a.get_text(strip=True)
    section = (
        li.find_previous("li", class_="highlight")
        .get_text(strip=True)
        .strip(" :")
    )

    out.setdefault("songs", []).append(song_name)
    out.setdefault(section, []).append(song_name)

print(out)

Prints:

{
    "artist": "Phish",
    "month": "Jun",
    "day": "5",
    "year": "2022",
    "venue": "Ruoff Home Mortgage Music Center, Noblesville, IN, USA",
    "songs": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
        "Grind",
        "Slave to the Traffic Light",
    ],
    "Set 1": [
        "While My Guitar Gently Weeps",
        "My Soul",
        "Rift",
        "Horn",
        "Wombat",
        "Evolve",
        "Guyute",
        "Limb by Limb",
        "Mercury",
        "The Moma Dance",
    ],
    "Set 2": [
        "Sand",
        "Sigma Oasis",
        "Twenty Years Later",
        "The Mango Song",
        "Rise/Come Together",
        "Free",
    ],
    "Encore": ["Grind", "Slave to the Traffic Light"],
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文