刮擦时，我得到了一些垃圾价值

发布于 2025-02-10 20:49:31 字数 1321 浏览 1 评论 0原文

大家好，请使用BS4检查以下代码以刮擦网页。

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.nfl.com/standings/league/2019/REG'
page = requests.get('https://www.nfl.com/standings/league/2019/REG')
soup = BeautifulSoup(page.text, 'lxml')

#Subsets the HTML to only get the HTML of our table needed
table = soup.find('table', {'summary':'Standings - Detailed View'})

#Gets all the column headers of our table
headers = []
for i in table.find_all('th'):
    title = i.text.strip()
    headers.append(title)

#Creates a dataframe using the column headers from our table
df = pd.DataFrame(columns = headers)

#gets all our data within the table and adds it to our dataframe
for row in table.find_all('tr')[1:]:
    #line below fixes the formatting issue  with the team names
    first_td = row.find_all('td')[0].find('div', class_ = 'd3-o-club-fullname').text.strip()
    data = row.find_all('td')[1:]
    row_data = [td.text.strip() for td in data]
    row_data.insert(0,first_td)
    length = len(df)
    df.loc[length] = row_data

df.to_csv('F:/beautiful soup/tablefg.csv')

运行上述代码后，我将获得以下值的值。

在此处输入图像描述

在此图像中，我将获得2000的0个值。不知道为什么它是表现出来。它应该03-03-0，但输出为03-03-2000

原文

Hi All please check the below code using bs4 to scrape the webpage.

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.nfl.com/standings/league/2019/REG'
page = requests.get('https://www.nfl.com/standings/league/2019/REG')
soup = BeautifulSoup(page.text, 'lxml')

#Subsets the HTML to only get the HTML of our table needed
table = soup.find('table', {'summary':'Standings - Detailed View'})

#Gets all the column headers of our table
headers = []
for i in table.find_all('th'):
    title = i.text.strip()
    headers.append(title)

#Creates a dataframe using the column headers from our table
df = pd.DataFrame(columns = headers)

#gets all our data within the table and adds it to our dataframe
for row in table.find_all('tr')[1:]:
    #line below fixes the formatting issue  with the team names
    first_td = row.find_all('td')[0].find('div', class_ = 'd3-o-club-fullname').text.strip()
    data = row.find_all('td')[1:]
    row_data = [td.text.strip() for td in data]
    row_data.insert(0,first_td)
    length = len(df)
    df.loc[length] = row_data

df.to_csv('F:/beautiful soup/tablefg.csv')

After running the above code i am getting the values as below.

enter image description here

in this image for 0 value i am getting as 2000. Dont know why it's showing so. it should 03-03-0 but getting output as 03-03-2000

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独享拥抱 2025-02-17 20:49:31

如果您使用的是大熊猫，则无需使桌子解析如此困难。您可以简单地做：

import pandas as pd

df = pd.read_html('https://www.nfl.com/standings/league/2019/REG')[0]
df.to_csv('F:/beautiful soup/tablefg.csv')

If you're using pandas, you don't need to make table parsing so difficult. You can simply do:

import pandas as pd

df = pd.read_html('https://www.nfl.com/standings/league/2019/REG')[0]
df.to_csv('F:/beautiful soup/tablefg.csv')

回复收藏 0 原文

~没有更多了~

关于作者

亢潮

暂无简介

文章

27 人气

关注发私信

闻呓

文章 0 评论 0

关注

深府石板幽径

文章 0 评论 0

关注

mabiao

文章 0 评论 0

关注

枕花眠

文章 0 评论 0

关注

qq_CrTt6n

文章 0 评论 0

关注

红颜悴

文章 0 评论 0

友情链接

文江博客

刮擦时，我得到了一些垃圾价值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

闻呓

深府石板幽径

mabiao

枕花眠

qq_CrTt6n

红颜悴

友情链接

刮擦时，我得到了一些垃圾价值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

闻呓

深府石板幽径

mabiao

枕花眠

qq_CrTt6n

红颜悴

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。