使用Python从表中刮除产品信息
``我无法用代码从表中刮擦成分。请帮助我使用我的代码。我只想成分名称作为输出。我还提供了成分表的图像。在这里,我只想要用红色圆圈标记的成分名称。 '''
url=https://mamaearth.in/product/mamaearth-me-deo-for-a-scent-that-s-unique-to-you-120-ml
table1 = soup.find('div', class_='CmsItemRevamp-sc-1moss4z-0 eQqUUy CMSContent').text.strip()
table1
mydata = pd.DataFrame(columns = headers)
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
'''I'm unable to scrape ingredients from table with my code. Please help me with my code. I want only ingredients name as a output. I've also provided the image of ingredients table. Here, I only want the ingredients names marked with a red circle.'''
url=https://mamaearth.in/product/mamaearth-me-deo-for-a-scent-that-s-unique-to-you-120-ml
table1 = soup.find('div', class_='CmsItemRevamp-sc-1moss4z-0 eQqUUy CMSContent').text.strip()
table1
mydata = pd.DataFrame(columns = headers)
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该信息在HTML中,但使用JavaScript渲染。因此,您需要从HTML的
< script>
部分中的JSON中提取它。这可以如下完成:
我建议您
print(data)
查看返回的所有可用信息。最难的部分是找到所需的JSON结构内部的位置。这将为您提供以下输出:
注意:一些JSON值包含HTML,这就是为什么使用第二个对BeautifulSoup的调用来解析此嵌入式HTML的原因。
另一种方法是使用Selenium之类的东西来控制您的浏览器。这将在使用视图源时看到HTML。不利的一面是,它的资源较慢和资源密集得多。
要在一条线上输出成分:
The information is in the HTML, but it is rendered using Javascript. As such you need to extract it yourself from JSON contained inside a
<script>
section of the HTML.This could be done as follows:
I suggest you
print(data)
to see all the available information that is returned. The hardest part is finding the location inside the JSON structure for what you need.This would give you the following output:
Note: Some of the JSON values contain HTML which is why a second call to BeautifulSoup is used to parse this embedded HTML.
An alternative approach would be to use something like selenium to control your browser. This would render the HTML as you see when using view source. The downside is it is MUCH slower and resource intensive.
To output the ingredients on one line: