如何获得包裹在表中的标签值？

发布于 2025-02-10 16:14:37 字数 1404 浏览 1 评论 0原文

<td> <label for="cp_designation">Designation : </label></td> 
                                    
                                   <td> PARTNER</td>
                                </tr>                        
                        <tr>   
                                    <td><label for="cp_category">Category : </label></td> 
                                
                                   <td>SPORTS GEARS</td>
                                </tr>
                        <tr>
                               <td> <label for="cp_address">Address : </label></td> 
                            
                               <td> A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.</td>
                            </tr>
                        <tr>  
                               <td> <label for="cp_phone">Phone  : </label></td>
                            
                               <td> 4603886,</td>
                            </tr>
                            
soup = bs(page.content, "html.parser")
for i in soup:
  label = soup.find_all('label',text='Designation : ')
  print(label.find('tr'))

大家好，我的问题是我想提取标签中的标签值，我尝试了很多东西，但没有获得价值。你们有没有任何专业，如果是的，那么这将是可喜的。提前致谢。

原文

<td> <label for="cp_designation">Designation : </label></td> 
                                    
                                   <td> PARTNER</td>
                                </tr>                        
                        <tr>   
                                    <td><label for="cp_category">Category : </label></td> 
                                
                                   <td>SPORTS GEARS</td>
                                </tr>
                        <tr>
                               <td> <label for="cp_address">Address : </label></td> 
                            
                               <td> A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.</td>
                            </tr>
                        <tr>  
                               <td> <label for="cp_phone">Phone  : </label></td>
                            
                               <td> 4603886,</td>
                            </tr>
                            
soup = bs(page.content, "html.parser")
for i in soup:
  label = soup.find_all('label',text='Designation : ')
  print(label.find('tr'))

hi y'all my question is that i want to extract label value that is in tag i tried so many things but fail to get value. did you guys has any experties if yes so it would be hightly appreciatable. thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

复古式 2025-02-17 16:14:37

在这里，您可以在find_all的方法上找到主TR标记通过标签标记将数据作为key-value获取数据。配对并使用find_next使用标签标签获得下一个标签，以获取标签值

from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
dict1={}
for i in soup.find_all("tr"):
    label=i.find("label")
    dict1[label.get_text(strip=True)]=label.find_next("td").get_text(strip=True)

输出的值：

{'Designation :': 'PARTNER',
 'Category :': 'SPORTS GEARS',
 'Address :': 'A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.',
 'Phone  :': '4603886,'}

Here you can find main tr tag with find_all method to iterate over label tag to get data as key-value pair and use find_next to get next tag with label tag to get values of labels

from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
dict1={}
for i in soup.find_all("tr"):
    label=i.find("label")
    dict1[label.get_text(strip=True)]=label.find_next("td").get_text(strip=True)

Output:

{'Designation :': 'PARTNER',
 'Category :': 'SPORTS GEARS',
 'Address :': 'A-148, WARD NO.4, PAINTER STREETSIALKOT-CANTT.',
 'Phone  :': '4603886,'}

回复收藏 0 原文

岁月苍老的讽刺 2025-02-17 16:14:37

我们在这里要做的是列出标题列表，列出表行列表，然后将标题划为表数据标签中存储的数据（作为文本），然后我们将其转换为词典，然后添加到列表中。

这不是刮擦的最佳方法，因为您可以点击不存在数据的问题，并且在不正确的位置中进行数据，但是，您可以将其调整为更强大。

soup=BeautifulSoup(html,"html.parser")

all_data = []

table = soup.find('table')
headers = [i.text for i in table.find_all('th')]
rows = table.find_all('tr')

for row in rows:
    table_data_text = [i.text for i in row.find_all('td')]
    output_dict = dict(zip(headers, table_data_text))
    all_data.append(output_dict)

What we do here is take a list of the headers, take a list of the table rows, and zip the headers to the data stored in the table data tag (as text), we then convert this to a dictionary and add to a list.

This isn't the best way of scraping as you can hit issues where data doesn't exist and data in the incorrect location, however with the below you can adapt it to be more robust.

soup=BeautifulSoup(html,"html.parser")

all_data = []

table = soup.find('table')
headers = [i.text for i in table.find_all('th')]
rows = table.find_all('tr')

for row in rows:
    table_data_text = [i.text for i in row.find_all('td')]
    output_dict = dict(zip(headers, table_data_text))
    all_data.append(output_dict)

回复收藏 0 原文

~没有更多了~