美丽的小组 - 请帮助解释我的代码

发布于 2025-02-08 16:23:25 字数 1577 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瘫痪情歌 2025-02-15 16:23:25

如建议,您应该使用调试器运行此操作,或在代码中添加一些打印语句,以便您可以查看代码的每一行/一部分发生的事情。

如果执行此操作,则在运行代码时会看到link = content ['href'],因此'https://chhouk-krohom.com/%E1; 98%E1%9E%A0%E1%9E%B6%E1%9E%9C%E1%9E%B7%E1%9E%97%97%E1%9E%84%E1%E1%9F%92%92%E1%E1%9E%9E%82% E1%9F%A1/'存储为链接

您正在用中的L中的进行的字符串迭代。因此,第一个迭代l'h'(字符串中的第一个字符。因此,它尝试执行s = bs(requests.get('H) ').content,'html.parser')不是有效的URL。 ,然后't'。 ','h'等.....

您要做的是从每个coptents中获取所有内容。代码>带有HREF的标签(这些是链接)。

from bs4 import BeautifulSoup as bs
import requests

url = 'https://chhouk-krohom.com/'
response = requests.get(url)
soup = bs(response.content, 'html.parser')

contents = soup.select('p[style="text-align:justify;"]')
for content in contents:
    links = content.find_all('a', href=True)
    
    for link in links:
        part = link.text
        url_link = link['href']
    
        s = bs(requests.get(url).content, 'html.parser')
        main = s.article.text
        file_name = part # I don't understand here
        with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
            file.write(str(main))

As suggested, you should run this with a debugger, or add some print statements in your code so you can see what is happening at each line/part of the code.

If you do that, you will see when you run the code, link = content['href'], so 'https://chhouk-krohom.com/%E1%9E%98%E1%9E%A0%E1%9E%B6%E1%9E%9C%E1%9E%B7%E1%9E%97%E1%9E%84%E1%9F%92%E1%9E%82%E1%9F%A1/' is stored as link.

You are iterating over a string with for l in link:. So the first iteration l is 'h' (the first character in the string. So then it's trying to do s = bs(requests.get('h').content, 'html.parser') which isn't a valid url. So what it's doing is iterating l stored as a 'h', then 't'. then 't', 'p', 's', ':', '/', '/', 'c', 'h', etc.....

What you want to do is first get the contents. From each contents, find all the <a> tags with a href (those are the links). Then iterate through that list of links.

from bs4 import BeautifulSoup as bs
import requests

url = 'https://chhouk-krohom.com/'
response = requests.get(url)
soup = bs(response.content, 'html.parser')

contents = soup.select('p[style="text-align:justify;"]')
for content in contents:
    links = content.find_all('a', href=True)
    
    for link in links:
        part = link.text
        url_link = link['href']
    
        s = bs(requests.get(url).content, 'html.parser')
        main = s.article.text
        file_name = part # I don't understand here
        with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
            file.write(str(main))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文