当前位置：文江博客话题详情

美丽的小组 - 请帮助解释我的代码

发布于 2025-02-08 16:23:25 字数 1577 浏览 2 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瘫痪情歌 2025-02-15 16:23:25

如建议，您应该使用调试器运行此操作，或在代码中添加一些打印语句，以便您可以查看代码的每一行/一部分发生的事情。

如果执行此操作，则在运行代码时会看到link = content ['href']，因此'https://chhouk-krohom.com/%E1; 98％E1％9E％A0％E1％9E％B6％E1％9E％9C％E1％9E％B7％E1％9E％97％97％E1％9E％84％E1％E1％9F％92％92％E1％E1％9E％9E％82％ E1％9F％A1/'存储为链接。

您正在用中的L中的进行的字符串迭代。因此，第一个迭代l是'h'（字符串中的第一个字符。因此，它尝试执行s = bs（requests.get（'H） '）.content，'html.parser'）不是有效的URL。，然后't'。 '，'h'等.....

您要做的是从每个coptents中获取所有内容。代码>带有HREF的标签（这些是链接）。

from bs4 import BeautifulSoup as bs
import requests

url = 'https://chhouk-krohom.com/'
response = requests.get(url)
soup = bs(response.content, 'html.parser')

contents = soup.select('p[style="text-align:justify;"]')
for content in contents:
    links = content.find_all('a', href=True)
    
    for link in links:
        part = link.text
        url_link = link['href']
    
        s = bs(requests.get(url).content, 'html.parser')
        main = s.article.text
        file_name = part # I don't understand here
        with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
            file.write(str(main))

As suggested, you should run this with a debugger, or add some print statements in your code so you can see what is happening at each line/part of the code.

If you do that, you will see when you run the code, link = content['href'], so 'https://chhouk-krohom.com/%E1%9E%98%E1%9E%A0%E1%9E%B6%E1%9E%9C%E1%9E%B7%E1%9E%97%E1%9E%84%E1%9F%92%E1%9E%82%E1%9F%A1/' is stored as link.

You are iterating over a string with for l in link:. So the first iteration l is 'h' (the first character in the string. So then it's trying to do s = bs(requests.get('h').content, 'html.parser') which isn't a valid url. So what it's doing is iterating l stored as a 'h', then 't'. then 't', 'p', 's', ':', '/', '/', 'c', 'h', etc.....

What you want to do is first get the contents. From each contents, find all the <a> tags with a href (those are the links). Then iterate through that list of links.

from bs4 import BeautifulSoup as bs
import requests

url = 'https://chhouk-krohom.com/'
response = requests.get(url)
soup = bs(response.content, 'html.parser')

contents = soup.select('p[style="text-align:justify;"]')
for content in contents:
    links = content.find_all('a', href=True)
    
    for link in links:
        part = link.text
        url_link = link['href']
    
        s = bs(requests.get(url).content, 'html.parser')
        main = s.article.text
        file_name = part # I don't understand here
        with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
            file.write(str(main))

回复收藏 0 原文

~没有更多了~

关于作者

塔塔猫

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

美丽的小组 - 请帮助解释我的代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

美丽的小组 - 请帮助解释我的代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。