网络刮板在不同的文件中分割

发布于 2025-01-31 11:53:43 字数 1497 浏览 3 评论 0原文

我已经从事Python刮刀工作了一段时间。我想保存我在不同文件中获得的信息。 URL必须在一个文件中，并且标题必须在另一个文件中。

在使用URL时，没有问题，但是当我尝试刮擦要搜索的博客的名称时，我得到了这个结果：

w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e

我已经确定了问题，我认为这与“ \ n”有关，但是我找不到解决方案。

这是我的代码：

from bs4 import BeautifulSoup

search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
    for url in image_urls:
        archivo.write(url + '\n')
archivo.close()


archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
    for sources in source:
        archivo.write(sources + '\n')
archivo.close()

原文

I have been working on a Python scraper for a while.
I want to save the information I get in different files. URLs must be in one file and captions must be in another file.

While working with URLs there is no issue, but when I try to scrape the names of the blog I'm searching for, I get this result:

w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e

I have identified the problem and I think it is related to the '\n', but I have not been able to find a solution.

This is my code:

from bs4 import BeautifulSoup

search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")

articles = soup.find_all("article", class_="FtjPK")

data = {}
for article in articles:
    try:
        source = article.find("div", class_="vGkyT").text
        for imgvar in article.find_all("img", alt="Image"):
            data.setdefault(source, []).extend(
                [
                    i.replace("500w", "").strip()
                    for i in imgvar["srcset"].split(",")
                    if "500w" in i
                ]
            )
    except AttributeError:
        continue


archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
    for url in image_urls:
        archivo.write(url + '\n')
archivo.close()


archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
    for sources in source:
        archivo.write(sources + '\n')
archivo.close()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野侃 2025-02-07 11:53:43

将最后一个循环更改为：

archivo = open("Source.txt", "w")
for source in data:
    archivo.write(source + "\n")
archivo.close()

然后，source.txt的内容将是：

harshvardhan25
mikeahrens
amazinglybeautifulphotography
landscaperrosebay
danielapelli
sahrish-acrylic-painting
sweetd3lights
pensamentsisomnis
pics-bae
oneshotolive
scattopermestesso
huariqueje

或使用：使用：

with open("Source.txt", "w") as archivo:
    archivo.write("\n".join(data))

Change the last loop to:

archivo = open("Source.txt", "w")
for source in data:
    archivo.write(source + "\n")
archivo.close()

Then the content of Source.txt will be:

harshvardhan25
mikeahrens
amazinglybeautifulphotography
landscaperrosebay
danielapelli
sahrish-acrylic-painting
sweetd3lights
pensamentsisomnis
pics-bae
oneshotolive
scattopermestesso
huariqueje

Or using with:

with open("Source.txt", "w") as archivo:
    archivo.write("\n".join(data))

回复收藏 0 原文

~没有更多了~

关于作者

莳間冲淡了誓言ζ

暂无简介

文章

28 人气

关注发私信

达拉崩吧

文章 0 评论 0

关注

PANGOO

文章 0 评论 0

关注

kkgtx

文章 0 评论 0

关注

WordPress小学生

文章 0 评论 0

关注

酷炫老祖宗

文章 0 评论 0

关注

硪扪都還晓

文章 0 评论 0

友情链接

文江博客

网络刮板在不同的文件中分割

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

网络刮板在不同的文件中分割

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。