网络刮板在不同的文件中分割
我已经从事Python刮刀工作了一段时间。 我想保存我在不同文件中获得的信息。 URL必须在一个文件中,并且标题必须在另一个文件中。
在使用URL时,没有问题,但是当我尝试刮擦要搜索的博客的名称时,我得到了这个结果:
w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e
我已经确定了问题,我认为这与“ \ n”有关,但是我找不到解决方案。
这是我的代码:
from bs4 import BeautifulSoup
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")
articles = soup.find_all("article", class_="FtjPK")
data = {}
for article in articles:
try:
source = article.find("div", class_="vGkyT").text
for imgvar in article.find_all("img", alt="Image"):
data.setdefault(source, []).extend(
[
i.replace("500w", "").strip()
for i in imgvar["srcset"].split(",")
if "500w" in i
]
)
except AttributeError:
continue
archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
for url in image_urls:
archivo.write(url + '\n')
archivo.close()
archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
for sources in source:
archivo.write(sources + '\n')
archivo.close()
I have been working on a Python scraper for a while.
I want to save the information I get in different files. URLs must be in one file and captions must be in another file.
While working with URLs there is no issue, but when I try to scrape the names of the blog I'm searching for, I get this result:
w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e
I have identified the problem and I think it is related to the '\n', but I have not been able to find a solution.
This is my code:
from bs4 import BeautifulSoup
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")
articles = soup.find_all("article", class_="FtjPK")
data = {}
for article in articles:
try:
source = article.find("div", class_="vGkyT").text
for imgvar in article.find_all("img", alt="Image"):
data.setdefault(source, []).extend(
[
i.replace("500w", "").strip()
for i in imgvar["srcset"].split(",")
if "500w" in i
]
)
except AttributeError:
continue
archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
for url in image_urls:
archivo.write(url + '\n')
archivo.close()
archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
for sources in source:
archivo.write(sources + '\n')
archivo.close()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将最后一个循环更改为:
然后,
source.txt
的内容将是:或使用:使用
:
Change the last loop to:
Then the content of
Source.txt
will be:Or using
with
: