Python打印整个URL

发布于 2025-01-30 16:04:16 字数 576 浏览 3 评论 0原文

我正在尝试拉出所有包含“ https://play.google.com/store/”的URL并打印整个字符串。当我运行当前代码时,它只打印“ https://play.google.com/store/”,但我正在寻找整个URL。有人可以将我指向正确的方向吗?这是我的代码:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re



URL = "https://www.pocketgamer.com/android/best-tycoon-games-android/?page=3"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")

links = []
for link in soup.findAll("a", target="_blank"):
    links.append(link.get('href'))

x = re.findall("https://play.google.com/store/", str(links))
print(x)

I am trying to pull all the urls that contain "https://play.google.com/store/" and print the entire string. When I run my current code, it only prints "https://play.google.com/store/" but I am looking for the entire URL. Can someone point me in the right direction? Here is my code:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re



URL = "https://www.pocketgamer.com/android/best-tycoon-games-android/?page=3"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")

links = []
for link in soup.findAll("a", target="_blank"):
    links.append(link.get('href'))

x = re.findall("https://play.google.com/store/", str(links))
print(x)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忆离笙 2025-02-06 16:04:16

re.findall只是返回与正则符合正则匹配的文本的一部分,因此您要获得的只是https://play.google.com/store/是在正则。您可以修改正则表达式,但是鉴于您要搜索的内容是链接列表,只需检查它们是否以https://play.google.com/store/开始。例如:

x = [link for link in links if link.startswith('https://play.google.com/store/')]

输出(用于查询):

[
 'https://play.google.com/store/apps/details?id=com.auxbrain.egginc',
 'https://play.google.com/store/apps/details?id=net.kairosoft.android.gamedev3en',
 'https://play.google.com/store/apps/details?id=com.pixodust.games.idle.museum.tycoon.empire.art.history',
 'https://play.google.com/store/apps/details?id=com.AdrianZarzycki.idle.incremental.car.industry.tycoon',
 'https://play.google.com/store/apps/details?id=com.veloxia.spacecolonyidle',
 'https://play.google.com/store/apps/details?id=com.uplayonline.esportslifetycoon',
 'https://play.google.com/store/apps/details?id=com.codigames.hotel.empire.tycoon.idle.game',
 'https://play.google.com/store/apps/details?id=com.mafgames.idle.cat.neko.manager.tycoon',
 'https://play.google.com/store/apps/details?id=com.atari.mobile.rctempire',
 'https://play.google.com/store/apps/details?id=com.pixodust.games.rocket.star.inc.idle.space.factory.tycoon',
 'https://play.google.com/store/apps/details?id=com.idlezoo.game',
 'https://play.google.com/store/apps/details?id=com.fluffyfairygames.idleminertycoon',
 'https://play.google.com/store/apps/details?id=com.boomdrag.devtycoon2',
 'https://play.google.com/store/apps/details?id=com.TomJarStudio.GamingShop2D',
 'https://play.google.com/store/apps/details?id=com.roasterygames.smartphonetycoon2'
]

re.findall just returns the part of the text that matches the regex, so all you are getting is the https://play.google.com/store/ that is in the regex. You could modify the regex, but given what you are searching is a list of links, it's easier to just check if they start with https://play.google.com/store/. For example:

x = [link for link in links if link.startswith('https://play.google.com/store/')]

Output (for your query):

[
 'https://play.google.com/store/apps/details?id=com.auxbrain.egginc',
 'https://play.google.com/store/apps/details?id=net.kairosoft.android.gamedev3en',
 'https://play.google.com/store/apps/details?id=com.pixodust.games.idle.museum.tycoon.empire.art.history',
 'https://play.google.com/store/apps/details?id=com.AdrianZarzycki.idle.incremental.car.industry.tycoon',
 'https://play.google.com/store/apps/details?id=com.veloxia.spacecolonyidle',
 'https://play.google.com/store/apps/details?id=com.uplayonline.esportslifetycoon',
 'https://play.google.com/store/apps/details?id=com.codigames.hotel.empire.tycoon.idle.game',
 'https://play.google.com/store/apps/details?id=com.mafgames.idle.cat.neko.manager.tycoon',
 'https://play.google.com/store/apps/details?id=com.atari.mobile.rctempire',
 'https://play.google.com/store/apps/details?id=com.pixodust.games.rocket.star.inc.idle.space.factory.tycoon',
 'https://play.google.com/store/apps/details?id=com.idlezoo.game',
 'https://play.google.com/store/apps/details?id=com.fluffyfairygames.idleminertycoon',
 'https://play.google.com/store/apps/details?id=com.boomdrag.devtycoon2',
 'https://play.google.com/store/apps/details?id=com.TomJarStudio.GamingShop2D',
 'https://play.google.com/store/apps/details?id=com.roasterygames.smartphonetycoon2'
]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文