为什么XPath无法获得目标元素?

发布于 2025-02-06 05:46:50 字数 3269 浏览 1 评论 0原文

我在XPATH上刮擦很新。我正在尝试刮擦目标的产品信息。我使用硒和XPath成功获取价格和名称。但是xpath在刮擦产品大小销售位置。 例如,对于此URL,“ https://www.target.com/p/pataday-once-daily-relief-extra-extra-stra-strength-drops-drops-085-fl-oz/-/a-83775159? #lnk = sametab“,xpath大小为:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()

xpath sales位置是:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[1]/div[2]/span

我也尝试通过使用请求来获取这两个元素,但也无效。有人知道为什么会发生吗?任何帮助都赞赏。谢谢。

以下是我的代码:

    def job_function():
        urlList = ['https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab',
        'https://www.target.com/p/kleenex-ultra-soft-facial-tissue/-/A-84780536?preselect=12964744#lnk=sametab',
        'https://www.target.com/p/claritin-24-hour-non-drowsy-allergy-relief-tablets-loratadine/-/A-80354268?preselect=14351285#lnk=sametab',
        'https://www.target.com/p/opti-free-pure-moist-rewetting-drops-0-4-fl-oz/-/A-14358641#lnk=sametab'     
        ]
    
        def ScrapingTarget(url):
            AArray = []
            wait_imp = 10
            CO = webdriver.ChromeOptions()
            CO.add_experimental_option('useAutomationExtension', False)
            CO.add_argument('--ignore-certificate-errors')
            CO.add_argument('--start-maximized')
            wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',
                                  options=CO)
    
            wd.get(url)
            wd.implicitly_wait(wait_imp)
            sleep(1)
    
            #start scraping
            name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text
            sleep(0.5)
            price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text
            sleep(0.5)
            try:
                size = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()").text
            except:
                size = "none"
            sleep(0.5)
            try:
                sales location = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[2]/span").text
            except:
                sales location = "none"
            tz = pytz.timezone('Etc/GMT-0')
            GMT = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
    
            AArray.append([name, price, size, sales location, GMT])

            with open(
                    r'C:\Users\12987\PycharmProjects\python\Network\priceingAlgoriCoding\export_Target_dataframe.csv',
                    'a', newline="", encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerows(AArray)
    
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            executor.map(ScrapingTarget, urlList)
    
    sched = BlockingScheduler()
    sched.add_job(job_function,'interval',seconds=60)
    sched.start()

I am quite new in scraping with xpath. I am trying to scrape product information on Target. I use selenium and xpath successfully get the price and name. But xpath cannot return any value when scraping for product sizeproduct size and sales locationsales location.
For example, for this url"https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab", xpath for size is:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()

xpath for sales location is:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[1]/div[2]/span

I also try to get these two elements by using requests but it also did not work. Do anyone know why it happened? Any help appreciated. Thanks.

Following is my code:

    def job_function():
        urlList = ['https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab',
        'https://www.target.com/p/kleenex-ultra-soft-facial-tissue/-/A-84780536?preselect=12964744#lnk=sametab',
        'https://www.target.com/p/claritin-24-hour-non-drowsy-allergy-relief-tablets-loratadine/-/A-80354268?preselect=14351285#lnk=sametab',
        'https://www.target.com/p/opti-free-pure-moist-rewetting-drops-0-4-fl-oz/-/A-14358641#lnk=sametab'     
        ]
    
        def ScrapingTarget(url):
            AArray = []
            wait_imp = 10
            CO = webdriver.ChromeOptions()
            CO.add_experimental_option('useAutomationExtension', False)
            CO.add_argument('--ignore-certificate-errors')
            CO.add_argument('--start-maximized')
            wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',
                                  options=CO)
    
            wd.get(url)
            wd.implicitly_wait(wait_imp)
            sleep(1)
    
            #start scraping
            name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text
            sleep(0.5)
            price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text
            sleep(0.5)
            try:
                size = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()").text
            except:
                size = "none"
            sleep(0.5)
            try:
                sales location = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[2]/span").text
            except:
                sales location = "none"
            tz = pytz.timezone('Etc/GMT-0')
            GMT = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
    
            AArray.append([name, price, size, sales location, GMT])

            with open(
                    r'C:\Users\12987\PycharmProjects\python\Network\priceingAlgoriCoding\export_Target_dataframe.csv',
                    'a', newline="", encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerows(AArray)
    
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            executor.map(ScrapingTarget, urlList)
    
    sched = BlockingScheduler()
    sched.add_job(job_function,'interval',seconds=60)
    sched.start()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文