为什么XPath无法获得目标元素？

发布于 2025-02-06 05:46:50 字数 3269 浏览 1 评论 0原文

我在XPATH上刮擦很新。我正在尝试刮擦目标的产品信息。我使用硒和XPath成功获取价格和名称。但是xpath在刮擦产品大小销售位置。例如，对于此URL，“ https://www.target.com/p/pataday-once-daily-relief-extra-extra-stra-strength-drops-drops-085-fl-oz/-/a-83775159？＃lnk = sametab“，xpath大小为：

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()

xpath sales位置是：

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[1]/div[2]/span

我也尝试通过使用请求来获取这两个元素，但也无效。有人知道为什么会发生吗？任何帮助都赞赏。谢谢。

以下是我的代码：

    def job_function():
        urlList = ['https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab',
        'https://www.target.com/p/kleenex-ultra-soft-facial-tissue/-/A-84780536?preselect=12964744#lnk=sametab',
        'https://www.target.com/p/claritin-24-hour-non-drowsy-allergy-relief-tablets-loratadine/-/A-80354268?preselect=14351285#lnk=sametab',
        'https://www.target.com/p/opti-free-pure-moist-rewetting-drops-0-4-fl-oz/-/A-14358641#lnk=sametab'     
        ]
    
        def ScrapingTarget(url):
            AArray = []
            wait_imp = 10
            CO = webdriver.ChromeOptions()
            CO.add_experimental_option('useAutomationExtension', False)
            CO.add_argument('--ignore-certificate-errors')
            CO.add_argument('--start-maximized')
            wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',
                                  options=CO)
    
            wd.get(url)
            wd.implicitly_wait(wait_imp)
            sleep(1)
    
            #start scraping
            name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text
            sleep(0.5)
            price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text
            sleep(0.5)
            try:
                size = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()").text
            except:
                size = "none"
            sleep(0.5)
            try:
                sales location = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[2]/span").text
            except:
                sales location = "none"
            tz = pytz.timezone('Etc/GMT-0')
            GMT = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
    
            AArray.append([name, price, size, sales location, GMT])

            with open(
                    r'C:\Users\12987\PycharmProjects\python\Network\priceingAlgoriCoding\export_Target_dataframe.csv',
                    'a', newline="", encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerows(AArray)
    
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            executor.map(ScrapingTarget, urlList)
    
    sched = BlockingScheduler()
    sched.add_job(job_function,'interval',seconds=60)
    sched.start()

原文

I am quite new in scraping with xpath. I am trying to scrape product information on Target. I use selenium and xpath successfully get the price and name. But xpath cannot return any value when scraping for product sizeproduct size and sales locationsales location.
For example, for this url"https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab", xpath for size is:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()

xpath for sales location is:

//*[@id="pageBodyContainer"]/div[1]/div[2]/div[2]/div/div[1]/div[2]/span

I also try to get these two elements by using requests but it also did not work. Do anyone know why it happened? Any help appreciated. Thanks.

Following is my code:

    def job_function():
        urlList = ['https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab',
        'https://www.target.com/p/kleenex-ultra-soft-facial-tissue/-/A-84780536?preselect=12964744#lnk=sametab',
        'https://www.target.com/p/claritin-24-hour-non-drowsy-allergy-relief-tablets-loratadine/-/A-80354268?preselect=14351285#lnk=sametab',
        'https://www.target.com/p/opti-free-pure-moist-rewetting-drops-0-4-fl-oz/-/A-14358641#lnk=sametab'     
        ]
    
        def ScrapingTarget(url):
            AArray = []
            wait_imp = 10
            CO = webdriver.ChromeOptions()
            CO.add_experimental_option('useAutomationExtension', False)
            CO.add_argument('--ignore-certificate-errors')
            CO.add_argument('--start-maximized')
            wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',
                                  options=CO)
    
            wd.get(url)
            wd.implicitly_wait(wait_imp)
            sleep(1)
    
            #start scraping
            name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text
            sleep(0.5)
            price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text
            sleep(0.5)
            try:
                size = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[3]/div/div[1]/text()").text
            except:
                size = "none"
            sleep(0.5)
            try:
                sales location = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[2]/span").text
            except:
                sales location = "none"
            tz = pytz.timezone('Etc/GMT-0')
            GMT = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
    
            AArray.append([name, price, size, sales location, GMT])

            with open(
                    r'C:\Users\12987\PycharmProjects\python\Network\priceingAlgoriCoding\export_Target_dataframe.csv',
                    'a', newline="", encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerows(AArray)
    
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            executor.map(ScrapingTarget, urlList)
    
    sched = BlockingScheduler()
    sched.add_job(job_function,'interval',seconds=60)
    sched.start()

分享到QQ

分享到微博