在Tableau中使用Python Selenium的嵌入式卷轴

发布于 2025-02-05 01:48:47 字数 2673 浏览 3 评论 0 原文

我正在努力从供应商那里刮下一个私人的Tableau仪表板,似乎无法选择或使用Tableau中存在的嵌入式卷轴。我试图滚动,滚动到视图中,然后只用JavaScript抓住滚动条。

我遇到的滚动栏的一个示例可以在:

https://public.tableau.com/views/worldicators-tableators-tablicators-tableators-tablabableAugeNeraLexpample?stample/stample?stample?stample?stample?stample?; stambed; stambed = impempempempempuizp. y&%3adisplay_count = y&%3adisplay_static_image = y

我正在使用的xpath是

/html/body/div[2]/div[3]/div[1]/div[1]/div/div[2]/div[4]/div/div/div/div/div[2]/div/div/div/div[1]/div[20]

我尝试了已找到的选项此处在这里 a>。

我似乎实际上无法抓住卷轴本身。我能做的最好的方法是单击整个栏。

当我迭代它们时,我该如何提高此滚动条以使ID亮相?

import os, sys, shutil, logging, os.path
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
from azure.storage.blob import BlockBlobService





url = 'https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y'
    

PATH = "/Users/171644/python_tools/chromedriver"  #change this
options = Options()
driver = webdriver.Chrome(PATH,options=options)
wait = WebDriverWait(driver, 120)

driver.get(url)
time.sleep(5)
driver.fullscreen_window()
time.sleep(10)

element = driver.find_element_by_id('10671917940_0')
actions = ActionChains(driver)
actions.move_to_element(element).perform()

I'm working on scraping a private Tableau Dashboard from a vendor and cannot seem to select or use the embedded scrollbars that exist in tableau. I've attempted to scroll, scroll into view, and simply grabbing the scrollbar with javascript.

An example of the scrollbar I've encountered can be found at:

https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y

the XPATH I am using is

/html/body/div[2]/div[3]/div[1]/div[1]/div/div[2]/div[4]/div/div/div/div/div[2]/div/div/div/div[1]/div[20]

I've attempted the options found here, here, and here.

I cannot seem to actually grab the scrollbar itself. The best I've been able to do is click the entire bar.

How can I advance this scrollbar to bring IDs into view as I iterate over them?

import os, sys, shutil, logging, os.path
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
from azure.storage.blob import BlockBlobService





url = 'https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y'
    

PATH = "/Users/171644/python_tools/chromedriver"  #change this
options = Options()
driver = webdriver.Chrome(PATH,options=options)
wait = WebDriverWait(driver, 120)

driver.get(url)
time.sleep(5)
driver.fullscreen_window()
time.sleep(10)

element = driver.find_element_by_id('10671917940_0')
actions = ActionChains(driver)
actions.move_to_element(element).perform()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

就是爱搞怪 2025-02-12 01:48:47

这是不起作用的,因为您要访问的元素位于不同域中的iframe内部。您可以在 same-origin-policy-policy 。

此外,您的方法需要花费大量时间并且在这里变得片状:嵌入式图Tableau工作簿在iFrame内部渲染(您必须找到每个InvidiDualiframe),并且还会发生异步渲染,并进行ajax呼叫;因此,您将处理明确的等待。

我建议我使用刮擦工具

我给你一个如果您想跟进最新信息,则很小的代码段。

from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet

This is not going to work because the element you are trying to access is located inside of an iframe from a different domain. You can read more on this on Same-Origin-Policy .

Additionally, there are many reasons why your approach will take a lot of time and be flaky here: Embedded tableau workbooks are rendered inside an iframe (you will have to locate each invididual iframe) and there's also asynchronous rendering taking place w/ AJAX calls; so you will deal with explicit waits a lot.

I would advise to use a scraping tool instead

I leave you a little code snippet in case you want to follow up on the latest.

from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet
惯饮孤独 2025-02-12 01:48:47

要使用此代码,您需要 pip install pyautogui 。使用Pyautogui,您可以将鼠标光标移到桌子上,然后用鼠标轮将滚动滚动,以便将所有行加载。

重要:在最后一行中,我们需要 row.get_attribute('innertext')而不是 row.text ,因为 .text 能够获得只有可见元素的文本内容。

import pyautogui

driver.get(url)
time.sleep(5)
table = driver.find_element(By.CSS_SELECTOR, 'div.tabZone-viz')
c = table.rect

# move mouse to the center of the table
pyautogui.moveTo(c['x']+c['width'], c['y']+c['height'])

# scroll to the bottom of the table
pyautogui.scroll(-9999)

# find the first cell of each row
rows = driver.find_elements(By.CSS_SELECTOR, 'div.tab-vizLeftSceneMargin div.tab-vizHeaderWrapper')

# print the content of the cells
[row.get_attribute('innerText') for row in rows]

输出

['Singapore',
 'Hong Kong SAR, C..',
 'New Zealand',
 'United States',
 ...
 'Congo, Rep.',
 'Central African Rep..',
 'Libya',
 'Chad']

To use this code you need pip install pyautogui. With pyautogui you can move the mouse cursor over the table and then simulate scroll down with the mouse wheel, so that all the rows are loaded.

Important: in the last line we need row.get_attribute('innerText') instead of row.text, because .text is able to get only the text content of the visible elements.

import pyautogui

driver.get(url)
time.sleep(5)
table = driver.find_element(By.CSS_SELECTOR, 'div.tabZone-viz')
c = table.rect

# move mouse to the center of the table
pyautogui.moveTo(c['x']+c['width'], c['y']+c['height'])

# scroll to the bottom of the table
pyautogui.scroll(-9999)

# find the first cell of each row
rows = driver.find_elements(By.CSS_SELECTOR, 'div.tab-vizLeftSceneMargin div.tab-vizHeaderWrapper')

# print the content of the cells
[row.get_attribute('innerText') for row in rows]

Output

['Singapore',
 'Hong Kong SAR, C..',
 'New Zealand',
 'United States',
 ...
 'Congo, Rep.',
 'Central African Rep..',
 'Libya',
 'Chad']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文