我如何使用Selenium和Python避免或最小化TimeOutEfexception的Instagram?
我使用Python和Selenium开发了一个成功的Instagram(IG)机器人,使我可以发布,喜欢和遵循。
我注意到每隔几天,我的机器人就会遇到一个问题,从而获得了超时访问。我认为这是因为IG更新其网站/更改源代码。
我想分享(1)我开发的用于查找网络元素的方法/函数以及(2)一些元素,看看我是否有更好的方法可以避免使用机器人的未来问题。或者,当IG更新/更改其网站时,我是否需要接受我需要定期更新XPATH等列表。
下面的代码显示了我使用的主要功能之一。这会列出可能的路径和时间延迟作为输入。然后,它试图找到每个元素,通常是通过XPATH来找到每个元素,然后返回此元素,如果有timeoutexception,它将尝试我列表中的下一个可能的路径。
def try_selenium_timeout_clickable(input_possible_paths_list, input_time_delay):
for loop_possible_path in input_possible_paths_list:
very_short_sleep()
try:
print('Trying a path')
element_we_want = WebDriverWait(driver, input_time_delay ,ignored_exceptions=ignored_exceptions).until(EC.element_to_be_clickable((By.XPATH,loop_possible_path)))
return(element_we_want)
except TimeoutException:
print('TimeoutException - trying different path')
过去发生了变化并引起我一些问题的元素的两个示例是:(1)关闭帖子的帖子右上方的小十字架以及(2)帖子上的小心脏图标,以喜欢该帖子。下面是我传递到“ try_selenium_timeout_clickable”函数的列表的示例。
close_post_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div/svg/path", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]"]
like_button_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button/div[1]/svg", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button"]
我的问题
有人知道为什么我会定期获得TimeOutException - 这是因为IG更改/更新其源代码吗?
我有一种更好的方法来参考上面列表中的元素,或者确实是一种更好的方法,让我整体做到这一点,以避免或最大程度地减少timeOutOutException的任何内容? >
I have developed a successful Instagram (IG) bot, using Python and Selenium, that allows me to post, like, and follow.
I have noticed that every few days, my bot runs into a problem, whereby it is getting a TimeoutException. I think this is because IG update their website/change the source code.
I wanted to share (1) the method/functions I have developed for finding the web elements and (2) some of the elements, to see if there is a better way for me to do this, to avoid future issues with my bot. Or, do I need to accept that I need to update the list of XPATHS etc on a regular basis when IG updates/change their site.
The code below shows one of the main functions I use. This takes a list of possible paths and a time delay as inputs. It then tries to find each element, typically by XPATH and then return this, if there is a TimeoutException, it will try the next possible path in my list.
def try_selenium_timeout_clickable(input_possible_paths_list, input_time_delay):
for loop_possible_path in input_possible_paths_list:
very_short_sleep()
try:
print('Trying a path')
element_we_want = WebDriverWait(driver, input_time_delay ,ignored_exceptions=ignored_exceptions).until(EC.element_to_be_clickable((By.XPATH,loop_possible_path)))
return(element_we_want)
except TimeoutException:
print('TimeoutException - trying different path')
Two examples of elements that have changed and caused me some issues in the past are, (1) the little cross at the top right of a post that closes the post and (2) the little heart icon on a post for liking the post. Examples of the lists that i pass into the 'try_selenium_timeout_clickable' function are below.
close_post_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div/svg/path", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]"]
like_button_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button/div[1]/svg", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button"]
My questions
Does anyone know why i regularly get TimeoutException's - is this because IG changes/updates their source code?
Is there a better way for me to reference the elements in the lists above or indeed a better way for me to do this overall, so as to avoid or minimise any TimeoutException's and changes to the XPATH's in the future?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,我试图尽可能避免杠杆XPATH,除非我尝试使用其内部文本(并且没有标题属性)来定位网元,否则我会尝试确保它们'重新相对灵活,例如div [@class ='whting whting'] // div [包含(@class,'另一级')],
我会猜测您正在遇到困难,因为您正在使用的xpath 必须有Y Div的孩子,这是很多Z孩子的孩子。
太脆了,“这个div必须有一个X Div的孩子, 相对容易
colles_button_path =“ div [cool ='button'] [aria-label ='close']“
像我要使用的cssSelector一样棘手(button [type ='button'] [aria-label =''类似于'])返回两个Webements,因此我们可以使用XPath查找这两个元素,然后获取祖先按钮元素(其中只有一个)。
例如__button_xpath =“ // button [@type ='button'] //*[@aria-label =“ like”] // ancestor :: button“
tl; dr; dr; dr
您的XPath非常脆弱,如果您只能直接跳到所讨论的元素:
colled_button_css =“ div [div [cole ='button'] [aria-label ='close'],例如
_button_xpath = “ // button [@type ='button'] //*[@aria-label =“ like”] //祖先:: button”
In general I try to avoid leveraged xpaths as much as possible, unless I'm trying to target a WebElement using only its inner text (and it doesn't have a title property), and when I do so I try to ensure that they're relatively flexible, e.g. div[@class='whatever']//div[contains(@class, 'another-class')]
I would hazard a guess that you're running into difficulties as the xpaths you're using are too brittle, 'this div must have a X div children which must have Y div children which much have Z children".
I had a little look at the IG dom and it seems like you could just use a CssSelector to target the close button relatively easily
close_button_path = "div[role='button'] [aria-label='Close']"
The like button is a little trickier as the CssSelector that I would use (button[type='button'] [aria-label='Like']) returns two WebElements, so we can use an Xpath to find these two elements and then get the ancestor button element (of which there is only one).
like_button_xpath = "//button[@type='button']//*[@aria-label="Like"]//ancestor::button"
tl;dr
Your xpaths are very brittle, try to avoid targeting one element after another if you can just jump straight down to the element in question:
close_button_css = "div[role='button'] [aria-label='Close']"
like_button_xpath = "//button[@type='button']//*[@aria-label="Like"]//ancestor::button"