我如何使用Selenium和Python避免或最小化TimeOutEfexception的Instagram?
我使用Python和Selenium开发了一个成功的Instagram(IG)机器人,使我可以发布,喜欢和遵循。
我注意到每隔几天,我的机器人就会遇到一个问题,从而获得了超时访问。我认为这是因为IG更新其网站/更改源代码。
我想分享(1)我开发的用于查找网络元素的方法/函数以及(2)一些元素,看看我是否有更好的方法可以避免使用机器人的未来问题。或者,当IG更新/更改其网站时,我是否需要接受我需要定期更新XPATH等列表。
下面的代码显示了我使用的主要功能之一。这会列出可能的路径和时间延迟作为输入。然后,它试图找到每个元素,通常是通过XPATH来找到每个元素,然后返回此元素,如果有timeoutexception,它将尝试我列表中的下一个可能的路径。
def try_selenium_timeout_clickable(input_possible_paths_list, input_time_delay):
for loop_possible_path in input_possible_paths_list:
very_short_sleep()
try:
print('Trying a path')
element_we_want = WebDriverWait(driver, input_time_delay ,ignored_exceptions=ignored_exceptions).until(EC.element_to_be_clickable((By.XPATH,loop_possible_path)))
return(element_we_want)
except TimeoutException:
print('TimeoutException - trying different path')
过去发生了变化并引起我一些问题的元素的两个示例是:(1)关闭帖子的帖子右上方的小十字架以及(2)帖子上的小心脏图标,以喜欢该帖子。下面是我传递到“ try_selenium_timeout_clickable”函数的列表的示例。
close_post_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div/svg/path", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]"]
like_button_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button/div[1]/svg", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button"]
我的问题
有人知道为什么我会定期获得TimeOutException - 这是因为IG更改/更新其源代码吗?
我有一种更好的方法来参考上面列表中的元素,或者确实是一种更好的方法,让我整体做到这一点,以避免或最大程度地减少timeOutOutException的任何内容? >
I have developed a successful Instagram (IG) bot, using Python and Selenium, that allows me to post, like, and follow.
I have noticed that every few days, my bot runs into a problem, whereby it is getting a TimeoutException. I think this is because IG update their website/change the source code.
I wanted to share (1) the method/functions I have developed for finding the web elements and (2) some of the elements, to see if there is a better way for me to do this, to avoid future issues with my bot. Or, do I need to accept that I need to update the list of XPATHS etc on a regular basis when IG updates/change their site.
The code below shows one of the main functions I use. This takes a list of possible paths and a time delay as inputs. It then tries to find each element, typically by XPATH and then return this, if there is a TimeoutException, it will try the next possible path in my list.
def try_selenium_timeout_clickable(input_possible_paths_list, input_time_delay):
for loop_possible_path in input_possible_paths_list:
very_short_sleep()
try:
print('Trying a path')
element_we_want = WebDriverWait(driver, input_time_delay ,ignored_exceptions=ignored_exceptions).until(EC.element_to_be_clickable((By.XPATH,loop_possible_path)))
return(element_we_want)
except TimeoutException:
print('TimeoutException - trying different path')
Two examples of elements that have changed and caused me some issues in the past are, (1) the little cross at the top right of a post that closes the post and (2) the little heart icon on a post for liking the post. Examples of the lists that i pass into the 'try_selenium_timeout_clickable' function are below.
close_post_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div/svg/path", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]/div", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[2]"]
like_button_possible_paths_list = ["/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button/div[1]/svg", "/html/body/div[1]/div/div[1]/div/div[2]/div/div/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[1]/span[1]/button"]
My questions
Does anyone know why i regularly get TimeoutException's - is this because IG changes/updates their source code?
Is there a better way for me to reference the elements in the lists above or indeed a better way for me to do this overall, so as to avoid or minimise any TimeoutException's and changes to the XPATH's in the future?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,我试图尽可能避免杠杆XPATH,除非我尝试使用其内部文本(并且没有标题属性)来定位网元,否则我会尝试确保它们'重新相对灵活,例如div [@class ='whting whting'] // div [包含(@class,'另一级')],
我会猜测您正在遇到困难,因为您正在使用的xpath 必须有Y Div的孩子,这是很多Z孩子的孩子。
太脆了,“这个div必须有一个X Div的孩子, 相对容易
colles_button_path =“ div [cool ='button'] [aria-label ='close']“
像我要使用的cssSelector一样棘手(button [type ='button'] [aria-label =''类似于'])返回两个Webements,因此我们可以使用XPath查找这两个元素,然后获取祖先按钮元素(其中只有一个)。
例如__button_xpath =“ // button [@type ='button'] //*[@aria-label =“ like”] // ancestor :: button“
tl; dr; dr; dr
您的XPath非常脆弱,如果您只能直接跳到所讨论的元素:
colled_button_css =“ div [div [cole ='button'] [aria-label ='close'],例如
_button_xpath = “ // button [@type ='button'] //*[@aria-label =“ like”] //祖先:: button”
In general I try to avoid leveraged xpaths as much as possible, unless I'm trying to target a WebElement using only its inner text (and it doesn't have a title property), and when I do so I try to ensure that they're relatively flexible, e.g. div[@class='whatever']//div[contains(@class, 'another-class')]
I would hazard a guess that you're running into difficulties as the xpaths you're using are too brittle, 'this div must have a X div children which must have Y div children which much have Z children".
I had a little look at the IG dom and it seems like you could just use a CssSelector to target the close button relatively easily
close_button_path = "div[role='button'] [aria-label='Close']"
The like button is a little trickier as the CssSelector that I would use (button[type='button'] [aria-label='Like']) returns two WebElements, so we can use an Xpath to find these two elements and then get the ancestor button element (of which there is only one).
like_button_xpath = "//button[@type='button']//*[@aria-label="Like"]//ancestor::button"
tl;dr
Your xpaths are very brittle, try to avoid targeting one element after another if you can just jump straight down to the element in question:
close_button_css = "div[role='button'] [aria-label='Close']"
like_button_xpath = "//button[@type='button']//*[@aria-label="Like"]//ancestor::button"