我正在使用Selenium从Facebook帖子上的评论中从DIV类元素下提取HREF元素。
我使用 https:// m。而不是 https://www.
奇怪的是,该代码在起作用,但会产生错误HREF链接。
这是我的测试链接:
这是代码中的相关示例:
links = browser.find_elements_by_xpath("//div[@class='_2b1h async_elem']/a")
for link in links:
print(link.get_attribute('href'))
这是第一个打印(错误)(错误):
https://m.facebook.com/comment/replies/?ctoken=10158970984237798_10158971018182798&count=7& amp; ; eav = afyl7kfupiufaudt64uj85qvzhozxyuktty1wrjrnmqxfg85nmev-au_bpm0a4z0hzm& amp; av = 10007874316486& gfid = aqc0hgp2a-i6q7fpj_y& tn = r
但HREF链接应为(右):
https://m.facebook.com/comment/replies/?ctoken=10158970984237798_10158971018182798&count=7& amp; ; eav = afaic6mt5kvbueifgolyj9g5kyf_lv_lv4sncnomajjkkkkk1dek-axbnnywwnnywnnywnnywnnywnnywnofmt9kioq& amp; amp; av = 100000431416784& gfid = aqdid_gy8ucklbnf0bq& tn = r
为什么这件代码会获得错误的HREF链接?
Im using Selenium to extract a href elements under div class elements from comments on Facebook posts.
To do that im using https://m. and not https://www.
The weird thing is that the code is working but it produces a wrong href link.
This is my test link:
https://m.facebook.com/permalink.php?story_fbid=10158970984237798&id=267767252797
here's the relevant sample from the code:
links = browser.find_elements_by_xpath("//div[@class='_2b1h async_elem']/a")
for link in links:
print(link.get_attribute('href'))
This is the first print (wrong):
https://m.facebook.com/comment/replies/?ctoken=10158970984237798_10158971018182798&count=7&curr&pc=1&isinline&initcomp&ft_ent_identifier=10158970984237798&eav=AfYL7kFupIufaUdT64Uj85QVZhOZxYUkTTY1wrjRnMqxFG85Nmev-Au_bPm0a4Z0HzM&av=100078743166486&gfid=AQC0Hgp2a-I6Q7Fpj_Y&tn=R
but the href link should be (Right):
https://m.facebook.com/comment/replies/?ctoken=10158970984237798_10158971018182798&count=7&curr&pc=1&isinline&initcomp&ft_ent_identifier=10158970984237798&eav=AfaIC6mT5kvBUEIfgoLYj9G5KYF_lv4sncnOMaJjJKk1dEk-aXbNnYwwNnoFmt9kIOQ&av=100000431416784&gfid=AQDiD_GY8uckLbNf0bQ&tn=R
Why does this piece of code get the wrong href link?
发布评论
评论(1)
对于每个正在寻找解决方案来解决我所遇到的同一问题的人,我会为您省去麻烦,并说经过大量挖掘后,FB 似乎只是在经过一段不同的短时间后更改了几个元素值(例如 href..)时间(可以是小时或分钟)。
就我而言,代码运行间隔只有 30 分钟。
供参考,
和平。
For everyone looking for a solution to the same problem I had, i will save you the trouble and say that after a lot of digging, it appears that FB is simply changing several elements values (like href..) after a varied short period of time (can be hours or minutes).
in my case, the code worked only with 30 min gap between runs.
FYI,
Peace.