无法在 python 上使用 selenium 访问类中的文本
我愿意解析 https://2gis.kz ,并且我遇到了使用时出错的问题。文本或用于从类中提取文本的任何方法
我正在输入搜索查询,例如“健身”
我的窗口变量是
all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
card_.click()
window = driver.find_element(By.CLASS_NAME, "_18lzknl")
这是我如何打开一个包含所有基本信息的迷你窗口的相当简化的版本。下面我附加了一段代码,我试图从电话号码持有者中提取文本。
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
print(texts) # this prints out something from where I am concluding that this thing is accessible
try:
print(texts.text)
except:
print(".text")
try:
print(texts.text())
except:
print(".text()")
try:
print(texts.get_attribute("innerHTML"))
except:
print('getAttribute("innerHTML")')
try:
print(texts.get_attribute("textContent"))
except:
print('getAttribute("textContent")')
try:
print(texts.get_attribute("outerHTML"))
except:
print('getAttribute("outerHTML")')
嗨,伙计们,我解决了一个问题。由于某种原因,.text 不起作用。我猜开发人员以某种方式设法保护信息不使用这种方法。我用过一个
get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class
,现在它就像一个魅力。
texts = window.find_elements(By.TAG_NAME, "bdo")
with io.open("t.txt", "a", encoding="utf-8") as f:
for text in texts:
nums = re.sub("[^0-9]", "",
text.get_attribute("innerHTML"))
f.write(nums+'\n')
f.close()
所以问题是:
- 我试图仅通过使用 print(texts) 来打印项目列表
- 即使当我尝试在 for 循环中打印 texts 变量的每个元素时,我也会收到错误,因为它是以utf-8解码。
我希望有人会发现它很有用,并且不会花费大量时间尝试修复这样一个简单的错误。
I am willing to parse https://2gis.kz , and I encountered the problem that I am getting error while using .text or any methods used to extract text from a class
I am typing the search query such as "fitness"
My window variable is
all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
card_.click()
window = driver.find_element(By.CLASS_NAME, "_18lzknl")
This is a quite simplified version of how I open a mini-window with all of the essential information inside it. Below I am attaching the piece of code where I am trying to extract text from a phone number holder.
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
print(texts) # this prints out something from where I am concluding that this thing is accessible
try:
print(texts.text)
except:
print(".text")
try:
print(texts.text())
except:
print(".text()")
try:
print(texts.get_attribute("innerHTML"))
except:
print('getAttribute("innerHTML")')
try:
print(texts.get_attribute("textContent"))
except:
print('getAttribute("textContent")')
try:
print(texts.get_attribute("outerHTML"))
except:
print('getAttribute("outerHTML")')
Hi, guys, I solved an issue. The .text was not working for some reason. I guess developers somehow managed to protect information from using this method. I used a
get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class
and now it works like a charm.
texts = window.find_elements(By.TAG_NAME, "bdo")
with io.open("t.txt", "a", encoding="utf-8") as f:
for text in texts:
nums = re.sub("[^0-9]", "",
text.get_attribute("innerHTML"))
f.write(nums+'\n')
f.close()
So the problem was that:
- I was trying to print a list of items just by using print(texts)
- Even when I tried to print each element of texts variable in a for loop, I was getting an error due to the fact that it was decoded in utf-8.
I hope someone will find it useful and will not spend a plethora of time trying to fix such a simple bug.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
find_elements
方法返回 Web 元素列表。因此,这将为您提供
texts
网络元素列表。您不能直接在
list
上应用.text
方法。为了获取每个元素文本,您必须迭代列表中的元素并提取该元素文本,如下所示:
此外,我不确定您正在使用的定位器。
_1hf7139
、_18lzknl
和_b0ke8
类名似乎是动态类名,即它们可能会更改每个浏览会话。find_elements
method returns a list of web elements. So thisgives you
texts
a list of web elements.You can not apply
.text
method directly onlist
.In order to get each element text you will have to iterate over elements in the list and extract that element text, like this:
Also, I'm not sure about locators you are using.
_1hf7139
,_18lzknl
and_b0ke8
class names are seem to be dynamic class names i.e they may change each browsing session.