无法在 python 上使用 selenium 访问类中的文本

发布于 2025-01-09 13:28:04 字数 1837 浏览 0 评论 0原文

我愿意解析 https://2gis.kz ，并且我遇到了使用时出错的问题。文本或用于从类中提取文本的任何方法

我正在输入搜索查询，例如“健身”

我的窗口变量是

all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
    card_.click()
    window = driver.find_element(By.CLASS_NAME, "_18lzknl")

这是我如何打开一个包含所有基本信息的迷你窗口的相当简化的版本。下面我附加了一段代码，我试图从电话号码持有者中提取文本。

    texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

    print(texts) # this prints out something from where I am concluding that this thing is accessible
    try:
        print(texts.text)
    except:
        print(".text")
    try:
        print(texts.text())
    except:
        print(".text()")
    try:
        print(texts.get_attribute("innerHTML"))
    except:
       print('getAttribute("innerHTML")')
    try:
        print(texts.get_attribute("textContent"))
    except:
        print('getAttribute("textContent")')
    try:
        print(texts.get_attribute("outerHTML"))
    except:
        print('getAttribute("outerHTML")')

嗨，伙计们，我解决了一个问题。由于某种原因，.text 不起作用。我猜开发人员以某种方式设法保护信息不使用这种方法。我用过一个

get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class

，现在它就像一个魅力。

                texts = window.find_elements(By.TAG_NAME, "bdo")

                with io.open("t.txt", "a", encoding="utf-8") as f:
                    for text in texts:
                        nums = re.sub("[^0-9]", "", 
                        text.get_attribute("innerHTML"))
                        f.write(nums+'\n')
                    f.close()

所以问题是：

我试图仅通过使用 print(texts) 来打印项目列表
即使当我尝试在 for 循环中打印 texts 变量的每个元素时，我也会收到错误，因为它是以utf-8解码。

我希望有人会发现它很有用，并且不会花费大量时间尝试修复这样一个简单的错误。

原文

I am willing to parse https://2gis.kz , and I encountered the problem that I am getting error while using .text or any methods used to extract text from a class

I am typing the search query such as "fitness"

My window variable is

all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
    card_.click()
    window = driver.find_element(By.CLASS_NAME, "_18lzknl")

This is a quite simplified version of how I open a mini-window with all of the essential information inside it. Below I am attaching the piece of code where I am trying to extract text from a phone number holder.

    texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

    print(texts) # this prints out something from where I am concluding that this thing is accessible
    try:
        print(texts.text)
    except:
        print(".text")
    try:
        print(texts.text())
    except:
        print(".text()")
    try:
        print(texts.get_attribute("innerHTML"))
    except:
       print('getAttribute("innerHTML")')
    try:
        print(texts.get_attribute("textContent"))
    except:
        print('getAttribute("textContent")')
    try:
        print(texts.get_attribute("outerHTML"))
    except:
        print('getAttribute("outerHTML")')

Hi, guys, I solved an issue. The .text was not working for some reason. I guess developers somehow managed to protect information from using this method. I used a

get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class

and now it works like a charm.

                texts = window.find_elements(By.TAG_NAME, "bdo")

                with io.open("t.txt", "a", encoding="utf-8") as f:
                    for text in texts:
                        nums = re.sub("[^0-9]", "", 
                        text.get_attribute("innerHTML"))
                        f.write(nums+'\n')
                    f.close()

So the problem was that:

I was trying to print a list of items just by using print(texts)
Even when I tried to print each element of texts variable in a for loop, I was getting an error due to the fact that it was decoded in utf-8.

I hope someone will find it useful and will not spend a plethora of time trying to fix such a simple bug.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

無心 2025-01-16 13:28:04

find_elements 方法返回 Web 元素列表。因此，这

texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

将为您提供 texts 网络元素列表。
您不能直接在 list 上应用 .text 方法。
为了获取每个元素文本，您必须迭代列表中的元素并提取该元素文本，如下所示：

text_elements = window.find_elements(By.CLASS_NAME,'_b0ke8')
for element in text_elements:
    print(element.text)

此外，我不确定您正在使用的定位器。
_1hf7139、_18lzknl 和 _b0ke8 类名似乎是动态类名，即它们可能会更改每个浏览会话。

find_elements method returns a list of web elements. So this

texts = window.find_elements(By.CLASS_NAME,'_b0ke8')

gives you texts a list of web elements.
You can not apply .text method directly on list.
In order to get each element text you will have to iterate over elements in the list and extract that element text, like this:

text_elements = window.find_elements(By.CLASS_NAME,'_b0ke8')
for element in text_elements:
    print(element.text)

Also, I'm not sure about locators you are using.
_1hf7139, _18lzknl and _b0ke8 class names are seem to be dynamic class names i.e they may change each browsing session.

回复收藏 0 原文

~没有更多了~