如何提取 HTML 标签中的文本（在 Selenium IDE 中）？

发布于 2024-11-08 04:12:05 字数 348 浏览 0 评论 0原文

html 看起来像这样：

<p>
    sometext1
    <br>
    sometext2
    <br>
    sometext3
</p>

我想提取段落标记之间的所有文本，包括标记。

我尝试使用 storeText 函数，但它仅存储文本，没有标签。我可以存储整个 HTML 源代码，然后在 Perl 中提取我需要的内容，但我想知道是否有一种方法可以使用特定的 xpath 存储 HTML 代码块（例如，将网页中第三个表的 HTML 代码存储在里面）一个变量）。

原文

The html looks something like this:

<p>
    sometext1
    <br>
    sometext2
    <br>
    sometext3
</p>

I would like to extract all the text between the paragraph tags, including the <br> tags.

I tried to use storeText function, but it stores only the text, without the tags.
I could store the entire HTML source and then extract what I need in Perl, but I was wondering if there is a way to store a block of HTML code using a specific xpath (e.g. store the HTML code for the third table in the webpage inside a variable).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明月松间行 2024-11-15 04:12:05

innerHTML

我将尝试使用 document.getElementById('id').innerHTML

回复收藏 0 原文

ζ澈沫 2024-11-15 04:12:05

您可以将 getEval() 与 Javascript 结合使用，返回元素的innerHTML。不过，你必须在 javascript 中找到它

回复收藏 0 原文

满身野味 2024-11-15 04:12:05

@Tarun：如果我能的话，我会的……

@Grooveek：谢谢，伙计，这有效。
我使用：

 storeEval | window.document.getElementsByTagName("p").item(9).innerHTML | p

这将第9段的内容保存在变量p中。
我必须使用 getElementsByTagName 因为标签没有 id。

为了更准确，可以使用 getElementById 函数安装：

 storeEval | window.document.getElementById("id of element").innerHTML | p

希望这对其他人也有帮助。
再次感谢。

@Tarun: I would if I could man....

@Grooveek: Thanks man, that worked.
I used:

 storeEval | window.document.getElementsByTagName("p").item(9).innerHTML | p

This saved the content of the 9th paragrah in the variable p.
I had to use getElementsByTagName because the tags had no id's.

For more accuracy, one could use getElementById function insted:

 storeEval | window.document.getElementById("id of element").innerHTML | p

Hope this will help other people too.
Thanks again.

回复收藏 0 原文

↙温凉少女 2024-11-15 04:12:05

我建议这样做：

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("url")
element = driver.find_element_by_tag_name("p")
text = element.text

但请记住，如果您正在处理文本框，则不能使用 .text；它返回None。在这种情况下，您应该使用 .get_attribute("value")，当您无法捕获所需内容时，您可以使用 .get_attribute("innerHTML") 。

I suggest this:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("url")
element = driver.find_element_by_tag_name("p")
text = element.text

But keep in mind if you are dealing with text-boxes, you can't use .text; it returns None. In that case you should use .get_attribute("value"), and when ever you are unable to catch what you want, you can use .get_attribute("innerHTML").

回复收藏 0 原文

你的笑 2024-11-15 04:12:05

getAttribute("innerHTML");为我工作

回复收藏 0 原文

渔村楼浪 2024-11-15 04:12:05

我建议通过类名来查找它，并非所有对象都有它的 Id。

storeEval | window.document.getElementsByClassName('*classname*')[0].innerHTML; | HTMLContent

数字 0 将返回第一次出现的情况。如果有多个元素，请选择适当的数量，或者通过.length获取类出现的次数

storeEval | window.document.getElementsByClassName('*classname*').length; | ClassCount

I propose to find it by a class name, not all objects have it's Id.

storeEval | window.document.getElementsByClassName('*classname*')[0].innerHTML; | HTMLContent

number 0 will return first occurence. If there is more than one element, choose proper number, or get the number of class occurencies by .length

storeEval | window.document.getElementsByClassName('*classname*').length; | ClassCount

回复收藏 0 原文

~没有更多了~

关于作者

乖乖哒

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何提取 HTML 标签中的文本（在 Selenium IDE 中）？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如何提取 HTML 标签中的文本（在 Selenium IDE 中）？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。