如何使用Python硒从跨元素中提取多个文本?
我正在尝试使用Selenium Webdriver方法中的以下HTML代码将SPAN中的所有文本提取到列表中:
['1a', '1b', '1c', '2a', ' ', ' ', '3a', '3b', '3c', '4a', ' ', ' ']
有人专家知道该怎么做吗?
html:
<tr style="background-color:#999">
<td><b style="white-space: nowrap;">table_num</b><enter code here/td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>1a</span>
<span>1b</span>
<span>1c</span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>2a</span>
<span> </span>
<span> </span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>3a</span>
<span>3b</span>
<span>3c</span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>4a</span>
<span> </span>
<span> </span>
</span>
</td>
</tr>
I am trying to extract all the texts in span into list, using the following HTML code from Selenium webdriver method:
['1a', '1b', '1c', '2a', ' ', ' ', '3a', '3b', '3c', '4a', ' ', ' ']
Anyone expert know how to do it?
HTML:
<tr style="background-color:#999">
<td><b style="white-space: nowrap;">table_num</b><enter code here/td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>1a</span>
<span>1b</span>
<span>1c</span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>2a</span>
<span> </span>
<span> </span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>3a</span>
<span>3b</span>
<span>3c</span>
</span>
</td>
<td style="text-align:center;">
<span style="flex: 1;display: flex;flex-direction: column;">
<span>4a</span>
<span> </span>
<span> </span>
</span>
</td>
</tr>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是方法,请使用以下
XPath
,它将为您提供所有必需的跨度
。一旦拥有所有跨度,就必须从中提取文本。
如果有空文本,请忽略或将其添加到列表中。
Here is the way, use the below
xpath
which will give you all the requiredspans
.Once you have all the span, you have to extract text from it.
If there is empty text, then ignore or else add it in the list.
根据html,要从
&lt; span&gt;
元素中提取所有文本中,您必须诱导 https://stackoverflow.com/a/59130336/7429447"> webdriverwait > 并使用 list consection 您可以使用以下任何一个 定位器策略 :使用 css_selector使用 css_selector /em>和 text 属性:
使用 xpath 和
get_attribute(“ innerhtml”)
< /em>:As per the HTML, to extract all the texts from the
<span>
elements into a list you have to induce WebDriverWait for visibility_of_all_elements_located() and using List Comprehension you can use either of the following locator strategies:Using CSS_SELECTOR and text attribute:
Using XPATH and
get_attribute("innerHTML")
:只需从XPath中删除谓词
[1]
,它就会变成:EN可以更确切地说:您可以使用:
Just remove the predicate
[1]
from XPath, so it becomes:En to be more precise you could use: