python | Web刮擦:HTML代码大多使用同一类时使用Web刮擦的问题,而没有任何ID或名称属性
因此,我试图使用网络刮擦的页面是私人的。它使用双向身份验证,这不会让我通过硒打开链接。当我手动打开页面时,我不要求额外的身份验证。
该页面是自我使用页面上所有表的同一类,而TD标签中的类也大致相同。
这是同一页面上的另一个表,我不需要,但是大多数都具有相同的类和标签
它确实杀死了我认为没有其他属性或任何内容可以使其更简单。由于事实并非如此,所以我真的很无知如何继续获取数据。
真正开放任何想法。 提前提前
so the page I'm trying to use Web Scraping on is Private. It uses two-way authentication, which will not let me open the link through selenium. When I open the page manually I'm not asked for extra authentication.
The Page is self uses the same classes for all the tables on the page and the classes in the td tag are all mostly the same as well.
Here is the Table with the data I wan't to extract
Here is another Table on the same Page, which I don't need, but mostly has the same classes and tags
It really kills me that no other attributes or anything was added in order to make this a bit more simple. Since that is no the case, I'm really clueless how to continue to get the data.
Really open for any ideas.
Thx in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,如果数据始终处于相同的顺序中,则可以尝试使用CSS选择器,例如
driver.find_element(by.css_selector,“ tr> td:nth-child(3)”)在第一个TR中获取第三个TD。
如果它不起作用,并且您的目标是获取与表中的密钥相关的信息,则可以制作一个循环以将表从表中收集到字典中,然后调用所需的键。
First, if the data are always in the same order, you can try to use a css selector, like
driver.find_element(By.CSS_SELECTOR, "tr > td:nth-child(3)")
to get the third td in the first tr for exemple.If it don't work, and your goal is to get information related to a key in the table, you can make a loop to collect all the data from the table into a dictionnary, and then call the key that you want.