如何从没有 id 的表中 getelementbyid()
好吧,我想不出任何更简单的方法来表达这个问题,但它并不像看起来那么复杂。基本上我有一个小项目可以帮助自己在工作场所晋升(目前是技术支持代理,希望兼职网络开发:我目前渴望代码,而技术支持并不令人满意)
所以我说我会制作一个小程序,当出现问题或站点问题时,可以向技术代理提供更新信息。它从一个名为“中断”的小网页获取信息(在我看来,这是灾难性的,wcc 验证器上有 177 个错误)
Web 开发人员不会只提供表和 ID,某种安全漏洞?不知道怎么做,但我不会质疑我上面的人。尝试与他们合作,而不是反对他们。
表本身没有 id,但里面的列有(span id),例如
<table width="100%" border="0">
<tbody>
<tr id="title">
<td width="9%">Date/Time</td>
<td width="24%">program/site</td>
<td width="5%">Ticket</td>
<td width="*">Issue</td>
<td width="2%">More</td>
</tr>
<tr>
<td><span id="date">2011-01-27 17:32</span></td>
<td><span id="site"><a id="fus_00001"></a>sample area or program affected</span></td>
<td><span id="site"><a href="https://sample php file i cant give you" target="_blank">12345671</a></span></td>
<td><span id="issue">problem identified/ investiating</span></td>
<td><span id="ticket"></span></td>
</tr><tr>
我使用java来实现这个目的,并且出于所有意图和目的,它绘制,完成我需要它做的一切。为了解析信息,我正在使用 htmlunit 2.8
这是我目前正在使用的代码。我只是不知道如何获取那些没有 id 的表。
String update = "blank";
final WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);// javascript causes some serious problems.
webClient.setCssEnabled(false);
HtmlPage page;
try
{
URL outageURL = new URL("file:\\C:\\Users\\MYDRIVE\\Desktop\\version control\\OUTAGE\\Outages.htm"); //local drive at home
page = webClient.getPage(outageURL);
//final HtmlTable table = page.getHtmlElementById("outages");// if the table had the id "outages, this would be perfect! but alas it doesnt
final HtmlTable table = page.get//the cells int eh table by some other means
update = (table.getCellAt(1,0).asText() + " " + table.getCellAt(1,1).asText() + " " + table.getCellAt(1,2).asText() + " " + table.getCellAt(1,3).asText());
// above code takes the cells and combines them
} catch and everything else
return update;
所以底线是,有没有人知道如何在没有 id 的情况下通过其他方式访问这些表。也许是跨度ID? ps 我已经浏览了 api 和 html 单元,不太确定我能找到任何有用的东西。
final String stringHtmlTable = page.getPage().asXml();
如果我要这样做,我将如何使用 xpath 根据标记的响应将我带到所需的单元格。 ps对xml完全不熟悉
Well I can't think of any easier way to word that question guys, but it's not as complex as it seems. Basically I have a little project going to help myself move up in my workplace (tech support agent at the moment, looking to go part time in web dev: I'm hungry for code at the moment and tech support isn't satisfying)
So I said I'd make a small program that would update tech agents on problems or site issues when they arose. It takes the information from a small webpage called outage (which is disastrous in my opinion, 177 errors on wcc validator)
The web dev guys won't just give the table and id, some sort of security hole? Don't know how but I'm not going to question the guys above me. Trying to work with them, not against them.
The table itself doesn't have an id, but the columns inside do (span id), e.g
<table width="100%" border="0">
<tbody>
<tr id="title">
<td width="9%">Date/Time</td>
<td width="24%">program/site</td>
<td width="5%">Ticket</td>
<td width="*">Issue</td>
<td width="2%">More</td>
</tr>
<tr>
<td><span id="date">2011-01-27 17:32</span></td>
<td><span id="site"><a id="fus_00001"></a>sample area or program affected</span></td>
<td><span id="site"><a href="https://sample php file i cant give you" target="_blank">12345671</a></span></td>
<td><span id="issue">problem identified/ investiating</span></td>
<td><span id="ticket"></span></td>
</tr><tr>
I'm using java for this and for all intents and purposes, it draws, does everything i need it to. To parse the information I'm using htmlunit 2.8
Here's the code that I'm using at the moment. I just don't know how to get those tables without an id.
String update = "blank";
final WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);// javascript causes some serious problems.
webClient.setCssEnabled(false);
HtmlPage page;
try
{
URL outageURL = new URL("file:\\C:\\Users\\MYDRIVE\\Desktop\\version control\\OUTAGE\\Outages.htm"); //local drive at home
page = webClient.getPage(outageURL);
//final HtmlTable table = page.getHtmlElementById("outages");// if the table had the id "outages, this would be perfect! but alas it doesnt
final HtmlTable table = page.get//the cells int eh table by some other means
update = (table.getCellAt(1,0).asText() + " " + table.getCellAt(1,1).asText() + " " + table.getCellAt(1,2).asText() + " " + table.getCellAt(1,3).asText());
// above code takes the cells and combines them
} catch and everything else
return update;
So bottom line, has anyone got any ideas of how to access these tables by some other way without the id. Maybe the span id? p.s I've looked through the api hor html unit, not really sure I can find anything useful.
final String stringHtmlTable = page.getPage().asXml();
If I was to do this, how would I use xpath to take me to the desired cell as per mark's response.
p.s. not familiar with xml at all
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
找到一个好的 xpath 例子绝对是非常困难的。
最后,通过span id得到了每一个的详细信息
finding a good example of xpath was absolutely ridiculously hard.
In the end, this got the details of each one by the span id's
如果您无法直接获取表标签本身(例如通过 ID),那么您可以深入挖掘该表独有的内容。例如,如果这是页面上唯一具有
program/site
的表格,则可以让 XPath 查找该单元格,然后使用 getParent() 向上挖掘到父元素。
If you can't get at the table tag itself directly (e.g by ID), then you can dig deeper inside for something that would be unique for just that table. For instance, if this is the only table on the page that would have
<td width="24%">program/site</td>
, you can have XPath look for that cell, then use getParent() to dig back upwards to the parent<table>
element.