为什么 HTMLunit 在这个 https 网页上不起作用?
我正在尝试了解有关 HTMLunit 的更多信息并目前进行一些测试。我正在尝试从此网站获取页面标题和文本等基本信息:
https://....com(删除了完整的网址,重要的是它是https)
我使用的代码是这样的,在其他网站上运行良好:
final WebClient webClient = new WebClient();
final HtmlPage page;
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println(page.getTitleText());
System.out.println(page.asText());
为什么我无法获取此基本信息?如果是因为安全措施,具体是什么?我可以绕过它们吗?谢谢。
编辑:嗯,代码在 webclient.getpage(); 之后停止工作。 ,test2没有写。所以我无法检查页面是否为空。
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);
final HtmlPage page;
System.out.println("test1");
try {
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println("test2");
I'm trying to learn more about HTMLunit and doing some tests at the moment. I am trying to get basic information such as page title and text from this site:
https://....com (removed the full url, important part is that it is https)
The code I use is this, which is working fine on other websites:
final WebClient webClient = new WebClient();
final HtmlPage page;
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println(page.getTitleText());
System.out.println(page.asText());
Why can't I get this basic information ? If it is because of security measures, what are the specifics and can I bypass them ? Thanks.
Edit:Hmm the code stops working after webclient.getpage(); , test2 is not written. So I can not check if page is null or not.
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);
final HtmlPage page;
System.out.println("test1");
try {
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println("test2");
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我通过添加这行代码解决了这个问题:
这是禁用安全 SSL 的不推荐方法。在当前的 HtmlUnit 版本中,您必须执行以下操作:
I solved this by adding this line of code:
which is deprecated way of disabling secure SSL. In current HtmlUnit version you have to do:
我认为这是一个身份验证问题 - 如果我在 Firefox 中访问该页面,我会看到一个登录框。
尝试
webClient.setAuthentication(realm,username,password);
在调用 getPage() 之前
I think that this is an authentication problem - If I go tho that page in Firefox I get a login box.
Try
webClient.setAuthentication(realm,username,password);
before the call the getPage()