Groovy htmlunit

发布于 2024-09-29 09:58:44 字数 467 浏览 3 评论 0原文

我在将 htmlunit (htmlunit.sf.net) 导入 groovy 脚本时遇到问题。

我目前只是使用网络上的示例脚本，它让我无法解析类 com.gargoylesoftware.htmlunit.WebClient

该脚本是：

import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

我从网站下载了源代码并将我的脚本所在的 com 文件夹（及其所有内容）。

有谁知道我遇到了什么问题？我不太确定为什么它不会导入它

原文

I'm having issues importing htmlunit (htmlunit.sf.net) into a groovy script.

I'm currently just using the example script that was on the web and it gives me unable to resolve class com.gargoylesoftware.htmlunit.WebClient

The script is:

import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

I downloaded the source from the website and placed the com folder (and all its contents) where my script was located.

Does anyone know what issue I'm encountering? I'm not quite sure why it won't import it

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清引 2024-10-06 09:58:44

您可以使用 Grape 在脚本运行时为您获取依赖项。最简单的方法是将 @Grab 注释添加到导入语句中。

就像这样：

@Grab('net.sourceforge.htmlunit:htmlunit:2.7')
import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()

// Added as HtmlUnit had problems with the JavaScript
client.javaScriptEnabled = false
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

只有一个问题。对于 HtmlUnit 来说，该页面似乎有点难以咀嚼。当我运行代码时，每次都会遇到 OutOfMemoryException 。我建议以正常方式下载 html，然后使用 NekoHtml 或 TagSoup 之类的工具将 html 解析为 XML 并以这种方式使用它。

此示例使用 TagSoup 在 Groovy 中将 html 作为 xml 处理： http://blog.foosion.org/2008/06/09/parse-html-the-groovy-way/

You could use Grape to get the dependecy for you during script runtime. Easiest way to do it is to add a @Grab annotation to your import statement.

Like this:

@Grab('net.sourceforge.htmlunit:htmlunit:2.7')
import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()

// Added as HtmlUnit had problems with the JavaScript
client.javaScriptEnabled = false
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

There's only one problem. The page seems to be a little bit to much to chew off for HtmlUnit. When I ran the code I got OutOfMemoryException every time. I'd suggest downloading the html the normal way instead and then using something like NekoHtml or TagSoup to parse the html into XML and work with it that way.

This example uses TagSoup to work with html as xml in Groovy: http://blog.foosion.org/2008/06/09/parse-html-the-groovy-way/

回复收藏 0 原文