我可以使用什么正则表达式从 Google 搜索中提取 URL?
我将 Delphi 与 JCLRegEx 结合使用,并希望从 google 搜索中捕获所有结果 URL。我查看了 HackingSearch.com,他们有一个看起来正确的示例正则表达式,但当我尝试时我无法得到任何结果。
我使用它类似于:
Var re:JVCLRegEx;
I:Integer;
Begin
re := TJclRegEx.Create;
With re do try
Compile('class="?r"?>.+?href="(.+?)".*?>(.+?)<\/a>.+?class="?s"?>(.+?)<cite>.+?class="?gl"?><a href="(.+?)"><\/div><[li|\/ol]',false,false);
If match(memo1.lines.text) then begin
For I := 0 to captureCount -1 do
memo2.lines.add(captures[1]);
end;
finally free;
end;
freeandnil(re);
end;
正则表达式可在 hackingsearch.com
我正在使用 Delphi Jedi 版本,因为每次我安装 TPerlRegEx 时都会与两者发生冲突......
I'm using Delphi with the JCLRegEx and want to capture all the result URL's from a google search. I looked at HackingSearch.com and they have an example RegEx that looks right, but I cannot get any results when I try it.
I'm using it similar to:
Var re:JVCLRegEx;
I:Integer;
Begin
re := TJclRegEx.Create;
With re do try
Compile('class="?r"?>.+?href="(.+?)".*?>(.+?)<\/a>.+?class="?s"?>(.+?)<cite>.+?class="?gl"?><a href="(.+?)"><\/div><[li|\/ol]',false,false);
If match(memo1.lines.text) then begin
For I := 0 to captureCount -1 do
memo2.lines.add(captures[1]);
end;
finally free;
end;
freeandnil(re);
end;
Regex is available at hackingsearch.com
I'm using the Delphi Jedi version, since everytime I install TPerlRegEx I get a conflict with the two...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
题外话:您可以尝试 Google AJAX 搜索 API:http://code.google.com/apis /ajaxsearch/文档/
Offtopic: You can try Google AJAX Search API: http://code.google.com/apis/ajaxsearch/documentation/
以下是 Google 搜索结果中术语
python tuple
的相关部分。 (我通过在这里和那里添加新行来修改它以适应屏幕,但我在从 Google 源获取的原始字符串上测试了您的正则表达式,如 Firebug 所示)。您的正则表达式没有给出该字符串的匹配项。FWIW,我想众多原因之一是这个结果中根本没有
。我从 Firebug 复制了完整的 html 源代码并尝试将其与您的正则表达式匹配 - 根本没有任何匹配。Google 可能会不时改变显示结果的方式 - 在给定时间,它可能会根据您的登录状态、网络历史记录等因素而变化。您提出的特定正则表达式可能目前适合您,但从长远来看,它会变得难以维护。人们建议使用 html 解析器而不是给出正则表达式,因为他们知道解决方案不稳定。
Below is a relevant section from Google search results for the term
python tuple
. (I modified it to fit the screen here by adding new lines here and there, but I tested your regex on the raw string obtained from Google's source as revealed by Firebug). Your regex gave no matches for this string.FWIW, I guess one of the many reasons is that there is no
<Va>
in this result at all. I copied the full html source from Firebug and tried to match it with your regex - didn't get any match at all.Google might change the way they display the results from time to time - at a given time, it can vary depending on factors like your logged in status, web history etc. The particular regex you came up with might be working for you for now, but in the long run it will become difficult to maintain. People suggest using html parser instead of giving a regex because they know that the solution won't be stable.
如果您需要调试任何语言的正则表达式,您需要查看 RegExBuddy,它不是免费的,但它会一天之内就能收回成本。
If you need to debug regular expressions in any language you need to look at RegExBuddy, its not free but it will pay for itself in a day.
目前有效。
works for now.