BeautifulSoup 和 ASP.NET/C#
有人将 BeautifulSoup 与 ASP.NET/C# 集成(可能使用 IronPython 或其他方式)吗? 是否有 BeautifulSoup 替代方案或与 ASP.NET/C# 配合良好的端口
计划使用该库的目的是从任何随机 URL 中提取可读文本。
谢谢
Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)?
Is there a BeautifulSoup alternative or a port that works nicely with ASP.NET/C#
The intent of planning to use the library is to extract readable text from any random URL.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Html Agility Pack 是一个类似的项目,但针对 C# 和 .NET
编辑:
要提取所有可读文本:
请注意,这将返回
标记的文本内容。
要解决此问题,您可以删除所有
标记,如下所示:(
来源:SLAks)
Html Agility Pack is a similar project, but for C# and .NET
EDIT:
To extract all readable text:
Note that this will return the text content of
<script>
tags.To fix that, you can remove all of the
<script>
tags, like this:(Credit: SLaks)
我知道这已经很旧了,但我决定发布它以供将来参考。
我在寻找类似的解决方案时遇到了这个问题。
我发现了一个基于 Html Agility Pack 构建的库,名为 scrapysharp
我使用它的方式与 BeautifulSoup 非常相似
https://bitbucket.org/rflechner/scrapysharp/wiki/Home (编辑:链接损坏,项目移至 https://github.com/rflechner/ScrapySharp)
编辑:https://www.nuget.org/packages/ScrapySharp/ 有该包
I know this is quite old, but I decided to post this for future reference.
I came across this searching for a similar solution.
I found a library built on top of Html Agility Pack called scrapysharp
I've used it in quite similar manner as I would BeautifulSoup
https://bitbucket.org/rflechner/scrapysharp/wiki/Home (EDIT: broken link, project moved to https://github.com/rflechner/ScrapySharp)
EDIT: https://www.nuget.org/packages/ScrapySharp/ has the package
您可以尝试一下,尽管它目前有一些错误:
You could try this although it currently has a few bugs: