如何获得

XML 中的元素值?

发布于 2025-01-02 13:30:11 字数 726 浏览 1 评论 0原文

XDocument coordinates = XDocument.Load("http://feeds.feedburner.com/TechCrunch");
System.IO.StreamWriter StreamWriter1 = new System.IO.StreamWriter(DestFile);
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in coordinates.Descendants("item"))
{
   string link = item.Element("guid").Value;
   string content = item.Element(nsContent + "encoded").Value; //It gets all links, images etc 
}

StreamWriter1.Close();

使用这个我可以获取 guid 元素值以及 content:encoded 值,但 content:encoded 元素的值获取所有链接、标签,

标签等。

但我只想要文本...意味着我只需要简单文本数据,不需要获取任何img链接、链接等。

我如何解析XML 中的

..

标记数据? 请建议 谢谢

XDocument coordinates = XDocument.Load("http://feeds.feedburner.com/TechCrunch");
System.IO.StreamWriter StreamWriter1 = new System.IO.StreamWriter(DestFile);
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in coordinates.Descendants("item"))
{
   string link = item.Element("guid").Value;
   string content = item.Element(nsContent + "encoded").Value; //It gets all links, images etc 
}

StreamWriter1.Close();

using this i can get guid element values as well as content:encoded values but the value of content:encoded element gets all the links, tags,

tags etc.

But i want the text only...Means i need the simple text data only and not need to get any img links, links etc.

How can i parse the <p>..</p> tag data in XML ?
Please suggest
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

羁绊已千年 2025-01-09 13:30:12

识别内容字符串后,您将有几个不同的选项:

  1. 使用正则表达式来隔离和删除标记内的任何内容。这将从文本中删除所有标签,并且(理论上)只留下您感兴趣的文本。
  2. 解析文本本身并构造文本输出。我在这个 powershell 脚本中做了类似的事情(https://github.com/blob/master/GetTextFromHtml.ps1" rel="nofollow"> com/joelmartinez/PowerShell-Bits/blob/master/GetTextFromHtml.ps1)。我使用 HtmlAgilityPack 将一些 HTML 加载到 dom 中,然后遍历 dom 树以提取文本节点。

Once you identify the content string, you've got a few different options:

  1. Use regular expressions to isolate and remove anything within tags. This will remove all tags from the text and would (in theory) leave you with only the text if that's what you are interested in.
  2. Parse the text itself and construct a text output. I've done something similar in this powershell script (https://github.com/joelmartinez/PowerShell-Bits/blob/master/GetTextFromHtml.ps1). I load some HTML into a dom using the HtmlAgilityPack, and then walk the dom tree to pull out text nodes.
甜尕妞 2025-01-09 13:30:11

嗯,您已将 HTML 嵌入到该 XML 文档中。最安全的做法是获取该 HTML 并使用 HTML 解析器(例如 HTML Agility Pack)对其进行解析然后从那里开始。应该没有那么大的不同。请注意,HTML 仍然进行了一些编码,因此您必须先对其进行解码。

const string url = "http://feeds.feedburner.com/TechCrunch";
var doc = XDocument.Load(url);
var items = doc.Descendants("item");
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in items)
{
    var encodedContent = (string)item.Element(nsContent + "encoded");
    var decodedContent = System.Net.WebUtility.HtmlDecode(encodedContent);
    var html = new HtmlDocument();
    html.LoadHtml(decodedContent);
    var ps = html.DocumentNode.Descendants("p");
    foreach (var p in ps)
    {
        var textContent = p.InnerText;
        // do something with textContent
    }
}

不幸的是,HTML 似乎不是格式良好的 XML,因此您无法将 LINQ to XML 与该部分一起使用。

Well you have HTML embeded in that XML document. The safest thing to do would be to take that HTML and parse it using an HTML parser such as the HTML Agility Pack and go from there. It shouldn't be that much different. Do note that the HTML is still encoded a bit so you'll have to decode it first.

const string url = "http://feeds.feedburner.com/TechCrunch";
var doc = XDocument.Load(url);
var items = doc.Descendants("item");
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in items)
{
    var encodedContent = (string)item.Element(nsContent + "encoded");
    var decodedContent = System.Net.WebUtility.HtmlDecode(encodedContent);
    var html = new HtmlDocument();
    html.LoadHtml(decodedContent);
    var ps = html.DocumentNode.Descendants("p");
    foreach (var p in ps)
    {
        var textContent = p.InnerText;
        // do something with textContent
    }
}

Unfortunately the HTML doesn't seem to be very well-formed XML so you won't be able to use LINQ to XML with that part.

暗地喜欢 2025-01-09 13:30:11

使用 xpath,类似于:

//p

这应该可以完成 xpath 查询。 这是您正在使用的库的链接

Use an xpath, something like:

//p

That should do it for the xpath query. Here's a link for the library you're using.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文