如何获得
XML 中的元素值？

发布于 2025-01-02 13:30:11 字数 726 浏览 3 评论 0原文

XDocument coordinates = XDocument.Load("http://feeds.feedburner.com/TechCrunch");
System.IO.StreamWriter StreamWriter1 = new System.IO.StreamWriter(DestFile);
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in coordinates.Descendants("item"))
{
   string link = item.Element("guid").Value;
   string content = item.Element(nsContent + "encoded").Value; //It gets all links, images etc 
}

StreamWriter1.Close();

使用这个我可以获取 guid 元素值以及 content:encoded 值，但 content:encoded 元素的值获取所有链接、标签，

标签等。

但我只想要文本...意味着我只需要简单文本数据，不需要获取任何img链接、链接等。

我如何解析XML 中的

..

标记数据？请建议谢谢

原文

XDocument coordinates = XDocument.Load("http://feeds.feedburner.com/TechCrunch");
System.IO.StreamWriter StreamWriter1 = new System.IO.StreamWriter(DestFile);
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in coordinates.Descendants("item"))
{
   string link = item.Element("guid").Value;
   string content = item.Element(nsContent + "encoded").Value; //It gets all links, images etc 
}

StreamWriter1.Close();

using this i can get guid element values as well as content:encoded values but the value of content:encoded element gets all the links, tags,

tags etc.

But i want the text only...Means i need the simple text data only and not need to get any img links, links etc.

How can i parse the <p>..</p> tag data in XML ?
Please suggest
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羁绊已千年 2025-01-09 13:30:12

识别内容字符串后，您将有几个不同的选项：

使用正则表达式来隔离和删除标记内的任何内容。这将从文本中删除所有标签，并且（理论上）只留下您感兴趣的文本。
解析文本本身并构造文本输出。我在这个 powershell 脚本中做了类似的事情（https://github.com/blob/master/GetTextFromHtml.ps1" rel="nofollow"> com/joelmartinez/PowerShell-Bits/blob/master/GetTextFromHtml.ps1）。我使用 HtmlAgilityPack 将一些 HTML 加载到 dom 中，然后遍历 dom 树以提取文本节点。

回复收藏 0 原文

甜尕妞 2025-01-09 13:30:11

嗯，您已将 HTML 嵌入到该 XML 文档中。最安全的做法是获取该 HTML 并使用 HTML 解析器（例如 HTML Agility Pack）对其进行解析然后从那里开始。应该没有那么大的不同。请注意，HTML 仍然进行了一些编码，因此您必须先对其进行解码。

const string url = "http://feeds.feedburner.com/TechCrunch";
var doc = XDocument.Load(url);
var items = doc.Descendants("item");
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in items)
{
    var encodedContent = (string)item.Element(nsContent + "encoded");
    var decodedContent = System.Net.WebUtility.HtmlDecode(encodedContent);
    var html = new HtmlDocument();
    html.LoadHtml(decodedContent);
    var ps = html.DocumentNode.Descendants("p");
    foreach (var p in ps)
    {
        var textContent = p.InnerText;
        // do something with textContent
    }
}

不幸的是，HTML 似乎不是格式良好的 XML，因此您无法将 LINQ to XML 与该部分一起使用。

Well you have HTML embeded in that XML document. The safest thing to do would be to take that HTML and parse it using an HTML parser such as the HTML Agility Pack and go from there. It shouldn't be that much different. Do note that the HTML is still encoded a bit so you'll have to decode it first.

const string url = "http://feeds.feedburner.com/TechCrunch";
var doc = XDocument.Load(url);
var items = doc.Descendants("item");
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in items)
{
    var encodedContent = (string)item.Element(nsContent + "encoded");
    var decodedContent = System.Net.WebUtility.HtmlDecode(encodedContent);
    var html = new HtmlDocument();
    html.LoadHtml(decodedContent);
    var ps = html.DocumentNode.Descendants("p");
    foreach (var p in ps)
    {
        var textContent = p.InnerText;
        // do something with textContent
    }
}

Unfortunately the HTML doesn't seem to be very well-formed XML so you won't be able to use LINQ to XML with that part.

回复收藏 0 原文