如何在 C# 中检索 XML 实体值?

发布于 2024-11-29 09:05:59 字数 796 浏览 0 评论 0原文

我希望能够在 C#/.NET 4.0 应用程序中显示实体名称和值的列表。

我可以使用 XmlDocument.DocumentType.Entities 轻松检索实体名称,但是有没有一种好的方法来检索这些实体的值?

我注意到我可以使用 InnerText 检索纯文本实体的值,但这不适用于包含 XML 标记的实体。

求助于正则表达式的最佳方法是?

假设我有一个这样的文档:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

我想向用户呈现一个列表,其中包含三个实体名称(test、wwwc 和 copy)及其值(名称后面的引号中的文本)。我没有考虑过嵌套在其他实体中的实体的问题,因此我对一种解决方案感兴趣,该解决方案既可以完全扩展实体值,也可以像引号中一样显示文本。

I want to be able to display a list of entity names and values in a C#/.NET 4.0 application.

I am able to retrieve the entity names easily enough using XmlDocument.DocumentType.Entities, but is there a good way to retrieve the values of those entities?

I noticed that I can retrieve the value for text only entities using InnerText, but this doesn't work for entities that contain XML tags.

Is the best way to resort to a regex?

Let's say that I have a document like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "©">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

I want to present a list to the user containing the three entity names (test, wwwc, and copy), along with their values (the text in quotes following the name). I had not thought through the question of entities nested within other entities, so I would be interested in a solution that either completely expands the entity values or shows the text just as it is in the quotes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冷血 2024-12-06 09:05:59

尽管这可能不是最优雅的解决方案,但我想出了一些似乎足以满足我的目的的解决方案。首先,我解析原始文档并从该文档中检索实体节点。然后,我创建了一个小的内存中 XML 文档,并向其中添加了所有实体节点。接下来,我添加了对临时 XML 中所有实体的实体引用。最后,我从所有引用中检索了 InnerXml。

这是一些示例代码:

        // parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }

Although it’s not likely the most elegant solution possible, I came up with something that seems to work well enough for my purposes. First, I parsed the original document and retrieved the entity nodes from that document. Then I created a small in-memory XML document, to which I added all the entity nodes. Next, I added entity references to all of the entities within the temporary XML. Finally, I retrieved the InnerXml from all of the references.

Here's some sample code:

        // parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }
翻身的咸鱼 2024-12-06 09:05:59

这是一种方法(未经测试),它使用 XMLReader 和此类的 ResolveEntity() 方法:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}

This is one way (untested), it uses XMLReader and the ResolveEntity() method of this class:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}
丑丑阿 2024-12-06 09:05:59

如果您有一个 XmlDocument 对象,也许递归地单步执行每个 XmlNode 对象(来自 XmlDocument.ChildNodes)会更容易,并且对于每个节点,您可以使用 Name 属性来获取节点的名称。然后“获取值”取决于您想要的内容(InnerXml 用于字符串表示形式,ChildNodes 用于以编程方式访问 XmlNode 对象,这些对象可以是转换为 XmlEntity/XmlAttribute/XmlText)。

If you have an XmlDocument object, perhaps it would be easier to recursively step through each XmlNode object (from XmlDocument.ChildNodes), and for each node you can use the Name property to get the name of the node. Then "getting the value" depends on what you want (InnerXml for a string representation, ChildNodes for programmatic access to the XmlNode objects which can be cast to XmlEntity/XmlAttribute/XmlText).

无声静候 2024-12-06 09:05:59

只需递归遍历树即可轻松显示 XML 文档的表示形式。

这个小类恰好使用控制台,但您可以轻松地根据需要对其进行修改。

public static class XmlPrinter {
   private const Int32 SpacesPerIndent = 3;

   public static void Print(XDocument xDocument) {
      if (xDocument == null) {
         Console.WriteLine("No XML Document Provided");
         return;
      }

      PrintElementRecursive(xDocument.Root);
   }

   private static void PrintElementRecursive(XElement element, Int32 indentationLevel = 0) {
      if(element == null) return;

      PrintIndentation(indentationLevel);
      PrintElement(element);
      PrintNewline();

      foreach (var xAttribute in element.Attributes()) {
         PrintIndentation(indentationLevel + 1);
         PrintAttribute(xAttribute);
         PrintNewline();
      }

      foreach (var xElement in element.Elements()) {
         PrintElementRecursive(xElement, indentationLevel+1);
      }
   }

   private static void PrintAttribute(XAttribute xAttribute) {
      if (xAttribute == null) return;

      Console.Write("[{0}] = \"{1}\"", xAttribute.Name, xAttribute.Value);
   }

   private static void PrintElement(XElement element) {
      if (element == null) return;

      Console.Write("{0}", element.Name);

      if(!String.IsNullOrWhiteSpace(element.Value))
         Console.Write(" : {0}", element.Value);
   }

   private static void PrintIndentation(Int32 level) {
      Console.Write(new String(' ', level * SpacesPerIndent));
   }

   private static void PrintNewline() {
      Console.Write(Environment.NewLine);
   }
}

使用该类很简单。下面是一个打印出当前配置文件的示例:

static void Main(string[] args) {
   XmlPrinter.Print(XDocument.Load(
      ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None).FilePath
                        ));

   Console.ReadKey();
}

您自己尝试一下,您应该能够快速修改以获得您想要的内容。

You can easily display a representation of an XML document simply by walking the tree recursively.

This small class happens to use a Console, but you could easily modify it to your needs.

public static class XmlPrinter {
   private const Int32 SpacesPerIndent = 3;

   public static void Print(XDocument xDocument) {
      if (xDocument == null) {
         Console.WriteLine("No XML Document Provided");
         return;
      }

      PrintElementRecursive(xDocument.Root);
   }

   private static void PrintElementRecursive(XElement element, Int32 indentationLevel = 0) {
      if(element == null) return;

      PrintIndentation(indentationLevel);
      PrintElement(element);
      PrintNewline();

      foreach (var xAttribute in element.Attributes()) {
         PrintIndentation(indentationLevel + 1);
         PrintAttribute(xAttribute);
         PrintNewline();
      }

      foreach (var xElement in element.Elements()) {
         PrintElementRecursive(xElement, indentationLevel+1);
      }
   }

   private static void PrintAttribute(XAttribute xAttribute) {
      if (xAttribute == null) return;

      Console.Write("[{0}] = \"{1}\"", xAttribute.Name, xAttribute.Value);
   }

   private static void PrintElement(XElement element) {
      if (element == null) return;

      Console.Write("{0}", element.Name);

      if(!String.IsNullOrWhiteSpace(element.Value))
         Console.Write(" : {0}", element.Value);
   }

   private static void PrintIndentation(Int32 level) {
      Console.Write(new String(' ', level * SpacesPerIndent));
   }

   private static void PrintNewline() {
      Console.Write(Environment.NewLine);
   }
}

Using the class is trivial. Here is an example that prints out your current config file:

static void Main(string[] args) {
   XmlPrinter.Print(XDocument.Load(
      ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None).FilePath
                        ));

   Console.ReadKey();
}

Try it for yourself, and you should be able to quickly modify to get what you want.

眼睛会笑 2024-12-06 09:05:59

我在使用公认的解决方案时遇到了问题。特别是:

private IEnumerable<KeyValuePair<string, string>> AllEntityExpansions(XmlDocument doc)
{
  var entities = doc.DocumentType.Entities;
  foreach (var entity in entities.OfType<XmlEntity>()
    .OrderBy(e => e.Name, StringComparer.OrdinalIgnoreCase))
  {
    var xmlString = default(string);
    try
    {
      var element = doc.CreateElement("e");
      element.AppendChild(doc.CreateEntityReference(entity.Name));
      using (var r = new XmlNodeReader(element))
      {
        var elem = XElement.Load(r);
        xmlString = elem.ToString();
      }
    }
    catch (XmlException) { }

    if (xmlString?.Length > 7)
      yield return new KeyValuePair<string, string>(entity.Name, xmlString.Substring(3, xmlString.Length - 7));
  }
}

I ran into problems using the accepted solution. In particular:

  • In my document, the entity references needed a custom resolver to load them from external sources. Therefore, creating elements from the original document (and just not appending them) was an easier approach then trying to replicate the DTD and resolver in a new XmlDocument.
  • In addition, the InnerXml property kept returning the entity reference instead of its expansion. To work around this, I took the approach of copying the XML into an XElement which resolves the entity automatically.
private IEnumerable<KeyValuePair<string, string>> AllEntityExpansions(XmlDocument doc)
{
  var entities = doc.DocumentType.Entities;
  foreach (var entity in entities.OfType<XmlEntity>()
    .OrderBy(e => e.Name, StringComparer.OrdinalIgnoreCase))
  {
    var xmlString = default(string);
    try
    {
      var element = doc.CreateElement("e");
      element.AppendChild(doc.CreateEntityReference(entity.Name));
      using (var r = new XmlNodeReader(element))
      {
        var elem = XElement.Load(r);
        xmlString = elem.ToString();
      }
    }
    catch (XmlException) { }

    if (xmlString?.Length > 7)
      yield return new KeyValuePair<string, string>(entity.Name, xmlString.Substring(3, xmlString.Length - 7));
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文