xml 和 &问题

发布于 2024-12-13 14:04:52 字数 1228 浏览 3 评论 0原文

我是 XML 新手,现在正在尝试读取 xml 文件。 我用谷歌搜索并尝试这种方式来读取 xml 但出现此错误。

引用未声明的实体“Ccaron”。 2902 行,位置 9。

当我转到第 2902 行时,我得到了这个,

<H0742>&Ccaron;opova 14, POB 1725,
SI-1000 Ljubljana</H0742>

这是我尝试的方式

XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
            XmlNodeList nodes = xDoc.SelectNodes("nodeName");
            foreach (XmlNode n in nodes)
            {
if (n.SelectSingleNode("H0742") != null)
                {
                    row.IrNbr = n.SelectSingleNode("H0742").InnerText;
                }
                .
                .
                .
            }

当我查看 w3school 时,&在 xml 中是非法的

编辑 : 这就是编码。我想知道它与 xml 有什么关系。

编码='iso-8859-1'

提前致谢。

编辑:

他们给了我一个.ENT文件,我可以在线参考ftp.MyPartnerCompany.com/name.ent。 在此 .ENT 文件中 我看到这样的实体

<!ENTITY Cacute "&#262;"> <!-- latin capital letter C with acute,
                                  U+0106 Latin Extended-A -->

如何在我的 xml 解析中引用它? 我更喜欢在线参考,因为他们可能随时添加新内容。 提前致谢 !!!

I am new to XML and I am now trying to read an xml file.
I googled and try this way to read xml but I get this error.

Reference to undeclared entity 'Ccaron'. Line 2902, position 9.

When I go to line 2902 I got this,

<H0742>Čopova 14, POB 1725,
SI-1000 Ljubljana</H0742>

This is the way I try

XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
            XmlNodeList nodes = xDoc.SelectNodes("nodeName");
            foreach (XmlNode n in nodes)
            {
if (n.SelectSingleNode("H0742") != null)
                {
                    row.IrNbr = n.SelectSingleNode("H0742").InnerText;
                }
                .
                .
                .
            }

When I look at w3school, & is illegal in xml.

EDIT :
This is the encoding. I wonder it's related with xml somehow.

encoding='iso-8859-1'

Thanks in advance.

EDIT :

They gave me an .ENT file and I can reference online ftp.MyPartnerCompany.com/name.ent.
In this .ENT file
I see entities like that

<!ENTITY Cacute "Ć"> <!-- latin capital letter C with acute,
                                  U+0106 Latin Extended-A -->

How can I reference it in my xml Parsing ?
I prefer to reference online since they may add new anytime.
Thanks in advance !!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

再可℃爱ぅ一点好了 2024-12-20 14:04:52

首先要注意的是问题不在于您的软件。

由于您是 XML 新手,我猜想您以前从未遇到过定义实体。字符实体是任意文本片段(一个或多个字符)的快捷方式。您最常看到它们的地方就是您现在所处的情况。在某些时候,您的 XML 是由想要键入字符“Č”或“č”(如果您的字体无法显示的话,则为带有 Caron 的大写和小写 C)的某人创建的。

然而,在 XML 中,我们只有一些预先声明的实体(与号、小于、大于、双引号和撇号)。任何其他字符实体都需要声明。为了正确解析您的文件,您需要执行以下两件事之一 - 将字符实体替换为不会导致解析器问题的内容或声明该实体。

要声明实体,您可以使用称为“内部子集”的东西 - 您可能会在 XML 文件顶部看到 DTD 语句的一种特殊形式。像这样的事情:

<!DOCTYPE root-element 
   [ <!ENTITY Ccaron "Č">
     <!ENTITY ccaron "č">]
>

将该语句放在 XML 文件的开头(更改“根元素”以匹配您的)将允许解析器解析该实体。

或者,只需将 Č 更改为 Č,您的问题也将得到解决。

&# 表示法是一个数字实体,为字符提供适当的 unicode 值(“x”表示它是十六进制)。

您也可以随时键入字符,但这需要了解键盘和区域的详细信息。

The first thing to be aware of is that the problem isn't in your software.

As you are new to XML, I'm going to guess that definining entities isn't something you've come across before. Character entities are shortcuts for arbitrary pieces of text (one or more characters). The most common place you are going to see them is in the situation you are in now. At some point, your XML has been created by someone who wanted to type the character 'Č' or 'č' (that's upper and lower case C with Caron if your font can't display it).

However, in XML we only have a few predeclared entities (ampersand, less than, greater than, double quote and apostraphe). Any other character entities need to be declared. In order to parse your file correctly you will need to do one of two things - either replace the character entity with something that doesn't cause the parser issues or declare the entity.

To declare the entity, you can use something called an "internal subset" - a specialised form of the DTD statement you might see at the top of your XML file. Something like this:

<!DOCTYPE root-element 
   [ <!ENTITY Ccaron "Č">
     <!ENTITY ccaron "č">]
>

Placing that statement at the beginning of the XML file (change the 'root-element' to match yours) will allow the parser to resolve the entity.

Alternatively, simply change the Č to Č and your problem will also be resolved.

The &# notation is a numeric entity, giving appropriate unicode value for the character (the 'x' indicates that it's in hex).

You could always just type the character too but that requires knowledge of the ins and outs of your keyboard and region.

娜些时光,永不杰束 2024-12-20 14:04:52

Č 不是 XML,它甚至没有在 HTML 4 实体引用中定义。顺便说一句,这不是 XML。 XML 并不支持所有这些实体,事实上,它只支持很少的实体,但如果您查找该实体并找到它,您将能够使用它的 Unicode 等效项,您可以使用它。例如,Š 是无效的 XML,但 Š 不是。 (Scaron 是我能找到的最接近 Ccaron 的)。

Č isn't XML it's not even defined in the HTML 4 entity reference. Which btw isn't XML. XML doesn't support all those entities, in fact, it supports very few of them but if you look up the entity and find it, you'll be able to use it's Unicode equivalent, which you can use. e.g. Š is invalid XML but Š isn't. (Scaron was the closest I could find to Ccaron).

遗弃M 2024-12-20 14:04:52

您的 XML 文件格式不正确,因此不能用作 XmlDocument。时期。

您有两个选择:

  • 将该文件作为常规文本文件打开并修复该症状。
  • 修复您的 XML 生成器,这才是您真正的问题。该生成器不是使用 System.Xml 生成该文件,而是可能连接多个字符串,因为“XML 只是一个文本文件”。您应该修复它,否则打​​开生成的 XML 文件总是会令人惊讶。

编辑:由于您无法修复 XML 生成器,我建议使用 File.ReadAllText 打开它并执行正则表达式来重新编码 & 或剥离整个实体(因为我们无法翻译它)

Console.WriteLine(
    Regex.Replace("<H0742>Čopova 14, { POB & SI-1000 &</H0742>",
    @"&((?!#)\S*?;)?", match =>
    {
        switch (match.Value)
        {
            case "<":
            case ">":
            case "&":
            case """:
            case "'":
                return match.Value; // correctly encoded

            case "&":
                return "&";

            default: // here you can choose:
                // to remove entire entity:
                return "";
                // or just encode that & character
                return "&" + match.Value.Substring(1);
        }
    }));

Your XML file isn't well-formed and, so, can't be used as XmlDocument. Period.

You have two options:

  • Open that file as a regular text file and fixed that symptom.
  • Fix your XML generator, and that's your real problem. That generator isn't generating that file using System.Xml, but probably concatening several strings, as "XML is just a text file". You should repair it, or opening a generated XML file will be always a surprise.

EDIT: As you can't fix your XML generator, I recommend to open it with File.ReadAllText and execute an regular expression to re-encode that & or to strip off entire entity (as we can't translate it)

Console.WriteLine(
    Regex.Replace("<H0742>Čopova 14, { POB & SI-1000 &</H0742>",
    @"&((?!#)\S*?;)?", match =>
    {
        switch (match.Value)
        {
            case "<":
            case ">":
            case "&":
            case """:
            case "'":
                return match.Value; // correctly encoded

            case "&":
                return "&";

            default: // here you can choose:
                // to remove entire entity:
                return "";
                // or just encode that & character
                return "&" + match.Value.Substring(1);
        }
    }));
囍笑 2024-12-20 14:04:52

Č 是实体引用。实体引用很可能是针对字符Č,以便生成:Čopova

但是,必须声明该实体,否则 XML 解析器将不知道应该用什么替换该实体解析 XML 时的实体引用。

Č is an entity reference. It is likely that the entity reference is intended to be for the character Č, in order to produce: Čopova.

However, that entity must be declared, or the XML parser will not know what should be substituted for the entity reference as it parses the XML.

平定天下 2024-12-20 14:04:52

解决方案 :-

 byte[] encodedString = Encoding.UTF8.GetBytes(xml);
    // Put the byte array into a stream and rewind it to the beginning 
        MemoryStream ms = new MemoryStream(encodedString);
         ms.Flush();    
     ms.Position = 0;     
     // Build the XmlDocument from the MemorySteam of UTF-8 encoded bytes 
    XmlDocument xmlDoc = new XmlDocument(); 
     xmlDoc.Load(ms); 

solution :-

 byte[] encodedString = Encoding.UTF8.GetBytes(xml);
    // Put the byte array into a stream and rewind it to the beginning 
        MemoryStream ms = new MemoryStream(encodedString);
         ms.Flush();    
     ms.Position = 0;     
     // Build the XmlDocument from the MemorySteam of UTF-8 encoded bytes 
    XmlDocument xmlDoc = new XmlDocument(); 
     xmlDoc.Load(ms); 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文