xml 和 &问题
我是 XML 新手,现在正在尝试读取 xml 文件。 我用谷歌搜索并尝试这种方式来读取 xml 但出现此错误。
引用未声明的实体“Ccaron”。 2902 行,位置 9。
当我转到第 2902 行时,我得到了这个,
<H0742>Čopova 14, POB 1725,
SI-1000 Ljubljana</H0742>
这是我尝试的方式
XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
XmlNodeList nodes = xDoc.SelectNodes("nodeName");
foreach (XmlNode n in nodes)
{
if (n.SelectSingleNode("H0742") != null)
{
row.IrNbr = n.SelectSingleNode("H0742").InnerText;
}
.
.
.
}
当我查看 w3school 时,&在 xml 中是非法的。
编辑 : 这就是编码。我想知道它与 xml 有什么关系。
编码='iso-8859-1'
提前致谢。
编辑:
他们给了我一个.ENT文件,我可以在线参考ftp.MyPartnerCompany.com/name.ent。 在此 .ENT 文件中 我看到这样的实体
<!ENTITY Cacute "Ć"> <!-- latin capital letter C with acute,
U+0106 Latin Extended-A -->
如何在我的 xml 解析中引用它? 我更喜欢在线参考,因为他们可能随时添加新内容。 提前致谢 !!!
I am new to XML and I am now trying to read an xml file.
I googled and try this way to read xml but I get this error.
Reference to undeclared entity 'Ccaron'. Line 2902, position 9.
When I go to line 2902 I got this,
<H0742>Čopova 14, POB 1725,
SI-1000 Ljubljana</H0742>
This is the way I try
XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
XmlNodeList nodes = xDoc.SelectNodes("nodeName");
foreach (XmlNode n in nodes)
{
if (n.SelectSingleNode("H0742") != null)
{
row.IrNbr = n.SelectSingleNode("H0742").InnerText;
}
.
.
.
}
When I look at w3school, & is illegal in xml.
EDIT :
This is the encoding. I wonder it's related with xml somehow.
encoding='iso-8859-1'
Thanks in advance.
EDIT :
They gave me an .ENT file and I can reference online ftp.MyPartnerCompany.com/name.ent.
In this .ENT file
I see entities like that
<!ENTITY Cacute "Ć"> <!-- latin capital letter C with acute,
U+0106 Latin Extended-A -->
How can I reference it in my xml Parsing ?
I prefer to reference online since they may add new anytime.
Thanks in advance !!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
首先要注意的是问题不在于您的软件。
由于您是 XML 新手,我猜想您以前从未遇到过定义实体。字符实体是任意文本片段(一个或多个字符)的快捷方式。您最常看到它们的地方就是您现在所处的情况。在某些时候,您的 XML 是由想要键入字符“Č”或“č”(如果您的字体无法显示的话,则为带有 Caron 的大写和小写 C)的某人创建的。
然而,在 XML 中,我们只有一些预先声明的实体(与号、小于、大于、双引号和撇号)。任何其他字符实体都需要声明。为了正确解析您的文件,您需要执行以下两件事之一 - 将字符实体替换为不会导致解析器问题的内容或声明该实体。
要声明实体,您可以使用称为“内部子集”的东西 - 您可能会在 XML 文件顶部看到 DTD 语句的一种特殊形式。像这样的事情:
将该语句放在 XML 文件的开头(更改“根元素”以匹配您的)将允许解析器解析该实体。
或者,只需将
Č
更改为Č
,您的问题也将得到解决。&#
表示法是一个数字实体,为字符提供适当的 unicode 值(“x”表示它是十六进制)。您也可以随时键入字符,但这需要了解键盘和区域的详细信息。
The first thing to be aware of is that the problem isn't in your software.
As you are new to XML, I'm going to guess that definining entities isn't something you've come across before. Character entities are shortcuts for arbitrary pieces of text (one or more characters). The most common place you are going to see them is in the situation you are in now. At some point, your XML has been created by someone who wanted to type the character 'Č' or 'č' (that's upper and lower case C with Caron if your font can't display it).
However, in XML we only have a few predeclared entities (ampersand, less than, greater than, double quote and apostraphe). Any other character entities need to be declared. In order to parse your file correctly you will need to do one of two things - either replace the character entity with something that doesn't cause the parser issues or declare the entity.
To declare the entity, you can use something called an "internal subset" - a specialised form of the DTD statement you might see at the top of your XML file. Something like this:
Placing that statement at the beginning of the XML file (change the 'root-element' to match yours) will allow the parser to resolve the entity.
Alternatively, simply change the
Č
toČ
and your problem will also be resolved.The
&#
notation is a numeric entity, giving appropriate unicode value for the character (the 'x' indicates that it's in hex).You could always just type the character too but that requires knowledge of the ins and outs of your keyboard and region.
Č
不是 XML,它甚至没有在 HTML 4 实体引用中定义。顺便说一句,这不是 XML。 XML 并不支持所有这些实体,事实上,它只支持很少的实体,但如果您查找该实体并找到它,您将能够使用它的 Unicode 等效项,您可以使用它。例如,Š
是无效的 XML,但Š
不是。 (Scaron
是我能找到的最接近Ccaron
的)。Č
isn't XML it's not even defined in the HTML 4 entity reference. Which btw isn't XML. XML doesn't support all those entities, in fact, it supports very few of them but if you look up the entity and find it, you'll be able to use it's Unicode equivalent, which you can use. e.g.Š
is invalid XML butŠ
isn't. (Scaron
was the closest I could find toCcaron
).您的 XML 文件格式不正确,因此不能用作 XmlDocument。时期。
您有两个选择:
编辑:由于您无法修复 XML 生成器,我建议使用
File.ReadAllText
打开它并执行正则表达式来重新编码&
或剥离整个实体(因为我们无法翻译它)Your XML file isn't well-formed and, so, can't be used as XmlDocument. Period.
You have two options:
System.Xml
, but probably concatening several strings, as "XML is just a text file". You should repair it, or opening a generated XML file will be always a surprise.EDIT: As you can't fix your XML generator, I recommend to open it with
File.ReadAllText
and execute an regular expression to re-encode that&
or to strip off entire entity (as we can't translate it)Č
是实体引用。实体引用很可能是针对字符Č,以便生成:Čopova
。但是,必须声明该实体,否则 XML 解析器将不知道应该用什么替换该实体解析 XML 时的实体引用。
Č
is an entity reference. It is likely that the entity reference is intended to be for the character Č, in order to produce:Čopova
.However, that entity must be declared, or the XML parser will not know what should be substituted for the entity reference as it parses the XML.
解决方案 :-
solution :-