Coldfusion XMLFormat() 不转换所有字符

发布于 2024-08-10 05:47:39 字数 118 浏览 4 评论 0原文

我正在使用 XMLFormat() 对 XML 文档的一些文本进行编码。但是,当我去读取我创建的 XML 文件时,出现无效字符错误。为什么 XMLFormat() 不能正确编码所有字符?

我正在运行CF8。

I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?

I'm running CF8.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

在巴黎塔顶看东京樱花 2024-08-17 05:47:39

您确定以正确的编码输出文件吗?您不能这样做,

<cffile action="write" file="foo.xml" output="#xml#" />

因为结果很可能与您的 XML 所在的字符集不同。除非另有说明(通过编码声明),否则 XML 文件将被视为 UTF-8,您应该这样做:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />

Are you sure to output the file in the right encoding? You can't just do

<cffile action="write" file="foo.xml" output="#xml#" />

as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />
—━☆沉默づ 2024-08-17 05:47:39

我觉得这是 XMLFormat 中的一个错误。我不确定下面代码片段的原始作者是谁,但这里有一种通过正则表达式捕获额外字符的方法......

  <cfset myText = xmlFormat(myText)>

  <cfscript>
      i = 0;
      tmp = '';
      while(ReFind('[^\x00-\x7F]',myText,i,false))
      {
        i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
        tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
        myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
        myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
        i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
      }
      return myText;
  </cfscript>

I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...

  <cfset myText = xmlFormat(myText)>

  <cfscript>
      i = 0;
      tmp = '';
      while(ReFind('[^\x00-\x7F]',myText,i,false))
      {
        i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
        tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
        myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
        myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
        i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
      }
      return myText;
  </cfscript>
伴梦长久 2024-08-17 05:47:39

不要忘记将放入在你的模板之上。

Do not forget also to put <cfprocessingdirective pageencoding="utf-8"> on top of your template.

天荒地未老 2024-08-17 05:47:39

如果您尝试将 XML 直接返回到浏览器,您可能需要尝试类似让用户下载它的方法

<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

,或者,如果您希望它作为网页返回(ala REST),那么这应该可以解决问题,

<cfheader charset="utf-8">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

希望有帮助

if your trying to return your XML directly to the browser, you might want to try something like for the user to download it

<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

or, if you want it returned as a webpage (ala REST) then this should do the trick

<cfheader charset="utf-8">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

hope that helps

眼睛会笑 2024-08-17 05:47:39

不幸的是,XMLFormat 并不是一个包罗万象的解决方案。它的字符列表非常有限,将取代[文档]。

您需要对 XML 无效但 XMLFormat 未涵盖的字符进行自定义编码。

这绝对不是很有效,但一个潜在的解决方案是逐个字符地循环典型可疑字段的内容(对于初学者来说,用户生成的任何内容),检查 ascii 代码,如果它高于 255,则忽略字符或对其进行正确编码。

Unfortunately, XMLFormat is just not an all-inclusive solution. It has a very limited list of characters that it will replace [documentation].

You'll need to do custom encoding of characters that are invalid for XML but not covered by XMLFormat.

It's definitely not very efficient, but a potential solution would be to loop over the content of typically-suspect fields (anything user-generated, for starters) character-by-character, checking the ascii code, and if it's above 255, either omit the character or properly encode it.

病毒体 2024-08-17 05:47:39

这对我来说也是一个大问题,事实证明字符集是主要因素,您需要明确指定正确的字符集。

对我来说,我在 xml 中有外语,并且在我输入正确的字符集之前不会被正确解析......

This was a huge issue for me as well, and it turns out charset is the main factor, you need to clearly specify the correct charset.

For me I was having foreign languages inside xml, and wouldn't be parsed correctly until i put in the correct charset...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文