KRL RSS 解析器:处理编码问题?
我正在将 RSS 提要从 Tumblr 导入到 Kynetx 应用程序中。 RSS 提要似乎存在一些编码问题,因为撇号显示如下:
该提要(您可以找到此处)声称以 UTF-8 编码。
有没有办法指定编码或用常规撇号替换这些字符?
I'm importing an RSS feed from Tumblr into a Kynetx app. It appears that the RSS feed has some encoding issues, as apostrophes appear like this:
The feed (which you can find here) claims to be encoded in UTF-8.
Is there a way to specify the encoding or else replace those characters with regular apostrophes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
虽然不是最佳选择,但您可以尝试捕获这些编码并将其替换为 UTF-8 标准:
这将出现这是指定 UTF-8 但未明确强制执行的服务的情况。我上传了您提供的 RSS 源的图像。为了进行比较,我将文本剪切并粘贴到记事本文档中,然后从键盘输入相同的文本。
我不知道你是否能从图像中看出,但被破坏的撇号与我的 UTF-8 浏览器生成的撇号不同。
我怀疑这篇文章是通过 Windows 客户端提交的。如果您查看编码选项,您将看到西方的选项(Windows-1252)。
Windows-1252 是 Windows 的传统编码,类似于 ISO 8859-1,但用自己的一些字符替换 ANSI 标准中的控制字符,并更改其他代码页中的位置。
我上面引用的维基百科页面上的几句话:
KRL支持UTF-8支持的所有语言字符集,因此它原生支持多字节国际字符;但是,这是以当您只有 ISO-8859-1 或 Windows-1252 可供选择时可能伪造编码为代价的。
While not optimal, you could try to catch these encodings and replace them with the UTF-8 standard:
This appears to be a case of a service that specifies UTF-8, but does't explicitly enforce it. I uploaded an image of the RSS feed that you provided. For comparison, I cut and pasted the text into a notepad document and then typed in the same text from my keyboard.
I don't know if you can tell from the image, but the apostrophe that is mangled is different from the apostrophe that is generated by my UTF-8 browser.
I suspect that this post was submitted via a Windows client. If you look at your encoding options, you will see an option for Western (Windows-1252).
Windows-1252 is a legacy encoding from windows that resembles ISO 8859-1, but substitutes some of their own characters for control characters in the ANSI standard and changes the location in the codepage of others.
A couple of quotes from the wikipedia page that I cite above:
KRL supports all of the language charsets supported by UTF-8, so it supports multi-byte international characters natively; however, that comes at the expense of being able to fudge encodings that is possible when you only have ISO-8859-1 or Windows-1252 to choose from.