如何检查 Oracle 数据库中的商标 (™) 字符设置是否正确?

发布于 2024-12-03 08:47:23 字数 1554 浏览 0 评论 0原文

如何检查 Oracle 数据库中的商标(™) 字符设置是否正确?

我希望它使用 UTF-8 编码存储。

我在 Salesforce.com 字段中存储了一个值,从 GUI 来看,该值如下所示(注意商标字符):

Chuck Norris's Roundhouse Kick™

我正在使用 Informatica 将其复制到 Oracle 数据库。我的数据库设置为使用 AL32UTF8 编码。

它在 SQL Developer 中的显示方式

当我使用 SQL Developer 查询表时,商标符号显示为矩形(黑色边框,白色填充)。

它如何在 HTML 中显示

当我使用 UTF-8 编码将其从 SQL Developer 导出到 HTML 文档中,并在 Chrome 中打开它时,商标符号根本不出现。当我在 IE 中打开它时,它再次显示为矩形。在 Firefox 中,它是一个矩形,上半部分为 00,下半部分为 99。所有三个浏览器都使用 UTF-8 解释 HTML 文档。

它如何在文本编辑器中显示

在记事本和记事本++中打开相同的 HTML 文档,商标符号显示为矩形。如果我使用 Notepad++ 的 Hex Viewer 插件,我会看到字节编码是 C2 99。这似乎是 UTF-8 中商标符号的正确编码。

当我在 MS Write 中打开文档时,商标字符如下所示:

当我使用Python以编程方式获取值时

,当我从数据库获取值时,商标字符被替换为“\xbf”-- 倒问号,但据我所知,那个字符甚至没有正确编码,因为它至少缺少一个前导字节(取决于特定编码)

>>> import cx_Oracle
>>> con = cx_Oracle.connect('username', 'password', 'db')
>>> cur = con.cursor()
>>> cur.execute('select * from trademark')
<__builtin__.OracleCursor on <cx_Oracle.Connection to username@db>>
>>> records = cur.fetchall()
>>> records[0][0]
"Chuck Norris's Roundhouse Kick\xbf"

理想情况下,我希望能够使用上述所有方法验证存储在 Oracle 数据库中的数据。我会满足于有人只是验证我在十六进制查看器中看到的内容足以进行“测试”;)

How can I check that the trademark(™) character is set correctly in my Oracle database?

I expect it to be stored using UTF-8 encoding.

I have a value stored in a Salesforce.com field that looks like this from the GUI (notice the trademark character):

Chuck Norris's Roundhouse Kick™

I'm using Informatica to replicate it to an Oracle database. My database is set to use the AL32UTF8 encoding.

How it shows up in SQL Developer

When I query my table using SQL Developer, the trademark symbol shows up as a rectangle (black border, white fill).

How it shows up in HTML

When I export it from SQL Developer using the UTF-8 encoding into an HTML document, and open it in Chrome, the trademark symbol does not appear at all. When I open it in IE, the it appears as a rectangle again. In Firefox, it's a rectangle with 00 in the top half and 99 in the bottom half. All three browsers interpret the HTML doc using UTF-8.

How it shows up in text editors

Opening the same HTML doc in Notepad and Notepad++, the trademark symbol shows up as a rectangle. If I use the Hex Viewer plugin for Notepad++ I see the byte encoding is C2 99. That seems to be the correct encoding for the trademark symbol in UTF-8.

When I open document in MS Write, the trademark character looks like this: ™.

When I get the value programmatically

Using Python, when I get the value from the database, the trademark character is replaced with '\xbf' -- the inverted question mark, but that character is not even properly encoded as far as I can tell because it's missing at least one leading byte (depending on the specific encoding)

>>> import cx_Oracle
>>> con = cx_Oracle.connect('username', 'password', 'db')
>>> cur = con.cursor()
>>> cur.execute('select * from trademark')
<__builtin__.OracleCursor on <cx_Oracle.Connection to username@db>>
>>> records = cur.fetchall()
>>> records[0][0]
"Chuck Norris's Roundhouse Kick\xbf"

Ideally, I'd like to be able to validate the data stored in my Oracle database using all of the above methods. I'd settle for someone just validating that what I saw in the Hex Viewer was enough of a "test" ;)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吝吻 2024-12-10 08:47:23

您发布的字符文字 ™ 不是 U+0099(控制字符),而是 U+2122(商标符号)。

Unicode 规范 定义 U+0099 如下:

0099;<control>;Cc;0;BN;;;;;N;;;;;

所以,它甚至没有名字,我还没有深入研究规范来找出这个角色的用途。

在 Windows 中解码 U+0099 确实会产生商标字素。我想这是一个错误。

UTF-8 中商标符号 (U+2122) 的正确字节序列是 E2 84 A2

The character literal ™ you posted is not U+0099 (a control character), but U+2122 (TRADE MARK SIGN).

The Unicode spec defines U+0099 as follows:

0099;<control>;Cc;0;BN;;;;;N;;;;;

So, it doesn't even have a name and I haven't gone digging round the spec to find out what this character is for.

Decoding U+0099 in Windows does result in a trademark grapheme. I guess this is a bug.

The correct byte sequence for the TRADE MARK SIGN (U+2122) in UTF-8 is E2 84 A2.

慕烟庭风 2024-12-10 08:47:23

仅供将来参考,因为作者没有费心发布修复程序。
这确实是一个 Informatica 问题,需要什么:

  1. 更改 Informatica 框中 odbc.ini 中的连接属性,将“IANAAppCodePage=106”添加到需要 UTF8 的连接。
  2. 在 Informatica 本身中更改连接的连接属性,并在“连接管理器 -> 连接 -> 关系 -> -> 编辑”中添加“Codepage=Utf-8”

Just for any future reference, because the author did not bother posting a fix.
It's indeed an Informatica problem, what is needed:

  1. Change connection properties in odbc.ini on Informatica box, add "IANAAppCodePage=106" to the connections that need UTF8.
  2. Change connection properties in Informatica itself for the connection and add "Codepage=Utf-8" in "Connection Manager -> Connections -> Relational -> -> Edit"
软糯酥胸 2024-12-10 08:47:23

如果您要保存此字符串以在 html 文档中输出,请使用: 商标符号的 html 实体。

如果您将此字符串用于非 html 目的,请在运行时解码该字符串,请使用:

import HTMLParser
h = HTMLParser.HTMLParser()
s = h.unescape('™')

请参阅:
http://www.w3schools.com/html/html_entities.asp
http://fredericiana.com/2010/10/08/decoding-html-entities-to-text-in-python/

if you are saving this string for output in an html doc use: the html entity for the trademark symbol.

if you are using this string for non html purposes decode the string at runtime use:

import HTMLParser
h = HTMLParser.HTMLParser()
s = h.unescape('™')

see:
http://www.w3schools.com/html/html_entities.asp
http://fredericiana.com/2010/10/08/decoding-html-entities-to-text-in-python/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文