Oracle:将 varchar2 字段中的 xml 实体转换为 utf-8 字符
我在表中有一个字段,它保存特殊字符的 XML 实体,因为该表采用 latin-1 格式。 例如“Hallöle slovenčina
”(“ö”在 latin-1 中,但“slovenčina”中的“č”必须通过某个存储该实体的应用程序转换为实体值到数据库中)
现在我需要通过将 XML 实体转换为其原始字符来将表导出到 utf-8 编码文件中。
Oracle 中是否有一个函数可以为我处理这个问题,或者我真的需要为此创建一个巨大的键/值映射?
非常感谢任何帮助。
编辑:我找到了函数 DBMS_XMLGEN.convert
,但它仅适用于 <
、>
和 &
>。不在 &#NNN;
上:-(
I have a field in a table which holds XML entities for special characters, since the table is in latin-1.
E.g. "Hallöle slovenčina
" (the "ö" is in latin-1, but the "č" in "slovenčina" had to be converted to an entity by some application that stores the values into the database)
Now I need to export the table into a utf-8 encoded file by converting the XML entities to their original characters.
Is there a function in Oracle that might handle this for me, or do I really need to create a huge key/value map for that?
Any help is greatly appreciated.
EDIT: I found the function DBMS_XMLGEN.convert
, but it only works on <
,>
and &
. Not on NNN;
:-(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为 dbms_xmlgen 的问题在于技术上只有五个 XML 实体。您的示例有一个数字 HTML 实体,它与 Unicode 相对应:
http: //theorem.ca/~mvcorks/cgi-bin/unicode.pl.cgi?start=0100&end=017F
Oracle 有一个函数 UNISTR 在这里很有帮助:
在上面的示例中,我已将 269 转换为其十六进制等效值
010d
(在 Unicode 中为U+010D
)。但是,您可以传递一个十进制数并进行如下转换:编辑:PL/SQL 解决方案:
这是我为您准备的一个示例。这应该循环并替换您从表中选择的每一行的所有出现情况。
请注意,我已对 UTL_FILE 函数的调用进行了存根,以将 NVARCHAR 行(Oracle 的扩展字符集)写入数据库服务器上的文件。 dbms_output 虽然非常适合调试,但似乎不支持扩展字符,但如果您使用 UTL_FILE 写入文件,这应该不是问题。这是 DBMS_OUTPUT:
I believe the problem with dbms_xmlgen is that there are technically only five XML entities. Your example has a numeric HTML entity, which corresponds with Unicode:
http://theorem.ca/~mvcorks/cgi-bin/unicode.pl.cgi?start=0100&end=017F
Oracle has a function UNISTR which is helpful here:
I've converted 269 to its hex equivalent
010d
in the example above (in Unicode it isU+010D
). However, you could pass a decimal number and do a conversion like this:EDIT: The PL/SQL solution:
Here's an example I've rigged up for you. This should loop over and replace any occurrences for each row you select out of your table(s).
Notice that I've stubbed in calls to the UTL_FILE function to write NVARCHAR lines (Oracle's extended character set) to a file on the database server. The dbms_output, while great for debugging, doesn't seem to support extended characters, but this shouldn't be a problem if you use UTL_FILE to write to a file. Here's the DBMS_OUTPUT:
您也可以只使用国际化包:
非常适合将这些 html 实体更改为普通字符(例如将数据库从 iso 8859P1 移动到 UTF-8 后的清理)
You can also just use the internationalization package :
Works great in changing those html entities to normal characters (such as cleanup after moving a database from iso 8859P1 to UTF-8)
这可能应该在 PL/SQL 中完成,我不知道,但我想看看用纯 SQL 能达到什么程度。这只会替换第一次出现的代码,因此您必须以某种方式运行它多次。
或者可读性较差但更可用:
此(更新的)版本正确替换了第一次出现的版本。您需要应用它,直到所有这些都被替换为止。
This should probably be done in PL/SQL which I do not know, but I wanted to see how far I could get it with pure SQL. This only replaces the first occurence of the code, so you would have to somehow run it multiple times.
Or less readable but more usable:
This (updated) version correctly replaces the first occurrence. You need to apply it until all of them are replaced.