当前位置：文江博客话题详情

Oracle Text 不适用于 NVARCHAR2。还有什么可能不可用？

发布于 2024-10-06 12:53:45 字数 817 浏览 5 评论 0原文

我们将迁移一个应用程序以使其支持 Unicode，并且必须在整个数据库的 unicode 字符集或存储在 N[VAR]CHAR2 中的 unicode 列之间进行选择。

我们知道，如果选择 NVARCHAR2，我们将不再能够使用 Oracle Text 对列内容进行索引，因为 Oracle Text 只能基于 CHAR 类型对列进行索引。

除此之外，在从 Oracle 中获取可能性时是否可能会出现其他主要差异？

另外，较新版本的 Oracle 中是否可能添加了一些新功能，但仅支持 CHAR 列或 NCHAR 列，但不能同时支持两者？

谢谢您的回答。

请注意贾斯汀的回答：

谢谢您的回答。我将讨论您的观点，并将其应用于我们的案例：

我们的应用程序通常单独在 Oracle 数据库上，并负责数据本身。其他连接数据库的软件仅限于Toad， Tora 或 SQL 开发人员。

我们还使用 SQL*Loader 和 SQL*Plus 与数据库进行基本通信声明或在产品版本之间升级。我们有没有听说所有有关 NVARCHAR2 的软件有任何具体问题。

我们也不知道我们的客户中的数据库管理员会喜欢在无法支持数据的数据库上使用其他工具 NVARCHAR2 和我们并不真正担心他们的工具是否会造成破坏，毕竟他们对自己的工作很熟练，如果需要的话可能会找到其他工具。

您的最后两点对我们的案例更有洞察力。我们用的不多 Oracle 的内置软件包，但这种情况仍然发生。我们将探讨这一点问题。

如果我们的应用程序（在 Visual C++ 下编译）使用 wchar_t 存储UTF-16，必须对所有处理的数据进行编码转换？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生来就爱笑 2024-10-13 12:53:45

如果您有任何接近的选择，请对整个数据库使用 Unicode 字符集。这样一来，生活总体上就轻松多了。

有许多第三方实用程序和库根本不支持 NCHAR/ NVARCHAR2 列，或者无法使 NCHAR/ NVARCHAR2 列的使用变得愉快。例如，当您闪亮的新报告工具无法报告您的 NVARCHAR2 数据时，这非常烦人。
对于自定义应用程序，使用 NCHAR/NVARCHAR2 列需要跳过一些使用 CHAR/VARCHAR2 Unicode 编码列则不需要的环节。例如，在 JDBC 代码中，您将不断调用 Statement.setFormOfUse 方法。其他语言和框架也会有其他问题；有些将有相对详细的记录，而其他一些则相对晦涩难懂。
许多内置包仅接受（或返回）VARCHAR2 而不是 NVARCHAR2。由于隐式转换，您仍然可以调用它们，但最终可能会遇到字符集转换问题。
一般来说，能够避免数据库内的字符集转换问题并将这些问题转移到数据库实际从客户端发送或接收数据的边缘，使得开发应用程序的工作变得更加容易。调试由网络传输引起的字符集转换问题已经足够了——在存储过程连接 VARCHAR2 和 NVARCHAR2 中的数据并将结果存储在 VARCHAR2 中然后通过网络发送之前，找出某些数据被损坏的情况，可以令人难以忍受。

Oracle 设计了 NCHAR/NVARCHAR2 数据类型，用于以下情况：您尝试在同一数据库中支持不支持 Unicode 的旧应用程序与使用 Unicode 的新应用程序，以及使用不同的数据库存储某些 Unicode 数据是有益的。编码（即您有大量日语数据，您希望使用 NVARCHAR2 中的 UTF-16 编码而不是 UTF-8 编码来存储这些数据）。如果您不属于这两种情况之一，而且听起来也不像，那么我会不惜一切代价避免使用 NCHAR/NVARCHAR2。

回复您的后续行动

我们的应用程序通常单独在
Oracle 数据库并负责
数据本身。其他软件
连接到数据库仅限于
Toad、Tora 或 SQL 开发人员。

“处理数据本身”是什么意思？我希望您不是说您已将应用程序配置为绕过 Oracle 的字符集转换例程并且您自己完成了所有字符集转换。

我还假设您正在使用某种 API/库来访问数据库，即使这是 OCI。您是否研究过需要对应用程序进行哪些更改才能支持 NCHAR/ NVARCHAR2 以及您使用的 API 是否支持 NCHAR/ NVARCHAR2？您在 C++ 中获取 Unicode 数据这一事实实际上并不表明您不需要进行（可能很重要的）更改来支持 NCHAR/NVARCHAR2 列。

我们还使用 SQL*Loader 和 SQL*Plus 来
与数据库进行通信
基本报表或升级之间
产品的版本。我们还没有
听说有任何具体问题
那些关于 NVARCHAR2 的软件。

这些应用程序均适用于 NCHAR/NVARCHAR2。 NCHAR/ NVARCHAR2 会给脚本带来一些额外的复杂性，特别是当您尝试对数据库字符集中无法表示的字符串常量进行编码时。不过，您当然可以解决这些问题。

我们也不知道数据库
我们的客户中的管理员
想使用其他工具
数据库无法支持数据
在 NVARCHAR2 上，我们并不是真的
担心他们的工具是否会
破坏，毕竟他们擅长
他们的工作，并且可能会找到其他工具，如果
必要的。

虽然我确信您的客户可以找到处理数据的替代方法，但如果您的应用程序不能与他们的企业报告工具、企业 ETL 工具或他们碰巧使用过的任何桌面工具很好地配合，那么很可能客户会责怪你的应用程序而不是他们的工具。它可能不会成为一个阻碍，但给顾客带来不必要的悲伤也没有任何好处。这可能不会促使他们使用竞争对手的产品，但也不会让他们渴望接受你的产品。

我们是否也可以期待性能
如果我们的应用程序（即
在 Visual C++ 下编译），使用
wchar_t 存储 UTF-16，必须
对所有执行编码转换
处理过的数据？

我不确定你所说的“转换”是什么。这可能又回到了我最初的问题，即您是否声明您正在绕过 Oracle 的 NLS 层自行进行字符集转换。

不过，我的底线是，鉴于您所描述的内容，我认为使用 NCHAR/ NVARCHAR2 没有任何优势。使用它们有很多潜在的缺点。然而，即使您可以消除 99% 与您的特定需求无关的缺点，您仍然面临着一种情况，即最好的情况是两种方法之间的交替。鉴于此，我更愿意采用最大限度提高灵活性的方法，即将整个数据库转换为 Unicode（大概是 AL32UTF8）并使用它。

If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.

There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.
For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.
Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.
In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.

Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.

Responding to your followups

Our application is usually alone on
the Oracle database and takes care of
the data itself. Other software that
connect to the database are limited to
Toad, Tora or SQL developer.

What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.

I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.

We also use SQL*Loader and SQL*Plus to
communicate with the database for
basic statements or to upgrade between
versions of the product. We have not
heard of any specific problem with all
those software regarding NVARCHAR2.

Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.

We are also not aware that database
administrators among our customers
would like to use other tools on the
database that could not support data
on NVARCHAR2 and we are not really
concerned whether their tools might
disrupt, after all they are skilled in
their job and may find other tools if
necessary.

While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.

Could we also expect performance
breakage if our application (that is
compiled under Visual C++), that uses
wchar_t to store UTF-16, has to
perform encoding conversions on all
processed data?

I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.

My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.

回复收藏 0 原文

~没有更多了~