Oracle Text 不适用于 NVARCHAR2。还有什么可能不可用?
我们将迁移一个应用程序以使其支持 Unicode,并且必须在整个数据库的 unicode 字符集或存储在 N[VAR]CHAR2 中的 unicode 列之间进行选择。
我们知道,如果选择 NVARCHAR2,我们将不再能够使用 Oracle Text 对列内容进行索引,因为 Oracle Text 只能基于 CHAR 类型对列进行索引。
除此之外,在从 Oracle 中获取可能性时是否可能会出现其他主要差异?
另外,较新版本的 Oracle 中是否可能添加了一些新功能,但仅支持 CHAR 列或 NCHAR 列,但不能同时支持两者?
谢谢您的回答。
请注意贾斯汀的回答:
谢谢您的回答。我将讨论您的观点,并将其应用于我们的案例:
我们的应用程序通常单独在 Oracle 数据库上,并负责 数据本身。其他连接数据库的软件仅限于Toad, Tora 或 SQL 开发人员。
我们还使用 SQL*Loader 和 SQL*Plus 与数据库进行基本通信 声明或在产品版本之间升级。我们有 没有听说所有有关 NVARCHAR2 的软件有任何具体问题。
我们也不知道我们的客户中的数据库管理员会 喜欢在无法支持数据的数据库上使用其他工具 NVARCHAR2 和我们并不真正担心他们的工具是否会造成破坏, 毕竟他们对自己的工作很熟练,如果需要的话可能会找到其他工具。
您的最后两点对我们的案例更有洞察力。我们用的不多 Oracle 的内置软件包,但这种情况仍然发生。我们将探讨这一点 问题。
如果我们的应用程序(在 Visual C++ 下编译)使用 wchar_t
存储UTF-16,必须对所有处理的数据进行编码转换?
We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.
We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.
Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?
Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?
Thank you for your answers.
Note following Justin's answer:
Thank you for your answer. I will discuss your points, applied to our case:
Our application is usually alone on the Oracle database and takes care of the
data itself. Other software that connect to the database are limited to Toad,
Tora or SQL developer.
We also use SQL*Loader and SQL*Plus to communicate with the database for basic
statements or to upgrade between versions of the product. We have
not heard of any specific problem with all those software regarding NVARCHAR2.
We are also not aware that database administrators among our customers would
like to use other tools on the database that could not support data on
NVARCHAR2 and we are not really concerned whether their tools might disrupt,
after all they are skilled in their job and may find other tools if necessary.
Your last two points are more insightful for our case. We do not use many
built-in packages from Oracle but it still happens. We will explore that
problem.
Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t
to
store UTF-16, has to perform encoding conversions on all processed data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您有任何接近的选择,请对整个数据库使用 Unicode 字符集。这样一来,生活总体上就轻松多了。
Oracle 设计了 NCHAR/NVARCHAR2 数据类型,用于以下情况:您尝试在同一数据库中支持不支持 Unicode 的旧应用程序与使用 Unicode 的新应用程序,以及使用不同的数据库存储某些 Unicode 数据是有益的。编码(即您有大量日语数据,您希望使用 NVARCHAR2 中的 UTF-16 编码而不是 UTF-8 编码来存储这些数据)。如果您不属于这两种情况之一,而且听起来也不像,那么我会不惜一切代价避免使用 NCHAR/NVARCHAR2。
回复您的后续行动
“处理数据本身”是什么意思?我希望您不是说您已将应用程序配置为绕过 Oracle 的字符集转换例程并且您自己完成了所有字符集转换。
我还假设您正在使用某种 API/库来访问数据库,即使这是 OCI。您是否研究过需要对应用程序进行哪些更改才能支持 NCHAR/ NVARCHAR2 以及您使用的 API 是否支持 NCHAR/ NVARCHAR2?您在 C++ 中获取 Unicode 数据这一事实实际上并不表明您不需要进行(可能很重要的)更改来支持 NCHAR/NVARCHAR2 列。
这些应用程序均适用于 NCHAR/NVARCHAR2。 NCHAR/ NVARCHAR2 会给脚本带来一些额外的复杂性,特别是当您尝试对数据库字符集中无法表示的字符串常量进行编码时。不过,您当然可以解决这些问题。
虽然我确信您的客户可以找到处理数据的替代方法,但如果您的应用程序不能与他们的企业报告工具、企业 ETL 工具或他们碰巧使用过的任何桌面工具很好地配合,那么很可能客户会责怪你的应用程序而不是他们的工具。它可能不会成为一个阻碍,但给顾客带来不必要的悲伤也没有任何好处。这可能不会促使他们使用竞争对手的产品,但也不会让他们渴望接受你的产品。
我不确定你所说的“转换”是什么。这可能又回到了我最初的问题,即您是否声明您正在绕过 Oracle 的 NLS 层自行进行字符集转换。
不过,我的底线是,鉴于您所描述的内容,我认为使用 NCHAR/ NVARCHAR2 没有任何优势。使用它们有很多潜在的缺点。然而,即使您可以消除 99% 与您的特定需求无关的缺点,您仍然面临着一种情况,即最好的情况是两种方法之间的交替。鉴于此,我更愿意采用最大限度提高灵活性的方法,即将整个数据库转换为 Unicode(大概是 AL32UTF8)并使用它。
If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.
Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.
Responding to your followups
What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.
I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.
Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.
While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.
I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.
My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.