Oracle 数据库中不区分大小写搜索的性能

发布于 2024-12-05 05:04:07 字数 610 浏览 4 评论 0原文

我的数据库背景位于 MS SQL Server 端,其中索引和约束中的文本比较不区分大小写(至少默认情况下)。因此,一旦将值“abc”分配给唯一列,就无法存储第二个值“ABC”,并且如果搜索“ABC”,SQL Server 将找到“abc”。

对于 Oracle,情况有所不同,因此即使在文本列上使用唯一索引,您也可以在那里存储“abc”和“ABC”,如果您搜索“AbC”,您将不会得到任何结果。

AFAIK 在 Oracle 10gR2 之前没有办法解决这个问题,现在可以为每个会话设置不敏感的比较,恕我直言,这不是一个好的解决方案,因为一切都取决于程序员的纪律。

但是区分大小写的查找最糟糕的是,那些将所有搜索重写为 UPPER(some_column)=UPPER(some_text) (这是许多讨论线程所推荐的)的人会以表扫描结束,即使存在some_column 上的索引。性能影响是灾难性的:我刚刚在一个有 50 万行的表上测试了一个简单的搜索,使用 UPPER 函数调用的搜索比仅使用列标识符的搜索花费了 20 倍的时间,从而确认在执行函数时没有使用索引基于搜索。

在 Oracle 数据库中进行不区分大小写搜索的最标准技术真的是应用 UPPER/LOWER 函数来搜索元素,尽管性能很差?或者有更优雅的方法来解决这个问题吗?

My db background is on MS SQL Server side where text comparison in indexes and constraints is not case-sensitive (at least by default). So once you have a value "abc" assigned to a unique column, you can not store a second value "ABC" and if you search for "ABC" SQL Server will find "abc".

With Oracle things are different, so even with unique index on a text column you can store there both "abc" and "ABC", and if you search for "AbC" you won't get any result.

AFAIK prior to Oracle 10gR2 there was no way around it, now it's possible to set insensitive comparison per sesson which IMHO is not a good solution because everything depends on programmers' discipline.

But what's worst with case-sensitive lookup is that those who rewrite all searches as UPPER(some_column)=UPPER(some_text) (and this is what many discussion threads recommend) end with table scan even when there is an index on some_column. Performance implication is disastereous: I just tested a simple search on a table with half a million rows, and search with UPPER function call took 20 times longer than the search with just a column identifier, thus confirming that the index is not used when doing function-based search.

Is it really so that the most standard technique to do case-insensitive search in Oracle database is to apply UPPER/LOWER functions to search elements despite of bad performance? Or are there more elegant ways of addressing this issue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

淡莣 2024-12-12 05:04:07

是的,使用 UPPER(some_column)=UPPER(some_text) 确实是最好的方法,但是您可以在 UPPER(some_column)。这应该可以缓解这个问题。

Yes, use of of UPPER(some_column)=UPPER(some_text) really is the best way, but you can create an index on UPPER(some_column). That should alleviate the problem.

停顿的约定 2024-12-12 05:04:07

我想说的是,根据贵公司用于清理这些字段的业务逻辑创建“清理”字段(例如,公司名称或地址将有大量围绕家具词、usps 规则等的清理逻辑,更不用说第三个了)派对清洁程序(如果使用)。

因此,对于重要的搜索字段,请保留原始(不干净)和干净版本。如果您的清洁逻辑随着时间的推移发生显着变化,您可以返回并根据原始值重新清洁。您的搜索(假设您没有使用 Oracle Text 或 Lucene 等模糊逻辑引擎)将达到干净的值。

对于所有其他字段(不需要单独的干净版本),我通常执行最低级别的清理。大写、修剪、剥离控制字符、将多个空格减少为一个空格等都是一组基本清理例程的一部分。这些通常在加载数据之前完成(在数据构建程序中)。

因此,提出一套数据标准并坚持下去,但我不会只是在数据库中扔任何旧垃圾。尽量不要将数据库视为数据的垃圾场,您肯定会得到更好的结果。

I would say create "clean" fields based on your company's business logic for cleaning these fields (a company name or address, for example, will have a surprising amount of cleaning logic around furniture words, usps rules, etc., not to mention third party cleaning routines if used).

So, for important search fields, keep BOTH the raw (unclean) and clean versions. If your cleaning logic changes significantly over time, you can go back and reclean based on raw values. Your searches (assuming you're not using a fuzzy logic engine like Oracle Text or Lucene) would hit the clean values.

For all other fields (that do not merit separate clean versions), I usually perform a minimum level of scrubbing. Capitalization, trimming, strip control chars, reducing multiple spaces to 1 space, etc. is all part of a set of basic cleansing routines. These are usually done before the data is loaded (in the data build programs).

So, come up with a set of data standards and stick to it, but I wouldn't just throw any ole junk in the db. Try not to think of the db as a dumping ground of data, and you'll get better results for sure.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文