Sql Server 2008 - 全文四舍五入货币值?
假设我们有一个包含这些记录的全文索引表:(
blabla bla bla 101010,65 blabla bla bla
blabla bla bla 1012344,34 blabla bla bla
葡萄牙语中的小数点分隔符是“,”而不是英语中的“。”)
当我们执行如下查询时:
where contains(field, "101011") or
where contains(field, "1012344")
全文引擎正在返回这些记录,因为它似乎我认为它将数字四舍五入为:
101010,65 becomes 101011
1012344,34 becomes 1012344
有什么办法可以避免这种情况吗?
编辑
抱歉,我忘了说该列是 varchar max 列,而不是货币列。当该字段具有浮点值时,就会发生这种情况,尽管它是一个 varchar 列
EDIT2
这不是我的列中唯一的数据。类似的数字经常出现在我的索引文本中。它不是串联的。正如我所说,这是原文的一部分,我没有对原文做任何事情。我猜这是断词器的行为,但谁知道呢?
Lets assume we have a full text indexed table with those records:
blabla bla bla 101010,65 blabla bla bla
blabla bla bla 1012344,34 blabla bla bla
(The decimal separator in Portuguese is "," not "." as in English)
When we execute a query like:
where contains(field, "101011") or
where contains(field, "1012344")
The full text engine is returning those records because it seems to me that it is rounding the numbers as:
101010,65 becomes 101011
1012344,34 becomes 1012344
Is there any way of avoiding that?
EDIT
Sorry, i forgot to say that the column is a varchar max column and not a currency column. This is happening in this field when it has a float value despite the fact that it is a varchar column
EDIT2
This is not the only data I have in my column. Numbers like those appears frequently on my indexed texts. It is not concatenated. As I said, this is part of the original text and I have done nothing to the original text. I guess this is a behavior of the word breaker, but who knows for sure?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
编辑:
<忽略>
您看到此行为的原因是,SQL 全文搜索的默认分词器是由英语(区域设置 1033)定义的。在英语中,逗号是有效的断词符,从而将您的数字分成两个不同的数字。然而,如果您使用葡萄牙语断字器,FTS 会非常巧妙地将数字保留在一起。尝试在 SQL Server 上运行以下查询,看看全文引擎如何根据指定的区域设置以不同的方式解析相同的输入
: /忽略>
更新:
好吧,我已经成功复制了您的场景,是的,这似乎是 SQL Server FTS 的默认行为。然而,它似乎只四舍五入到最接近的数字的 1/10(在您的情况下是最接近的 10 分),而不是最接近的整数。
例如;搜索12345,88 和 12345,9 时都会返回 12345,88,而搜索 56789,98 和 56790 时会出现 56789,98。但是,诸如此类的数字45678,60 将保持不变,不会向上或向下舍入,因此它并不像您想象的那么糟糕。
但不确定是否可以采取任何措施来改变这种行为。在谷歌上快速搜索一无所获。
EDIT:
< Ignore >
The reason you are seeing this behaviour is that, the default wordbreakers for SQL fulltext search are defined by the English language (locale 1033). In English, a comma is a valid word-breaker, thereby breaking your number into two different numbers. However, if you use the Portuguese word-breaker, FTS quite cleverly retains the numbers together. Try running the following query on your SQL Server to see how the fulltext engine parses the same input differently depending on the locale specified:
< /Ignore >
UPDATE:
Alright, I have managed to replicate your scenario and yes it does seem to be default behaviour with SQL Server FTS. However, it only seems to round up to nearest 1/10th of the number (the nearest 10 centavos in your case), and NOT to the nearest whole number.
So for example; 12345,88 would be returned in searches for both 12345,88 as well as 12345,9, while 56789,98 would appear in searches for 56789,98 as well as 56790. However, a number such as 45678,60 will remain intact with no rounding up or down, so it's not as bad as you think.
Not sure if there is anything you can do to change this behaviour though. A quick search on Google returned nothing.
我的建议是首先不要使用 Money 数据类型。它为您带来的只是一点格式化的便利(无论如何您都应该在表示层执行此操作),但会带来其他复杂性和不灵活性。我不确定 DECIMAL/NUMERIC 能否解决这个特定问题,因为我不是全文专家,但我会尽可能引导人们远离有问题的数据类型,例如 MONEY。有关此问题的大量讨论,请参阅上一个问题。 您应该选择 MONEY 还是 DECIMAL( x,y) SQL Server 中的数据类型?
My suggestion would be to not use the Money data type in the first place. All it buys you is a little formatting ease (which you should be doing at the presentation layer anyway), but brings about other complications and inflexibility. I'm not sure DECIMAL/NUMERIC would solve this particular issue, as I'm not a full-text guy, but I try to steer people away from problematic data types like MONEY whenever I can. See this previous question for lots of discussion about this. Should you choose the MONEY or DECIMAL(x,y) datatypes in SQL Server?