使用 String.ToUpperInvariant() 对字符串进行规范化
我目前在 SQL Server 数据库中以小写形式存储标准化版本的字符串。 例如,在我的 Users 表中,我有一个 UserName 和 LoweredUserName 字段。 根据上下文,我使用 T-SQL 的 LOWER() 函数或 C# 的 String.ToLower() 方法来生成用户名的小写版本以填充 LoweredUserName 字段。 根据Microsoft 指南和Visual Studio 的代码分析规则 CA1308,我应该使用 C# 的 String.ToUpperInvariant() 而不是 ToLower()。 根据 Microsoft 的说法,这既是一个性能问题,也是一个全球化问题:转换为大写字母是安全的,而转换为小写字母可能会导致信息丢失(例如,土耳其语“我”问题)。
如果我转向使用 ToUpperInvariant 进行字符串规范化,我还必须更改我的数据库架构,因为我的架构基于 Microsoft 的 ASP.NET 会员 框架(请参阅 此相关问题),它将字符串标准化为小写。
Microsoft 告诉我们在 C# 中使用大写规范化,而它自己的成员资格表和过程中的代码却使用小写规范化,这不是自相矛盾吗? 我应该将所有内容切换为大写标准化,还是继续使用小写标准化?
I am currently storing normalized versions of strings in my SQL Server database in lower case. For example, in my Users table, I have a UserName and a LoweredUserName field. Depending on the context, I either use T-SQL's LOWER() function or C#'s String.ToLower() method to generate the lower case version of the user name to fill the LoweredUserName field. According to Microsoft's guidelines and Visual Studio's code analysis rule CA1308, I should be using C#'s String.ToUpperInvariant() instead of ToLower(). According to Microsoft, this is both a performance and globalization issue: converting to upper case is safe, while converting to lower case can cause a loss of information (for example, the Turkish 'I' problem).
If I move to using ToUpperInvariant for string normalization, I will have to change my database schema as well, since my schema is based on Microsoft's ASP.NET Membership framework (see this related question), which normalizes strings to lower case.
Isn't Microsoft contradicting itself by telling us to use upper case normalization in C#, while it's own code in the Membership tables and procedures is using lower case normalization? Should I switch everything to upper case normalization, or just continue using lower case normalization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据CA1308,这样做的原因是某些字符不能往返从大写转换为小写。 重要的是你总是朝一个方向移动,所以如果你的标准是总是移动到小写,那么没有理由改变它。
According to CA1308, the reason to do this is that some characters cannot be roundtrip converted from upper to lower case. The important thing is that you always move in one direction, so if your standard is to always move to lower case then there is no reason to change it.
回答你的第一个问题,是的,微软有点不一致。 要回答您的第二个问题,在您确认这会导致应用程序出现瓶颈之前,请勿切换任何内容。
想想你可以在你的项目上取得多少进展,而不是浪费时间切换一切。 您的开发时间比您从此类更改中获得的节省更有价值。
记住:
To answer your first question, yes Microsoft is a bit inconsistent. To answer your second question, no do not switch anything until you have confirmed that this is causing a bottleneck in your application.
Think how much forward progress you can make on you project instead of wasting time switching everything. Your development time is much more valuable than the savings you would get from such a change.
Remember:
对于那些想知道为什么建议使用“大写”的人,这里是所谓的“土耳其语 I 问题”的演示:
上面的代码产生以下输出:
这表明,当使用土耳其文化时(即
tr -TR
),如果通过将字符串全部转换为大写来规范化字符串,那么稍后将这些大写字符串转换为小写时,您将获得原始字符串。 如果您标准化为小写,您将无法返回原始字符串(即您不能 往返)。我不确定这一切在其他语言中是如何发挥作用的(Unicode 是一件很混乱的事情),但至少对于土耳其语来说,人们可以明白为什么建议使用大写字母而不是小写字母。
For anyone else who lands here wondering why "uppercase" is recommended, here is a demonstration of the so called "Turkish I Problem":
The above code produces the following output:
This shows that when using the Turkish culture (i.e.
tr-TR
), if you normalize strings by converting them all to uppercase, you will get the original string if you later convert those uppercase strings to lowercase. If instead you normalize to lowercase, you won't be able to get back to the original string (i.e. you can't roundtrip).I am not sure how all this plays out with other languages (Unicode is messy business) but at least for Turkish one can see why uppercase is recommended over lowercase.
继续使用小写标准化。 仅当出现大问题时才进行更改以符合 Microsoft 标准。
这是不幸的,但却是值得的。 遗憾的是,微软的“标准”往往没有得到充分考虑,而且有些不太一致。 与他们合作的经验表明,除非有令人信服的理由,否则最好在有效的情况下坚持下去。 请注意,对于非 Microsoft 技术来说,情况通常并非如此。 但微软“标准”的任意性使得它们值得避免。
编辑:我应该在这里澄清; 根据对微软标准的长期经验,我对微软的评价非常低。 正如评论中所指出的,我没有特别提到“除了微软之外的其他人”; 这只是来自我的个人经历。 您的里程可能会有很大差异。 这个答案应该被认为只是我的意见。 很抱歉没有早点说得更清楚。
Continue using lower case normalization. Only change to conform to Microsoft standards if a large issue develops.
This is unfortunate, but worthwhile. Sadly, Microsoft "standards" tend to be poorly considered and somewhat less than consistent; experience with them has shown that unless there is a compelling reason, it's best to simply stick with what works while it works. Note that this is generally NOT true of non-Microsoft technologies; but the arbitrariness of the Microsoft "standards" makes them worth avoiding.
Edit: I should clarify here; my opinion of Microsoft is very low, from long experience with their standards. As was pointed out in the comments, I don't have particular references to point out about "everybody else other than Microsoft"; this just comes from my personal experience. Your Mileage May Vary widely. This answer should be considered really just my opinion. Sorry for not making that more clear earlier.