字符串规范化
我正在编写一些需要进行字符串规范化的代码,我想将给定的字符串转换为驼峰式表示(好吧,至少是最好的猜测)。 示例:
"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"
我认为您应该能够从这些示例中得到启发。 我想删除所有特殊字符(',“,!,@,.等),将每个单词大写(单词由空格,_或-定义)和任何前导数字下降(尾随/内部都可以,但这个要求并不重要,具体取决于难度)
我的第一个猜测是。使用正则表达式,但我的正则表达式技能充其量只是糟糕,所以我真的不知道从哪里开始
我的另一个想法是循环并解析数据,比如将其分解为单词,解析每个,并以这种方式重建字符串。
或者还有另一种方法可以解决这个问题吗?
I'm writing some code which needs to do string normalisation, I want to turn a given string into a camel-case representation (well, to the best guess at least). Example:
"the quick brown fox" => "TheQuickBrownFox"
"the_quick_brown_fox" => "TheQuickBrownFox"
"123The_quIck bROWN FOX" => "TheQuickBrownFox"
"the_quick brown fox 123" => "TheQuickBrownFox123"
"thequickbrownfox" => "Thequickbrownfox"
I think you should be able to get the idea from those examples. I want to strip out all special characters (', ", !, @, ., etc), capitalise every word (words are defined by a space, _ or -) and any leading numbers dropped (trailing/ internal are ok, but this requirement isn't vital, depending on the difficulty really).
I'm trying to work out what would be the best way to achieve this. My first guess would be with a regular expression, but my regex skills are bad at best so I wouldn't really know where to start.
My other idea would be to loop and parse the data, say break it down into words, parse each one, and rebuilt the string that way.
Or is there another way in which I could go about it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用 Strings.StrConv 在 Microsoft.VisualBasic 命名空间中?
(不要忘记添加对 Microsoft.VisualBasic 的项目引用):
How about a simple solution using Strings.StrConv in the Microsoft.VisualBasic namespace?
(Don't forget to add a Project Reference to Microsoft.VisualBasic):
该正则表达式匹配所有单词。 然后,我们使用将第一个字符大写的方法来聚合它们,并使用 ToLower 来处理字符串的其余部分。
顺便说一下,这个方法忽略了数字。 要添加它们,您可以将正则表达式更改为
@"[a-zA-Z]*|[0-9]*"
,我想 - 但我还没有测试过。This regex matches all words. Then, we
Aggregate
them with a method that capitalizes the first chars, andToLower
s the rest of the string.This method ignores numbers, by the way. To Add them, you can change the regex to
@"[a-zA-Z]*|[0-9]*"
, I suppose - but I haven't tested it.任何涉及匹配特定字符的解决方案可能不适用于某些字符编码,特别是如果使用 Unicode 表示形式,其中包含数十个空格字符、数千个“符号”、数千个标点符号、数千个“字母”等。尽可能使用内置的 Unicode 识别函数会更好。 至于什么是“特殊字符”,您可以根据 Unicode 来决定类别。 例如,它会包括“标点符号”,但会包括“符号”吗?
ToLower()、IsLetter() 等应该没问题,并考虑 Unicode 中所有可能的字母。 与破折号和斜杠的匹配可能应该考虑 Unicode 中数十个空格和破折号字符中的一些。
Any solution that involves matching particular characters may not work well with some character encodings, particularly if Unicode representation is being used, which has dozens of space characters, thousands of 'symbols', thousands of punctuation characters, thousands of 'letters', etc. It would be better where-ever possible to use built-in Unicode-aware functions. In terms of what is a 'special character', well you could decide based on Unicode categories. For instance, it would include 'Punctuation' but would it include 'Symbols'?
ToLower(), IsLetter(), etc should be fine, and take into account all possible letters in Unicode. Matching against dashes and slashes should probably take into account some of the dozens of space and dash characters in Unicode.
您可以穿红宝石拖鞋上班 :)
You could wear ruby slippers to work :)
我觉得尝试一下会很有趣,这就是我的想法:
thought it'd be fun to try it, here's what i came up with: