sql字符串提取
我在 sql server 2008 中有一个列,它是 varchar(255)。 该栏充满了“测量”,大约有 15,000 条记录。 由于该列是字符串类型并且是由人手动输入的,因此变得混乱 例如
- 100x200cm
- 200x400mm
- 3" x 9"
- 30 x 20
- 12cm 长
基本上我需要将所有测量值(我可以)提取到可用尺寸中,并将每个单位存储到一个新列中(高度/宽度/深度)。
谁能建议最好的方法来做到这一点?
我正在研究正则表达式,但不确定最好的前进方向
I have a column in sql server 2008 which is a varchar(255).
The column is full of 'measurements' and there are around 15,000 records.
As the column is of type string and has been manually inputted by human it has become messy
e.g.
- 100x200cm
- 200x400mm
- 3" by 9"
- 30 x 20
- 12cm long
Basically I need to extract all measurements (that I can) into usable dimensions and store each unit into a new column (height/width/depth).
Can anyone suggest the best way to go about doing this?
I was looking at regular expression but not sure on the best way forward
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我推荐一个非常严格的解析器,每当遇到无法显式处理的内容时,它就会抛出异常。这样您就可以避免意外。
首先,您应该为此使用脚本语言。
获取记录并迭代它们,对于每条记录,您尝试解析它:
^([1-9][0-9]*)x([1-9][0-9]* )([az]+)$
- 如果匹配,则将$3
与有效单位列表相匹配,例如“cm”、“mm”、“m”等。^([1-9][0-9]*) ([^]+) 由([1-9][0-9]*)(.+)$
- 如果匹配,则比较匹配$2
和$4
处的单位。再次将它们与有效单位列表进行匹配,如果成功,但它们不相同,请将其中一个转换为另一个。您不断对引发异常的每个项目添加严格检查。这样,您的表达式列表就会随着您处理的案例而扩展。
I recommend a very strict parser which throws an exception whenever it encounters something it cannot handle explicitly. This way you are save from surprises.
First of, you should use a scripting language for that.
Fetch the records and iterate over them and for each record you try to parse it:
^([1-9][0-9]*)x([1-9][0-9]*)([a-z]+)$
- if it matches, you match$3
against a list of valid units, for example "cm", "mm", "m", etc.^([1-9][0-9]*)([^ ]+) by ([1-9][0-9]*)(.+)$
- if it matches, compare the units at match$2
and$4
. Match them against a list of valid units again, and if that's successful, but they are not the same, convert one into the other.You keep adding strict checks for each item which throws an exception. This way, your list of expressions expands with the cases you handle.