如何过滤掉 Teradata 文本字段中的非数字值?
我有一个 Teradata 表,其中包含大约 1000 万条记录,该表将数字 id 字段存储为 varchar。我需要将该字段中的值传输到另一个表中的 bigint 列,但我不能简单地说cast(id_field as bigint),因为我收到无效字符错误。查看这些值,我发现字符串中的任何位置都可能有一个字符,所以假设字符串是 varchar(18) 我可以像这样过滤掉无效行:
where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)
etc, etc...
然后强制转换就可以了,但这是不可行的从长远来看。它很慢,并且如果字符串有 18 个可能的字符,则会导致查询不可读。如何过滤掉在此字段中具有值且不会转换为 bigint 的行,而不单独检查非数字字符数组中的每个字符?
示例值是
123abc464
a2.3v65
a_356087
........
000000000
BOB KNIGHT
1235468099
不遵循特定模式的值,我只需要过滤掉包含任何非数字数据的值。 123456789 可以,但 123.abc_c3865 不行……
oI have a teradata table with about 10 million records in it, that stores a numeric id field as a varchar. i need to transfer the values in this field to a bigint column in another table, but i can't simply say cast(id_field as bigint) because i get an invalid character error. looking through the values, i find that there could be a character at any position in the string, so let's say the string is varchar(18) i could filter out invalid rows like so :
where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)
etc, etc...
then the cast would work, but this is not feasible in the long run. it's slow and if the string has 18 possible characters, it makes the query unreadable. how can i filter out rows that have a value in this field that will not cast as a bigint without checking each character individually for an array of non-numeric characters?
example values would be
123abc464
a2.3v65
a_356087
........
000000000
BOB KNIGHT
1235468099
the values follow no specific patterns, I simply need to filter out the ones that contain ANY non-numeric data.
123456789 is okay but 123.abc_c3865 is not...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
从TD14 Teradata开始添加了一些功能,现在有多种方法,例如:
但最简单的方法是TO_NUMBER,对于坏数据返回NULL:
Starting with TD14 Teradata added some functions, now there are multiple ways, e.g.:
But the easiest way is TO_NUMBER, which returns NULL for bad data:
我曾经管理过的最好的方法是:
由于大写字符为小写字符提供了不同的十六进制值,这将确保您没有字母字符,但仍然会留下下划线、冒号等。如果这不能满足您的要求,您可能需要编写 UDF。
The best that I've ever managed is this:
Since upper case characters give a different hex value to lower case ones, this will ensure that you have no alphabetical characters, but will still leave you with underscores, colons and so forth. If this doesn't meet your requirements, you may need to write an UDF.
我们是否还可以尝试将字段中的值除以某个整数“如果除以则必须是一个数字,如果不是并抛出一些错误,则必须有一些字符......”猜测这会很快,就像数学一样涉及...
could we also try to divide the values in the field by some integer "if divided then must be a number and if not and throws some error,then must have some character...." guess this would be lot fast as has just mathematics involved...
我在尝试从街道地址门牌号中排除字母字符时遇到了同样的问题。如果您不介意将所有数字连接在一起,则以下内容将有效......
它检查字符串的上部是否等于字符串的下部,如果是,则为数字,否则为空。
I've faced the same issue to try to exclude alpha characters from street address house numbers. The following will work if you don't mind concatanating all the numeric numbers together......
It checks if the upper of a string equals the lower of the string, if so it's a number, if not it becomes null.
尝试使用此代码段
Try using this code segment
我发现 lins314159 答案对于解决类似问题非常有帮助。它可能是一个旧线程,但就其价值而言,我使用了:
char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9 ')
成功将剩余的 VARCHAR 结果转换为 INT
I found lins314159 answer to be very helpful with a similar issue. It may be an old thread but for what it's worth, I used:
char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')
to successfully cast the remaining VARCHAR results to INT
这对于检查数字字段中的值是否为非数字非常有效。
This works perfectly fine to check whether the values in a numeric field is non-numeric.
这对我来说效果很好!它显示任何包含非数字值的 id_field
This works well for me! It reveals any id_field containing a non-numeric value