如何过滤掉 Teradata 文本字段中的非数字值?

发布于 2024-09-15 19:29:03 字数 724 浏览 10 评论 0原文

我有一个 Teradata 表,其中包含大约 1000 万条记录,该表将数字 id 字段存储为 varchar。我需要将该字段中的值传输到另一个表中的 bigint 列,但我不能简单地说cast(id_field as bigint),因为我收到无效字符错误。查看这些值,我发现字符串中的任何位置都可能有一个字符,所以假设字符串是 varchar(18) 我可以像这样过滤掉无效行:

     where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
     and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)

etc, etc... 

然后强制转换就可以了,但这是不可行的从长远来看。它很慢,并且如果字符串有 18 个可能的字符,则会导致查询不可读。如何过滤掉在此字段中具有值且不会转换为 bigint 的行,而不单独检查非数字字符数组中的每个字符?

示例值是

   123abc464
   a2.3v65
   a_356087
   ........
   000000000
   BOB KNIGHT
   1235468099

不遵循特定模式的值,我只需要过滤掉包含任何非数字数据的值。 123456789 可以,但 123.abc_c3865 不行……

oI have a teradata table with about 10 million records in it, that stores a numeric id field as a varchar. i need to transfer the values in this field to a bigint column in another table, but i can't simply say cast(id_field as bigint) because i get an invalid character error. looking through the values, i find that there could be a character at any position in the string, so let's say the string is varchar(18) i could filter out invalid rows like so :

     where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
     and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)

etc, etc... 

then the cast would work, but this is not feasible in the long run. it's slow and if the string has 18 possible characters, it makes the query unreadable. how can i filter out rows that have a value in this field that will not cast as a bigint without checking each character individually for an array of non-numeric characters?

example values would be

   123abc464
   a2.3v65
   a_356087
   ........
   000000000
   BOB KNIGHT
   1235468099

the values follow no specific patterns, I simply need to filter out the ones that contain ANY non-numeric data.
123456789 is okay but 123.abc_c3865 is not...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

橪书 2024-09-22 19:29:03

从TD14 Teradata开始添加了一些功能,现在有多种方法,例如:

WHERE RTRIM(col, '0123456789') = ''

但最简单的方法是TO_NUMBER,对于坏数据返回NULL:

TO_NUMBER(col)

Starting with TD14 Teradata added some functions, now there are multiple ways, e.g.:

WHERE RTRIM(col, '0123456789') = ''

But the easiest way is TO_NUMBER, which returns NULL for bad data:

TO_NUMBER(col)
少女情怀诗 2024-09-22 19:29:03

我曾经管理过的最好的方法是:

where char2hexint(upper(id_field)) = char2hexint(lower(id_field))

由于大写字符为小写字符提供了不同的十六进制值,这将确保您没有字母字符,但仍然会留下下划线、冒号等。如果这不能满足您的要求,您可能需要编写 UDF。

The best that I've ever managed is this:

where char2hexint(upper(id_field)) = char2hexint(lower(id_field))

Since upper case characters give a different hex value to lower case ones, this will ensure that you have no alphabetical characters, but will still leave you with underscores, colons and so forth. If this doesn't meet your requirements, you may need to write an UDF.

风苍溪 2024-09-22 19:29:03

我们是否还可以尝试将字段中的值除以某个整数“如果除以则必须是一个数字,如果不是并抛出一些错误,则必须有一些字符......”猜测这会很快,就像数学一样涉及...

could we also try to divide the values in the field by some integer "if divided then must be a number and if not and throws some error,then must have some character...." guess this would be lot fast as has just mathematics involved...

神回复 2024-09-22 19:29:03

我在尝试从街道地址门牌号中排除字母字符时遇到了同样的问题。如果您不介意将所有数字连接在一起,则以下内容将有效......
它检查字符串的上部是否等于字符串的下部,如果是,则为数字,否则为空。

select cast(case when upper(substring('12E'from 1 for 1)) = lower(substring('12E'from 1 for 1)) then substring('12E'from 1 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end ||
             case when upper(substring('12E'from 3 for 1)) = lower(substring('12E'from 3 for 1)) then substring('12E'from 3 for 1) else null end ||
             case when upper(substring('12E'from 4 for 1)) = lower(substring('12E'from 4 for 1)) then substring('12E'from 4 for 1) else null end ||
             case when upper(substring('12E'from 5 for 1)) = lower(substring('12E'from 5 for 1)) then substring('12E'from 5 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end
             as integer) 

I've faced the same issue to try to exclude alpha characters from street address house numbers. The following will work if you don't mind concatanating all the numeric numbers together......
It checks if the upper of a string equals the lower of the string, if so it's a number, if not it becomes null.

select cast(case when upper(substring('12E'from 1 for 1)) = lower(substring('12E'from 1 for 1)) then substring('12E'from 1 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end ||
             case when upper(substring('12E'from 3 for 1)) = lower(substring('12E'from 3 for 1)) then substring('12E'from 3 for 1) else null end ||
             case when upper(substring('12E'from 4 for 1)) = lower(substring('12E'from 4 for 1)) then substring('12E'from 4 for 1) else null end ||
             case when upper(substring('12E'from 5 for 1)) = lower(substring('12E'from 5 for 1)) then substring('12E'from 5 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end
             as integer) 
阳光①夏 2024-09-22 19:29:03

尝试使用此代码段

WHERE id_Field NOT LIKE '%[^0-9]%'

Try using this code segment

WHERE id_Field NOT LIKE '%[^0-9]%'
や莫失莫忘 2024-09-22 19:29:03

我发现 lins314159 答案对于解决类似问题非常有帮助。它可能是一个旧线程,但就其价值而言,我使用了:

char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9 ')

成功将剩余的 VARCHAR 结果转换为 INT

I found lins314159 answer to be very helpful with a similar issue. It may be an old thread but for what it's worth, I used:

char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')

to successfully cast the remaining VARCHAR results to INT

流殇 2024-09-22 19:29:03
SELECT customer_id
FROM t
WHERE UPPER(customer_id)(CASESPECIFIC) <>
      LOWER(customer_id)(CASESPECIFIC);

这对于检查数字字段中的值是否为非数字非常有效。

SELECT customer_id
FROM t
WHERE UPPER(customer_id)(CASESPECIFIC) <>
      LOWER(customer_id)(CASESPECIFIC);

This works perfectly fine to check whether the values in a numeric field is non-numeric.

笔落惊风雨 2024-09-22 19:29:03
SELECT id_field
WHERE oTranslate(id_field, '0123456789','')<>'';

这对我来说效果很好!它显示任何包含非数字值的 id_field

SELECT id_field
WHERE oTranslate(id_field, '0123456789','')<>'';

This works well for me! It reveals any id_field containing a non-numeric value

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文