如何获得雪花中包含行的数字

发布于 2025-02-11 15:38:39 字数 699 浏览 2 评论 0原文

我尝试过REGEXP,REGEXP_LIKE和LIKE,但不起作用

从B 中select *

  • regexp_like(col1,'\ d'),
  • 其中regexp_like(col1,'[0-9])
  • ... .etc

我们有此表

col1
avr100000
adfdsgwr
20170910020359.761
企业
adf56ds76gwr
+093000
080000
adfdsgwr

  col1
  avr100000
  20170910020359.761
  adf56ds76gwr
  0+093000
  1080000

0 谢谢

I have tried regexp, regexp_like and like but didn't work

select * from b

  • where regexp_like(col1, '\d')
  • where regexp_like(col1, '[0-9]')
  • ....etc

we have this table

Col1
avr100000
adfdsgwr
20170910020359.761
Enterprise
adf56ds76gwr
0+093000
080000
adfdsgwr

output should be these 5 rows

  col1
  avr100000
  20170910020359.761
  adf56ds76gwr
  0+093000
  1080000

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

千秋岁 2025-02-18 15:38:39

您可以在WHERE子句中使用REGEXP_INSTR,以查看是否在字符串中的任何位置找到一个数字:

create temp table b(col1 string);

insert into b (col1) values ('avr100000'), ('adfdsgwr'), 
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('080000'),
('adfdsgwr')
;

select col1 from b where regexp_instr(col1, '\\d') > 0;

我正在将答案更新到请注意,请注意Regexp_instr将执行约3.88比使用REGEXP_COUNT的时间快。

原因是REGEXP_INSTR将停止并报告其遇到的第一个数字的位置。相比之下,Regexp_count将继续检查字符串,直到达到其末端为止。如果我们只想知道字符串中是否存在一个数字,我们就可以在遇到第一个数字后立即停止。

如果这是一个小数据集,则无关紧要。对于大型数据集,快3.8倍的速度有很大的不同。这是一个小型测试安全带,显示了性能差异:

create or replace transient table RANDOM_STRINGS as
select RANDSTR(50, random()) as RANDSTR from table(generator (rowcount => 10000000));

alter session set use_cached_result = false;

-- Run these statements multiple times on an X-Small warehouse to test performance
-- Run both to warm the cache, then note the times after the initial runs to warm the cache

-- Average over 10 times with warm cache: 3.315s
select count(*) as ROWS_WITH_NUMBERS 
from RANDOM_STRINGS where regexp_count(randstr, '\\d') > 0;


-- Average over 10 times with warm cache: 0.8686s
select count(*) as ROWS_WITH_NUMBERS 
from RANDOM_STRINGS where regexp_instr(randstr, '\\d') > 0;

You can use regexp_instr in the where clause to see if it finds a digit anywhere in the string:

create temp table b(col1 string);

insert into b (col1) values ('avr100000'), ('adfdsgwr'), 
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('080000'),
('adfdsgwr')
;

select col1 from b where regexp_instr(col1, '\\d') > 0;

I'm updating my answer to note that regexp_instr is going to perform about 3.8 times faster than using regexp_count for this requirement.

The reason is that regexp_instr will stop and report the location of the first digit it encounters. In contrast, regexp_count will continue examining the string until it reaches its end. If we only want to know if a digit exists in a string, we can stop as soon as we encounter the first one.

If it is a small data set, this won't matter much. For large data sets, that 3.8 times faster makes a big difference. Here is a mini test harness that shows the performance difference:

create or replace transient table RANDOM_STRINGS as
select RANDSTR(50, random()) as RANDSTR from table(generator (rowcount => 10000000));

alter session set use_cached_result = false;

-- Run these statements multiple times on an X-Small warehouse to test performance
-- Run both to warm the cache, then note the times after the initial runs to warm the cache

-- Average over 10 times with warm cache: 3.315s
select count(*) as ROWS_WITH_NUMBERS 
from RANDOM_STRINGS where regexp_count(randstr, '\\d') > 0;


-- Average over 10 times with warm cache: 0.8686s
select count(*) as ROWS_WITH_NUMBERS 
from RANDOM_STRINGS where regexp_instr(randstr, '\\d') > 0;
转瞬即逝 2025-02-18 15:38:39

一种方法是计算有多少个alpha令牌:

select column1 as input
    ,regexp_count(column1, '[A-Za-z]') as alpha_count
from values
    ('0-100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adfdsgwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
输入alpha_count
0-1000000
ADFDSGWR8
20170910020359.7610
ENTERPRISE10
ADFDSGWR8
0+093000 0 1-0800000
1-0800000
ADFDSGWR8

,从而排除那些不在的地方:

select column1 as input
from values
    ('0-100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adfdsgwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
where regexp_count(column1, '[A-Za-z]') = 0

:INPUT 0:INPUT 0 :

IT IT IT IT INPUT
0 INPUT 0 -100000
20170910020359.761
0+093000
1-080000

One method is to count how many alpha tokens there are:

select column1 as input
    ,regexp_count(column1, '[A-Za-z]') as alpha_count
from values
    ('0-100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adfdsgwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
INPUTALPHA_COUNT
0-1000000
adfdsgwr8
20170910020359.7610
Enterprise10
adfdsgwr8
0+0930000
1-0800000
adfdsgwr8

and thus exclude those where it is not zero:

select column1 as input
from values
    ('0-100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adfdsgwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
where regexp_count(column1, '[A-Za-z]') = 0

gives:

INPUT
0-100000
20170910020359.761
0+093000
1-080000
红玫瑰 2025-02-18 15:38:39

您需要做的就是找到一个数值的1个或更多实例:

select
    column1 as input
from
values
    ('avr100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adf56ds76gwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
where
    regexp_count (column1, '\\d') > 0;

结果:

avr100000
20170910020359.800
adf56ds76gwr
0+093000
1-080000

All you need to do is find 1 or more instances of a numeric value:

select
    column1 as input
from
values
    ('avr100000'),
    ('adfdsgwr'),
    ('20170910020359.761'),
    ('Enterprise'),
    ('adf56ds76gwr'),
    ('0+093000'),
    ('1-080000'),
    ('adfdsgwr')
where
    regexp_count (column1, '\\d') > 0;

Results:

avr100000
20170910020359.800
adf56ds76gwr
0+093000
1-080000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文