提取主机+路径中的一个文件夹

发布于 2024-12-21 04:00:35 字数 545 浏览 4 评论 0原文

你能帮我找出一个正则表达式，

当其后面的路径中没有指定文件夹时，该正则表达式将从 url:

host name 中提取例如

http://jj.com/'-> 'jj.com
http://jj.com/index.php'-> 'jj.com
http://jj.com/query?q=http://kk.uk' -> 'jj.com

主机名 + 路径中的一个文件夹（当路径中至少指定了一个文件夹时）例如

'http://jj.com/site/index.php' -> 'jj.com/site'
'http://jj.com/site/second/aldldls.html' -> 'jj.com/site'

是否可以仅使用一个正则表达式来做到这一点？

顺便说一句，我将使用 hive 中的 regex_extract 函数，但任何可以做到这一点的正则表达式变体（例如 perl 正则表达式）都将非常有用。

原文

Could you help me figure out a regular expression that would extract from url:

host name when there is no folder specified in the path that follows it
e.g.

http://jj.com/' -> 'jj.com
http://jj.com/index.php' -> 'jj.com
http://jj.com/query?q=http://kk.uk' -> 'jj.com

host name + one folder from path when there is at least one folder specified in the path
e.g.

'http://jj.com/site/index.php' -> 'jj.com/site'
'http://jj.com/site/second/aldldls.html' -> 'jj.com/site'

Is it possible to do that with just one regular expression?

BTW I will be using regex_extract function from hive but any variation of regex (e.g. perl regex) that can do that would be extremely useful.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡莣 2024-12-28 04:00:35

use 5.010;
use URI;

for (
    'http://jj.com/',
    'http://jj.com/index.php',
    'http://jj.com/query?q=http://kk.uk',
    'http://jj.com/site/index.php',
    'http://jj.com/site/second/aldldls.html',
) {
    my $u = URI->new($_);
    say (
        ($u->path_segments)[2]
            ? join q(/), $u->host, ($u->path_segments)[1]
            : $u->host
    );
}

输出

jj.com
jj.com
jj.com
jj.com/site
jj.com/site

use 5.010;
use URI;

for (
    'http://jj.com/',
    'http://jj.com/index.php',
    'http://jj.com/query?q=http://kk.uk',
    'http://jj.com/site/index.php',
    'http://jj.com/site/second/aldldls.html',
) {
    my $u = URI->new($_);
    say (
        ($u->path_segments)[2]
            ? join q(/), $u->host, ($u->path_segments)[1]
            : $u->host
    );
}

Output

jj.com
jj.com
jj.com
jj.com/site
jj.com/site

回复收藏 0 原文

壹場煙雨 2024-12-28 04:00:35

#!/usr/bin/perl

use strict;
use warnings;

for (<DATA>) {
    s!^http://([^/]+/([^\?/]+/)?).*!$1!;
    s!/\s*$!!;
    print "$_\n";
}

__DATA__
http://jj.com/
http://jj.com/index.php
http://jj.com/query?q=http://kk.uk
http://jj.com/site/index.php
http://jj.com/site/second/aldldls.html

输出：

jj.com
jj.com
jj.com
jj.com/site
jj.com/site

#!/usr/bin/perl

use strict;
use warnings;

for (<DATA>) {
    s!^http://([^/]+/([^\?/]+/)?).*!$1!;
    s!/\s*$!!;
    print "$_\n";
}

__DATA__
http://jj.com/
http://jj.com/index.php
http://jj.com/query?q=http://kk.uk
http://jj.com/site/index.php
http://jj.com/site/second/aldldls.html

Output:

jj.com
jj.com
jj.com
jj.com/site
jj.com/site

回复收藏 0 原文

~没有更多了~

关于作者

橘味果▽酱

暂无简介

文章

28 人气

关注发私信

达拉崩吧

文章 0 评论 0

关注

PANGOO

文章 0 评论 0

关注

kkgtx

文章 0 评论 0

关注

WordPress小学生

文章 0 评论 0

关注

酷炫老祖宗

文章 0 评论 0

关注

硪扪都還晓

文章 0 评论 0

友情链接

文江博客

提取主机+路径中的一个文件夹

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

提取主机+路径中的一个文件夹

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。