解析 mysql:/// sqlite:/// URL

发布于 2024-11-08 19:49:17 字数 674 浏览 0 评论 0原文

我们在模块中有这个小正则表达式来解析如下 URL：

if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
    $url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {

mysql:// [email protected]:1234/dbname

现在我们要添加对 sqlite URL 的解析，如下所示：

sqlite:///dbname_which_is_a_file

但它不适用于绝对路径，例如： sqlite:// /tmp/dbname_which_is_a_file

执行此操作的正确方法是什么？

原文

We've got this little regexp in a module to parse URLs like the following:

if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
    $url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {

mysql://[email protected]:1234/dbname

and now we want to add parsing of sqlite URLs which can be like this:

sqlite:///dbname_which_is_a_file

But it won't work with absolute paths like: sqlite:///tmp/dbname_which_is_a_file

What is the proper way of doing this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北斗星光 2024-11-15 19:49:17

CPAN 模块 URI::Split 从长远来看会效果更好比脆弱的正则表达式运行。以下是其 POD 的概要：

use URI::Split qw(uri_split uri_join);
($scheme, $auth, $path, $query, $frag) = uri_split($uri);
$uri = uri_join($scheme, $auth, $path, $query, $frag);

更通用的模块（更灵活、更复杂）将是 URI，但对于简单的用途来说，可能不需要额外的复杂性。

顺便说一句，URI 是统一资源标识符，它是 URL 的超集或父集。 URL 是 URI 的具体应用。

The CPAN module, URI::Split will work out a lot better in the long run than a fragile regexp. Here's the synopsis from its POD:

use URI::Split qw(uri_split uri_join);
($scheme, $auth, $path, $query, $frag) = uri_split($uri);
$uri = uri_join($scheme, $auth, $path, $query, $frag);

A more general module (more flexible, and more complex) would be URI, but for simple uses its additional complexity may not be necessary.

By the way, a URI is a Uniform Resource Identifier, which is a superset, or parent to a URL. A URL is a specific application of URIs.

回复收藏 0 原文

太阳男子 2024-11-15 19:49:17

正则表达式的问题是它不适用于长度超过两个元素的路径。它将它们分成 db_name 和 table_name（如果有）。此外，此正则表达式不适用于 SQLite 特殊文件名，例如“：内存”（这对于测试非常有用）。

为了拥有可维护的 RE 方法，最好的方法是使用一个调度表，其中包含需要不同解析的主要协议，并为每种不同的方法提供一个子程序。也将有助于拥有 //x 的 RE，因此它可以有注释并有助于其可维护性：

 sub test_re{
     my $url =shift;
     my $x={};
     @$x{qw(conn driver user pass host port dbname table_name tparam_name tparam_value conn_param_string)} =
         $url =~ m{
                ^(
                  (\w*)
                  ://
                  (?:
                    (\w+) # user
                    (?:
                      \:
                      ([^/\@]*) # password 
                    )?
                    \@
                  )? # could not have user,pass
                  (?:
                    ([\w\-\.]+) #host
                    (?:
                      \:
                      (\d+) # port
                    )? # port optional
                  )? # host and port optional
                  / # become in a third '/' if no user pass host and port
                  (\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
                )
                (?:
                  /  # if tables
                  (\w+) # get table
                  (?:
                    \? # parameters
                    (\w+)
                    =
                   (\w+)
                  )? # parameter is conditional but would have always a tablename
                )? # conditinal table and parameter
                (
                  (?:
                    ;
                    (\w+)
                    =
                    (\w+)
                  )* # rest of parameters if any
                )
                $
             }x;
     return $x;
 }

但我建议使用 URI::Split （比 URI），以及然后根据需要分割路径。

您可以在此处看到使用 RE 与 URI::Split 的区别：

#!/usr/bin/env perl

use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use Data::Dumper;

my $urls = [qw(
             mysql://[email protected]:1234/dbname
             mysql://[email protected]:1234/dbname/tablename
             mysql://[email protected]:1234/dbname/pathextra/tablename
             sqlite:///dbname_which_is_a_file
             sqlite:///tmp/dbname_which_is_a_file
             sqlite:///tmp/db/dbname_which_is_a_file
             sqlite:///:dbname_which_is_a_file
             sqlite:///:memory
             )];



foreach my $url (@$urls) {
    print Dumper(test_re($url));
    print Dumper(uri_split($url));
}

结果：

 [...]
 == testing sqlite:///dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'dbname_which_is_a_file',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///dbname_which_is_a_file',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => undef,
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

 == testing sqlite:///tmp/dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'tmp',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///tmp',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => 'dbname_which_is_a_file',
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/tmp/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

== testing sqlite:///tmp/db/dbname_which_is_a_file ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/tmp/db/dbname_which_is_a_file';
$VAR4 = undef;
$VAR5 = undef;

== testing sqlite:///:memory ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/:memory';
$VAR4 = undef;
$VAR5 = undef;

The problem with the regular expression is that does not work with paths longer than two elements. It splits them into db_name and table_name (if any). Also this regular expression does not work with SQLite special filenames like ':memory' (that are very useful for tests).

In order to have a maintainable RE approach, the best way to work with this is to have a dispatch table with the main protocols that need different parsing and have a subrutine for each different approach. Also will help have a RE with //x, so it can have comments and help its maintainability:

 sub test_re{
     my $url =shift;
     my $x={};
     @$x{qw(conn driver user pass host port dbname table_name tparam_name tparam_value conn_param_string)} =
         $url =~ m{
                ^(
                  (\w*)
                  ://
                  (?:
                    (\w+) # user
                    (?:
                      \:
                      ([^/\@]*) # password 
                    )?
                    \@
                  )? # could not have user,pass
                  (?:
                    ([\w\-\.]+) #host
                    (?:
                      \:
                      (\d+) # port
                    )? # port optional
                  )? # host and port optional
                  / # become in a third '/' if no user pass host and port
                  (\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
                )
                (?:
                  /  # if tables
                  (\w+) # get table
                  (?:
                    \? # parameters
                    (\w+)
                    =
                   (\w+)
                  )? # parameter is conditional but would have always a tablename
                )? # conditinal table and parameter
                (
                  (?:
                    ;
                    (\w+)
                    =
                    (\w+)
                  )* # rest of parameters if any
                )
                $
             }x;
     return $x;
 }

But I will recommend to use URI::Split (less code verbosity than URI), and then split the path as needed.

You can see the difference of using the RE vs URI::Split here:

#!/usr/bin/env perl

use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use Data::Dumper;

my $urls = [qw(
             mysql://[email protected]:1234/dbname
             mysql://[email protected]:1234/dbname/tablename
             mysql://[email protected]:1234/dbname/pathextra/tablename
             sqlite:///dbname_which_is_a_file
             sqlite:///tmp/dbname_which_is_a_file
             sqlite:///tmp/db/dbname_which_is_a_file
             sqlite:///:dbname_which_is_a_file
             sqlite:///:memory
             )];



foreach my $url (@$urls) {
    print Dumper(test_re($url));
    print Dumper(uri_split($url));
}

Results:

 [...]
 == testing sqlite:///dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'dbname_which_is_a_file',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///dbname_which_is_a_file',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => undef,
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

 == testing sqlite:///tmp/dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'tmp',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///tmp',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => 'dbname_which_is_a_file',
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/tmp/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

== testing sqlite:///tmp/db/dbname_which_is_a_file ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/tmp/db/dbname_which_is_a_file';
$VAR4 = undef;
$VAR5 = undef;

== testing sqlite:///:memory ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/:memory';
$VAR4 = undef;
$VAR5 = undef;

回复收藏 0 原文

~没有更多了~