解析 mysql:/// sqlite:/// URL
我们在模块中有这个小正则表达式来解析如下 URL:
if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
$url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {
mysql:// [email protected]:1234/dbname
现在我们要添加对 sqlite URL 的解析,如下所示:
sqlite:///dbname_which_is_a_file
但它不适用于绝对路径,例如: sqlite:// /tmp/dbname_which_is_a_file
执行此操作的正确方法是什么?
We've got this little regexp in a module to parse URLs like the following:
if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
$url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {
mysql://[email protected]:1234/dbname
and now we want to add parsing of sqlite URLs which can be like this:
sqlite:///dbname_which_is_a_file
But it won't work with absolute paths like: sqlite:///tmp/dbname_which_is_a_file
What is the proper way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
CPAN 模块 URI::Split 从长远来看会效果更好比脆弱的正则表达式运行。以下是其 POD 的概要:
更通用的模块(更灵活、更复杂)将是 URI,但对于简单的用途来说,可能不需要额外的复杂性。
顺便说一句,URI 是统一资源标识符,它是 URL 的超集或父集。 URL 是 URI 的具体应用。
The CPAN module, URI::Split will work out a lot better in the long run than a fragile regexp. Here's the synopsis from its POD:
A more general module (more flexible, and more complex) would be URI, but for simple uses its additional complexity may not be necessary.
By the way, a URI is a Uniform Resource Identifier, which is a superset, or parent to a URL. A URL is a specific application of URIs.
正则表达式的问题是它不适用于长度超过两个元素的路径。它将它们分成 db_name 和 table_name(如果有)。此外,此正则表达式不适用于 SQLite 特殊文件名,例如“:内存”(这对于测试非常有用)。
为了拥有可维护的 RE 方法,最好的方法是使用一个调度表,其中包含需要不同解析的主要协议,并为每种不同的方法提供一个子程序。也将有助于拥有 //x 的 RE,因此它可以有注释并有助于其可维护性:
但我建议使用 URI::Split (比 URI),以及然后根据需要分割路径。
您可以在此处看到使用 RE 与 URI::Split 的区别:
结果:
The problem with the regular expression is that does not work with paths longer than two elements. It splits them into db_name and table_name (if any). Also this regular expression does not work with SQLite special filenames like ':memory' (that are very useful for tests).
In order to have a maintainable RE approach, the best way to work with this is to have a dispatch table with the main protocols that need different parsing and have a subrutine for each different approach. Also will help have a RE with //x, so it can have comments and help its maintainability:
But I will recommend to use URI::Split (less code verbosity than URI), and then split the path as needed.
You can see the difference of using the RE vs URI::Split here:
Results: