多个正则表达式子模式的可选匹配

发布于 2024-12-27 17:51:55 字数 1190 浏览 2 评论 0原文

我有一个正则表达式问题困扰着我,并且不知道如何解决它。

我有一个带有文本的输入字段,我喜欢从中提取某些值。 我想提取标题、描述、价格和特价。

输入示例:

  • 所有纯文本内容都被视为标题。
  • 散列内的所有内容(#description comes here#)都被视为描述。
  • 23.49 美元被视为价格,%$19.99 将符合特价。

我正在使用的 CoffeeScript 模式:

 pattern = ///
  ([^$]+)
  (#(.+?)#+)
  ([\$]\d+\. \d+)
  ([\%\$]\d+\. \d+)
  ///
  params = [title,description,oldPrice,newPrice]=input_txt.match(pattern)[1..4]

它不起作用。如果我输入给定序列中的所有值,并且还必须提供所要求的子字符串,它应该可以工作。

我想要的是如果提供了序列(所以可选)并且无论序列如何,都能够提取序列...... 如何提取字符串的可选序列... 编辑/// 我提供了一些示例

exmp1:

Kindle #Amazon's ebook reader# $79.00

这应该被提取为

title:Kindle 
description: Amazon's ebook reader 
oldPrice:$79.00

exmp2:

Nike Sneaker's $109.00 %$89.00

这应该被提取为

title:Nikes Sneaker's 
oldPrice:$109.00 
newPrice:$89.00

exmp3:

$100.00 Just dance 3 #for XBox# 

这应该被提取到

title: Just dance 3 
description: for XBox 
oldPrice:$100.00

任何帮助都会是伟大的 ...

I have a regex problem which bugs me and have no clue how to solve it.

I have an input field with a text and I like to extract certain values out of it.
I would like to extract a title, description, a price and a special price.

Examples for the input:

  • everything what is plain text is concerned as title.
  • everything within within hashes (#description goes here#) is considered the description.
  • $23.49 is considered as price and %$19.99 would match the special price.

The CoffeeScript pattern I'm using:

 pattern = ///
  ([^$]+)
  (#(.+?)#+)
  ([\$]\d+\. \d+)
  ([\%\$]\d+\. \d+)
  ///
  params = [title,description,oldPrice,newPrice]=input_txt.match(pattern)[1..4]

It does not work. It should work if I enter all values in the given sequence and I also have to provide a the asked substring.

What I would like to have is the ability to extract the sequments if the are provided (so optional) and no matter of the sequence...
How can I extract optional sequences of an string...
EDIT///
I provide some examples

exmp1:

Kindle #Amazon's ebook reader# $79.00

this should be extracted as

title:Kindle 
description: Amazon's ebook reader 
oldPrice:$79.00

exmp2:

Nike Sneaker's $109.00 %$89.00

this should be extracted as

title:Nikes Sneaker's 
oldPrice:$109.00 
newPrice:$89.00

exmp3:

$100.00 Just dance 3 #for XBox# 

this should be extracted to

title: Just dance 3 
description: for XBox 
oldPrice:$100.00

Any help would be great ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一腔孤↑勇 2025-01-03 17:51:55

常规语法的本质使得它很难解决你的问题。作为解决方案,最简单的解决方案是只执行正则表达式 4 次:

  1. 中删除结果字符串(字符串替换)
  2. Match /#(.+?)#+/ 并从原始Match /[\%\$]\d+ 。 \d+/ 并从原始 Match /[\$]\d+ 中删除结果字符串
  3. 。 \d+/ 和...你得到了模式
  4. 现在原件中剩下的是标题。

The nature of regular grammars makes it hard to solve your problem. As a work around the simplest solution would be to just execute your regex 4 times:

  1. Match /#(.+?)#+/ and remove the result string (string replace) from the original
  2. Match /[\%\$]\d+. \d+/ and remove the result string from the original
  3. Match /[\$]\d+. \d+/ and... you get the pattern
  4. Now what remains in the original is the the title.
oО清风挽发oО 2025-01-03 17:51:55

您可以使用此代码来查找删除每个单独的匹配项:

function extractParts(str) {
    var parts = {};

    function removePiece(re) {
        var result;
        var matches = str.match(re);
        if (matches) {
            result = matches[1];
            str = str.replace(re, "");
        }
        return(result);
    }

    // find and remove each piece we're looking for
    parts.description = removePiece(/#([^#]+)#/);        // #text#
    parts.oldPrice = removePiece(/[^%](\$\d+\.\d+)/);    // $4.56
    parts.newPrice = removePiece(/%(\$\d+\.\d+)/);       // %$3.78
    // fix up whitespace
    parts.title = str.replace(/\s+/g, " ").replace(/^\s+/, "").replace(/\s+$/, "");
    return(parts);
}

var pieces = extractParts("Kindle #Amazon's ebook reader# $79.00");

并且,您可以在此处查看正在运行的演示: http://jsfiddle.net/jfriend00/d8NNr/

You can use this code that looks for a removes each separate piece of the matches:

function extractParts(str) {
    var parts = {};

    function removePiece(re) {
        var result;
        var matches = str.match(re);
        if (matches) {
            result = matches[1];
            str = str.replace(re, "");
        }
        return(result);
    }

    // find and remove each piece we're looking for
    parts.description = removePiece(/#([^#]+)#/);        // #text#
    parts.oldPrice = removePiece(/[^%](\$\d+\.\d+)/);    // $4.56
    parts.newPrice = removePiece(/%(\$\d+\.\d+)/);       // %$3.78
    // fix up whitespace
    parts.title = str.replace(/\s+/g, " ").replace(/^\s+/, "").replace(/\s+$/, "");
    return(parts);
}

var pieces = extractParts("Kindle #Amazon's ebook reader# $79.00");

And, you can see a demo in action here: http://jsfiddle.net/jfriend00/d8NNr/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文