用于匹配 MediaWiki 模板及其参数的正则表达式

发布于 2024-11-18 02:56:05 字数 885 浏览 2 评论 0 原文

我正在编写一个简单的 Javascript,将特定参数添加到当前正在编辑的文章中的特定模板。

维基百科模板的结构如下:

 {{Template name|unnamed parameter|named parameter=some value|another parameter=[[target article|article name]]|parameter={{another template|another tamplate's parameter}}}}

一个模板也可以包含多行,例如:

{{Template 
|name=John
|surname=Smith
|pob=[[London|London, UK]]
}}

如需进一步参考,请查看 http://en.wikipedia.org/wiki/Help:Template

所以首先我想匹配整个模板。我找到了部分解决方案,即:

document.editform.wpTextbox1.value.match(/\{\{template name((.|\n)*?)\}\}$/gmis)

但是问题是它只匹配从初始括号到第一个嵌套模板(第一个示例)的结束括号的文本。

此外,我想以数组形式获取其参数。因此,对于结果,我想获取一个具有特定顺序参数的数组。 Array( 参数 pob 的值、参数名称的值、参数姓氏的值、参数 pod 的值(在本例中为空,因为它未设置))

我将使用它来清理一些文章中的非标准化格式并添加一些新的参数。

谢谢你!

I'm writing a simple Javascript to add a specific parameter to a specific template in article that is currently being edited.

Wikipedia Templates are structured in the following format:

 {{Template name|unnamed parameter|named parameter=some value|another parameter=[[target article|article name]]|parameter={{another template|another tamplate's parameter}}}}

One template can also be over more lines, for example:

{{Template 
|name=John
|surname=Smith
|pob=[[London|London, UK]]
}}

For further reference, please have a look at http://en.wikipedia.org/wiki/Help:Template

So firstly I'd like to match the entire template. I came over partial solution, that is:

document.editform.wpTextbox1.value.match(/\{\{template name((.|\n)*?)\}\}$/gmis)

However the problem is that it only matches text from the initial brackets till the closing brackets of the first nested template (first example).

In addition I'd like to fetch its parameters in an array form. So for the result, I'd like to get an array with parameters in specific order.
Array( value of paramter pob, value of paramter name, value of parameter surname, value of parameter pod (in this case empty, because it was unset) )

I'd use that to clean the unstandardised formatting in some articles and add some new parameters.

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ヅ她的身影、若隐若现 2024-11-25 02:56:05

编写简单的解析器。

通过正则表达式解决此类问题是不对的。它与匹配括号相同 - 使用正则表达式很难做到。正则表达式一般不适合嵌套表达式。

尝试这样的事情:

var parts = src.split(/(\{\{|\}\})/);
for (var i in parts) {
  if (parts[i] == '{{') // starting new (sub) template
  else if (parts[i] == '}}') // ending (sub) template
  else // content (or outside)
}

这只是伪代码,因为我现在很着急,将更新此代码以使其正常工作...

更新(2011 年 8 月 9 日)

var NO_TPL = 0, // outside any tpl - ignoring...
    IN_TPL = 1, // inside tpl
    IN_LIST = 3; // inside list of arguments

function parseWiki(src) {
  var tokens = src.split(/(\{\{|\}\}|\||=|\[\[|\]\])/),
      i = -1, end = tokens.length - 1,
      token, next, state = NO_TPL,
      work = [], workChain = [], stateChain = [];

  function trim(value) {
    return value.replace(/^\s*/, '').replace(/\s*$/, '');
  }

  // get next non empty token
  function getNext(next) {
    while (!next && i < end) next = trim(tokens[++i]);
    return next;
  }

  // go into tpl / list of arguments
  function goDown(newState, newWork, newWorkKey) {
    stateChain.push(state);
    workChain.push(work);

    if (newWorkKey) {
      work[newWorkKey] = newWork;
    } else {
      work.push(newWork);
    }

    work = newWork;
    state = newState;
  }

  // jump up from tpl / list of arguments
  function goUp() {
    work = workChain.pop();
    state = stateChain.pop();
  }

  // state machine
  while ((token = getNext())) {
    switch(state) {

      case IN_TPL:
        switch(token) {
          case '}}': goUp(); break;
          case '|': break;
          default:
            next = getNext();
            if (next != '=') throw "invalid";
            next = getNext();
            if (next == '[[') {
              goDown(IN_LIST, [], token);
            } else if (next == '{{') {
              goDown(IN_TPL, {id: getNext()}, token);
            } else {
              work[token] = next;
            }
        }
        break;

      case IN_LIST:
        switch(token) {
          case ']]': goUp(); break;
          case '|': break;
          default: work.push(token);
        }
        break;

      case NO_TPL:
        if (token == '{{') {
          next = getNext();
          goDown(IN_TPL, {id: next});
        }
        break;
    }
  }

  return work;
}

单元测试

describe('wikiTpl', function() {
  it('should do empty tpl', function() {
    expect(parseWiki('{{name}}'))
      .toEqual([{id: 'name'}]);
  });

  it('should ignore text outside from tpl', function() {
    expect(parseWiki(' abc {{name}} x y'))
    .toEqual([{id: 'name'}]);
  });

  it('should do simple param', function() {
    expect(parseWiki('{{tpl | p1= 2}}'))
      .toEqual([{id: 'tpl', p1: '2'}]);
  });

  it('should do list of arguments', function() {
    expect(parseWiki('{{name | a= [[1|two]]}}'))
      .toEqual([{id: 'name', a: ['1', 'two']}]);
  });

  it('should do param after list', function() {
    expect(parseWiki('{{name | a= [[1|two|3]] | p2= true}}'))
      .toEqual([{id: 'name', a: ['1', 'two', '3'], p2: 'true'}]);
  });

  it('should do more tpls', function() {
    expect(parseWiki('{{first | a= [[1|two|3]] }} odd test {{second | b= 2}}'))
      .toEqual([{id: 'first', a: ['1', 'two', '3']}, {id: 'second', b: '2'}]);
  });

  it('should allow nested tpl', function() {
    expect(parseWiki('{{name | a= {{nested | p1= 1}} }}'))
      .toEqual([{id: 'name', a: {id: 'nested', p1: '1'}}]);
  });
});

注意:我使用 Jasmine 的语法进行这些单元测试。您可以使用包含整个测试环境的 AngularJS 轻松运行它 - 请访问 http://angularjs.org 查看它

Write simple parser.

Solving this kind of problem by regexp is not right. It's the same as matching brackets - difficult to do with regexp. Regexps are not suitable for nested expressions in general.

Try something like that:

var parts = src.split(/(\{\{|\}\})/);
for (var i in parts) {
  if (parts[i] == '{{') // starting new (sub) template
  else if (parts[i] == '}}') // ending (sub) template
  else // content (or outside)
}

This is just pseudo code, as I'm in rush now, will update this code to be working...

UPDATE (9th August 2011)

var NO_TPL = 0, // outside any tpl - ignoring...
    IN_TPL = 1, // inside tpl
    IN_LIST = 3; // inside list of arguments

function parseWiki(src) {
  var tokens = src.split(/(\{\{|\}\}|\||=|\[\[|\]\])/),
      i = -1, end = tokens.length - 1,
      token, next, state = NO_TPL,
      work = [], workChain = [], stateChain = [];

  function trim(value) {
    return value.replace(/^\s*/, '').replace(/\s*$/, '');
  }

  // get next non empty token
  function getNext(next) {
    while (!next && i < end) next = trim(tokens[++i]);
    return next;
  }

  // go into tpl / list of arguments
  function goDown(newState, newWork, newWorkKey) {
    stateChain.push(state);
    workChain.push(work);

    if (newWorkKey) {
      work[newWorkKey] = newWork;
    } else {
      work.push(newWork);
    }

    work = newWork;
    state = newState;
  }

  // jump up from tpl / list of arguments
  function goUp() {
    work = workChain.pop();
    state = stateChain.pop();
  }

  // state machine
  while ((token = getNext())) {
    switch(state) {

      case IN_TPL:
        switch(token) {
          case '}}': goUp(); break;
          case '|': break;
          default:
            next = getNext();
            if (next != '=') throw "invalid";
            next = getNext();
            if (next == '[[') {
              goDown(IN_LIST, [], token);
            } else if (next == '{{') {
              goDown(IN_TPL, {id: getNext()}, token);
            } else {
              work[token] = next;
            }
        }
        break;

      case IN_LIST:
        switch(token) {
          case ']]': goUp(); break;
          case '|': break;
          default: work.push(token);
        }
        break;

      case NO_TPL:
        if (token == '{{') {
          next = getNext();
          goDown(IN_TPL, {id: next});
        }
        break;
    }
  }

  return work;
}

UNIT TESTS

describe('wikiTpl', function() {
  it('should do empty tpl', function() {
    expect(parseWiki('{{name}}'))
      .toEqual([{id: 'name'}]);
  });

  it('should ignore text outside from tpl', function() {
    expect(parseWiki(' abc {{name}} x y'))
    .toEqual([{id: 'name'}]);
  });

  it('should do simple param', function() {
    expect(parseWiki('{{tpl | p1= 2}}'))
      .toEqual([{id: 'tpl', p1: '2'}]);
  });

  it('should do list of arguments', function() {
    expect(parseWiki('{{name | a= [[1|two]]}}'))
      .toEqual([{id: 'name', a: ['1', 'two']}]);
  });

  it('should do param after list', function() {
    expect(parseWiki('{{name | a= [[1|two|3]] | p2= true}}'))
      .toEqual([{id: 'name', a: ['1', 'two', '3'], p2: 'true'}]);
  });

  it('should do more tpls', function() {
    expect(parseWiki('{{first | a= [[1|two|3]] }} odd test {{second | b= 2}}'))
      .toEqual([{id: 'first', a: ['1', 'two', '3']}, {id: 'second', b: '2'}]);
  });

  it('should allow nested tpl', function() {
    expect(parseWiki('{{name | a= {{nested | p1= 1}} }}'))
      .toEqual([{id: 'name', a: {id: 'nested', p1: '1'}}]);
  });
});

Note: I'm using Jasmine's syntax for these unit tests. You can easily run it using AngularJS which contains whole testing environment - check it out at http://angularjs.org

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文