类似 wiki 标记的正则表达式转换

发布于 2024-08-03 11:07:26 字数 1520 浏览 11 评论 0原文

考虑以下标记输入：

* Line 1
* Line 2
:* Line 2.1
:* Line 2.2
* Line 3

这通常编码为：

  <ul>
    <li>Line 1</li>
    <li>Line 2</li>
    <ul>
      <li>Line 2.1</li>
      <li>Line 2.2</li>
    </ul>
    <li>Line 3</li>
  </ul>

我的问题：

使用单行来表示相同输入的最佳表示方式是什么？
生成相应XHTML的正则表达式是什么？

例如，单行输入格式可以是：

> Line 1 > Line 2 >> Line 2.1 >> Line 2.2 > Line 3

> 是无序列表项分隔符。我选择 > 因为文本可能包含典型的标点符号。使用 »（或其他此类非 104 键键）会很有趣，但打字不太容易。

行输入格式也可以是：

[Line 1][Line 2 [Line 2.1][Line 2.2]][Line 3]

更新#1 - 问题稍微简单一些。巢的数量可以限制为三个。 n 层深度的通用解决方案仍然很酷。

更新 #2 - XHTML，而不是 HTML。

更新 #3 - 另一种可能的输入格式。

更新 #4 - Java 解决方案（或纯正则表达式）最受欢迎。

更新 #5

修改后的代码：

String in = " * Line 1 * Line 2 > * Line 2.1 * Line 2.2 < * Line 3";

String sub = "<ul>" + in.replace( " > ", "<ul>" ) + "</ul>";

sub = sub.replace( " < ", "</ul>" );

sub = sub.replaceAll( "( | >)\\* ([^*<>]*)", "<li>$2</li>" );

System.out.println( "Result: " + sub );

打印以下内容：

Result: <ul><li>Line 1 </li>* Line 2<ul>* Line 2.1<li>Line 2.2</li></ul>* Line 3

原文

Consider the following mark-up input:

* Line 1
* Line 2
:* Line 2.1
:* Line 2.2
* Line 3

This is typically coded as:

  <ul>
    <li>Line 1</li>
    <li>Line 2</li>
    <ul>
      <li>Line 2.1</li>
      <li>Line 2.2</li>
    </ul>
    <li>Line 3</li>
  </ul>

My questions:

What would be a good representation for the same input using a single line?
What is the regular expression to generate the corresponding XHTML?

For example, the single line input format could be:

> Line 1 > Line 2 >> Line 2.1 >> Line 2.2 > Line 3

With > being unordered list item delimiter. I chose > because the text might include typical punctuation marks. Using » (or other such non-104-key keys) would be fun, but not as easy to type.

The line input format could also be:

[Line 1][Line 2 [Line 2.1][Line 2.2]][Line 3]

Update #1 - The problem is a little simpler. The number of nests can be limited to three. A general solution for n-levels deep would still be cool.

Update #2 - XHTML, not HTML.

Update #3 - Another possible input format.

Update #4 - Java solutions (or pure regex) are most welcome.

Update #5

Revised code:

String in = " * Line 1 * Line 2 > * Line 2.1 * Line 2.2 < * Line 3";

String sub = "<ul>" + in.replace( " > ", "<ul>" ) + "</ul>";

sub = sub.replace( " < ", "</ul>" );

sub = sub.replaceAll( "( | >)\\* ([^*<>]*)", "<li>$2</li>" );

System.out.println( "Result: " + sub );

Prints the following:

Result: <ul><li>Line 1 </li>* Line 2<ul>* Line 2.1<li>Line 2.2</li></ul>* Line 3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

老街孤人 2024-08-10 11:07:26

你的例子对我来说似乎很好。

 > Line 1 > Line 2 >> Line 2.1 >> Line 2.2 > Line 3

不幸的是，纯 RegEx 无法跟踪您所处的嵌套级别，因此它不知道将 /UL 关闭标记放在哪里。

像这样的东西可能会起作用：

 * Line 1 * Line 2 > * Line 2.1 * Line 2.2 < * Line 3

在这里，大于和小于在层次结构中上下移动，星号是项目符号的分隔符。每个之前和之后的空格用作一种转义序列，因此当这些字符没有被空格包围时，您仍然可以按字面使用这些字符或用于其他目的，例如斜体和粗体。

对正则表达式的尝试：

 string ol = "<ul>" & RegEx.Replace(t, " > ", "<ul>") & "</ul>";
 ol = RegEx.Replace(ol, " < ", "</ul>");
 ol = RegEx.Replace(ol, "( |>)\\* ([^*<>]*)", "<li>\\2</li>");

编辑：根据下面的评论进行调整以生成 XHTML，关闭 LI 标记。还修复了我的 C# 语法。

最终编辑：我认为最后一个 Replace 中的 \ * 和 \ 2 需要对 C# 进行转义，修复。另请注意，前两个 Replace() 调用可以使用 String.Replace() 而不是 RegEx，这可能会更快。

Your example seems fine to me.

 > Line 1 > Line 2 >> Line 2.1 >> Line 2.2 > Line 3

Unfortunately, pure RegEx can't keep track of which nesting level you are on, so it won't know where to put the /UL close tags.

Something like this might work:

 * Line 1 * Line 2 > * Line 2.1 * Line 2.2 < * Line 3

Here, the greater-than and less-than move up and down the hierarchy, and the asterisks are the delimiters for the bullets. The spaces before and after each are used as a sort of escape sequence, so you can still use those characters literally or for other purposes like italics and bold when they aren't surrounded by spaces.

A stab at the RegEx:

 string ol = "<ul>" & RegEx.Replace(t, " > ", "<ul>") & "</ul>";
 ol = RegEx.Replace(ol, " < ", "</ul>");
 ol = RegEx.Replace(ol, "( |>)\\* ([^*<>]*)", "<li>\\2</li>");

Edit: Adjusted to produce XHTML, closing the LI tags, based on comment below. Also fixed my C# syntax.

Final edit: I think the \ * and \ 2 in the last Replace need to be escaped for C#, fixing. Also, note that the first two Replace() calls can use String.Replace() rather than RegEx, which will likely be faster.

回复收藏 0 原文

℡Ms空城旧梦 2024-08-10 11:07:26

我不建议使用正则表达式作为解析和转换工具。正则表达式往往具有很高的开销，并且不是解析语言的最有效方法......这才是您真正要求它做的事情。你已经创建了一种语言，尽管它很简单，但你应该这样对待它。我建议为 WIKI 风格的格式化代码编写一个实际的、专用的解析器。由于您可以将解析器专门针对您的语言，因此它应该更高效。此外，您不必创建一些可怕的正则表达式来解析您的语言并处理其所有细微差别。从长远来看，您将获得更清晰的代码、更好的可维护性等好处。

我建议使用以下资源：

回复收藏 0 原文

忆梦 2024-08-10 11:07:26

解决方案

一个可行的解决方案如下：

public class Test {
  public Test() {
  }

  public static void main( String[] args ) {
    String in = "= Line 1 = Line 2 > = Line 2.1 = Line 2.2 < = Line 3";

    in = in.replaceAll( "= ([^=<>]*)", "<li>$1</li>" );
    in = in.replace( ">> ", "><ul>" );
    in = in.replace( ">< ", "></ul>" );
    in = "<ul>" + in + "</ul>";
    System.out.println( in );
  }
}

这将创建所需的 XHTML 片段：

<ul><li>Line 1 </li><li>Line 2 </li><ul><li>Line 2.1 </li><li>Line 2.2 </li></ul><li>Line 3</li></ul>

Solution

A working solution follows:

public class Test {
  public Test() {
  }

  public static void main( String[] args ) {
    String in = "= Line 1 = Line 2 > = Line 2.1 = Line 2.2 < = Line 3";

    in = in.replaceAll( "= ([^=<>]*)", "<li>$1</li>" );
    in = in.replace( ">> ", "><ul>" );
    in = in.replace( ">< ", "></ul>" );
    in = "<ul>" + in + "</ul>";
    System.out.println( in );
  }
}

This creates the desired XHTML fragment:

<ul><li>Line 1 </li><li>Line 2 </li><ul><li>Line 2.1 </li><li>Line 2.2 </li></ul><li>Line 3</li></ul>

回复收藏 0 原文

~没有更多了~

关于作者

在梵高的星空下

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

类似 wiki 标记的正则表达式转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

成熟稳重的好男人

笑脸一如从前

爱你是孤单的心事

mnbvcxz

真是无聊啊

旧城空念

友情链接

类似 wiki 标记的正则表达式转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

成熟稳重的好男人

笑脸一如从前

爱你是孤单的心事

mnbvcxz

真是无聊啊

旧城空念

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。