BASH/CSH/ZSH 风格大括号扩展的算法

发布于 2024-11-13 10:21:19 字数 339 浏览 1 评论 0原文

如果我有一个像这样的字符串

a/{b,c,d}/e

,我希望能够产生这样的输出:

a/b/e
a/c/e
a/d/e

你明白了。我需要在 C 中实现这一点。我编写了一种强力代码,我能够解析一对大括号(例如: /a/{b,c,d}/e/ 但如果有多对大括号,例如 /a/{b,c}/{d,e}/f 在这种情况下,我的方法将会中断,我想采取更好的方法。 我不是直接要求代码,

只是一个有效算法的提示就足够了,我认为解析大括号的任务是重复的,我们可以遵循递归算法吗?

If I have a string like

a/{b,c,d}/e

then I want to be able to produce this output:

a/b/e
a/c/e
a/d/e

You get the idea. I need to implement this in C. I have written a brute force kind of code which i capable of parsing a single pair of braces (for example: /a/{b,c,d}/e/ but if there are multiple pair of braces, like /a/{b,c}/{d,e}/f in that case my method will break. I would like to take a better approach.

I am not asking directly for the code, just a hint towards an efficient algorithm would be sufficient. I think the task of parsing the braces is repetitive and we could follow a recursive algorithm?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不美如何 2024-11-20 10:21:19

如果您使用任何类型的 Unix、Linux 或 OS X 系统,都有一个内置的库函数可以执行此操作。 man 3 glob 将告诉您如何从 C 调用它。或者您可以访问 http://linux.die.net/man/3/glob 查找在线文档。

如果您想自己动手,一个简单的方法是首先扫描字符串并构建中间数据结构,然后递归地遍历该数据结构,打印字符串。该数据结构可以由具有以下字段的结构构建:

  • text:指向一段字符串的指针
  • next_node:指向打印时此文本后面的内容的指针
  • sibling_node:指向可以代替此选择的下一个选择的指针

If you're on any kind of Unix, Linux or OS X system, there is a built in library function to do this. man 3 glob will tell you about how to call it from C. Or you can visit http://linux.die.net/man/3/glob to find online documentation.

If you want to roll your own, a simple way to go is to first scan the string and build an intermediate data structure, and then recursively walk that data structure, printing strings. That data structure could be built out of structs with the following fields:

  • text: pointer to a piece of string
  • next_node: pointer to what comes after this text when printed
  • sibling_node: pointer to the next choice that could be made instead of this one
梅窗月明清似水 2024-11-20 10:21:19

你在这里展示的并不是真正的递归。如果你可以嵌套括号,那么这就是递归的。

基本上你所拥有的是一个简单的语法:

thing ::= element { "/" element }*
element ::= symbol || list
list ::= "{" symbol { "," symbol }* "}"
symbol ::= [a-z]+

这是一种即兴的语法语言。 * 表示“零个或多个”,+ 表示“1 个或多个”。相当常见。

因此,您需要一个简单的分词器,它可以对符号进行分组并主要分离出标点符号。

然后是一个简单的解析器。

parseThing() {
    Element e = parseElement();
    while (nextToken != null) {
        Slash s = parseSlash();
        e = parseElement():
    }
}

Slash parseSlash() {
    Token t = peekNextToken();
    if (t.getText().equals("/")) {
        return new Slash();
    }
    throw "expected a '/' but got a " + t;
}

Element parseElement() {
    Token t = peekNextToken();
    if (t.isSymbol()) {
        return parseSymbol();
    }
    if (t.isOpenCurly()) {
        return parseList());
    }
    thrown "Syntax error, wanted a symbol or { and got " + t;
}

List parseList() {
    List l = new List();
    Token t = peekNextToken();
    if (t.isOpenCurly()) {
        consumeNextToken();
        Symbol s = parseSymbol();
        l.add(s);
        t = peekNextToken();
        while (t.isComma()) {
            consumeNextToken();
            s = parseSymbol();
            l.add(s);
            t = peekNextToken();
        }
        if (!t.closeCurly()) {
            throw "expected close of list, but got " + t;
        }
        consumeNextToken();
     } else {
         throw "expected start of list but got " + t;
     }
     return l;
}

Symbol parseSymbol() {
    Token t = peekNextToken();

    if(!t.isSymbol()) {
        throw "expected symbol, got " + t;
    }
    consumeNextToken();
    return new Symbol(t);
}

这是不完整的,而且是高级的,但可以让您了解如何处理它。

What you're showing here isn't really recursive. If you could nest the brackets, then that would be recursive.

basically what you have is a simple grammar:

thing ::= element { "/" element }*
element ::= symbol || list
list ::= "{" symbol { "," symbol }* "}"
symbol ::= [a-z]+

That's a off the cuff grammar language. * means "zero or more", + means "1 or more". Fairly common.

So, you need a simple tokenizer, something that groups up your symbols and separates out the punctuation mostly.

Then a simple parser

parseThing() {
    Element e = parseElement();
    while (nextToken != null) {
        Slash s = parseSlash();
        e = parseElement():
    }
}

Slash parseSlash() {
    Token t = peekNextToken();
    if (t.getText().equals("/")) {
        return new Slash();
    }
    throw "expected a '/' but got a " + t;
}

Element parseElement() {
    Token t = peekNextToken();
    if (t.isSymbol()) {
        return parseSymbol();
    }
    if (t.isOpenCurly()) {
        return parseList());
    }
    thrown "Syntax error, wanted a symbol or { and got " + t;
}

List parseList() {
    List l = new List();
    Token t = peekNextToken();
    if (t.isOpenCurly()) {
        consumeNextToken();
        Symbol s = parseSymbol();
        l.add(s);
        t = peekNextToken();
        while (t.isComma()) {
            consumeNextToken();
            s = parseSymbol();
            l.add(s);
            t = peekNextToken();
        }
        if (!t.closeCurly()) {
            throw "expected close of list, but got " + t;
        }
        consumeNextToken();
     } else {
         throw "expected start of list but got " + t;
     }
     return l;
}

Symbol parseSymbol() {
    Token t = peekNextToken();

    if(!t.isSymbol()) {
        throw "expected symbol, got " + t;
    }
    consumeNextToken();
    return new Symbol(t);
}

This is incomplete, and high level, but gives you an idea of how you could go about it.

少女情怀诗 2024-11-20 10:21:19

我最近一直在做这样的事情,花了我很多时间来解决这个问题,所以我是这样做的。不过,可能有一个更简单的算法。

您可以编写一个递归下降解析器将文本转换为树。使保存该字符串和匹配的大括号对的字符串叶节点成为内部节点。每个叶节点可以包含多个字符串。

例如,这样:

/a/{b,c}/{d,e{f,g,h}}/i

可以变成:

(
   ["/a/"]
   {
      ( ["b"] )
      ( ["c"] )
   }
   ["/"]
   {
      ( ["d"] )
      (
         ["e"]
         {
            ( ["f"] )
            ( ["g"] )
            ( ["h"] )
         }
      )
   }
   ["i"]
)

尝试将其视为一棵树,其中 ["stringA", "stringB"] 表示叶节点,匹配的一对大括号表示内部节点。有 2 种类型的内部节点,一种可以在其中一种替代方案中进行选择(我在本例中使用 {}),另一种则结合了所有组合(我使用 ()此处)。

所以,上面的树会像这样:

(
   ["/a/"]
   {
      ["b"]
      ["c"]
   }
   ["/"]
   {
      ["d"]
      (
         ["e"]
         {
            ["f"]
            ["g"]
            ["h"]
         }
      )
   }
   ["i"]
)

然后

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      (
         ["e"]
         ["f", "g", "h"]
      )
   }
   ["i"]
)

然后

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      ["ef", "eg", "eh"]
   }
   ["i"]
)

然后

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   ["d", "ef", "eg", "eh"]
   ["i"]
)

最后,你最终得到一个叶节点,这是所有的组合:

["/a/b/di", "/a/b/efi", "/a/b/egi", "/a/b/ehi",
 "/a/c/di", "/a/c/efi", "/a/c/egi", "/a/c/ehi"]

然后你可以漂亮地打印它。

I have been doing something like this recently, and it took me a lot of time to solve this, so here's how I do it. There may be a simpler algorithm for this though.

You can write a recursive descent parser to transform the text into the tree. Make the strings leaf nodes that holds that string and the matched pair of braces an internal node. Each leaf node can contain more than one string.

For example, this:

/a/{b,c}/{d,e{f,g,h}}/i

can become:

(
   ["/a/"]
   {
      ( ["b"] )
      ( ["c"] )
   }
   ["/"]
   {
      ( ["d"] )
      (
         ["e"]
         {
            ( ["f"] )
            ( ["g"] )
            ( ["h"] )
         }
      )
   }
   ["i"]
)

Try to look at it as a tree, where ["stringA", "stringB"] denotes a leaf node, and matched pair of braces represents an internal node. There are 2 types of internal node, one that can choose between one of the alternatives (I use {} in this example) and one that combines all the combination (I use () here).

So, the above tree would go like this:

(
   ["/a/"]
   {
      ["b"]
      ["c"]
   }
   ["/"]
   {
      ["d"]
      (
         ["e"]
         {
            ["f"]
            ["g"]
            ["h"]
         }
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      (
         ["e"]
         ["f", "g", "h"]
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      ["ef", "eg", "eh"]
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   ["d", "ef", "eg", "eh"]
   ["i"]
)

and finally, you end up with a single leaf node, which are all the combinations:

["/a/b/di", "/a/b/efi", "/a/b/egi", "/a/b/ehi",
 "/a/c/di", "/a/c/efi", "/a/c/egi", "/a/c/ehi"]

Then you can pretty print it.

梦里°也失望 2024-11-20 10:21:19

不知道高效,但直观的方法是使用某种形式的递归。该函数应该能够找到第一个大括号。假设第一个大括号包含 N 个替代项。因此该函数产生 N 次扩展,并在每次扩展时递归地调用自身。每个“分叉”都会继续分叉,直到耗尽所有支撑。

这有帮助吗?

Dunno about efficient, but an intuitive way would be to use some form of recursion. The function should be able to find the first brace. Say the first brace contains N alternatives. So the function produces N expansions, and recursively calls itself upon each expansion. Each "fork" keeps on forking till it exhausts every brace.

Does that help?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文