在Python中分割字符串

发布于 2024-07-07 11:22:22 字数 173 浏览 4 评论 0原文

我有一个像这样的字符串:

this is [bracket test] "andquotes test "

我正在尝试用 Python 编写一些内容,以按空格分隔它,同时忽略方括号和引号内的空格。 我正在寻找的结果是:

['this','is','括号测试','和引号测试']

I have a string which is like this:

this is [bracket test] "and quotes test "

I'm trying to write something in Python to split it up by space while ignoring spaces within square braces and quotes. The result I'm looking for is:

['this','is','bracket test','and quotes test ']

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夏日落 2024-07-14 11:22:22

这是一个适用于您的测试输入的简单解决方案:

import re
re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+',s)

匹配的任何代码。

  • 这将返回与左括号后跟零个或多个非右括号字符后跟右括号、
  • 双引号后跟零个或多个非右括号 引号字符后跟引号、
  • 任何非空白字符组

这适用于您的示例,但对于您可能遇到的许多实际字符串可能会失败。 例如,您没有说明您对不平衡的括号或引号的期望,或者您希望单引号或转义字符如何工作。 不过,对于简单的情况,上面的内容可能就足够了。

Here's a simplistic solution that works with your test input:

import re
re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+',s)

This will return any code that matches either

  • a open bracket followed by zero or more non-close-bracket characters followed by a close bracket,
  • a double-quote followed by zero or more non-quote characters followed by a quote,
  • any group of non-whitespace characters

This works with your example, but might fail for many real-world strings you may encounter. For example, you didn't say what you expect with unbalanced brackets or quotes,or how you want single quotes or escape characters to work. For simple cases, though, the above might be good enough.

仙女山的月亮 2024-07-14 11:22:22

要完成 Bryan 帖子并完全匹配答案:

>>> import re
>>> txt = 'this is [bracket test] "and quotes test "'
>>> [x[1:-1] if x[0] in '["' else x for x in re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+', txt)]
['this', 'is', 'bracket test', 'and quotes test ']

不要误解所使用的整个语法:这不是单行上的多个语句,而是单个功能语句(更防错误)。

To complete Bryan post and match exactly the answer :

>>> import re
>>> txt = 'this is [bracket test] "and quotes test "'
>>> [x[1:-1] if x[0] in '["' else x for x in re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+', txt)]
['this', 'is', 'bracket test', 'and quotes test ']

Don't misunderstand the whole syntax used : This is not several statments on a single line but a single functional statment (more bugproof).

乞讨 2024-07-14 11:22:22

这是一个简单的解析器(根据示例输入进行测试),它引入了状态设计模式。

在现实世界中,您可能希望使用 PLY 之类的东西构建一个真正的解析器。

class SimpleParser(object):

    def __init__(self):
        self.mode = None
        self.result = None

    def parse(self, text):
        self.initial_mode()
        self.result = []
        for word in text.split(' '):
            self.mode.handle_word(word)
        return self.result

    def initial_mode(self):
        self.mode = InitialMode(self)

    def bracket_mode(self):
        self.mode = BracketMode(self)

    def quote_mode(self):
        self.mode = QuoteMode(self)


class InitialMode(object):

    def __init__(self, parser):
        self.parser = parser

    def handle_word(self, word):
        if word.startswith('['):
            self.parser.bracket_mode()
            self.parser.mode.handle_word(word[1:])
        elif word.startswith('"'):
            self.parser.quote_mode()
            self.parser.mode.handle_word(word[1:])
        else:
            self.parser.result.append(word)


class BlockMode(object):

    end_marker = None

    def __init__(self, parser):
        self.parser = parser
        self.result = []

    def handle_word(self, word):
        if word.endswith(self.end_marker):
            self.result.append(word[:-1])
            self.parser.result.append(' '.join(self.result))
            self.parser.initial_mode()
        else:
            self.result.append(word)

class BracketMode(BlockMode):
    end_marker = ']'

class QuoteMode(BlockMode):
    end_marker = '"'

Here's a simplistic parser (tested against your example input) that introduces the State design pattern.

In real world, you probably want to build a real parser using something like PLY.

class SimpleParser(object):

    def __init__(self):
        self.mode = None
        self.result = None

    def parse(self, text):
        self.initial_mode()
        self.result = []
        for word in text.split(' '):
            self.mode.handle_word(word)
        return self.result

    def initial_mode(self):
        self.mode = InitialMode(self)

    def bracket_mode(self):
        self.mode = BracketMode(self)

    def quote_mode(self):
        self.mode = QuoteMode(self)


class InitialMode(object):

    def __init__(self, parser):
        self.parser = parser

    def handle_word(self, word):
        if word.startswith('['):
            self.parser.bracket_mode()
            self.parser.mode.handle_word(word[1:])
        elif word.startswith('"'):
            self.parser.quote_mode()
            self.parser.mode.handle_word(word[1:])
        else:
            self.parser.result.append(word)


class BlockMode(object):

    end_marker = None

    def __init__(self, parser):
        self.parser = parser
        self.result = []

    def handle_word(self, word):
        if word.endswith(self.end_marker):
            self.result.append(word[:-1])
            self.parser.result.append(' '.join(self.result))
            self.parser.initial_mode()
        else:
            self.result.append(word)

class BracketMode(BlockMode):
    end_marker = ']'

class QuoteMode(BlockMode):
    end_marker = '"'
花辞树 2024-07-14 11:22:22

这是一种更程序化的方法:

#!/usr/bin/env python

a = 'this is [bracket test] "and quotes test "'

words = a.split()
wordlist = []

while True:
    try:
        word = words.pop(0)
    except IndexError:
        break
    if word[0] in '"[':
        buildlist = [word[1:]]
        while True:
            try:
                word = words.pop(0)
            except IndexError:
                break
            if word[-1] in '"]':
                buildlist.append(word[:-1])
                break
            buildlist.append(word)
        wordlist.append(' '.join(buildlist))
    else:
        wordlist.append(word)

print wordlist

Here's a more procedural approach:

#!/usr/bin/env python

a = 'this is [bracket test] "and quotes test "'

words = a.split()
wordlist = []

while True:
    try:
        word = words.pop(0)
    except IndexError:
        break
    if word[0] in '"[':
        buildlist = [word[1:]]
        while True:
            try:
                word = words.pop(0)
            except IndexError:
                break
            if word[-1] in '"]':
                buildlist.append(word[:-1])
                break
            buildlist.append(word)
        wordlist.append(' '.join(buildlist))
    else:
        wordlist.append(word)

print wordlist
笑咖 2024-07-14 11:22:22

好吧,我已经多次遇到这个问题,这促使我编写自己的系统来解析任何类型的语法。

结果可以在此处找到; 请注意,这可能有点过头了,它会为您提供一些东西,让您可以解析带有方括号和圆括号、单引号和双引号的语句,如您所愿嵌套。 例如,您可以解析这样的内容(用 Common Lisp 编写的示例):

(defun hello_world (&optional (text "Hello, World!"))
    (format t text))

您可以使用嵌套、方括号(方)和圆括号(圆)、单引号和双引号字符串,并且它的可扩展性非常好。

这个想法基本上是有限状态机的可配置实现,它逐个字符地构建抽象语法树。 我建议您查看源代码(请参阅上面的链接),以便您了解如何执行此操作。 它可以通过正则表达式来实现,但是尝试使用 RE 编写系统,然后尝试扩展它(甚至理解它)。

Well, I've encountered this problem quite a few times, which led me to write my own system for parsing any kind of syntax.

The result of this can be found here; note that this may be overkill, and it will provide you with something that lets you parse statements with both brackets and parentheses, single and double quotes, as nested as you want. For example, you could parse something like this (example written in Common Lisp):

(defun hello_world (&optional (text "Hello, World!"))
    (format t text))

You can use nesting, brackets (square) and parentheses (round), single- and double-quoted strings, and it's very extensible.

The idea is basically a configurable implementation of a Finite State Machine which builds up an abstract syntax tree character-by-character. I recommend you look at the source code (see link above), so that you can get an idea of how to do it. It's capable via regular expressions, but try writing a system using REs and then trying to extend it (or even understand it) later.

好久不见√ 2024-07-14 11:22:22

仅适用于报价。

rrr = []
qqq = s.split('\"')
[ rrr.extend( qqq[x].split(), [ qqq[x] ] )[ x%2]) for x in range( len( qqq ) )]
print rrr

Works for quotes only.

rrr = []
qqq = s.split('\"')
[ rrr.extend( qqq[x].split(), [ qqq[x] ] )[ x%2]) for x in range( len( qqq ) )]
print rrr
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文