如何将字符串拆分为单词列表?
如何拆分句子并将每个单词存储在列表中? 例如
"these are words" ⟶ ["these", "are", "words"]
要按其他分隔符拆分,请参阅在 python 中按分隔符拆分字符串。
To拆分为单个字符,请参阅如何将字符串拆分为字符列表?。
How do I split a sentence and store each word in a list? e.g.
"these are words" ⟶ ["these", "are", "words"]
To split on other delimiters, see Split a string by a delimiter in python.
To split into individual characters, see How do I split a string into a list of characters?.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
给定一个字符串
sentence
,它将每个单词存储在名为words
的列表中:Given a string
sentence
, this stores each word in a list calledwords
:要在任何连续的空格上分割字符串
text
:要在自定义分隔符(例如
","
)上分割字符串text
:< code>words 变量将是一个
list
并包含在分隔符上拆分的text
中的单词。To split the string
text
on any consecutive runs of whitespace:To split the string
text
on a custom delimiter such as","
:The
words
variable will be alist
and contain the words fromtext
split on the delimiter.使用
str.split()
:Use
str.split()
:根据您计划对句子列表执行的操作,您可能需要查看自然语言工具包。 它主要涉及文本处理和评估。 您还可以使用它来解决您的问题:
这具有拆分标点符号的额外好处。
示例:
这允许您过滤掉任何不需要的标点符号并仅使用单词。
请注意,如果您不打算对句子进行任何复杂的操作,那么使用
string.split()
的其他解决方案会更好。[编辑]
Depending on what you plan to do with your sentence-as-a-list, you may want to look at the Natural Language Took Kit. It deals heavily with text processing and evaluation. You can also use it to solve your problem:
This has the added benefit of splitting out punctuation.
Example:
This allows you to filter out any punctuation you don't want and use only words.
Please note that the other solutions using
string.split()
are better if you don't plan on doing any complex manipulation of the sentence.[Edited]
这个算法怎么样? 在空白处拆分文本,然后修剪标点符号。 这会小心地删除单词边缘的标点符号,而不会损坏单词内的撇号,例如
were
。How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as
we're
.str().split()
方法执行此操作,它需要一个字符串,将其拆分为一个列表:The
str().split()
method does this, it takes a string, splits it into a list:如果您想要列表中单词/句子的所有字符,请执行以下操作:
If you want all the chars of a word/sentence in a list, do this:
shlex 有一个
.split()
函数。 它与 str.split() 的不同之处在于它不保留引号并将带引号的短语视为单个单词:注意:它适用于类 Unix 命令行字符串。 它不适用于自然语言处理。
shlex has a
.split()
function. It differs fromstr.split()
in that it does not preserve quotes and treats a quoted phrase as a single word:NB: it works well for Unix-like command line strings. It doesn't work for natural-language processing.
如果您想将字符串拆分为单词列表,并且字符串包含标点符号,则建议删除它们。 例如,
str.split()
以下字符串为其中
Hi,
,words;
,also,
等. 附有标点符号。 Python 有一个内置的string
模块,该模块将标点符号字符串作为属性 (string.punctuation
)。 摆脱标点符号的一种方法是简单地从每个单词中删除它们:另一种方法是制作要删除的字符串的综合字典
它不处理像
these're
这样的单词,因此它会处理casenltk.word_tokenize
可以用作tgray 建议。 只是,过滤掉完全由标点符号组成的单词。If you want to split a string into a list of words and if the string has punctuations, it's probably advisable to remove them. For example,
str.split()
the following string aswhere
Hi,
,words;
,also,
etc. have punctuation attached to them. Python has a built-instring
module that has a string of punctuations as an attribute (string.punctuation
). One way to get rid of the punctuations is to simply strip them from each word:another is make a comprehensive dictionary of the strings to remove
It doesn't handle words like
these're
, so it handle that casenltk.word_tokenize
could be used as tgray suggested. Only, filter out the words that consist entirely of punctuation.拆分单词而不损害单词内的撇号
请求出input_1和input_2摩尔定律
Split the words without without harming apostrophes inside words
Please find the input_1 and input_2 Moore's law