python中相邻句子的二元组
假设我有三个句子:
hello world
hello python
今天是星期二
如果我生成每个字符串的二元组,它会生成如下内容
[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]
:一个句子的二元组和两个连续的二元组之间的区别句子?例如,你好世界。 hello python
是两个连续的句子。这两个连续句子的二元组看起来像我的输出吗?
生成它的代码:
from itertools import tee, izip
def bigrams(iterable):
a, b = tee(iterable)
next(b, None)
return izip(a, b)
with open("hello.txt", 'r') as f:
for line in f:
words = line.strip().split()
bi = bigrams(words)
print list(bi)
Let's say I have three sentences:
hello world
hello python
today is tuesday
If I generate bigrams of each string it would generate something like this:
[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]
Is there a difference between bigrams for a sentence and bigrams for two consecutive sentences? For example, hello world. hello python
is two consecutive sentences. Will bigrams for these two consecutive sentences look like my output?
The code that produced it:
from itertools import tee, izip
def bigrams(iterable):
a, b = tee(iterable)
next(b, None)
return izip(a, b)
with open("hello.txt", 'r') as f:
for line in f:
words = line.strip().split()
bi = bigrams(words)
print list(bi)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这取决于你想要什么。如果您将二元组的项目定义为一个完整的句子,则它将如下所示:
如果您想要项目类型为单词的二元组,则对于所有句子,这将如下所示:
It depends what you want. If you define the items of your bigrams to be a whole sentence, it would look like this:
If you want the bigrams where the type of an item is a word, for all sentences this would look like this: