python中相邻句子的二元组

发布于 2024-12-20 01:14:47 字数 706 浏览 0 评论 0原文

假设我有三个句子:

  1. hello world

  2. hello python

  3. 今天是星期二

如果我生成每个字符串的二元组,它会生成如下内容

[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]

:一个句子的二元组和两个连续的二元组之间的区别句子?例如,你好世界。 hello python 是两个连续的句子。这两个连续句子的二元组看起来像我的输出吗?

生成它的代码:

from itertools import tee, izip

def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

with open("hello.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        bi = bigrams(words)
        print list(bi)

Let's say I have three sentences:

  1. hello world

  2. hello python

  3. today is tuesday

If I generate bigrams of each string it would generate something like this:

[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]

Is there a difference between bigrams for a sentence and bigrams for two consecutive sentences? For example, hello world. hello python is two consecutive sentences. Will bigrams for these two consecutive sentences look like my output?

The code that produced it:

from itertools import tee, izip

def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

with open("hello.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        bi = bigrams(words)
        print list(bi)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拥醉 2024-12-27 01:14:47

但是如果我想为相邻句子生成二元组,它会给出与上述输出相同的结果。如果不是,输出会是什么样子?

这取决于你想要什么。如果您将二元组的项目定义为一个完整的句子,则它将如下所示:

[('hello world', 'this is python'),('this is python', 'today is tuesday')]

如果您想要项目类型为单词的二元组,则对于所有句子,这将如下所示:

[('hello', 'world'), ('world', 'this'), ('this', 'is'),...]

but if i want to generate bigrams for the adjacent sentences will it give the same result as the above output. if not what would the output look like?

It depends what you want. If you define the items of your bigrams to be a whole sentence, it would look like this:

[('hello world', 'this is python'),('this is python', 'today is tuesday')]

If you want the bigrams where the type of an item is a word, for all sentences this would look like this:

[('hello', 'world'), ('world', 'this'), ('this', 'is'),...]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文