对嵌入式元组/字符串进行操作，python

发布于 2024-10-04 13:25:15 字数 1145 浏览 7 评论 0原文

假设我有一个元组格式的标记文本（单词，标签）。我想将其转换为字符串以便对标签进行一些更改。我下面的函数只看到文本中的最后一句话，我想有一些我无法意识到的明显且愚蠢的错误，所以请帮助使它在整个文本上工作。

>>> import nltk
>>> tpl = [[('This', 'V'), ('is', 'V'), ('one', 'NUM'), ('sentence', 'NN'), ('.', '.')], [('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'NUM')]]

def translate(tuple2string):
    for sent in tpl:
        t = ' '.join([nltk.tag.tuple2str(item) for item in sent])

>>> print t
    'And/CNJ This/V is/V another/DET one/NUM'

PS对于那些感兴趣的人，tuple2str函数在这里描述

< strong>编辑：现在我应该将其转换回元组，具有相同的格式。我该怎么做？

>>> [nltk.tag.str2tuple(item) for item in t.split()]

上面的转换成整个元组，但我需要嵌入一个（与输入（tpl）中的相同）

编辑2：好吧，可能值得发布整个元组代码：

def translate(tpl):
    t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl] 
    for t in t0: 
        t = re.sub(r'/NUM', '/N', t) 
        t = [nltk.tag.str2tuple(item) for item in t.split()] 
    print t

原文

say I have a tagged text (word, tag) in tuple format. i want to convert it to a string in order to make some changes to the tags. my function below only sees the last sentence in the text, i guess there is some obvious and stupid mistake which i cant realize, so plz help to make it work on the entire text.

>>> import nltk
>>> tpl = [[('This', 'V'), ('is', 'V'), ('one', 'NUM'), ('sentence', 'NN'), ('.', '.')], [('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'NUM')]]

def translate(tuple2string):
    for sent in tpl:
        t = ' '.join([nltk.tag.tuple2str(item) for item in sent])

>>> print t
    'And/CNJ This/V is/V another/DET one/NUM'

P.S. for those who are interested, tuple2str function is described here

EDIT: now i should convert it back to a tuple, having the same format. How do i do it?

>>> [nltk.tag.str2tuple(item) for item in t.split()]

the one above converts in into entire tuple, but i need embedded one (the same as in the input (tpl) )

EDIT2: well, probably it's worth to publish the entire code:

def translate(tpl):
    t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl] 
    for t in t0: 
        t = re.sub(r'/NUM', '/N', t) 
        t = [nltk.tag.str2tuple(item) for item in t.split()] 
    print t

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

就是爱搞怪 2024-10-11 13:25:15

>>> ' '.join(' '.join(nltk.tag.tuple2str(item) for item in sent) for sent in tpl)
'This/V is/V one/NUM sentence/NN ./. And/CNJ This/V is/V another/DET one/NUM'

编辑：

如果您希望这是可逆的，那么就不要进行外部联接。

>>> [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
['This/V is/V one/NUM sentence/NN ./.', 'And/CNJ This/V is/V another/DET one/NUM']

编辑2：

我想我们已经讨论过这个了......

>>> [[nltk.tag.str2tuple(re.sub('/NUM', '/N', w)) for w in s.split()] for s in t0]
[[('This', 'V'), ('is', 'V'), ('one', 'N'), ('sentence', 'NN'), ('.', '.')],
  [('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'N')]]

将其分解为非列表理解形式：

def translate(tpl):
    result = []
    t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
    for t in t0:
        t = re.sub(r'/NUM', '/N', t)
        t = [nltk.tag.str2tuple(item) for item in t.split()]
        result.append(t)
    return result

>>> ' '.join(' '.join(nltk.tag.tuple2str(item) for item in sent) for sent in tpl)
'This/V is/V one/NUM sentence/NN ./. And/CNJ This/V is/V another/DET one/NUM'

EDIT:

If you want this to be reversible then just don't do the outer join.

>>> [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
['This/V is/V one/NUM sentence/NN ./.', 'And/CNJ This/V is/V another/DET one/NUM']

EDIT 2:

I thought we went over this already...

>>> [[nltk.tag.str2tuple(re.sub('/NUM', '/N', w)) for w in s.split()] for s in t0]
[[('This', 'V'), ('is', 'V'), ('one', 'N'), ('sentence', 'NN'), ('.', '.')],
  [('And', 'CNJ'), ('This', 'V'), ('is', 'V'), ('another', 'DET'), ('one', 'N')]]

Splitting it out into the non-list-comprehension form:

def translate(tpl):
    result = []
    t0 = [' '.join([nltk.tag.tuple2str(item) for item in sent]) for sent in tpl]
    for t in t0:
        t = re.sub(r'/NUM', '/N', t)
        t = [nltk.tag.str2tuple(item) for item in t.split()]
        result.append(t)
    return result

回复收藏 0 原文

~没有更多了~