通过阅读txt文件创建列表元组的列表

发布于 2025-02-06 01:33:20 字数 1710 浏览 0 评论 0原文

我有一个看起来喜欢的txt文件

   EU NNP B-NP B-ORG
    rejects VBZ B-VP O
    German JJ B-NP B-MISC
    call NN I-NP O
    to TO B-VP O
    boycott VB I-VP O
    British JJ B-NP B-MISC
    lamb NN I-NP O
    . . O O
    
    Peter NNP B-NP B-PER
    Blackburn NNP I-NP I-PER

    BRUSSELS NNP B-NP B-LOC
    1996-08-22 CD I-NP O

,并且我试图从该txt中制作一个元组,这将在以后将其侧面评估为everally。我想有一个列表的列表,如下所示:

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
 (Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
 (BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

所有的空间都表明句子已添加到列表中,以在给定索引中,稍后我们应该在列表的下一个索引上移动以添加所有句子。

# function to read data, return list of tuples each tuple represents a token contains word, pos tag, chunk tag, and ner tag
import csv
def read_data(filename) -> list:
  data = []
  sentences = []
  with open(filename) as load_file:
    reader = csv.reader(load_file, delimiter=" ")   # read
   
    for row in reader:
      if(len(tuple(row)) != 0):
        data.append(tuple(row))
     
  sentences.append(data)
  return sentences

我有这样的函数,但是它返回了这一点:

('EU', 'NNP', 'B-NP', 'B-ORG'),
  ('rejects', 'VBZ', 'B-VP', 'O'),
  ('German', 'JJ', 'B-NP', 'B-MISC'),
  ('call', 'NN', 'I-NP', 'O'),
  ('to', 'TO', 'B-VP', 'O'),
  ('boycott', 'VB', 'I-VP', 'O'),
  ('British', 'JJ', 'B-NP', 'B-MISC'),
  ('lamb', 'NN', 'I-NP', 'O'),
  ('.', '.', 'O', 'O'),
  ('Peter', 'NNP', 'B-NP', 'B-PER'),
  ('Blackburn', 'NNP', 'I-NP', 'I-PER'),
  ('BRUSSELS', 'NNP', 'B-NP', 'B-LOC'),
  ('1996-08-22', 'CD', 'I-NP', 'O'),

如何解决此问题,ı使用2个不同的列表将它们添加在一起,但找不到方法。

I have a txt file that look likes

   EU NNP B-NP B-ORG
    rejects VBZ B-VP O
    German JJ B-NP B-MISC
    call NN I-NP O
    to TO B-VP O
    boycott VB I-VP O
    British JJ B-NP B-MISC
    lamb NN I-NP O
    . . O O
    
    Peter NNP B-NP B-PER
    Blackburn NNP I-NP I-PER

    BRUSSELS NNP B-NP B-LOC
    1996-08-22 CD I-NP O

And Im trying to make a tuples from this txt which ı will evalute them laterly word to features later on. I want to have a list of list look like this :

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
 (Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
 (BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

All of the whitespaces indicates that the sentences over and should add to list to given index, laterly after whitespace we should move on the next index of the list to add all sentences.

# function to read data, return list of tuples each tuple represents a token contains word, pos tag, chunk tag, and ner tag
import csv
def read_data(filename) -> list:
  data = []
  sentences = []
  with open(filename) as load_file:
    reader = csv.reader(load_file, delimiter=" ")   # read
   
    for row in reader:
      if(len(tuple(row)) != 0):
        data.append(tuple(row))
     
  sentences.append(data)
  return sentences

I have a function like this however it return this:

('EU', 'NNP', 'B-NP', 'B-ORG'),
  ('rejects', 'VBZ', 'B-VP', 'O'),
  ('German', 'JJ', 'B-NP', 'B-MISC'),
  ('call', 'NN', 'I-NP', 'O'),
  ('to', 'TO', 'B-VP', 'O'),
  ('boycott', 'VB', 'I-VP', 'O'),
  ('British', 'JJ', 'B-NP', 'B-MISC'),
  ('lamb', 'NN', 'I-NP', 'O'),
  ('.', '.', 'O', 'O'),
  ('Peter', 'NNP', 'B-NP', 'B-PER'),
  ('Blackburn', 'NNP', 'I-NP', 'I-PER'),
  ('BRUSSELS', 'NNP', 'B-NP', 'B-LOC'),
  ('1996-08-22', 'CD', 'I-NP', 'O'),

How can ı solve this problem, ı use 2 different list to add them together but ı could not find a way.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷…雨湿花 2025-02-13 01:33:20

我认为所有问题都是因为您显示了预期的结果

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
 (Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
 (BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

,但我认为您期望的

[
 [(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....], 
 [(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER)],
 [(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)],
]

需求

    for row in reader:
        if row:
           data.append(tuple(row))
        else:
           sentences.append(data)
           data = []

是,最终的

    if data:
       sentences.append(data)

也可能需要添加最后一个数据,因为这些数据完整工作示例之后没有空线。

我使用io仅在内存中模拟文件,以便每个人都可以复制并运行它。但是您应该使用open(),而没有文本

text = '''EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O

Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER

BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O'''

import csv
import io

data = []
sentences = []

#with open(filename) as load_file:
with io.StringIO(text) as load_file:    
    reader = csv.reader(load_file, delimiter=" ")   # read
   
    for row in reader:
        if row:
           data.append(tuple(row))
        else:
           sentences.append(data)
           data = []

    # add last data because there is no empty line after these data           
    if data:
       sentences.append(data)

print(sentences)           

I think all problem is because you show expected result

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
 (Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
 (BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

but I think you expect

[
 [(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....], 
 [(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER)],
 [(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)],
]

and this need

    for row in reader:
        if row:
           data.append(tuple(row))
        else:
           sentences.append(data)
           data = []

At the end it may need also to add last data becuase there is no empty line after these data

    if data:
       sentences.append(data)

Full working example.

I use io only to simulate file in memory so everyone can copy and run it. But you should use open() without text.

text = '''EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O

Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER

BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O'''

import csv
import io

data = []
sentences = []

#with open(filename) as load_file:
with io.StringIO(text) as load_file:    
    reader = csv.reader(load_file, delimiter=" ")   # read
   
    for row in reader:
        if row:
           data.append(tuple(row))
        else:
           sentences.append(data)
           data = []

    # add last data because there is no empty line after these data           
    if data:
       sentences.append(data)

print(sentences)           
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文