通过阅读txt文件创建列表元组的列表
我有一个看起来喜欢的txt文件
EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O
Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER
BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O
,并且我试图从该txt中制作一个元组,这将在以后将其侧面评估为everally。我想有一个列表的列表,如下所示:
[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)
所有的空间都表明句子已添加到列表中,以在给定索引中,稍后我们应该在列表的下一个索引上移动以添加所有句子。
# function to read data, return list of tuples each tuple represents a token contains word, pos tag, chunk tag, and ner tag
import csv
def read_data(filename) -> list:
data = []
sentences = []
with open(filename) as load_file:
reader = csv.reader(load_file, delimiter=" ") # read
for row in reader:
if(len(tuple(row)) != 0):
data.append(tuple(row))
sentences.append(data)
return sentences
我有这样的函数,但是它返回了这一点:
('EU', 'NNP', 'B-NP', 'B-ORG'),
('rejects', 'VBZ', 'B-VP', 'O'),
('German', 'JJ', 'B-NP', 'B-MISC'),
('call', 'NN', 'I-NP', 'O'),
('to', 'TO', 'B-VP', 'O'),
('boycott', 'VB', 'I-VP', 'O'),
('British', 'JJ', 'B-NP', 'B-MISC'),
('lamb', 'NN', 'I-NP', 'O'),
('.', '.', 'O', 'O'),
('Peter', 'NNP', 'B-NP', 'B-PER'),
('Blackburn', 'NNP', 'I-NP', 'I-PER'),
('BRUSSELS', 'NNP', 'B-NP', 'B-LOC'),
('1996-08-22', 'CD', 'I-NP', 'O'),
如何解决此问题,ı使用2个不同的列表将它们添加在一起,但找不到方法。
I have a txt file that look likes
EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O
Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER
BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O
And Im trying to make a tuples from this txt which ı will evalute them laterly word to features later on. I want to have a list of list look like this :
[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)
All of the whitespaces indicates that the sentences over and should add to list to given index, laterly after whitespace we should move on the next index of the list to add all sentences.
# function to read data, return list of tuples each tuple represents a token contains word, pos tag, chunk tag, and ner tag
import csv
def read_data(filename) -> list:
data = []
sentences = []
with open(filename) as load_file:
reader = csv.reader(load_file, delimiter=" ") # read
for row in reader:
if(len(tuple(row)) != 0):
data.append(tuple(row))
sentences.append(data)
return sentences
I have a function like this however it return this:
('EU', 'NNP', 'B-NP', 'B-ORG'),
('rejects', 'VBZ', 'B-VP', 'O'),
('German', 'JJ', 'B-NP', 'B-MISC'),
('call', 'NN', 'I-NP', 'O'),
('to', 'TO', 'B-VP', 'O'),
('boycott', 'VB', 'I-VP', 'O'),
('British', 'JJ', 'B-NP', 'B-MISC'),
('lamb', 'NN', 'I-NP', 'O'),
('.', '.', 'O', 'O'),
('Peter', 'NNP', 'B-NP', 'B-PER'),
('Blackburn', 'NNP', 'I-NP', 'I-PER'),
('BRUSSELS', 'NNP', 'B-NP', 'B-LOC'),
('1996-08-22', 'CD', 'I-NP', 'O'),
How can ı solve this problem, ı use 2 different list to add them together but ı could not find a way.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为所有问题都是因为您显示了预期的结果
,但我认为您期望的
需求
是,最终的
也可能需要添加最后一个
数据
,因为这些数据完整工作示例之后没有空线。我使用
io
仅在内存中模拟文件,以便每个人都可以复制并运行它。但是您应该使用open()
,而没有文本
。I think all problem is because you show expected result
but I think you expect
and this need
At the end it may need also to add last
data
becuase there is no empty line after these dataFull working example.
I use
io
only to simulate file in memory so everyone can copy and run it. But you should useopen()
withouttext
.