将非CSV文件的数据解析为Python中的多个列表

发布于 2025-01-26 19:57:17 字数 307 浏览 2 评论 0原文

我有一个文本文件，其中包含一些非常巨大的数据，该数据代表用户给特定电影的评分。我的文件（.txt）的结构就是这样：

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

1和2表示电影ID的跟随半列，然后用户ID，然后是他给电影的评分，然后是一年。

由于这显然不是CSV文件，因此有人可以指导我如何编写解析器以读取此文件并创建2个列表。一个用于电影ID，另一个是一个包含评分的列表？

原文

I have a text file which contains some really huge data which represents ratings given by users to specific movies. the structure of my file (.txt) is as such:

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

1 and 2 represent the movie ID's follow by a semi column then the user ID followed by the rating he gave to the movie and then the year.

Since this is clearly not a csv file, can someone please guide me on how to write a parser to read this file and create 2 lists. one for the movie ID's and the other, a list containing the ratings?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

镜花水月 2025-02-02 19:57:17

因为这显然不是CSV文件

，但是可以将其转换为CSV文件，让file.txt内容将

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

创建

with open("file.txt","r",encoding="utf-8") as infile, open("file.csv","w",encoding="utf-8") as outfile:
    for line in infile:
        if "," not in line:
            movieid = line.split(":")[0]
        else:
            print(movieid,line,sep=",",end="",file=outfile)

file.csv，

1,1711859 ,4 ,2005 −05 −08
1,1245640 ,3 ,2005 −12 −19
2,808731,4,2005−10−31
2,337541,5,2005−03−23

然后可以喂食。进入CSV解析器。说明：如果当前行没有，，请在之前获取作为Movieid之前的内容，否则打印Movieid，然后是由，剪切的行。请注意，我将end设置为line已经拥有自己的newline。 免责声明：我认为您的文件是UTF-8编码。

Since this is clearly not a csv file

Right, but it could be converted into csv file, let file.txt content be

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

then

with open("file.txt","r",encoding="utf-8") as infile, open("file.csv","w",encoding="utf-8") as outfile:
    for line in infile:
        if "," not in line:
            movieid = line.split(":")[0]
        else:
            print(movieid,line,sep=",",end="",file=outfile)

will create file.csv

1,1711859 ,4 ,2005 −05 −08
1,1245640 ,3 ,2005 −12 −19
2,808731,4,2005−10−31
2,337541,5,2005−03−23

Which then could be feed into CSV parser. Explanation: If current line does not have , then get what is before : as movieid, otherwise print movieid followed by line sheared by ,. Note that I set end to empty string as line already has it own newline. Disclaimer: I assume your file is UTF-8 encoded.

回复收藏 0 原文

~没有更多了~