将非CSV文件的数据解析为Python中的多个列表

发布于 2025-01-26 19:57:17 字数 307 浏览 2 评论 0原文

我有一个文本文件,其中包含一些非常巨大的数据,该数据代表用户给特定电影的评分。我的文件(.txt)的结构就是这样:

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

1和2表示电影ID的跟随半列,然后用户ID,然后是他给电影的评分,然后是一年。

由于这显然不是CSV文件,因此有人可以指导我如何编写解析器以读取此文件并创建2个列表。一个用于电影ID,另一个是一个包含评分的列表?

I have a text file which contains some really huge data which represents ratings given by users to specific movies. the structure of my file (.txt) is as such:

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

1 and 2 represent the movie ID's follow by a semi column then the user ID followed by the rating he gave to the movie and then the year.

Since this is clearly not a csv file, can someone please guide me on how to write a parser to read this file and create 2 lists. one for the movie ID's and the other, a list containing the ratings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

镜花水月 2025-02-02 19:57:17

因为这显然不是CSV文件

,但是可以将其转换为CSV文件,让file.txt内容将

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

创建

with open("file.txt","r",encoding="utf-8") as infile, open("file.csv","w",encoding="utf-8") as outfile:
    for line in infile:
        if "," not in line:
            movieid = line.split(":")[0]
        else:
            print(movieid,line,sep=",",end="",file=outfile)

file.csv

1,1711859 ,4 ,2005 −05 −08
1,1245640 ,3 ,2005 −12 −19
2,808731,4,2005−10−31
2,337541,5,2005−03−23

然后可以喂食。进入CSV解析器。说明:如果当前行没有,请在之前获取作为Movieid之前的内容,否则打印Movieid,然后是由剪切的行。请注意,我将end设置为line已经拥有自己的newline。 免责声明:我认为您的文件是UTF-8编码。

Since this is clearly not a csv file

Right, but it could be converted into csv file, let file.txt content be

1:
1711859 ,4 ,2005 −05 −08
1245640 ,3 ,2005 −12 −19
2:
808731,4,2005−10−31
337541,5,2005−03−23

then

with open("file.txt","r",encoding="utf-8") as infile, open("file.csv","w",encoding="utf-8") as outfile:
    for line in infile:
        if "," not in line:
            movieid = line.split(":")[0]
        else:
            print(movieid,line,sep=",",end="",file=outfile)

will create file.csv

1,1711859 ,4 ,2005 −05 −08
1,1245640 ,3 ,2005 −12 −19
2,808731,4,2005−10−31
2,337541,5,2005−03−23

Which then could be feed into CSV parser. Explanation: If current line does not have , then get what is before : as movieid, otherwise print movieid followed by line sheared by ,. Note that I set end to empty string as line already has it own newline. Disclaimer: I assume your file is UTF-8 encoded.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文