将行/行号添加到文本文件的快速方法

发布于 2024-08-02 00:47:05 字数 715 浏览 5 评论 0原文

我有一个大约有 1200 万行的文件，每行如下所示：

0701648016480002020000002030300000200907242058CRLF

我想要完成的是在数据之前添加行号，这些数字应该有固定的长度。

其背后的想法是能够将此文件批量插入到 SQLServer 表中，然后用它执行某些要求每行都有唯一标识符的操作。我已经尝试在数据库端执行此操作，但无法实现良好的性能（至少在 4' 以下，而在 1' 以下是理想的）。

现在我正在尝试使用 python 的解决方案，看起来像这样。

file=open('file.cas', 'r')
lines=file.readlines()
file.close()
text = ['%d %s' % (i, line) for i, line in enumerate(lines)]
output = open("output.cas","w")
output.writelines(str("".join(text)))
output.close()

我不知道这是否会起作用，但它会帮助我在继续尝试新事物之前了解它的性能和副作用，我也认为用 C 语言来做，这样我就有更好的内存控制。

用低级语言来做这件事会有帮助吗？有谁知道更好的方法来做到这一点，我很确定它已经完成，但我无法找到任何东西。

谢谢

原文

I have a file wich has about 12 millon lines, each line looks like this:

0701648016480002020000002030300000200907242058CRLF

What I'm trying to accomplish is adding a row numbers before the data, the numbers should have a fixed length.

The idea behind this is able to do a bulk insert of this file into a SQLServer table, and then perform certain operations with it that require each line to have a unique identifier. I've tried doing this in the database side but I haven´t been able to accomplish a good performance (under 4' at least, and under 1' would be ideal).

Right now I'm trying a solution in python that looks something like this.

file=open('file.cas', 'r')
lines=file.readlines()
file.close()
text = ['%d %s' % (i, line) for i, line in enumerate(lines)]
output = open("output.cas","w")
output.writelines(str("".join(text)))
output.close()

I don't know if this will work, but it'll help me having an idea of how will it perform and side effects before I keep on trying new things, I also thought doing it in C so I have a better memory control.

Will it help doing it in a low level language? Does anyone know a better way to do this, I'm pretty sure it has being done but I haven't being able to find anything.

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花落人断肠 2024-08-09 00:47:05

天哪，不，不要一次读完所有 1200 万行！如果您要使用 Python，至少要这样做：

file = open('file.cas', 'r')
try:
    output = open('output.cas', 'w')
    try:
        output.writelines('%d %s' % tpl for tpl in enumerate(file))
    finally:
        output.close()
finally:
    file.close()

使用生成器表达式，该表达式一次运行一次处理一行的文件。

oh god no, don't read all 12 million lines in at once! If you're going to use Python, at least do it this way:

file = open('file.cas', 'r')
try:
    output = open('output.cas', 'w')
    try:
        output.writelines('%d %s' % tpl for tpl in enumerate(file))
    finally:
        output.close()
finally:
    file.close()

That uses a generator expression which runs through the file processing one line at a time.

回复收藏 0 原文