向数千个文件添加 trec 格式标签

发布于 2024-10-22 04:28:14 字数 515 浏览 3 评论 0原文

我需要在目录中的数千个文件中添加诸如文件文本之类的标签,我尝试使用 cat 并将其输出到文件流,

for file in *
do
cat ../gau > temp;  //gau contain format i need to append in each file
echo $file >>temp;
cat ../gau_ >>temp ;//contains </DOCID>
cat $file >>temp;  
cat ../gau1  >> temp;  //this contain last sentence </DOC>
cat temp > $file
done

但这样做非常慢,请告诉我一种更好、更有效的方法来做到这一点。 os 不可能使用 c 来完成。我们如何批量打开文件,然后处理它们并放回,因为它可以加快这个过程,因为我认为打开和写入文件是瓶颈。

由于我们时间紧迫,是否有预制程序(高效且快速)来完成这项工作。

i need to add tags like text of file in thousand of files in a directory and i tried it using cat and outputing it to a stream of file using

for file in *
do
cat ../gau > temp;  //gau contain format i need to append in each file
echo $file >>temp;
cat ../gau_ >>temp ;//contains </DOCID>
cat $file >>temp;  
cat ../gau1  >> temp;  //this contain last sentence </DOC>
cat temp > $file
done

but doing this is very slow can please tell me a better and efficient way to do this .os ot possible to do using c .how can we open files in batches and then process them and put back as it can fasten this process since opening and writing file is bottle neck i suppose.

Is there and premade program(which is efficient and fast) to do this job as we are scarcity in time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

逆夏时光 2024-10-29 04:28:14

这是一个快速的Python代码,试试吧,它会比你的批处理脚本执行得更快:

import os

for dirname, dirnames, filenames in os.walk('/MY_DIRECTORY/'):
    for filename in filenames:
        with open(os.path.join(dirname, filename), "r+") as f:
             str = f.read() # read everything in the file
             f.seek(0) # rewind
             f.write("Prepended text tags" + str) # write the new line before
             f.close()

不过我还没有尝试过。

THis is a quick python code, try it, it would execute faster than your batch script:

import os

for dirname, dirnames, filenames in os.walk('/MY_DIRECTORY/'):
    for filename in filenames:
        with open(os.path.join(dirname, filename), "r+") as f:
             str = f.read() # read everything in the file
             f.seek(0) # rewind
             f.write("Prepended text tags" + str) # write the new line before
             f.close()

I haven't tried it though.

白首有我共你 2024-10-29 04:28:14

不要 cat temp > $file,只需mv temp $file——您不需要重写文件,只需重命名即可。这肯定是性能不佳的原因之一

for file in *; do
  { cat ../gau; echo $file; cat ../gau_ $file ../gau1; } > temp
  mv temp $file
done

。您可能需要选择比“gau”、“gau_”和“gau1”更具描述性的文件名。

Don't cat temp > $file, just mv temp $file -- you don't need to rewrite the file, just rename it. That's certainly one of the causes of bad performance

for file in *; do
  { cat ../gau; echo $file; cat ../gau_ $file ../gau1; } > temp
  mv temp $file
done

You might want to choose more desctiptive filenames than "gau", "gau_" and "gau1".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文