当前位置：文江博客话题详情

如何用Python编写标签删除器脚本

发布于 2024-09-26 11:53:43 字数 1059 浏览 0 评论 0原文

我想实现一个文件读取器（文件夹和子文件夹）脚本，它检测一些标签并从文件中删除这些标签。

这些文件是.cpp、.h、.txt 和.xml，它们是同一文件夹下的数百个文件。

我对Python一无所知，但人们告诉我，我可以轻松做到。

示例：

我的主文件夹是 A: C:\A

在 A 内部，我有文件夹（B、C、D）和一些文件 A.cpp Ah A.txt 和 A.xml。在 B 中，我有文件夹 B1、B2、B3，其中一些有更多子文件夹，以及文件 .cpp、.xml 和 .h....

xml 文件，包含一些标签，例如
.h 和 .cpp 文件包含另一种标签，如 //$TAG some text$
.txt 具有不同的格式标签：#$这是我的标签$

它总是以 $ 符号开始和结束，但它总是有一个注释字符（//，

这个想法是运行一个脚本并删除所有文件中的所有标签，因此该脚本必须：

读取文件夹和子文件夹
打开文件并查找标签
如果存在，则删除并保存更改后的文件

我有：

import  os

for root, dirs, files in os.walk(os.curdir):

 if files.endswith('.cpp'):
  %Find //$ and delete until next $
 if files.endswith('.h'):
  %Find //$ and delete until next $
 if files.endswith('.txt'):
  %Find #$ and delete until next $
 if files.endswith('.xml'):
  %Find <!-- $ and delete until next $ and -->

原文

I want to implement a file reader (folders and subfolders) script which detects some tags and delete those tags from the files.

The files are .cpp, .h .txt and .xml And they are hundreds of files under same folder.

I have no idea about python, but people told me that I can do it easily.

EXAMPLE:

My main folder is A: C:\A

Inside A, I have folders (B,C,D) and some files A.cpp A.h A.txt and A.xml. In B i have folders B1, B2,B3 and some of them have more subfolders, and files .cpp, .xml and .h....

xml files, contains some tags like 
.h and .cpp files contains another kind of tags like //$TAG some text$
.txt has different format tags: #$This is my tag$

It always starts and ends with $ symbol but it always have a comment character (//,

The idea is to run one script and delete all tags from all files so the script must:

Read folders and subfolders
Open files and find tags
If they are there, delete and save files with changes

WHAT I HAVE:

import  os

for root, dirs, files in os.walk(os.curdir):

 if files.endswith('.cpp'):
  %Find //$ and delete until next $
 if files.endswith('.h'):
  %Find //$ and delete until next $
 if files.endswith('.txt'):
  %Find #$ and delete until next $
 if files.endswith('.xml'):
  %Find <!-- $ and delete until next $ and -->

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桜花祭 2024-10-03 11:53:43

一般的解决方案是：

使用 os.walk() 函数遍历目录树。
迭代文件名并使用 fn_name.endswith('.cpp') 和 if/elseif 来确定您正在使用的文件
使用 re 模块创建常规文件您可以使用表达式来确定一行是否包含您的标签。
打开目标文件和临时文件（使用 tempfile 模块）。逐行迭代源文件并将过滤后的行输出到临时文件中。
如果替换了任何行，请使用 os.unlink() 加上 os.rename() 来替换原始文件。

这对于 Python 专家来说是一个微不足道的练习，但对于新手来说语言，可能需要几个小时才能开始工作。不过，您可能找不到比这更好的任务来介绍该语言了。祝你好运！

----- 更新 -----

os.walk 返回的 files 属性是一个列表，因此您还需要迭代它。此外，files 属性将仅包含文件的基本名称。您需要将 root 值与 os.path.join() 结合使用，将其转换为完整路径名。尝试这样做：

for root, d, files in os.walk('.'): 
    for base_filename in files: 
        full_name = os.path.join(root, base_filename)
        if full_name.endswith('.h'):
            print full_name, 'is a header!'
        elif full_name.endswith('.cpp'):
            print full_name, 'is a C++ source file!'

如果您使用的是 Python 3，打印语句将需要是函数调用，但总体思路保持不变。

The general solution would be to:

use the os.walk() function to traverse the directory tree.
Iterate over the filenames and use fn_name.endswith('.cpp') with if/elseif to determine which file you're working with
Use the re module to create a regular expression you can use to determine if a line contains your tag
Open the target file and a temporary file (use the tempfile module). Iterate over the source file line by line and output the filtered lines to your tempfile.
If any lines were replaced, use os.unlink() plus os.rename() to replace your original file

It's a trivial excercise for a Python adept but for someone new to the language, it'll probably take a few hours to get working. You probably couldn't ask for a better task to get introduced to the language though. Good Luck!

----- Update -----

The files attribute returned by os.walk is a list so you'll need to iterate over it as well. Also, the files attribute will only contain the base name of the file. You'll need to use the root value in conjunction with os.path.join() to convert this to a full path name. Try doing just this:

for root, d, files in os.walk('.'): 
    for base_filename in files: 
        full_name = os.path.join(root, base_filename)
        if full_name.endswith('.h'):
            print full_name, 'is a header!'
        elif full_name.endswith('.cpp'):
            print full_name, 'is a C++ source file!'

If you're using Python 3, the print statements will need to be function calls but the general idea remains the same.

回复收藏 0 原文

给我一枪 2024-10-03 11:53:43

尝试这样的操作：

import os
import re

CPP_TAG_RE = re.compile(r'(?<=// *)\$[^$]+\
 os.splitext 的一个好处是，它对以 . 开头的文件名执行正确的操作。
)

tag_REs = {
    '.h': CPP_TAG_RE,
    '.cpp': CPP_TAG_RE,
    '.xml': re.compile(r'(?<=<!-- *)\$[^$]+\$(?= *-->)'),
    '.txt': re.compile(r'(?<=# *)\$[^$]+\
 os.splitext 的一个好处是，它对以 . 开头的文件名执行正确的操作。
),
}

def process_file(filename, regex):
    # Set up.
    tempfilename = filename + '.tmp'
    infile = open(filename, 'r')
    outfile = open(tempfilename, 'w')

    # Filter the file.
    for line in infile:
        outfile.write(regex.sub("", line))

    # Clean up.
    infile.close()
    outfile.close()

    # Enable only one of the two following lines.
    os.rename(filename, filename + '.orig')
    #os.remove(filename)

    os.rename(tempfilename, filename)

def process_tree(starting_point=os.curdir):
    for root, d, files in os.walk(starting_point): 
        for filename in files:
            # Get rid of `.lower()` in the following if case matters.
            ext = os.path.splitext(filename)[1].lower()
            if ext in tag_REs:
                process_file(os.path.join(root, base_filename), tag_REs[ext])

os.splitext 的一个好处是，它对以 . 开头的文件名执行正确的操作。

Try something like this:

import os
import re

CPP_TAG_RE = re.compile(r'(?<=// *)\$[^$]+\
Nice thing about os.splitext is that it does the right thing for filenames that start with a ..
)

tag_REs = {
    '.h': CPP_TAG_RE,
    '.cpp': CPP_TAG_RE,
    '.xml': re.compile(r'(?<=<!-- *)\$[^$]+\$(?= *-->)'),
    '.txt': re.compile(r'(?<=# *)\$[^$]+\
Nice thing about os.splitext is that it does the right thing for filenames that start with a ..
),
}

def process_file(filename, regex):
    # Set up.
    tempfilename = filename + '.tmp'
    infile = open(filename, 'r')
    outfile = open(tempfilename, 'w')

    # Filter the file.
    for line in infile:
        outfile.write(regex.sub("", line))

    # Clean up.
    infile.close()
    outfile.close()

    # Enable only one of the two following lines.
    os.rename(filename, filename + '.orig')
    #os.remove(filename)

    os.rename(tempfilename, filename)

def process_tree(starting_point=os.curdir):
    for root, d, files in os.walk(starting_point): 
        for filename in files:
            # Get rid of `.lower()` in the following if case matters.
            ext = os.path.splitext(filename)[1].lower()
            if ext in tag_REs:
                process_file(os.path.join(root, base_filename), tag_REs[ext])

Nice thing about os.splitext is that it does the right thing for filenames that start with a ..

回复收藏 0 原文

~没有更多了~