Python 文件操作

发布于 2024-08-09 23:53:28 字数 549 浏览 1 评论 0原文

假设我有这样的文件夹

  rootfolder
      | 
     / \ \
    01 02 03 ....
    |
  13_itemname.xml

，所以在我的根文件夹下，每个目录代表一个月份，如 01 02 03，在这些目录下，我有一些项目及其创建时间和项目名称例如 16_item1.xml、24_item1.xml 等，您可能会猜到有几个项目，每个 xml 每小时创建一次。

现在我想做两件事：

我需要生成一个月的项目名称列表，即 01 里面有 item1、item2 和 item3。
我需要过滤每个项目，例如 item1：我想读取从 01_item1.xml 到 24_item1.xml 的每个项目。

如何在 Python 中以简单的方式实现这些目标？

原文

Assume I have such folders

  rootfolder
      | 
     / \ \
    01 02 03 ....
    |
  13_itemname.xml

So under my rootfolder, each directory represents a month like 01 02 03 and under these directories I have items with their create hour and item name
such as 16_item1.xml, 24_item1.xml etc, as you may guess there are several items and each xml created every hour.

Now I want to do two things:

I need to generate a list of item names for a month, ie for 01 I have item1, item2 and item3 inside.
I need to filter each item, such as for item1: i want to read each from 01_item1.xml to 24_item1.xml.

How can I achieve these in Python in an easy way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无名指的心愿 2024-08-16 23:53:28

这里有两种方法可以满足您的要求（如果我理解正确的话）。一种带正则表达式，一种不带正则表达式。您可以选择您喜欢的那一个；）

“setdefault”行似乎很神奇。有关说明，请参阅文档。我将其作为“留给读者的练习”以了解其工作原理；）

from os import listdir
from os.path import join

DATA_ROOT = "testdata"

def folder_items_no_regex(month_name):

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):
      date, name = file.split( "_", 1 )

      # skip files that were not possible to split on "_"
      if not date or not name:
         continue

      # ignore non-.xml files
      if not name.endswith(".xml"):
         continue

      # cut off the ".xml" extension
      name = name[0:-4]

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items

def folder_items_regex(month_name):

   import re

   # The pattern:
   # 1. match the beginnning of line "^"
   # 2. capture 1 or more digits ( \d+ )
   # 3. match the "_"
   # 4. capture any character (as few as possible ): (.*?)
   # 5. match ".xml"
   # 6. match the end of line "$"
   pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):

      match = pattern.match( file )
      if not match:
         continue

      date, name = match.groups()

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items
if __name__ == "__main__":
   from pprint import pprint

   data = folder_items_no_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )


   data = folder_items_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )

Here are two methods doing what you ask (if I understood it properly). One with regex, one without. You choose which one you prefer ;)

One bit which may seem like magic is the "setdefault" line. For an explanation, see the docs. I leave it as "an exercise to the reader" to understand how it works ;)

from os import listdir
from os.path import join

DATA_ROOT = "testdata"

def folder_items_no_regex(month_name):

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):
      date, name = file.split( "_", 1 )

      # skip files that were not possible to split on "_"
      if not date or not name:
         continue

      # ignore non-.xml files
      if not name.endswith(".xml"):
         continue

      # cut off the ".xml" extension
      name = name[0:-4]

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items

def folder_items_regex(month_name):

   import re

   # The pattern:
   # 1. match the beginnning of line "^"
   # 2. capture 1 or more digits ( \d+ )
   # 3. match the "_"
   # 4. capture any character (as few as possible ): (.*?)
   # 5. match ".xml"
   # 6. match the end of line "$"
   pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):

      match = pattern.match( file )
      if not match:
         continue

      date, name = match.groups()

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items
if __name__ == "__main__":
   from pprint import pprint

   data = folder_items_no_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )


   data = folder_items_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )

回复收藏 0 原文

老旧海报 2024-08-16 23:53:28

假设项目名称具有固定长度的前缀和后缀（即 3 个字符的前缀，如“01_”和 4 个字符的后缀“.xml”），您可以像这样解决问题的第一部分

names = set(name[3:-4] for name in os.listdir('01') if name.endswith('.xml')]

：你独特的物品名称。

要过滤每个项目，只需查找以该项目名称结尾的文件，并根据需要对其进行排序。

item_suffix = '_item2.xml'
filtered = sorted(name for name in os.listdir('01') if name.endswith(item_suffix))

Assuming that item names have a fixed length prefix and suffix (ie, a 3 character prefix such as '01_' and a 4 character suffix of '.xml'), you could solve the first part of the problem like this:

names = set(name[3:-4] for name in os.listdir('01') if name.endswith('.xml')]

That will get you unique item names.

To filter each item, simply look for files that end with that item's name and sort it if required.

item_suffix = '_item2.xml'
filtered = sorted(name for name in os.listdir('01') if name.endswith(item_suffix))

回复收藏 0 原文

GRAY°灰色天空 2024-08-16 23:53:28

不确定你到底想做什么，但这里有一些可能对

创建文件名有用的指针（“％02d”表示用零填充）

foldernames = [“％02d”％i for i in range(1, 13)]

filenames = ["%02d"%i for i in range(1,24)]

使用os.path.join构建复杂的路径而不是字符串连接

os.path.join(foldername,filename)

os.path.exists首先检查文件是否存在

if os.path.exists(newname):
    print "file already exists"

以列出目录内容，使用glob

from glob import glob
xmlfiles = glob("*.xml")

使用shutil获得更高级别创建文件夹、重命名文件等操作

shutil.move(oldname,newname)

basename 从完整路径获取文件名

filename = os.path.basename(完整路径）

Not sure exactly what you want to do, but here are some pointers that might be useful

creating filenames ("%02d" means pad left with zeros)

foldernames = ["%02d"%i for i in range(1,13)]

filenames = ["%02d"%i for i in range(1,24)]

use os.path.join for building up complex paths instead of string concatenation

os.path.join(foldername,filename)

os.path.exists for checking whether a file exists first

if os.path.exists(newname):
    print "file already exists"

for listing directory contents, use glob

from glob import glob
xmlfiles = glob("*.xml")

use shutil for higher level operations like creating folders, renaming files

shutil.move(oldname,newname)

basename to get a file name from a full path

filename = os.path.basename(fullpath)

回复收藏 0 原文

~没有更多了~

关于作者

夢归不見

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

Python 文件操作

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

策马西风

柠檬心

1331

七度光

qq_oc2LaO

野却迷人

友情链接

Python 文件操作

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

策马西风

柠檬心

1331

七度光

qq_oc2LaO

野却迷人

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。