os.walk 和一些测试
我不确定我是否正确理解 os.walk 如何存储其结果。
我正在尝试执行以下操作:
我正在检查根文件夹中是否有后续文件夹。它们有数百个,并且以某种统一的方式嵌套。
我正在尝试检查每个子文件夹,如果它以四位数结尾,则将其存储在列表中。
我使用了高度程序化的代码,并得到了它,但代码使用的是 os.listdir,这意味着我需要为我想要的每个文件夹执行该函数。
有更好的办法吗?
def ListadorPastas(pasta):
resultado = []
regex = "^[0-9]{4}"
padrao = re.compile(regex)
for p in os.listdir(pasta):
regexObject = re.match(padrao,p[-4:])
if (regexObject!=None):
resultado.append(regexObject.string)
else:
pass
return resultado
另外,我有一个正则表达式问题:这个正则表达式匹配表达式的最后四个切片数字。有时我的文件夹末尾有 5 位数字,这也会匹配。我尝试使用“$[0-9]{4}”,但它什么也没返回。有什么想法吗?
提前致谢。
乔治
I'm not sure if i understand properly how os.walk store its results.
Im trying to do the following:
I'm checking a root folder for subsequent folders. There are several hundreds of em, and they are nested in somewaht uniform way.
I'm trying to check each subfolder, and if it ends with a four digit number, store it in a list.
I used a highly procedural code, and got to it, but the code is using os.listdir, meaning that i need to execute the function for each folder i want.
Is there a better way?
def ListadorPastas(pasta):
resultado = []
regex = "^[0-9]{4}"
padrao = re.compile(regex)
for p in os.listdir(pasta):
regexObject = re.match(padrao,p[-4:])
if (regexObject!=None):
resultado.append(regexObject.string)
else:
pass
return resultado
Also, i have a regex problem: this regex is matching the last four sliced digits of a expression. Sometime i have folders with 5 digits in the end, which ALSO will match. I tried using "$[0-9]{4}" but it returns me nothing. Any ideas why?
Thanks in advance.
George
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
$
表示正则表达式模式中的结尾(行或字符串),所以我想知道您如何期望“字符串结尾然后四位数字”可能匹配任何内容......?根据“结束”的定义,它后面不会跟着 4 位数字!r'(^|\D)\d{4}$'
如果我理解你想要什么,匹配只有 4 位数字的字符串,或者以 4 位数字结尾,而不是 5 位数字,那么效果应该会更好或更多(\D
表示非数字,就像\d
表示数字一样 - 没有理由使用[0-9]
或 <代码>[^0-9]!)。os.walk 不需要存储太多——它正在行走的隐式树上的几个指针——但是为什么你关心它的内部实现方式呢?只需使用它...:
我还借此机会展示了一种非正则表达式的方法来对子目录名称进行所需的检查。
$
means end-of-(line or string) in a regex pattern, so I wonder how you expected "end of string then four digits" to ever possibly match anything...? By definition of "end" it won't be followed by 4 digits!r'(^|\D)\d{4}$'
should work better if I understand what you want, to match strings that are just 4 digits, or end with exactly 4 digits, not 5 or more (\D
means non-digit, just like\d
means digit -- no reason to use[0-9]
or[^0-9]
!).os.walk
does not need to store much -- a couple pointers on the implicit tree it's walking -- but why do you care how it's implemented internally? Just use it...:where I'm also taking the opportunity to show a non-regex way to do the checks you want on the subdirectory's name.
关于正则表达式:如果您使用
p[-4:]
,您将始终查看p
的最后四个字符,因此您没有机会看看是否真的有五个。因此,使用
re.search
也将匹配字符串的部分内容。About the regex: If you use
p[-4:]
, you'll always look at the last four characters ofp
, so you don't get a chance to see if there really are five.So instead, use
re.search
will also match parts of the string.您应该使用的正则表达式是:
至于 os.walk 您的解释并不完全清楚。
regex you should be using is:
as for
os.walk
your explanation is not entirely clear.