来自 os.listdir() 的非字母数字列表顺序
我经常使用 python 来处理数据目录。最近,我注意到列表的默认顺序已更改为几乎无意义的内容。例如,如果我位于包含以下子目录的当前目录中:run01、run02、... run19、run20,然后我通过以下命令生成一个列表:
dir = os.listdir(os.getcwd())
那么我通常会按此顺序获取一个列表:
dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]
等等。以前的顺序是字母数字。但这个新秩序已经伴随我一段时间了。
是什么决定了这些列表的(显示)顺序?
I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:
dir = os.listdir(os.getcwd())
then I usually get a list in this order:
dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]
and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.
What is determining the (displayed) order of these lists?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
您可以使用内置的
sorted
函数根据需要对字符串进行排序。根据您的描述,或者,您可以使用列表的
.sort
方法:我认为应该可以解决问题。
请注意,
os.listdir
获取文件名的顺序可能完全取决于您的文件系统。You can use the builtin
sorted
function to sort the strings however you want. Based on what you describe,Alternatively, you can use the
.sort
method of a list:I think should do the trick.
Note that the order that
os.listdir
gets the filenames is probably completely dependent on your filesystem.我认为顺序与文件在文件系统上的索引方式有关。
如果您确实想让它遵循某种顺序,您可以在获取文件后随时对列表进行排序。
I think the order has to do with the way the files are indexed on your FileSystem.
If you really want to make it adhere to some order you can always sort the list after getting the files.
根据文档:
顺序不能被依赖,并且是文件系统的产物。
要对结果进行排序,请使用
sorted(os.listdir(path))
。Per the documentation:
Order cannot be relied upon and is an artifact of the filesystem.
To sort the result, use
sorted(os.listdir(path))
.无论出于何种原因,Python 都没有内置的方式来进行自然排序(意思是 1, 2, 10 而不是 1, 10, 2),所以你必须自己编写它:
你可以现在使用此函数对列表进行排序:
问题:
如果您使用上述函数对字符串(例如文件夹名称)进行排序并希望它们像 Windows 资源管理器那样排序,则在某些边缘情况下它将无法正常工作。
如果您的文件夹名称中包含某些“特殊”字符,则此排序功能将在 Windows 上返回不正确的结果。例如,此函数将排序
1, !1, !a, a
,而 Windows 资源管理器将排序!1, 1, !a, a
。因此,如果您想要与 Python 中的 Windows 资源管理器完全一样排序,您必须使用 Windows 内置函数 StrCmpLogicalW 通过 ctypes(这当然在 Unix 上不起作用):
此函数比
sorted_alphanumeric() 稍慢
。额外奖励:
winsort
还可以对 Windows 上的完整路径进行排序。或者,特别是如果您使用 Unix,您可以使用
natsort
库 (pip install natsort
) 以正确的方式按完整路径排序(意味着子文件夹位于正确的位置) 。您可以像这样使用它来对完整路径进行排序:
从版本 7.1.0 开始 natsort 支持
os_sorted
内部使用前面提到的 Windows API 或 Linux 排序,应该使用它来代替natsorted()
。Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:
You can now use this function to sort a list:
PROBLEMS:
In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain 'special' characters in them. For example this function will sort
1, !1, !a, a
, whereas Windows Explorer would sort!1, 1, !a, a
.So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won't work on Unix):
This function is slightly slower than
sorted_alphanumeric()
.Bonus:
winsort
can also sort full paths on Windows.Alternatively, especially if you use Unix, you can use the
natsort
library (pip install natsort
) to sort by full paths in a correct way (meaning subfolders at the correct position).You can use it like this to sort full paths:
Starting with version 7.1.0 natsort supports
os_sorted
which internally uses either the beforementioned Windows API or Linux sorting and should be used instead ofnatsorted()
.我认为默认情况下顺序是由 ASCII 值决定的。这个问题的解决办法是这样的
I think by default the order is determined with the ASCII value. The solution to this problem is this
使用
natsort
库:对于 Ubuntu 和其他 Debian 版本,使用以下命令安装库
Python 2
Python 3
详细信息关于如何使用此库的信息可以在此处找到,
natsort
于 2020 年 1 月 27 日作为编辑添加。Use
natsort
library:Install the library with the following command for Ubuntu and other Debian versions
Python 2
Python 3
Details of how to use this library is found here
natsort
was added as an edit on 2020-01-27.如我的要求,我有类似
row_163.pkl
的情况,这里os.path.splitext('row_163.pkl')
将其分解为(' row_163', '.pkl')
因此还需要根据 '_' 对其进行拆分。但如果您有要求,您可以执行类似
where 的
操作,对于目录检索,您可以执行
sorted(os.listdir(path))
以及类似
'run01.txt' 的情况
或'run01.csv'
你可以这样做As In case of mine requirement I have the case like
row_163.pkl
hereos.path.splitext('row_163.pkl')
will break it into('row_163', '.pkl')
so need to split it based on '_' also.but in case of your requirement you can do something like
where
and also for directory retrieving you can do
sorted(os.listdir(path))
and for the case of like
'run01.txt'
or'run01.csv'
you can do like this这可能只是 C 的
readdir()
返回的顺序。尝试运行这个 C 程序:构建行应该类似于
gcc -o foo foo.c
。PS刚刚运行了这个和你的Python代码,它们都给了我排序的输出,所以我无法重现你所看到的。
It's probably just the order that C's
readdir()
returns. Try running this C program:The build line should be something like
gcc -o foo foo.c
.P.S. Just ran this and your Python code, and they both gave me sorted output, so I can't reproduce what you're seeing.
来自文档:
这意味着该顺序可能取决于操作系统/文件系统,没有特别有意义的顺序,因此不能保证是任何特定的顺序。正如许多答案所提到的:如果愿意,可以对检索到的列表进行排序。
干杯:)
From the documentation:
This means that the order is probably OS/filesystem dependent, has no particularly meaningful order, and is therefore not guaranteed to be anything in particular. As many answers mentioned: if preferred, the retrieved list can be sorted.
Cheers :)
建议的
os.listdir
和sorted
命令组合生成与 Linux 下的ls -l
命令相同的结果。以下示例验证了这一假设:因此,对于想要在 Python 代码中重现众所周知的 ls -l 命令结果的人来说,sorted( os.listdir( DIR ) ) 效果很好。
The proposed combination of
os.listdir
andsorted
commands generates the same result asls -l
command under Linux. The following example verifies this assumption:So, for someone who wants to reproduce the result of the well-known
ls -l
command in their python code,sorted( os.listdir( DIR ) )
works pretty well.我发现“排序”并不总是达到我的预期。例如,我有一个如下目录,“排序”给了我一个非常奇怪的结果:
似乎它首先比较第一个字符,如果它是最大的,那么它将是最后一个。
I found "sort" does not always do what I expected. eg, I have a directory as below, and the "sort" give me a very strange result:
It seems it compares the first character first, if that is the biggest, it would be the last one.
要直接回答问题,可以使用以下代码。
它将打印:
此方法使用Python内置方法
sorted
,并且通过key
参数指定排序标准,即不带'的列表项run' 转换为整数。To answer the question directly, you can use the following code.
It will print:
This approach uses the Python built-in method
sorted
, and, through thekey
argument, it specifies the sorting criterium, that is, the list item without 'run' casted to an integer.ls
默认情况下预览按名称排序的文件。 (ls
选项可用于按日期、大小等排序。)当目录包含如此多的文件时,使用
ls
会有更好的性能。ls
by default previews the files sorted by name. (ls
options can be used to sort by date, size, ...)Using
ls
would have much better performance when the directory contains so many files.