来自 os.listdir() 的非字母数字列表顺序

发布于 2024-10-14 15:45:12 字数 367 浏览 7 评论 0原文

我经常使用 python 来处理数据目录。最近,我注意到列表的默认顺序已更改为几乎无意义的内容。例如,如果我位于包含以下子目录的当前目录中:run01、run02、... run19、run20,然后我通过以下命令生成一个列表:

dir = os.listdir(os.getcwd())

那么我通常会按此顺序获取一个列表:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

等等。以前的顺序是字母数字。但这个新秩序已经伴随我一段时间了。

是什么决定了这些列表的(显示)顺序?

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

What is determining the (displayed) order of these lists?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

眼睛会笑 2024-10-21 15:45:12

您可以使用内置的 sorted 函数根据需要对字符串进行排序。根据您的描述,

sorted(os.listdir(whatever_directory))

或者,您可以使用列表的 .sort 方法:

lst = os.listdir(whatever_directory)
lst.sort()

我认为应该可以解决问题。

请注意,os.listdir 获取文件名的顺序可能完全取决于您的文件系统。

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.

似狗非友 2024-10-21 15:45:12

我认为顺序与文件在文件系统上的索引方式有关。
如果您确实想让它遵循某种顺序,您可以在获取文件后随时对列表进行排序。

I think the order has to do with the way the files are indexed on your FileSystem.
If you really want to make it adhere to some order you can always sort the list after getting the files.

淡莣 2024-10-21 15:45:12

根据文档

os.listdir(路径)

<块引用>

返回列表
包含条目的名称
路径给定的目录。 列表
是任意顺序的
。它不
包括特殊条目“.”和
'..' 即使它们出现在
目录。

顺序不能被依赖,并且是文件系统的产物。

要对结果进行排序,请使用 sorted(os.listdir(path))

Per the documentation:

os.listdir(path)

Return a list
containing the names of the entries in
the directory given by path. The list
is in arbitrary order
. It does not
include the special entries '.' and
'..' even if they are present in the
directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).

难如初 2024-10-21 15:45:12

无论出于何种原因,Python 都没有内置的方式来进行自然排序(意思是 1, 2, 10 而不是 1, 10, 2),所以你必须自己编写它:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

你可以现在使用此函数对列表进行排序:

dirlist = sorted_alphanumeric(os.listdir(...))

问题:
如果您使用上述函数对字符串(例如文件夹名称)进行排序并希望它们像 Windows 资源管理器那样排序,则在某些边缘情况下它将无法正常工作。
如果您的文件夹名称中包含某些“特殊”字符,则此排序功能将在 Windows 上返回不正确的结果。例如,此函数将排序 1, !1, !a, a,而 Windows 资源管理器将排序 !1, 1, !a, a

因此,如果您想要与 Python 中的 Windows 资源管理器完全一样排序,您必须使用 Windows 内置函数 StrCmpLogicalW 通过 ctypes(这当然在 Unix 上不起作用):

from ctypes import wintypes, windll
from functools import cmp_to_key

def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

此函数比 sorted_alphanumeric() 稍慢

额外奖励:winsort 还可以对 Windows 上的完整路径进行排序

或者,特别是如果您使用 Unix,您可以使用 natsort 库 (pip install natsort) 以正确的方式按完整路径排序(意味着子文件夹位于正确的位置) 。

您可以像这样使用它来对完整路径进行排序:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

从版本 7.1.0 开始 natsort 支持 os_sorted 内部使用前面提到的 Windows API 或 Linux 排序,应该使用它来代替 natsorted()

Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

dirlist = sorted_alphanumeric(os.listdir(...))

PROBLEMS:
In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain 'special' characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won't work on Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key

def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

Bonus: winsort can also sort full paths on Windows.

Alternatively, especially if you use Unix, you can use the natsort library (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

You can use it like this to sort full paths:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Starting with version 7.1.0 natsort supports os_sorted which internally uses either the beforementioned Windows API or Linux sorting and should be used instead of natsorted().

氛圍 2024-10-21 15:45:12

我认为默认情况下顺序是由 ASCII 值决定的。这个问题的解决办法是这样的

dir = sorted(os.listdir(os.getcwd()), key=len)

I think by default the order is determined with the ASCII value. The solution to this problem is this

dir = sorted(os.listdir(os.getcwd()), key=len)
清旖 2024-10-21 15:45:12

使用 natsort 库:

对于 Ubuntu 和其他 Debian 版本,使用以下命令安装库

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

详细信息关于如何使用此库的信息可以在此处找到,

from natsort import natsorted

files = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
natsorted(files)

[out]:
['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']
  • 这不是答案natsort 于 2020 年 1 月 27 日作为编辑添加。

Use natsort library:

Install the library with the following command for Ubuntu and other Debian versions

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

Details of how to use this library is found here

from natsort import natsorted

files = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
natsorted(files)

[out]:
['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']
  • This is not a duplicate of answer. natsort was added as an edit on 2020-01-27.
夏了南城 2024-10-21 15:45:12
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

如我的要求,我有类似 row_163.pkl 的情况,这里 os.path.splitext('row_163.pkl') 将其分解为 (' row_163', '.pkl') 因此还需要根据 '_' 对其进行拆分。

但如果您有要求,您可以执行类似

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where 的

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

操作,对于目录检索,您可以执行 sorted(os.listdir(path))

以及类似 'run01.txt' 的情况'run01.csv' 你可以这样做

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pkl here os.path.splitext('row_163.pkl') will break it into ('row_163', '.pkl') so need to split it based on '_' also.

but in case of your requirement you can do something like

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

and for the case of like 'run01.txt' or 'run01.csv' you can do like this

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
若水般的淡然安静女子 2024-10-21 15:45:12

这可能只是 C 的 readdir() 返回的顺序。尝试运行这个 C 程序:

#include <dirent.h>
#include <stdio.h>

int main(void){
   DIR *dirp;
   struct dirent* de;
   dirp = opendir(".");
   while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
   closedir(dirp);
   return 0;
}

构建行应该类似于 gcc -o foo foo.c

PS刚刚运行了这个和你的Python代码,它们都给了我排序的输出,所以我无法重现你所看到的。

It's probably just the order that C's readdir() returns. Try running this C program:

#include <dirent.h>
#include <stdio.h>

int main(void){
   DIR *dirp;
   struct dirent* de;
   dirp = opendir(".");
   while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
   closedir(dirp);
   return 0;
}

The build line should be something like gcc -o foo foo.c.

P.S. Just ran this and your Python code, and they both gave me sorted output, so I can't reproduce what you're seeing.

小草泠泠 2024-10-21 15:45:12

来自文档

列表按任意顺序排列,并且不包括特殊条目“.”和“..”,即使它们存在于目录中。

这意味着该顺序可能取决于操作系统/文件系统,没有特别有意义的顺序,因此不能保证是任何特定的顺序。正如许多答案所提到的:如果愿意,可以对检索到的列表进行排序。

干杯:)

From the documentation:

The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

This means that the order is probably OS/filesystem dependent, has no particularly meaningful order, and is therefore not guaranteed to be anything in particular. As many answers mentioned: if preferred, the retrieved list can be sorted.

Cheers :)

删除会话 2024-10-21 15:45:12

建议的 os.listdirsorted 命令组合生成与 Linux 下的 ls -l 命令相同的结果。以下示例验证了这一假设:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

因此,对于想要在 Python 代码中重现众所周知的 ls -l 命令结果的人来说,sorted( os.listdir( DIR ) ) 效果很好。

The proposed combination of os.listdir and sorted commands generates the same result as ls -l command under Linux. The following example verifies this assumption:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

So, for someone who wants to reproduce the result of the well-known ls -l command in their python code, sorted( os.listdir( DIR ) ) works pretty well.

橘味果▽酱 2024-10-21 15:45:12

我发现“排序”并不总是达到我的预期。例如,我有一个如下目录,“排序”给了我一个非常奇怪的结果:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

似乎它首先比较第一个字符,如果它是最大的,那么它将是最后一个。

I found "sort" does not always do what I expected. eg, I have a directory as below, and the "sort" give me a very strange result:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.

美男兮 2024-10-21 15:45:12
In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.
In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.
時窥 2024-10-21 15:45:12

要直接回答问题,可以使用以下代码。

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
for file in sorted(dir, key=lambda x:int(x.replace('run', ''))):
    print(file)

它将打印:

run01
run08
run11
run12
run13
run14
run18

此方法使用Python内置方法sorted,并且通过key参数指定排序标准,即不带'的列表项run' 转换为整数。

To answer the question directly, you can use the following code.

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
for file in sorted(dir, key=lambda x:int(x.replace('run', ''))):
    print(file)

It will print:

run01
run08
run11
run12
run13
run14
run18

This approach uses the Python built-in method sorted, and, through the key argument, it specifies the sorting criterium, that is, the list item without 'run' casted to an integer.

鸠书 2024-10-21 15:45:12

ls 默认情况下预览按名称排序的文件。 (ls 选项可用于按日期、大小等排序。)

files = list(os.popen("ls"))
files = [file.strip("\n") for file in files]

当目录包含如此多的文件时,使用 ls 会有更好的性能。

ls by default previews the files sorted by name. (ls options can be used to sort by date, size, ...)

files = list(os.popen("ls"))
files = [file.strip("\n") for file in files]

Using ls would have much better performance when the directory contains so many files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文