python 对末尾带有数字的字符串进行排序

发布于 2024-10-05 15:18:35 字数 281 浏览 8 评论 0原文

对末尾带有数字的字符串列表进行排序(有些有 3 位数字,有些有 4 位数字)的最简单方法是什么:

>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']

应该将 1234 放在最后。有没有简单的方法可以做到这一点?

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:

>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']

should put the 1234 one on the end. is there an easy way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

泡沫很甜 2024-10-12 15:18:35

有没有简单的方法可以做到这一点?

是的,

您可以使用 natsort 模块。

>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

完全公开,我是该包的作者。

is there an easy way to do this?

Yes

You can use the natsort module.

>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

Full disclosure, I am the package's author.

拥抱没勇气 2024-10-12 15:18:35

有没有简单的方法可以做到这一点?

不,

完全不清楚真正的规则是什么。 “有些有 3 位数字,有些有 4 位数字”实际上并不是一个非常精确或完整的规范。您的所有示例都在数字前面显示 4 个字母。这总是正确的吗?

import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
    m = key_pat.match(item)
    return m.group(1), int(m.group(2))

key 函数可能会执行您想要的操作。或者它可能太复杂了。或者也许模式确实是 r"^(.*)(\d{3,4})$" 或者规则可能更加模糊。

>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

is there an easy way to do this?

No

It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?

import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
    m = key_pat.match(item)
    return m.group(1), int(m.group(2))

That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.

>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
谢绝鈎搭 2024-10-12 15:18:35

您所描述的可能称为 自然排序,或人类排序。如果您使用的是 Python,则可以借鉴 Ned 的实现

自然排序的算法大致如下:

  • 将每个值拆分为字母“块”和数字“块”
  • 按每个值的第一个块排序
    • 如果块是按字母顺序排列的,则照常排序
    • 如果块是数字,则按表示的数值排序
  • 取出具有相同第一个块的值并按第二个块对它们进行排序
  • 依此类推

What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.

The algorithm for a natural sort is approximately as follows:

  • Split each value into alphabetical "chunks" and numerical "chunks"
  • Sort by the first chunk of each value
    • If the chunk is alphabetical, sort it as usual
    • If the chunk is numerical, sort by the numerical value represented
  • Take the values that have the same first chunk and sort them by the second chunk
  • And so on
妞丶爷亲个 2024-10-12 15:18:35
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
汐鸠 2024-10-12 15:18:35

你需要一个关键功能。您愿意在最后指定 3 或 4 位数字,并且我有一种感觉您希望它们进行数字比较。

sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:]))) 

如果没有 lambda 和条件表达式,

def key(s):
    if key[-4] in '0123456789':
         return (s[:-4], int(s[-4:]))
    else:
         return (s[:-3], int(s[-3:]))

sorted(list_, key=key)

这只是利用了元组按第一个元素排序,然后是第二个元素排序的事实。因此,由于调用了 key 函数来获取要比较的值,因此现在将像 key 函数返回的元组一样对元素进行比较。例如,'asdfbad123' 将与 'asd7890' 进行比较,就像 ('asdfbad', 123)('asd' 进行比较一样,7890)。如果字符串的最后 3 个字符实际上不是数字,您将得到一个 ValueError ,鉴于您向其传递的数据不符合其设计规范,因此这是完全合适的。

You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.

sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:]))) 

Without the lambda and conditional expression that's

def key(s):
    if key[-4] in '0123456789':
         return (s[:-4], int(s[-4:]))
    else:
         return (s[:-3], int(s[-3:]))

sorted(list_, key=key)

This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.

奶气 2024-10-12 15:18:35

问题是这里的排序是按字母顺序排列的,因为它们是字符串。在移动到下一个字符之前,会比较每个字符序列。

>>> 'a1234' < 'a124'  <----- positionally '3' is less than '4' 
True
>>> 

您需要进行数字排序才能获得所需的输出。

>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>> 

The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.

>>> 'a1234' < 'a124'  <----- positionally '3' is less than '4' 
True
>>> 

You will need to due numeric sorting to get the desired output.

>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>> 
另类 2024-10-12 15:18:35
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
孤云独去闲 2024-10-12 15:18:35

我没有自己拆分每一行,而是要求 python 使用 re.findall() 为我完成此操作:

import re
import sys

def SortKey(line):
  result = []
  for part in re.findall(r'\D+|\d+', line):
    try:
      result.append(int(part, 10))
    except (TypeError, ValueError) as _:
      result.append(part)
  return result

print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),

rather than splitting each line myself, I ask python to do it for me with re.findall():

import re
import sys

def SortKey(line):
  result = []
  for part in re.findall(r'\D+|\d+', line):
    try:
      result.append(int(part, 10))
    except (TypeError, ValueError) as _:
      result.append(part)
  return result

print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文