string.title() 认为撇号是一个新词的开头。为什么?

发布于 2024-12-09 21:16:00 字数 226 浏览 5 评论 0原文

>>> myStr="madam. i'm adam! i also tried c,o,m,m,a"
>>> myStr.title()
"Madam. I'M Adam! I Also Tried C,O,M,M,A"

这当然是不正确的。为什么撇号被视为新单词的开头。这是一个陷阱还是我假设 标题 的概念有问题?

>>> myStr="madam. i'm adam! i also tried c,o,m,m,a"
>>> myStr.title()
"Madam. I'M Adam! I Also Tried C,O,M,M,A"

This is certainly incorrect. Why would an apostrophe be considered be considered as the start of a new word. Is this a gotcha or a am I assuming something wrong about the concept of title?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

生寂 2024-12-16 21:16:00

因为该实现是通过查看前一个字符来工作的,如果它是字母数字,则将当前字符小写,否则将其大写。也就是说,它比较简单,这是它的纯Python版本的样子:

def title(string):
    result = []
    prev_letter = ' '

    for ch in string:
        if not prev_letter.isalpha():
            result.append(ch.upper())
        else:
            result.append(ch.lower())

        prev_letter = ch

    return "".join(result)

Because the implementation works by looking at the previous character, and if it's alphanumeric it lower cases the current character, otherwise it upper cases it. That is to say, it's relatively simple, here's what a pure-python version of it looks like:

def title(string):
    result = []
    prev_letter = ' '

    for ch in string:
        if not prev_letter.isalpha():
            result.append(ch.upper())
        else:
            result.append(ch.lower())

        prev_letter = ch

    return "".join(result)
倾城花音 2024-12-16 21:16:00

您可以使用:

string.capwords()

# Capitalize the words in a string, e.g. " aBc  dEf " -> "Abc Def".
def capwords(s, sep=None):
    """capwords(s, [sep]) -> string

    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join. Note that this replaces runs of whitespace characters by
    a single space.

    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

并且,由于 title() 与语言环境相关,请检查您的语言环境以查看这是否是故意的:

locale.localeconv()
返回本地约定的数据库作为
字典。

标题()
返回字符串的标题版本:单词开头为
大写字符,所有剩余的大小写字符均为小写。对于 8 位字符串,此方法与区域设置相关。

You could use:

string.capwords()

# Capitalize the words in a string, e.g. " aBc  dEf " -> "Abc Def".
def capwords(s, sep=None):
    """capwords(s, [sep]) -> string

    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join. Note that this replaces runs of whitespace characters by
    a single space.

    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

And, since title() is locale-dependent, check your locale to see if this is intentional:

locale.localeconv()
Returns the database of the local conventions as a
dictionary.

title()
Return a titlecased version of the string: words start with
uppercase characters, all remaining cased characters are lowercase. For 8-bit strings, this method is locale-dependent.

深爱不及久伴 2024-12-16 21:16:00

title 方法将字符串中每个单词的第一个字母大写(并使其余单词小写)。单词被标识为由非字母字符(例如数字或空格)分隔的字母字符的子串。这可能会导致一些意外的行为。例如,字符串“x1x”将转换为“X1X”而不是“X1x”。

http://en.wikibooks.org/wiki/Python_Programming /Strings#title.2C_upper.2C_lower.2C_swapcase.2C_capitalize

基本上,按预期工作。由于撇号确实是非字母的,因此您会得到上面概述的“意外行为”。

一些谷歌搜索表明,其他人认为这并不是最好的事情,并且已经编写了替代实现。请参阅:http://muffinresearch.co.uk/档案/2008/05/27/titlecasepy-titlecase-in-python/

The title method capitalizes the first letter of each word in the string (and makes the rest lower case). Words are identified as substrings of alphabetic characters that are separated by non-alphabetic characters, such as digits, or whitespace. This can lead to some unexpected behavior. For example, the string "x1x" will be converted to "X1X" instead of "X1x".

http://en.wikibooks.org/wiki/Python_Programming/Strings#title.2C_upper.2C_lower.2C_swapcase.2C_capitalize

Basically, working as intended. Since apostrophe is indeed non-alphabetic, you get the "unexpected behavior" outlined above.

A bit of googling shows that other people feel this is not exactly the best thing and alternate implementations have been written. See: http://muffinresearch.co.uk/archives/2008/05/27/titlecasepy-titlecase-in-python/

時窥 2024-12-16 21:16:00

这里的问题是“标题大小写”是一个非常依赖文化的概念。即使在英语中,也有太多的极端情况无法全部容纳。 (另请参阅http://bugs.python.org/issue7008

如果你想要更好的东西,你需要想想你想要处理什么类型的文本(这意味着错误地处理其他文本),并编写你自己的函数。

The problem here is that "title case" is a very culturally dependent concept. Even in English, there are too many corner cases to fit them all. (See also http://bugs.python.org/issue7008)

If you want something better, you need to think of what kinds of texts you want to handle (and that means doing others incorrectly), and write your own function.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文