对带有异常的字符串进行标题转换

发布于 2024-09-24 05:04:48 字数 123 浏览 11 评论 0原文

Python 中是否有一种标准方法来对字符串进行标题大写(即单词以大写字符开头,所有剩余的大小写字符均为小写),但保留诸如 andin 和 < code>of 小写?

Is there a standard way in Python to titlecase a string (i.e. words start with uppercase characters, all remaining cased characters have lowercase) but leaving articles like and, in, and of lowercased?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

我们的影子 2024-10-01 05:04:48

这有一些问题。如果使用 split 和 join,一些空白字符将被忽略。内置的大写和标题方法不会忽略空格。

>>> 'There     is a way'.title()
'There     Is A Way'

如果句子以文章开头,则您不希望标题的第一个单词为小写。

记住这些:

import re 
def title_except(s, exceptions):
    word_list = re.split(' ', s)       # re.split behaves as expected
    final = [word_list[0].capitalize()]
    for word in word_list[1:]:
        final.append(word if word in exceptions else word.capitalize())
    return " ".join(final)

articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a    way', articles)
# There is a    Way
print title_except('a whim   of an elephant', articles)
# A Whim   of an Elephant

There are a few problems with this. If you use split and join, some white space characters will be ignored. The built-in capitalize and title methods do not ignore white space.

>>> 'There     is a way'.title()
'There     Is A Way'

If a sentence starts with an article, you do not want the first word of a title in lowercase.

Keeping these in mind:

import re 
def title_except(s, exceptions):
    word_list = re.split(' ', s)       # re.split behaves as expected
    final = [word_list[0].capitalize()]
    for word in word_list[1:]:
        final.append(word if word in exceptions else word.capitalize())
    return " ".join(final)

articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a    way', articles)
# There is a    Way
print title_except('a whim   of an elephant', articles)
# A Whim   of an Elephant
败给现实 2024-10-01 05:04:48

使用 titlecase.py 模块!仅适用于英语。

>>> from titlecase import titlecase
>>> titlecase('i am a foobar bazbar')
'I Am a Foobar Bazbar'

GitHub:https://github.com/ppannuto/python-titlecase

Use the titlecase.py module! Works only for English.

>>> from titlecase import titlecase
>>> titlecase('i am a foobar bazbar')
'I Am a Foobar Bazbar'

GitHub: https://github.com/ppannuto/python-titlecase

皓月长歌 2024-10-01 05:04:48

有这些方法:

>>> mytext = u'i am a foobar bazbar'
>>> print mytext.capitalize()
I am a foobar bazbar
>>> print mytext.title()
I Am A Foobar Bazbar

没有小写冠词选项。您必须自己编写代码,可能是使用您想要降低的文章列表。

There are these methods:

>>> mytext = u'i am a foobar bazbar'
>>> print mytext.capitalize()
I am a foobar bazbar
>>> print mytext.title()
I Am A Foobar Bazbar

There's no lowercase article option. You'd have to code that yourself, probably by using a list of articles you want to lower.

葬心 2024-10-01 05:04:48

Stuart Colville 制作了一个 Python 端口 John Gruber 编写的 Perl 脚本,用于将字符串转换为标题大小写,但根据规则避免大写小单词来自《纽约时报》风格手册,并满足一些特殊情况。

这些脚本的一些聪明之处在于:

  • 它们将 if、in、of、on 等小单词大写,但如果输入中错误地大写,则会取消它们的大写。

  • 脚本假定第一个字符以外的大写字母的单词已经正确大写。这意味着他们将单独保留像“iTunes”这样的单词,而不是将其改写为“iTunes”,或更糟糕的是“Itunes”。

  • 他们会跳过任何带有线点的单词; “example.com”和“del.icio.us”将保持小写。

  • 他们有专门处理奇怪情况的硬编码技巧,例如“AT&T”和“Q&A”,这两个词都包含通常应为小写的小单词(at 和 a)。

  • 标题的第一个和最后一个单词始终大写,因此输入“没什么好害怕的”将变成“没什么好害怕的”。

  • 冒号后面的小单词将大写。

您可以在此处下载它。

Stuart Colville has made a Python port of a Perl script written by John Gruber to convert strings into title case but avoids capitalizing small words based on rules from the New York Times Manual of style, as well as catering for several special cases.

Some of the cleverness of these scripts:

  • they capitalizes small words like if, in, of, on, etc., but will un-capitalize them if they’re erroneously capitalized in the input.

  • the scripts assume that words with capitalized letters other than the first character are already correctly capitalized. This means they will leave a word like “iTunes” alone, rather than mangling it into “ITunes” or, worse, “Itunes”.

  • they skip over any words with line dots; “example.com” and “del.icio.us” will remain lowercase.

  • they have hard-coded hacks specifically to deal with odd cases, like “AT&T” and “Q&A”, both of which contain small words (at and a) which normally should be lowercase.

  • The first and last word of the title are always capitalized, so input such as “Nothing to be afraid of” will be turned into “Nothing to Be Afraid Of”.

  • A small word after a colon will be capitalized.

You can download it here.

做个ˇ局外人 2024-10-01 05:04:48
capitalize (word)

这应该可以。我有不同的理解。

>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>

好的,正如上面回复中所说,您必须自定义大写:

mytext = u'i am a foobar bazbar'

def xcaptilize(word):
    skipList = ['a', 'an', 'the', 'am']
    if word not in skipList:
        return word.capitalize()
    return word

k = mytext.split(" ") 
l = map(xcaptilize, k)
print " ".join(l)   

此输出

I am a Foobar Bazbar
capitalize (word)

This should do. I get it differently.

>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>

Ok as said in reply above, you have to make a custom capitalize:

mytext = u'i am a foobar bazbar'

def xcaptilize(word):
    skipList = ['a', 'an', 'the', 'am']
    if word not in skipList:
        return word.capitalize()
    return word

k = mytext.split(" ") 
l = map(xcaptilize, k)
print " ".join(l)   

This outputs

I am a Foobar Bazbar
↙温凉少女 2024-10-01 05:04:48
 not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
               if word in not_these
               else word.title()
               for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""

标题以大写单词开头,与文章不匹配。

 not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
               if word in not_these
               else word.title()
               for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""

The title starts with capitalized word and that does not match the article.

甲如呢乙后呢 2024-10-01 05:04:48

Python 2.7 的 title 方法有一个缺陷。

value.title()

当值为 Carpenter's Assistant 时,将返回 Carpenter'S Assistant

最好的解决方案可能是 @BioGeek 使用 Stuart Colville 的标题的解决方案。这与@Etienne 提出的解决方案相同。

Python 2.7's title method has a flaw in it.

value.title()

will return Carpenter'S Assistant when value is Carpenter's Assistant

The best solution is probably the one from @BioGeek using titlecase from Stuart Colville. Which is the same solution proposed by @Etienne.

牵你的手,一向走下去 2024-10-01 05:04:48

使用列表理解和三元运算符的单行

reslt = " ".join([word.title() if word not in "the a on in of an" else word for word in "Wow, a python one liner for titles".split(" ")])
print(reslt)

分解:

for word in "Wow, a python oneliner fortitles".split(" ") 将字符串拆分为列表并启动 for 循环(在列表理解中)

word.title() if word not in "the a on in of an" else word 使用本机方法 title() 标题大小写,如果不是文章,则字符串

" ".join 使用分隔符(空格)连接列表元素

One-liner using list comprehension and the ternary operator

reslt = " ".join([word.title() if word not in "the a on in of an" else word for word in "Wow, a python one liner for titles".split(" ")])
print(reslt)

Breakdown:

for word in "Wow, a python one liner for titles".split(" ") Splits the string into an list and initiates a for loop (in the list comprehenstion)

word.title() if word not in "the a on in of an" else word uses native method title() to title case the string if it's not an article

" ".join joins the list elements with a seperator of (space)

萤火眠眠 2024-10-01 05:04:48

未考虑的一个重要情况是首字母缩略词(如果您明确提供首字母缩略词作为例外,python-titlecase 解决方案可以处理首字母缩略词)。我更喜欢简单地避免下套管。通过这种方法,已经是大写的首字母缩略词仍保持大写。以下代码是对 dheerosaur 最初提供的代码的修改。

# This is an attempt to provide an alternative to ''.title() that works with 
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test.  This is only a test.

# The following code addresses 0-3 & 7.  It was felt that addressing the others 
# would add considerable complexity.


def titlecase(
    s,
    exceptions = (
        'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
        'for', 'in', 'of', 'on', 'per', 'to'
    )
):
    words = s.strip().split(' ')
        # split on single space to maintain word spacing
        # remove leading and trailing spaces -- needed for first word casing

    def upper(s):
        if s:
            if s[0] in '‘“"‛‟' + "'":
                return s[0] + upper(s[1:])
            return s[0].upper() + s[1:]
        return ''

    # always capitalize the first word
    first = upper(words[0])

    return ' '.join([first] + [
        word if word.lower() in exceptions else upper(word)
        for word in words[1:]
    ])


cases = '''
    CDC warns about "aggressive" rats as coronavirus shuts down restaurants
    L.A. County opens churches, stores, pools, drive-in theaters
    UConn senior accused of killing two men was looking for young woman
    Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
    Maintain given spacing: This is a test.  This is only a test.
'''.strip().splitlines()

for case in cases:
    print(titlecase(case))

运行时,它会产生以下结果:

CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
Maintain Given Spacing: This Is a Test.  This Is Only a Test.

One important case that is not being considered is acronyms (the python-titlecase solution can handle acronyms if you explicitly provide them as exceptions). I prefer instead to simply avoid down-casing. With this approach, acronyms that are already upper case remain in upper case. The following code is a modification of that originally provided by dheerosaur.

# This is an attempt to provide an alternative to ''.title() that works with 
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test.  This is only a test.

# The following code addresses 0-3 & 7.  It was felt that addressing the others 
# would add considerable complexity.


def titlecase(
    s,
    exceptions = (
        'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
        'for', 'in', 'of', 'on', 'per', 'to'
    )
):
    words = s.strip().split(' ')
        # split on single space to maintain word spacing
        # remove leading and trailing spaces -- needed for first word casing

    def upper(s):
        if s:
            if s[0] in '‘“"‛‟' + "'":
                return s[0] + upper(s[1:])
            return s[0].upper() + s[1:]
        return ''

    # always capitalize the first word
    first = upper(words[0])

    return ' '.join([first] + [
        word if word.lower() in exceptions else upper(word)
        for word in words[1:]
    ])


cases = '''
    CDC warns about "aggressive" rats as coronavirus shuts down restaurants
    L.A. County opens churches, stores, pools, drive-in theaters
    UConn senior accused of killing two men was looking for young woman
    Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
    Maintain given spacing: This is a test.  This is only a test.
'''.strip().splitlines()

for case in cases:
    print(titlecase(case))

When run, it produces the following:

CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
Maintain Given Spacing: This Is a Test.  This Is Only a Test.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文