包含多个子字符串的拆分字符串
我有字符串名称
names = ['熟人Muller','副总统约翰逊会员彼得森熟人Rose']
我想拆分包含<<的字符串强>以下子字符串中有多个:
substrings = ['副总统','会员','secclaintance']
更重要的是,我想在上一个之后拆分遵循子字符串的单词的字符
desired_output = ['熟人穆勒','副总统约翰逊',“会员彼得森”,'熟人罗斯博士']
我不知道如何如何在我的代码中实现“多个”条件:
names = ['Acquaintance Muller', 'Vice president Johnson Affiliate Peterson Acquaintance Dr. Rose']
substrings = re.compile(r'Vice\spresident|Affiliate|Acquaintance')
splitted = []
for i in names:
if substrings in i:
splitted.append([])
splitted[-1].append(item)
异常:当最后一个字符是一个点(例如Prof。
)时,在 second supstring之后分开。
更新: 名称
比我想象的要复杂,并遵循
- 已经正确回答的标题般的模式(
'副总统约翰逊(Johnson)会员彼得森(Johnson Affiliate Peterson Peterson)熟人Rose'< /code>)
- 直到遵循第二种字符串模式(
'Mister Kelly,AWS'
), - 直到第三个字符串遵循直到结束为止(
'Birker博士,秘书Dews博士,Dews博士,关系秘书小姐雅各布博士,秘书'
)
names = ['熟人穆勒' Prt Robertson小姐,FDU',“凯利先生,AWS”,'伯克(Birker),秘书Dews博士,伯格小姐(Miss Berg),关系秘书秘书,秘书']
有时秘书
随后是不同的规格。我不在乎这些字符有时会遵循秘书
,直到下一个名字发生。他们可以掉落。当然,'秘书'
应像updated_output
中一样存储。
我创建了一个 - 希望详尽的列表规格
秘书
的内容。这是列表的表示: 规格= ['','','','用于关系','for Interior',“对于环境”]
更新的问题:我如何解释使用规格
列表的第三个模式?
updated_output = ['熟人穆勒','副总统约翰逊',“会员彼得森”,'熟人罗斯博士' ,“凯利先生,aws”,博士Birker,国务卿”,博士Dews,成员',关系秘书伯格小姐,' Jakob,秘书']
I have a list of strings names
names = ['Acquaintance Muller', 'Vice president Johnson Affiliate Peterson Acquaintance Dr. Rose']
I want to split the strings that contain more than one of the following substrings:
substrings = ['Vice president', 'Affiliate', 'Acquaintance']
More precicely, i want to split after the last character of the word that follows the substring
desired_output = ['Acquaintance Muller', 'Vice president Johnson', 'Affiliate Peterson', 'Acquaintance Dr. Rose']
I dont know how to implement 'more than one' condition into my code:
names = ['Acquaintance Muller', 'Vice president Johnson Affiliate Peterson Acquaintance Dr. Rose']
substrings = re.compile(r'Vice\spresident|Affiliate|Acquaintance')
splitted = []
for i in names:
if substrings in i:
splitted.append([])
splitted[-1].append(item)
Exception: when that last character is a point (e.g. Prof.
), split after the second word following the substring.
update: names
is more complex than i thought and follows
- the title-like-pattern already answered correctly (
'Vice president Johnson Affiliate Peterson Acquaintance Dr. Rose'
) - until a second pattern of strings follows (
'Mister Kelly, AWS'
) - until a third pattern of strings follows until the end (
'Dr. Birker, Secretary Dr. Dews, Member Miss Berg, Secretary for Relations Dr. Jakob, Secretary'
)
names = ['Acquaintance Muller', 'Vice president Johnson Affiliate Peterson Acquaintance Dr. Rose', 'Vice president Dr. John Mister Schmid, PRT Miss Robertson, FDU', 'Mister Kelly, AWS', 'Dr. Birker, Secretary Dr. Dews, Member Miss Berg, Secretary for Relations Dr. Jakob, Secretary']
Sometimes Secretary
is followed by varying specifications. I dont care about these characters that sometimes follow Secretary
until the next name occurs. They can be dropped. Of course 'Secretary'
should be stored like in updated_output
.
I created a - hopefully exhaustive - list specifications
of the stuff that follows Secretary
. Here is a representation of list:specifications = ['', ' of State', ' for Relations', ' for the Interior', ' for the Environment']
updated question: how can i account for the third pattern using the specification
list?
updated_output = ['Acquaintance Muller', 'Vice president Johnson', 'Affiliate Peterson', 'Acquaintance Dr. Rose', 'Vice president Dr. John', 'Mister Schmid, PRT', 'Miss Robertson, FDU', 'Mister Kelly, AWS', 'Dr. Birker, Secretary of State', 'Dr. Dews, Member', 'Miss Berg, Secretary for Relations, 'Dr. Jakob, Secretary']
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您想在这三个标题之一之前以 word boundare 进行拆分,因此您可以查找一个word boundare
\ b
,然后是正lookahead(? = ...)
对于其中一个标题:然后,您可以修剪并丢弃空的结果:
使用输入字符串列表,只需将此处理应用于所有处理:
它给出:
You want to split at the word boundary just before one of those three titles, so you can look for a word boundary
\b
followed by a positive lookahead(?=...)
for one of those titles:Then, you can trim and discard the empty results:
With a list of input strings, simply apply this treatment to all of them:
Which gives:
尝试:
打印:
Try:
Prints: