如何使用Python删除字符串中的重复单词？

发布于 2024-12-10 11:44:39 字数 266 浏览 0 评论 0原文

以下示例：

string1 = "calvin klein design dress calvin klein"

如何删除后两个重复的 "calvin" 和 "klein"？

结果应该看起来

string2 = "calvin klein design dress"

只应删除第二个重复项，并且不应更改单词的顺序！

原文

Following example:

string1 = "calvin klein design dress calvin klein"

How can I remove the second two duplicates "calvin" and "klein"?

The result should look like

string2 = "calvin klein design dress"

only the second duplicates should be removed and the sequence of the words should not be changed!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清浅ˋ旧时光 2024-12-17 11:44:40

string = 'calvin klein design dress calvin klein'

def uniquify(string):
    output = []
    seen = set()
    for word in string.split():
        if word not in seen:
            output.append(word)
            seen.add(word)
    return ' '.join(output)

print uniquify(string)

string = 'calvin klein design dress calvin klein'

def uniquify(string):
    output = []
    seen = set()
    for word in string.split():
        if word not in seen:
            output.append(word)
            seen.add(word)
    return ' '.join(output)

print uniquify(string)

回复收藏 0 原文

杯别 2024-12-17 11:44:40

您可以使用一组来跟踪已处理的单词。

words = set()
result = ''
for word in string1.split():
    if word not in words:
        result = result + word + ' '
        words.add(word)
print result

You can use a set to keep track of already processed words.

words = set()
result = ''
for word in string1.split():
    if word not in words:
        result = result + word + ' '
        words.add(word)
print result

回复收藏 0 原文

无声静候 2024-12-17 11:44:40

有几个答案与此非常接近，但还没有完全达到我的预期：

def uniques( your_string ):    
    seen = set()
    return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )

当然，如果您想要它更干净或更快一点，我们可以重构一下：

def uniques( your_string ):    
    words = your_string.split()

    seen = set()
    seen_add = seen.add

    def add(x):
        seen_add(x)  
        return x

    return ' '.join( add(i) for i in words if i not in seen )

我认为第二个版本的性能尽可能高获取少量代码。（可以使用更多代码在一次扫描输入字符串中完成所有工作，但对于大多数工作负载来说，这应该足够了。）

Several answers are pretty close to this but haven't quite ended up where I did:

def uniques( your_string ):    
    seen = set()
    return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )

Of course, if you want it a tiny bit cleaner or faster, we can refactor a bit:

def uniques( your_string ):    
    words = your_string.split()

    seen = set()
    seen_add = seen.add

    def add(x):
        seen_add(x)  
        return x

    return ' '.join( add(i) for i in words if i not in seen )

I think the second version is about as performant as you can get in a small amount of code. (More code could be used to do all the work in a single scan across the input string but for most workloads, this should be sufficient.)

回复收藏 0 原文

锦爱 2024-12-17 11:44:40

问题：去除字符串中的重复项

 from _collections import OrderedDict

    a = "Gina Gini Gini Protijayi"

    aa = OrderedDict().fromkeys(a.split())
    print(' '.join(aa))
   # output => Gina Gini Protijayi

Question: Remove the duplicates in a string

 from _collections import OrderedDict

    a = "Gina Gini Gini Protijayi"

    aa = OrderedDict().fromkeys(a.split())
    print(' '.join(aa))
   # output => Gina Gini Protijayi

回复收藏 0 原文

舂唻埖巳落 2024-12-17 11:44:40

使用numpy函数
进行导入最好有一个导入别名（如 np）

import numpy as np

，然后你可以像这样 bing 它
您可以在您的情况下使用它，您可以使用

no_duplicates_array = np.unique(your_array)

为了从数组中删除重复项，如果您想要字符串结果，

no_duplicates_string = ' '.join(np.unique(your_string.split()))

Use numpy function
make an import its better to have an alias for the import (as np)

import numpy as np

and then you can bing it like this
for removing duplicates from array you can use it this way

no_duplicates_array = np.unique(your_array)

for your case if you want result in string you can use

no_duplicates_string = ' '.join(np.unique(your_string.split()))

回复收藏 0 原文

许你一世情深 2024-12-17 11:44:40

要从句子中删除重复的单词并保留单词的顺序，您可以使用 dict.fromkeys 方法。

string1 = "calvin klein design dress calvin klein"

words = string1.split()

result = " ".join(list(dict.fromkeys(words)))

print(result)

To remove duplicate words from sentence and preserve the order of the words you can use dict.fromkeys method.

string1 = "calvin klein design dress calvin klein"

words = string1.split()

result = " ".join(list(dict.fromkeys(words)))

print(result)

回复收藏 0 原文

岁月染过的梦 2024-12-17 11:44:40

11 和 2 完美工作：

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

和 2

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

11 and 2 work perfectly:

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

and 2

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

回复收藏 0 原文

卷耳 2024-12-17 11:44:40

您可以使用以下代码从文本文件或字符串中删除重复或重复的单词 -

from collections import Counter
for lines in all_words:

    line=''.join(lines.lower())
    new_data1=' '.join(lemmatize_sentence(line))
    new_data2 = word_tokenize(new_data1)
    new_data3=nltk.pos_tag(new_data2)

    # below code is for removal of repeated words

    for i in range(0, len(new_data3)):
        new_data3[i] = "".join(new_data3[i])
    UniqW = Counter(new_data3)
    new_data5 = " ".join(UniqW.keys())
    print (new_data5)


    new_data.append(new_data5)


print (new_data)

PS - 根据需要进行标识。
希望这有帮助！

You can remove duplicate or repeated words from a text file or string using following codes -

from collections import Counter
for lines in all_words:

    line=''.join(lines.lower())
    new_data1=' '.join(lemmatize_sentence(line))
    new_data2 = word_tokenize(new_data1)
    new_data3=nltk.pos_tag(new_data2)

    # below code is for removal of repeated words

    for i in range(0, len(new_data3)):
        new_data3[i] = "".join(new_data3[i])
    UniqW = Counter(new_data3)
    new_data5 = " ".join(UniqW.keys())
    print (new_data5)


    new_data.append(new_data5)


print (new_data)

P.S. -Do identations as per required.
Hope this helps!!!

回复收藏 0 原文

栀梦 2024-12-17 11:44:40

不使用 split 功能（在面试中会有帮助）

def unique_words2(a):
    words = []
    spaces = ' '
    length = len(a)
    i = 0
    while i < length:
        if a[i] not in spaces:
            word_start = i
            while i < length and a[i] not in spaces:
                i += 1
            words.append(a[word_start:i])
        i += 1
    words_stack = []
    for val in words:  #
        if val not in words_stack:  # We can replace these three lines with this one -> [words_stack.append(val) for val in words if val not in words_stack]
            words_stack.append(val)  #
    print(' '.join(words_stack))  # or return, your choice


unique_words2('calvin klein design dress calvin klein')

Without using the split function (will help in interviews)

def unique_words2(a):
    words = []
    spaces = ' '
    length = len(a)
    i = 0
    while i < length:
        if a[i] not in spaces:
            word_start = i
            while i < length and a[i] not in spaces:
                i += 1
            words.append(a[word_start:i])
        i += 1
    words_stack = []
    for val in words:  #
        if val not in words_stack:  # We can replace these three lines with this one -> [words_stack.append(val) for val in words if val not in words_stack]
            words_stack.append(val)  #
    print(' '.join(words_stack))  # or return, your choice


unique_words2('calvin klein design dress calvin klein')

回复收藏 0 原文

不寐倦长更 2024-12-17 11:44:40

初始化列表

listA = [ 'xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']

print("Given list : ",listA)

使用 `set()` 和 `split()`

res = [set(sub.split('-')) for sub in listA]

结果

print("List after duplicate removal :", res)

initializing list

listA = [ 'xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']

print("Given list : ",listA)

using `set()` and `split()`

res = [set(sub.split('-')) for sub in listA]

Result

print("List after duplicate removal :", res)

回复收藏 0 原文

梦年海沫深 2024-12-17 11:44:40

import re

# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"

# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
    text = file.read()

# Elimină cuvintele duplicate
result = re.sub(r'\b(\w+)\b(?=.*\b\1\b)', '', text)

# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",")

# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
    file.write(result)

或者这个

def remove_duplicates(words):
    words_stack = []
    for val in words:
        if val not in words_stack:
            words_stack.append(val)
    return words_stack

input_file = r'g:\Pyton+ChatGPT\dictionar.txt'
output_file = r'g:\Pyton+ChatGPT\dictionar_no_duplicates.txt'

with open(input_file, 'r', encoding='utf-8') as f:
    words = f.read().splitlines()

unique_words = remove_duplicates(words)

with open(output_file, 'w', encoding='utf-8') as f:
    for word in unique_words:
        f.write(word + '\n')

print("Duplicate removal completed.")

或者这个

import re

# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"

# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
    text = file.read()

# Crează o listă pentru cuvintele eliminate
removed_words = []

# Funcție callback pentru a adăuga cuvintele duplicate în listă
def replace_and_collect(match):
    word = match.group(1)
    if word not in removed_words:
        removed_words.append(word)
    return ''

# Elimină cuvintele duplicate și virgula asociată folosind funcția callback
result = re.sub(r'\b(\w+)\b,?(?=.*\b\1\b)', replace_and_collect, text)

# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",").strip(", ")

# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
    file.write(result)

# Afișează informații despre cuvintele eliminate
print(f"Numărul de cuvinte duplicate eliminate: {len(removed_words)}")
print(f"Cuvintele eliminate: {', '.join(removed_words)}")

import re

# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"

# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
    text = file.read()

# Elimină cuvintele duplicate
result = re.sub(r'\b(\w+)\b(?=.*\b\1\b)', '', text)

# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",")

# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
    file.write(result)

OR THIS

def remove_duplicates(words):
    words_stack = []
    for val in words:
        if val not in words_stack:
            words_stack.append(val)
    return words_stack

input_file = r'g:\Pyton+ChatGPT\dictionar.txt'
output_file = r'g:\Pyton+ChatGPT\dictionar_no_duplicates.txt'

with open(input_file, 'r', encoding='utf-8') as f:
    words = f.read().splitlines()

unique_words = remove_duplicates(words)

with open(output_file, 'w', encoding='utf-8') as f:
    for word in unique_words:
        f.write(word + '\n')

print("Duplicate removal completed.")

OR THIS

import re

# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"

# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
    text = file.read()

# Crează o listă pentru cuvintele eliminate
removed_words = []

# Funcție callback pentru a adăuga cuvintele duplicate în listă
def replace_and_collect(match):
    word = match.group(1)
    if word not in removed_words:
        removed_words.append(word)
    return ''

# Elimină cuvintele duplicate și virgula asociată folosind funcția callback
result = re.sub(r'\b(\w+)\b,?(?=.*\b\1\b)', replace_and_collect, text)

# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",").strip(", ")

# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
    file.write(result)

# Afișează informații despre cuvintele eliminate
print(f"Numărul de cuvinte duplicate eliminate: {len(removed_words)}")
print(f"Cuvintele eliminate: {', '.join(removed_words)}")

回复收藏 0 原文

冷…雨湿花 2024-12-17 11:44:40

您只需获取与字符串关联的集合即可做到这一点，字符串是一个根据定义不包含重复元素的数学对象。将集合中的单词重新连接成字符串就足够了：

def remove_duplicate_words(string):
        x = string.split()
        x = sorted(set(x), key = x.index)
        return ' '.join(x)

You can do that simply by getting the set associated to the string, which is a mathematical object containing no repeated elements by definition. It suffices to join the words in the set back into a string:

def remove_duplicate_words(string):
        x = string.split()
        x = sorted(set(x), key = x.index)
        return ' '.join(x)

回复收藏 0 原文

格子衫的從容 2024-12-17 11:44:39

string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))

这将根据原始单词列表中的单词索引对字符串中所有（唯一）单词的集合进行排序。

string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))

This sorts the set of all the (unique) words in your string by the word's index in the original list of words.

回复收藏 0 原文

阳光下慵懒的猫 2024-12-17 11:44:39

def unique_list(l):
    ulist = []
    [ulist.append(x) for x in l if x not in ulist]
    return ulist

a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))

def unique_list(l):
    ulist = []
    [ulist.append(x) for x in l if x not in ulist]
    return ulist

a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))

回复收藏 0 原文

停顿的约定 2024-12-17 11:44:39

在 Python 2.7+ 中，您可以使用 collections.OrderedDict 为此：

from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())

In Python 2.7+, you could use collections.OrderedDict for this:

from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())

回复收藏 0 原文

那片花海 2024-12-17 11:44:39

从 itertools 食谱剪切并粘贴

from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

我真的希望他们可以继续制作一个很快就可以摆脱这些食谱了。我非常希望能够执行 from itertools_recipes import unique_everseen 而不是每次需要时都使用剪切和粘贴。

像这样使用：

def unique_words(string, ignore_case=False):
    key = None
    if ignore_case:
        key = str.lower
    return " ".join(unique_everseen(string.split(), key=key))

string2 = unique_words(string1)

Cut and paste from the itertools recipes

from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

I really wish they could go ahead and make a module out of those recipes soon. I'd very much like to be able to do from itertools_recipes import unique_everseen instead of using cut-and-paste every time I need something.

Use like this:

def unique_words(string, ignore_case=False):
    key = None
    if ignore_case:
        key = str.lower
    return " ".join(unique_everseen(string.split(), key=key))

string2 = unique_words(string1)

回复收藏 0 原文

飘过的浮云 2024-12-17 11:44:39

string2 = ' '.join(set(string1.split()))

说明：

.split() - 这是一种将字符串拆分为列表的方法（没有参数，它用空格拆分）
set() - 它是排除重复项的无序集合类型
'separator'.join(list) - 表示您想要将列表从参数连接到字符串，并在元素之间使用“分隔符”

string2 = ' '.join(set(string1.split()))

Explanation:

.split() - it is a method to split string to list (without params it split by spaces)
set() - it is type of unordered collections that exclude dublicates
'separator'.join(list) - mean that you want to join list from params to string with 'separator' between elements

回复收藏 0 原文

~没有更多了~