Performance_Python 根据元组 3 个元素中的 2 个获取 2 个元组列表的并集

发布于 2024-10-03 01:41:09 字数 776 浏览 5 评论 0原文

我的程序做得不太好。在循环中，来自每个处理器（元组列表）的数据被收集到需要通过删除相似元素来清理它的主处理器中。

我在互联网上发现了很多有趣的线索，尤其是在这个关于列表并集的网站中。但是，我还没有设法将其应用于我的问题。我的目标是摆脱最后两个元素与列表中另一个元组相似的元组。例如：

list1=[[a,b,c],[d,e,f],[g,h,i]]
list2=[[b,b,c],[d,e,a],[k,h,i]]
the result should be:
final=[[a,b,c],[d,e,f],[g,h,i],[d,e,a]]

现在我正在使用循环和中断，但我希望使这个过程更快。

这是我的代码的样子（结果和临时是我想要从中获取联合的列表）在 python2.6 上。

for k in xrange(len(temp)):
    u=0
    #index= next((j for j in xrange(lenres) if temp[k][1:3] == result[j][1:3]),None)
    for j in xrange(len(result)):
        if temp[k][1:3] == result[j][1:3]:
            u=1
            break
    if u==0:
    #if index is None:
        result.append([temp[k][0],temp[k][1],temp[k][2]])

感谢您的帮助

埃尔维

原文

My program is not doing a great job. In a loop, data from each processor (list of tuple) are gathered into the master processor that needs to clean it by removing similar element.

I found a lot of interesting clue on internet and especially in this site about union of list. However, i have not managed to apply it to my problem.
My aim is to get rid of tuple whose its two last element are similar to another tuple in the list . for example:

list1=[[a,b,c],[d,e,f],[g,h,i]]
list2=[[b,b,c],[d,e,a],[k,h,i]]
the result should be:
final=[[a,b,c],[d,e,f],[g,h,i],[d,e,a]]

Right now I'm using loops and break but I'm hoping to make this process faster.

here is what my code looks like (result and temp are the lists I want to get union from)
on python2.6.

for k in xrange(len(temp)):
    u=0
    #index= next((j for j in xrange(lenres) if temp[k][1:3] == result[j][1:3]),None)
    for j in xrange(len(result)):
        if temp[k][1:3] == result[j][1:3]:
            u=1
            break
    if u==0:
    #if index is None:
        result.append([temp[k][0],temp[k][1],temp[k][2]])

Thanks for your help

Herve

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

极致的悲 2024-10-10 01:41:09

下面是我们的 uniques 函数。它接受参数 l（列表）和 f（函数），返回删除重复项的列表（以相同的顺序）。重复项定义如下：b 是 a 的重复项当且仅当 f(b) == f(a)。

def uniques(l, f = lambda x: x):
    return [x for i, x in enumerate(l) if f(x) not in [f(y) for y in l[:i]]]

我们定义lastTwo如下：

lastTwo = lambda x: x[-2:]

对于您的问题，我们按如下方式使用它：

>>> list1
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i')]
>>> list2
[('b', 'b', 'c'), ('d', 'e', 'a'), ('k', 'h', 'i')]
>>> uniques(list1+list2, lastTwo)
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('d', 'e', 'a')]

如果您描述的用例出现很多，您可能想要定义

def hervesMerge(l1, l2):
    return uniques(l1+l2, lambda x: x[-2:])

Identity是我们的默认f，但它可以是任何东西（只要它是为所有元素定义的）列表，因为它们可以是任何类型）。

f 可以是列表的和、列表的奇数元素、整数的质因数等。（只要记住，如果它的单射就没有意义！通过常数、线性函数等相加将与恒等式 bc 没有什么不同，它的 f(x) == f(y) w/ x != y 会产生差异）

>>> list1
[(1, 2, 3, 4), (2, 5), (6, 2, 2), (3, 4), (8, 3), (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)]
>>> uniques(list1, sum)
[(1, 2, 3, 4), (2, 5), (8, 3)]
>>> uniques(list1, lambda x: reduce(operator.mul, x))  #product
[(1, 2, 3, 4), (2, 5), (3, 4), (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)]
>>> uniques([1,2,3,4,1,2]) #defaults to identity
[1, 2, 3, 4]

你似乎关心速度，但我的答案实际上集中在短期/灵活性上，而没有显着（或任何？）速度改进。对于速度是 z 关注的较大列表，您希望利用可散列检查，并且已知 list1 和 list2 没有重复项

>>> s = frozenset(i[-2:] for i in list1)
>>> ans = list(list1) #copy list1
>>> for i in list2:
        if i[-2:] not in s: ans.append(i)
>>> ans
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('d', 'e', 'a')]

或允许无序这一

>>> d = dict()
>>> for i in list2 + list1:
        d[i[-2:]] = i
>>> d.values()
[('d', 'e', 'f'), ('a', 'b', 'c'), ('g', 'h', 'i'), ('d', 'e', 'a')]

事实- 编辑 -

您应该始终能够避免非 pythonic 循环，例如你在你的问题中发帖。这是循环更改后的确切代码：

for k in temp:
  u=0
  for j in result:
      if k[1:3] == j[1:3]:
          u=1
          break
  if u==0:
  #if index is None:
      result.append([k[0],k[1],k[2]])   // k

result 和 temp 是可迭代的，对于任何可迭代的内容，您可以将其直接放入 for 循环中，而无需进行操作。如果出于某种原因您明确需要索引（这不是这种情况，但我上面有一个）您可以使用枚举。

Below is our uniques function. It takes arguments l (list) and f (function), returns list with duplicates removed (in the same order). Duplicates are defined by: b is duplicate of a iff f(b) == f(a).

def uniques(l, f = lambda x: x):
    return [x for i, x in enumerate(l) if f(x) not in [f(y) for y in l[:i]]]

We define lastTwo as follows:

lastTwo = lambda x: x[-2:]

For your problem we use it as follows:

>>> list1
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i')]
>>> list2
[('b', 'b', 'c'), ('d', 'e', 'a'), ('k', 'h', 'i')]
>>> uniques(list1+list2, lastTwo)
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('d', 'e', 'a')]

If the usecase you describe comes up a lot you may want to define

def hervesMerge(l1, l2):
    return uniques(l1+l2, lambda x: x[-2:])

Identity is our default f but it can be anything (so long as it is defined for all elements of the list, since they can be of any type).

f can be sum of a list, odd elements of a list, prime factors of an integer, anything. (Just remember that if its injective theres no point! Add by constant, linear functions, etc will work no differently than identity bc its f(x) == f(y) w/ x != y that makes the difference)

>>> list1
[(1, 2, 3, 4), (2, 5), (6, 2, 2), (3, 4), (8, 3), (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)]
>>> uniques(list1, sum)
[(1, 2, 3, 4), (2, 5), (8, 3)]
>>> uniques(list1, lambda x: reduce(operator.mul, x))  #product
[(1, 2, 3, 4), (2, 5), (3, 4), (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)]
>>> uniques([1,2,3,4,1,2]) #defaults to identity
[1, 2, 3, 4]

You seemed concerned about speed, but my answer really focused on shortness/flexibility without significant (or any?) speed improvment. For bigger lists where speed is z concern, you want to take advantage of hashable checks and the fact that list1 and list2 are known to have no duplicates

>>> s = frozenset(i[-2:] for i in list1)
>>> ans = list(list1) #copy list1
>>> for i in list2:
        if i[-2:] not in s: ans.append(i)
>>> ans
[('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i'), ('d', 'e', 'a')]

OR allowing disordering

>>> d = dict()
>>> for i in list2 + list1:
        d[i[-2:]] = i
>>> d.values()
[('d', 'e', 'f'), ('a', 'b', 'c'), ('g', 'h', 'i'), ('d', 'e', 'a')]

--Edit--

You should always be able to avoid un-pythonic looping like you post in your question. Here is your exact code with the loops changed:

for k in temp:
  u=0
  for j in result:
      if k[1:3] == j[1:3]:
          u=1
          break
  if u==0:
  #if index is None:
      result.append([k[0],k[1],k[2]])   // k

result and temp are iterable, and for anything iterable you can put it directly in the for loop without eanges. If for some reason you explicitly need the index (this is not such a case, but I have one above) you can use enumerate.

回复收藏 0 原文

枯寂 2024-10-10 01:41:09

这是使用集合的简单解决方案：

list1=[('a','b','c'),('d','e','f'),('g','h','i')]
list2=[('b','b','c'),('d','e','a'),('k','h','i')]

set1 = set([A[1:3] for A in list1])
final = list1 + [A for A in list2 if A[1:3] not in set1]

但是，如果您的 list1 和 list2 实际上不是由元组组成，那么您必须将 tuple() 放在 A[1:3] 周围。

Here's a simple solution using a set:

list1=[('a','b','c'),('d','e','f'),('g','h','i')]
list2=[('b','b','c'),('d','e','a'),('k','h','i')]

set1 = set([A[1:3] for A in list1])
final = list1 + [A for A in list2 if A[1:3] not in set1]

However, if your list1 and list2 aren't actually made of tuples, then you will have to put tuple() around A[1:3].

回复收藏 0 原文

~没有更多了~