用于同时过滤和转换的列表理解中的中间变量

发布于 2024-09-30 18:43:50 字数 770 浏览 3 评论 0原文

我有一个想要标准化的向量列表（在 Python 中），同时删除最初具有较小范数的向量。

输入列表是，例如

a = [(1,1),(1,2),(2,2),(3,4)]

，我需要输出为 (x*n, y*n) ，其中 n = (x**2+y**2)**-0.5

例如，如果我只需要规范，那么使用列表理解就很容易：

an = [ (x**2+y**2)**0.5 for x,y in a ]

例如，仅存储标准化的 x 也很容易，但我想要的是这个临时变量“n”，用于两次计算，然后将其丢弃。

我也不能只使用 lambda 函数，因为我还需要 n 来过滤列表。那么最好的方法是什么？

现在我在这里使用这个嵌套列表理解（在内部列表中使用表达式）：

a = [(1,1),(1,2),(2,2),(3,4)]

[(x*n,y*n) for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

# Out[14]: 
# [(0.70710678118654757, 0.70710678118654757),
#  (0.60000000000000009, 0.80000000000000004)]

内部列表生成具有额外值（n）的元组，然后我使用这些值进行计算和过滤。这真的是最好的方法吗？我应该注意哪些严重的低效率问题吗？

原文

I have a list of vectors (in Python) that I want to normalize, while at the same time removing the vectors that originally had small norms.

The input list is, e.g.

a = [(1,1),(1,2),(2,2),(3,4)]

And I need the output to be (x*n, y*n) with n = (x**2+y**2)**-0.5

If I just needed the norms, for example, that would be easy with a list comprehension:

an = [ (x**2+y**2)**0.5 for x,y in a ]

It would be also easy to store just a normalized x, too, for example, but what I want is to have this temporary variable "n", to use in two calculations, and then throw it away.

I can't just use a lambda function too because I also need the n to filter the list. So what is the best way?

Right now I am using this nested list comprehension here (with an expression in the inner list):

a = [(1,1),(1,2),(2,2),(3,4)]

[(x*n,y*n) for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

# Out[14]: 
# [(0.70710678118654757, 0.70710678118654757),
#  (0.60000000000000009, 0.80000000000000004)]

The inner list generates tuples with an extra value (n), and then I use these values for the calculations and filtering. Is this really the best way? Are there any terrible inefficiencies I should be aware of?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

开始看清了 2024-10-07 18:43:50

这真的是最好的方法吗？

嗯，它确实有效地工作，如果你真的非常想写一行行，那么这是你能做的最好的事情。

另一方面，一个简单的 4 行函数可以更清晰地完成相同的操作：

def normfilter(vecs, min_norm):
    for x,y in vecs:
        n = (x**2.+y**2.)**-0.5
        if min_norm < n:
            yield (x*n,y*n)

normalized = list(normfilter(vectors, 0.4))

顺便说一句，您的代码或描述中有一个错误 - 您说您过滤掉了短向量，但您的代码执行了相反的操作：p

Is this really the best way?

Well, it does work efficiently and if you really, really want to write oneliners then it's the best you can do.

On the other hand, a simple 4 line function would do the same much clearer:

def normfilter(vecs, min_norm):
    for x,y in vecs:
        n = (x**2.+y**2.)**-0.5
        if min_norm < n:
            yield (x*n,y*n)

normalized = list(normfilter(vectors, 0.4))

Btw, there is a bug in your code or description - you say you filter out short vectors but your code does the opposite :p

回复收藏 0 原文

傲娇萝莉攻 2024-10-07 18:43:50

从 Python 3.8 开始，并引入赋值表达式 (PEP 572) （:= 运算符），可以在列表理解中使用局部变量，以避免多次调用相同的表达式：

在我们的例子中，我们可以命名计算(x**2.+y**2.)**-.5 作为变量 n，同时使用表达式的结果来过滤列表 if < code>n 低于 0.4；从而重新使用 n 来生成映射值：

# vectors = [(1, 1), (1, 2), (2, 2), (3, 4)]
[(x*n, y*n) for x, y in vectors if (n := (x**2.+y**2.)**-.5) < .4]
# [(0.7071067811865476, 0.7071067811865476), (0.6000000000000001, 0.8)]

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling multiple times the same expression:

In our case, we can name the evaluation of (x**2.+y**2.)**-.5 as a variable n while using the result of the expression to filter the list if n is inferior than 0.4; and thus re-use n to produce the mapped value:

# vectors = [(1, 1), (1, 2), (2, 2), (3, 4)]
[(x*n, y*n) for x, y in vectors if (n := (x**2.+y**2.)**-.5) < .4]
# [(0.7071067811865476, 0.7071067811865476), (0.6000000000000001, 0.8)]

回复收藏 0 原文

空城仅有旧梦在 2024-10-07 18:43:50

这表明使用 forloop 可能是最快的方法。请务必在您自己的计算机上检查 timeit 结果，因为这些结果可能会因多种因素（硬件、操作系统、Python 版本、a 的长度等）而有所不同。

a = [(1,1),(1,2),(2,2),(3,4)]

def two_lcs(a):
    an = [ ((x**2+y**2)**0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 0.4 ]
    return an

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**0.5
        if n<0.4:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

产生这些 timeit 结果：

% python -mtimeit -s'import test' 'test.using_forloop(test.a)'
100000 loops, best of 3: 3.29 usec per loop
% python -mtimeit -s'import test' 'test.two_lcs(test.a)'
100000 loops, best of 3: 4.52 usec per loop
% python -mtimeit -s'import test' 'test.using_lc(test.a)'
100000 loops, best of 3: 6.97 usec per loop

This suggests using a forloop might be the fastest way. Be sure to check the timeit results on your own machine, as these results can vary depending on a number of factors (hardware, OS, Python version, length of a, etc.).

a = [(1,1),(1,2),(2,2),(3,4)]

def two_lcs(a):
    an = [ ((x**2+y**2)**0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 0.4 ]
    return an

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**0.5
        if n<0.4:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 0.4]

yields these timeit results:

% python -mtimeit -s'import test' 'test.using_forloop(test.a)'
100000 loops, best of 3: 3.29 usec per loop
% python -mtimeit -s'import test' 'test.two_lcs(test.a)'
100000 loops, best of 3: 4.52 usec per loop
% python -mtimeit -s'import test' 'test.using_lc(test.a)'
100000 loops, best of 3: 6.97 usec per loop

回复收藏 0 原文

慢慢从新开始 2024-10-07 18:43:50

从unutbu窃取代码，这里是一个更大的测试，包括numpy版本和迭代器版本。请注意，将列表转换为 numpy 可能需要一些时间。

import numpy

# a = [(1,1),(1,2),(2,2),(3,4)]
a=[]
for k in range(1,10):
    for j in range(1,10):
        a.append( (float(k),float(j)) )

npa = numpy.array(a)

def two_lcs(a):
    an = [ ((x**2+y**2)**-0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 5.0 ]
    return an

def using_iterator(a):
    def normfilter(vecs, min_norm):
        for x,y in vecs:
            n = (x**2.+y**2.)**-0.5
            if n < min_norm:
                yield (x*n,y*n)

    return list(normfilter(a, 5.0))

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**-0.5
        if n<5.0:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 5.0]


def using_numpy(npa):
    n = (npa[:,0]**2+npa[:,1]**2)**-0.5
    where = n<5.0
    npa = npa[where]
    n = n[where]
    npa[:,0]=npa[:,0]*n
    npa[:,1]=npa[:,1]*n
    return( npa )

结果...

nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.two_lcs(test.a)'
10000 loops, best of 3: 65.8 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_lc(test.a)'
10000 loops, best of 3: 65.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_forloop(test.a)'
10000 loops, best of 3: 64.1 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_iterator(test.a)'
10000 loops, best of 3: 59.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_numpy(test.npa)'
10000 loops, best of 3: 48.7 usec per loop

Stealing the code from unutbu, here is a larger test including a numpy version and the iterator version. Notice that converting the list to numpy can cost some time.

import numpy

# a = [(1,1),(1,2),(2,2),(3,4)]
a=[]
for k in range(1,10):
    for j in range(1,10):
        a.append( (float(k),float(j)) )

npa = numpy.array(a)

def two_lcs(a):
    an = [ ((x**2+y**2)**-0.5, x,y) for x,y in a ]
    an = [ (x*n,y*n) for n,x,y in an if n < 5.0 ]
    return an

def using_iterator(a):
    def normfilter(vecs, min_norm):
        for x,y in vecs:
            n = (x**2.+y**2.)**-0.5
            if n < min_norm:
                yield (x*n,y*n)

    return list(normfilter(a, 5.0))

def using_forloop(a):
    result=[]
    for x,y in a:
        n=(x**2+y**2)**-0.5
        if n<5.0:
            result.append((x*n,y*n))
    return result

def using_lc(a):    
    return [(x*n,y*n)
            for (n,x,y) in (( (x**2.+y**2.)**-0.5 ,x,y) for x,y in a) if n < 5.0]


def using_numpy(npa):
    n = (npa[:,0]**2+npa[:,1]**2)**-0.5
    where = n<5.0
    npa = npa[where]
    n = n[where]
    npa[:,0]=npa[:,0]*n
    npa[:,1]=npa[:,1]*n
    return( npa )

and the result...

nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.two_lcs(test.a)'
10000 loops, best of 3: 65.8 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_lc(test.a)'
10000 loops, best of 3: 65.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_forloop(test.a)'
10000 loops, best of 3: 64.1 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_iterator(test.a)'
10000 loops, best of 3: 59.6 usec per loop
nlw@pathfinder:~$ python -mtimeit -s'import test' 'test.using_numpy(test.npa)'
10000 loops, best of 3: 48.7 usec per loop

回复收藏 0 原文

~没有更多了~