使用python的lambda、map的高效方法

发布于 2024-08-03 02:56:07 字数 343 浏览 11 评论 0原文

我需要在 Bigtable(db) 中存储一个大的整数列表。为了提高效率,我将它们存储为两个连续项目之间的差异。

例如:

 original_list = [1005, 1004, 1003, 1004, 1006] 

将上面的列表(实际上包含超过 1000k 个项目)存储为

start = 1005
diff = [-1, -1, 1, 2]

我可以管理的最接近的是,

ltp = [start]
map(lambda x: ltp.append(ltp[-1] + x), tick)

我正在寻找一种有效的方法将其转换回原始列表。

I need to store a big list of integers in Bigtable(db). For efficiency I am storing them as diff between 2 consecutive items.

for eg:

 original_list = [1005, 1004, 1003, 1004, 1006] 

Storing the above list(which actually contains more than 1000k items) as

start = 1005
diff = [-1, -1, 1, 2]

The closest I could manage is,

ltp = [start]
map(lambda x: ltp.append(ltp[-1] + x), tick)

I am looking for an efficient way to convert it back into original list.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

策马西风 2024-08-10 02:56:07

对于如此大的数据结构,numpy 可以很好地工作。对于这个例子,它的速度超过 200 倍(见下文),并且更容易编码,基本上只是

add.accumulate(diff)

numpy 和直接列表操作之间的比较:

import numpy as nx
import timeit

N = 10000

diff_nx = nx.zeros(N, dtype=nx.int)
diff_py = list(diff_nx)

start = 1005

def f0():
    orig = [start]
    for x in diff_py: 
        orig.append(orig[-1] + x)

def f1():
    diff_nx[0] = start
    nx.add.accumulate(diff_nx)

t = timeit.Timer("f0()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)
t = timeit.Timer("f1()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)

给出

13.4044158459     # for list looping
0.0474112033844   # for numpy accumulate

实际上,不过,重用已建立的压缩似乎更好算法,就像可以使用 PyTables 轻松完成,而不是像您正在做的那样滚动自己这里。

另外,在这里,我建议您在读取数据时为前置的开始术语留出空间,而不是使用前置的术语重建列表,当然,这样您就不必进行复制。

For such large data structures numpy will work well. For this example, it's over 200x faster (see below), and a bit easier to code, basically just

add.accumulate(diff)

Comparison between numpy and direct list manipulation:

import numpy as nx
import timeit

N = 10000

diff_nx = nx.zeros(N, dtype=nx.int)
diff_py = list(diff_nx)

start = 1005

def f0():
    orig = [start]
    for x in diff_py: 
        orig.append(orig[-1] + x)

def f1():
    diff_nx[0] = start
    nx.add.accumulate(diff_nx)

t = timeit.Timer("f0()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)
t = timeit.Timer("f1()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)

gives

13.4044158459     # for list looping
0.0474112033844   # for numpy accumulate

Really, though, it seems better to reuse an established compression algorithm, like can easily be done with PyTables, rather than rolling your own like it seems that you're doing here.

Also, here, I'm suggesting that you read in the data with room for the prepended start term, rather than rebuild the list with the prepended term, of course, so you don't have to do the copy.

夏末染殇 2024-08-10 02:56:07

以下对我有用:

orig = [start]
for x in diff:
    orig.append(orig[-1] + x)

使用 map 将创建一个相同大小的新数组,并填充 None。我还发现一个简单的 for 循环更具可读性,在这种情况下,速度尽可能快。

The following works for me:

orig = [start]
for x in diff:
    orig.append(orig[-1] + x)

Using map will create an new array of the same size, filled with None. I also find a simple for loop more readable, and in this case as fast as you can get.

不念旧人 2024-08-10 02:56:07

非常适合发电机:

def diff2abs( diffs, start ):
    yield start
    for diff in diffs:
        start += diff
        yield start

start = 1005
diffs = [-1, -1, 1, 2]
original_list = list( diff2abs( diffs, start ))

Perfect for generators:

def diff2abs( diffs, start ):
    yield start
    for diff in diffs:
        start += diff
        yield start

start = 1005
diffs = [-1, -1, 1, 2]
original_list = list( diff2abs( diffs, start ))
花海 2024-08-10 02:56:07

其他几个受访者对您要求的算法有合理的实现,但我不清楚您真正想要解决的问题到底是什么。

除非存储的数字非常大(即溢出整数并需要 bignum),否则差异列表不会为您带来任何效率 - 整数是来自 Python 运行时 POV 的整数,因此您的示例“diff”列表[-1, -1, 1, 2] 将消耗与原始列表 [1005, 1004, 1003, 1004, 1006] 一样多的内存。

Several of the other respondents have reasonable implementations of the algorithm you asked for, but I'm unclear on exactly what problem it is you're really trying to solve.

Unless the numbers being stored are very large (i.e., overflow an integer and require bignums), your list of diffs won't gain you any efficiency -- an integer is an integer from the Python runtime POV, so your example "diff" list of [-1, -1, 1, 2] will consume just as much memory as the original list [1005, 1004, 1003, 1004, 1006].

剩一世无双 2024-08-10 02:56:07
class runningtotal:
    def __init__(self, start = 0):
        self.total = start
    def __call__(self, value):
        self.total += value
        return self.total

现在尝试:

>>> map(runningtotal(start), [0,]+diff)
[1005, 1004, 1003, 1004, 1006]
class runningtotal:
    def __init__(self, start = 0):
        self.total = start
    def __call__(self, value):
        self.total += value
        return self.total

Now try:

>>> map(runningtotal(start), [0,]+diff)
[1005, 1004, 1003, 1004, 1006]
墨小沫ゞ 2024-08-10 02:56:07

正如 mshsayem 建议的那样,使用列表推导式 - 它们通常比 for 循环或 map/lambda 更快(根据 Mark Lutz 的《学习 Python》一书)。

如果你真的想使用更 FP 式的解决方案,正确的函数是“scan”,[我相信]它没有在 Python 中实现,所以你必须自己实现它(这不是一项艰巨的任务)。

“扫描”基本上是一个减少,但它不是将列表减少为单个值,而是将每次“迭代”的结果存储在新列表中。

如果你实现了它,你可以这样做:

scan(lambda x,y: x+y, [start]++diff)

As mshsayem suggested, use list comprehensions - they are generally faster than for loops or map/lambdas (according do Mark Lutz's book Learning Python).

If you really want to use an more FP-ish solution, the proper function would be "scan", wich [I believe] isn't implemented in Python so you would have to implement it yourself (which is not a hard task).

"scan" is basically a reduce, but instead of reducing the list to a single value, it stores the result of each "iteration" in a new list.

If you implemented it, you could do something like:

scan(lambda x,y: x+y, [start]++diff)
甜扑 2024-08-10 02:56:07

虽然我不明白为什么这应该更有效,但我很确定 for 循环将提供最佳性能:

l = [start]
for i in diff:
    l.append(l[-1] + i)

Although I don't get why this should be more efficient, I am pretty sure a for loop will give the best performance:

l = [start]
for i in diff:
    l.append(l[-1] + i)
安人多梦 2024-08-10 02:56:07

我不知道您将整数存储为差异的原因 - rcoder 给出了一个很好的答案,说明为什么这通常并不比存储整数本身更有效 - 但如果您不需要访问整个列表立刻,使用生成器在内存方面会更有效。既然你说这是一个“大列表”,那么你可以通过这种方式节省大量内存,而不是一次分配整个列表。下面是一个生成器理解来获取您的列表:

start = 1005
def mod_start(x):
    global start
    start += x
    return start
int_generator = (mod_start(i) for i in diffs)

然后您可以像任何列表一样迭代 int_generator ,而无需立即将整个列表存储在内存中。但请注意,您不能为生成器添加下标或切片,但可以在许多有用的情况下使用它。

您可以清理该示例,以便起始变量不需要是全局的。它只是不能位于 mod_start 函数的本地。

编辑:您不必使用生成器理解来获取生成器。您还可以将生成器函数与yield 表达式一起使用,就像 THC4k 所做的那样。这避免了起始变量作用域问题,并且可能更干净一些。您还可以随时从生成器获取列表,方法是将其传递给 list() 内置函数。

I don't know about your reasoning for storing the integers as diffs -- rcoder gave a good answer about why this generally is not more efficient than storing the integers themselves -- but if you don't need to have access to the entire list at once, it's more efficient memory-wise for you to use a generator. Since you say this is a "big list," you can save a lot of memory this way, instead of allocating the entire list at once. Here's a generator comprehension to get your list back:

start = 1005
def mod_start(x):
    global start
    start += x
    return start
int_generator = (mod_start(i) for i in diffs)

You can then iterate over int_generator like you would any list, without having the entire list in memory at once. Note, however, that you cannot subscript or slice a generator, but you can use it in many useful situations.

You can clean up the example so that the start variable does not need to be global. It just can't be local to the mod_start function.

Edit: You don't have to use the generator comprehension to get a generator. You can also use a generator function with the yield expression, like THC4k did. That avoids the start variable scope issue and is probably a little cleaner. You can also get a list from a generator at any time by passing it to the list() built-in function.

野鹿林 2024-08-10 02:56:07

不评论这个的性能,但是你可以在这里使用reduce。

start = 1005
diffs = [-1,-1,1,2]
reduce(lambda undiffed_list, diff: undiffed_list + [undiffed_list[-1] + diff],diffs,[start])

给你你想要的。

No comment on the performance of this, but you can use reduce here.

start = 1005
diffs = [-1,-1,1,2]
reduce(lambda undiffed_list, diff: undiffed_list + [undiffed_list[-1] + diff],diffs,[start])

gets you what you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文