Haskell 中动态规划的高效表
我已经用Haskell编写了0-1背包问题。我对迄今为止所取得的懒惰和普遍性水平感到相当自豪。
我首先提供用于创建和处理惰性二维矩阵的函数。
mkList f = map f [0..]
mkTable f = mkList (\i -> mkList (\j -> f i j))
tableIndex table i j = table !! i !! j
然后,我为给定的背包问题制作一个特定的表格
knapsackTable = mkTable f
where f 0 _ = 0
f _ 0 = 0
f i j | ws!!i > j = leaveI
| otherwise = max takeI leaveI
where takeI = tableIndex knapsackTable (i-1) (j-(ws!!i)) + vs!!i
leaveI = tableIndex knapsackTable (i-1) j
-- weight value pairs; item i has weight ws!!i and value vs!!i
ws = [0,1,2, 5, 6, 7] -- weights
vs = [0,1,7,11,21,31] -- values
,并使用几个辅助函数来查看该表格,
viewTable table maxI maxJ = take (maxI+1) . map (take (maxJ+1)) $ table
printTable table maxI maxJ = mapM_ print $ viewTable table maxI maxJ
这非常简单。但我想更进一步。
我想要一个更好的表数据结构。理想情况下,它应该是
未装箱(不可变)[编辑]别介意这个- 惰性
- 无界
O(1) 构造
- 查找给定条目的
O(1)
时间复杂度,
(更现实地说,最坏的情况是O(log n)
,其中 n 是i*j
,用于查找第 i 行、j 列的条目)
如果您能解释一下,即可加分您的解决方案为何/如何满足这些理想。
如果您可以进一步概括 knapsackTable
,并证明它是有效的,也会加分。
在改进数据结构时,您应该尝试满足以下目标:
- 如果我要求最大权重为 10 的解决方案(在我当前的代码中,这将是
indexTable knapsackTable 5 10
,5 意味着包括第 1-5 项),仅应执行最少量的必要工作。理想情况下,这意味着不需要O(i*j)
来强制表格每行的主干达到必要的列长度。如果您认为 DP 意味着评估整个表,那么您可以说这不是“真正的”DP。 - 如果我要求打印整个表格(例如
printTable knapsackTable 5 10
),则每个条目的值应该计算一次且仅一次。给定单元格的值应取决于其他单元格的值(DP 风格:想法是,永远不要重新计算同一子问题两次)
想法:
- Data.Array 有界 :(
- UArray strict :(
- 记忆技术(关于Haskell中的DP的问题)这可能有效
对我所说的做出一些妥协的答案只要它们提供了丰富的信息,就会被投票赞成。妥协最少的答案可能是“被接受”的答案。
I've coded up the 0-1 Knapsack problem in Haskell. I'm fairly proud about the laziness and level of generality achieved so far.
I start by providing functions for creating and dealing with a lazy 2d matrix.
mkList f = map f [0..]
mkTable f = mkList (\i -> mkList (\j -> f i j))
tableIndex table i j = table !! i !! j
I then make a specific table for a given knapsack problem
knapsackTable = mkTable f
where f 0 _ = 0
f _ 0 = 0
f i j | ws!!i > j = leaveI
| otherwise = max takeI leaveI
where takeI = tableIndex knapsackTable (i-1) (j-(ws!!i)) + vs!!i
leaveI = tableIndex knapsackTable (i-1) j
-- weight value pairs; item i has weight ws!!i and value vs!!i
ws = [0,1,2, 5, 6, 7] -- weights
vs = [0,1,7,11,21,31] -- values
And finish off with a couple helper functions for looking at the table
viewTable table maxI maxJ = take (maxI+1) . map (take (maxJ+1)) $ table
printTable table maxI maxJ = mapM_ print $ viewTable table maxI maxJ
This much was pretty easy. But I want to take it a step further.
I want a better data structure for the table. Ideally, it should be
Unboxed (immutable)[edit] never mind this- Lazy
- Unbounded
O(1)
time to constructO(1)
time complexity for looking up a given entry,
(more realistically, at worstO(log n)
, where n isi*j
for looking up the entry at row i, column j)
Bonus points if you can explain why/how your solution satisfies these ideals.
Also bonus points if you can further generalize knapsackTable
, and prove that it is efficient.
In improving the data structure you should try to satisfy the following goals:
- If I ask for the solution where the maximum weight is 10 (in my current code, that would be
indexTable knapsackTable 5 10
, the 5 means include items 1-5) only the minimal amount of work necessary should be performed. Ideally this means noO(i*j)
work for forcing the spine of each row of the table to necessary column length. You could say this isn't "true" DP, if you believe DP means evaluating the entirety of the table. - If I ask for the entire table to be printed (something like
printTable knapsackTable 5 10
), the values of each entry should be computed once and only once. The values of a given cell should depend on the values of other cells (DP style: the idea being, never recompute the same subproblem twice)
Ideas:
- Data.Array bounded :(
- UArray strict :(
- Memoization techniques (SO question about DP in Haskell) this might work
Answers that make some compromises to my stated ideals will be upvoted (by me, anyways) as long as they are informative. The answer with the least compromises will probably be the "accepted" one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
首先,您对未装箱数据结构的标准可能有点误导。未装箱的值必须是严格的,并且它们与不变性无关。我要提出的解决方案是不可变的、惰性的和盒装的。另外,我不确定您希望以何种方式构建和查询为 O(1)。我提议的结构是惰性构建的,但由于它可能是无界的,因此其完整构建将需要无限的时间。对于任何大小为 k 的特定键,查询结构将花费 O(k) 时间,但当然您正在查找的值可能需要更多时间来计算。
数据结构是惰性字典树。我在代码中使用 Conal Elliott 的 MemoTrie 库。为了通用性,它使用函数而不是列表来表示权重和值。
基本上,它是作为一个具有惰性脊柱和惰性值的特里树来实现的。它仅受密钥类型的限制。因为整个事情是惰性的,所以在使用查询强制执行之前的构造是 O(1)。每个查询都会强制沿着 trie 及其值使用一条路径,因此对于有界键大小,其时间复杂度为
O(1)O(log n)。正如我已经说过的,它是不可变的,但不是拆箱的。它将分担递归调用中的所有工作。它实际上不允许您直接打印 trie,但是这样的事情不应该做任何多余的工作:
First, your criterion for an unboxed data structure is probably a bit mislead. Unboxed values must be strict, and they have nothing to do with immutability. The solution I'm going to propose is immutable, lazy, and boxed. Also, I'm not sure in what way you are wanting construction and querying to be O(1). The structure I'm proposing is lazily constructed, but because it's potentially unbounded, its full construction would take infinite time. Querying the structure will take O(k) time for any particular key of size k, but of course the value you're looking up may take further time to compute.
The data structure is a lazy trie. I'm using Conal Elliott's MemoTrie library in my code. For genericity, it takes functions instead of lists for the weights and values.
Basically, it's implemented as a trie with a lazy spine and lazy values. It's bounded only by the key type. Because the entire thing is lazy, its construction before forcing it with queries is O(1). Each query forces a single path down the trie and its value, so it's
O(1) for a bounded key sizeO(log n). As I already said, it's immutable, but not unboxed.It will share all work in the recursive calls. It doesn't actually allow you to print the trie directly, but something like this should not do any redundant work:
Unboxed 意味着严格和有界。任何 100% 未装箱的东西都不能是惰性的或无限制的。通常的妥协体现在将 [Word8] 转换为 Data.ByteString.Lazy,其中存在未装箱的块(严格的 ByteString),它们以无限制的方式延迟链接在一起。
可以使用“scanl”、“zipWith”和我的“takeOnto”制作更高效的表生成器(增强以跟踪单个项目)。这有效地避免了在创建表时使用 (!!):
Unboxed implies strict and bounded. Anything 100% Unboxed cannot be Lazy or Unbounded. The usual compromise is embodied in converting [Word8] to Data.ByteString.Lazy where there are unboxed chunks (strict ByteString) which are linked lazily together in an unbounded way.
A much more efficient table generator (enhanced to track individual items) could be made using "scanl", "zipWith", and my "takeOnto". This effectively avoid using (!!) while creating the table:
惰性可存储向量:http://hackage.haskell.org/package/storablevector
无界,惰性, O(chunksize) 构建时间,O(n/chunksize) 索引,其中 chunksize 对于任何给定目的都可以足够大。基本上是一个具有一些显着的常数因子优点的惰性列表。
Lazy storable vectors: http://hackage.haskell.org/package/storablevector
Unbounded, lazy, O(chunksize) time to construct, O(n/chunksize) indexing, where chunksize can be sufficiently large for any given purpose. Basically a lazy list with some significant constant factor benifits.
为了记忆函数,我推荐像 Luke Palmer 的 memo Combinators 这样的库。该库使用 try,它是无界的并且具有 O(key size) 查找。 (一般来说,你不能比 O(key size) 查找更好,因为你总是必须触及密钥的每一位。)
在内部,
integral
组合器可能会构建一个无限数据结构Lookup 的工作方式如下:
还有其他方法可以构建无限尝试,但这种方法很流行。
To memoize functions, I recommend a library like Luke Palmer's memo combinators. The library uses tries, which are unbounded and have O(key size) lookup. (In general, you can't do better than O(key size) lookup because you always have to touch every bit of the key.)
Internally, the
integral
combinator probably builds an infinite data structureLookup works like this:
There are other ways to build infinite tries, but this one is popular.
为什么不使用 Data.Map 将另一个 Data.Map 放入其中?据我所知,速度相当快。
但也不会偷懒。
不仅如此,您还可以为数据实现 Ord 类型类
,并直接将二维索引作为键。您可以通过将此地图生成为列表来实现惰性,然后只需使用
Why won't you use Data.Map putting the other Data.Map into it? As far as I know it's quite fast.
It wouldn't be lazy though.
More than that, you can implement Ord typeclass for you data
and put a two dimensional index directly as a key. You can achieve laziness by generating this map as a list and then just use