当前位置：文江博客话题详情

OCaml 中的拓扑排序

发布于 2024-10-11 16:13:16 字数 1013 浏览 9 评论 0原文

我正在尝试在 ocaml 中编写拓扑排序，但我是初学者（在 OCaml 和图形算法中），我自己无法做到这一点。

对我来说，用 C++ 等语言来思考拓扑排序更容易（网上有很多用 C++ 进行拓扑排序的例子），但我想学习一些新的东西。此外，我发现了一些用 OCaml 编写的拓扑排序的例子，但坦白说我不明白它们。

假设我有一个列表 (int * int list) list，例如：

myList = [(1, [2]); (5, [6; 7]); (3, [2]); (6, [3; 7]); （8，[7]）； (4, [3; 1])];;

这意味着我需要在任务 2、任务 之前“执行”任务 1 >4 在任务 3 和 1 等之前。

我认为，该列表的输出应该是：

[8; 5； 6； 7； 4； 3； 1; 2]

（但是我不确定，因为我只是做了这个例子，所以如果我错了，请纠正我）

另外，我读过，拓扑排序不适用于图中的循环，所以循环必须有某种条件 - 当给定的图有循环时，我们会引发异常（我认为这是一个好主意）。

AFAIK，我需要在拓扑排序算法中使用 DFS，我不知道如何在 OCaml 中实现它（DFS）（我理解主要思想，但我不知道它是如何工作的）在 OCaml/函数式编程中）。

我非常感谢您帮助我理解这个概念（我的意思是拓扑排序，OCaml/函数式编程中的 DFS）。如果可以的话，如果您向我展示示例代码，那就太好了，因为阅读代码（对我来说）是理解算法概念的最佳方法。

我知道，对于你们大多数人来说，这是一个简单的问题，但我希望，这不会阻止你们帮助我。

PS：我正在自学OCaml（我在高中），所以我没有扎实的理论背景（无论是OCaml还是算法）。我开始学习 OCaml，因为我想了解递归概念，现在这种语言似乎很有趣，因为它与 C++ 非常不同，所以我仍在尝试学习 OCaml 中的新东西。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫雨熙 2024-10-18 16:13:16

首先，请注意，Objective Caml 确实支持一种编程风格，尽管语法存在差异，但它与 C++ 非常相似，通过可变数据结构（引用、数组、哈希表）和命令式构造（for 和 while 循环、变量赋值）。我假设您实际上正在尝试以惯用的纯函数风格编写拓扑排序。

纯函数式编程大多是声明性的：应用于该值的函数是另一个值。这就是为什么 let x = 的右侧是一个表达式（求值）而不是使用 return 的操作序列。当然，当采用通常被描述为一系列步骤的算法时，就会出现麻烦。

幸运的是，有一种模式（实际上是一系列模式）可以让您通过将“更改 X 的值”转变为“返回一个与旧对象相同的新对象（除了X 的值”。

传统的 DFS 算法涉及逐步浏览图形，同时记住哪些元素已经被访问过（这样您就不会多次访问它们）并到达当前位置（以便您可以检测循环）。因此，DFS算法的命令状态由当前节点、已访问节点的集合和当前路径中的节点集合组成。所有这些数据都必须提供给递归调用，并且任何永久更改都必须由这些相同的递归调用返回。

使用上面的图形结构，结合两个集合的列表表示（这很难说是最佳的 - Set 在这里会是更好的选择），算法看起来有点像这样：

let dfs graph start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] [] start_node

上面的三个重要细节：首先，DFS 表示，当您完成对给定节点的所有子节点的探索后，您应该从“当前路径”列表中删除该节点并将其放入“已访问”列表中。由于我们使用的是不可变的数据结构，所以第一步是不必要的：它的唯一目的是在探索开始时撤消节点的插入，并且在我们的版本中，列表 new_path 不会被递归调用explore。这是函数式数据结构比命令式结构更适合使用的示例。

另一个重要的细节是List.fold_left的使用。当我们开始使状态显式化时，我们替换了 -> 类型的隐式命令函数。具有 -> 类型显式函数的单元状态-> ..-> state（接受状态作为参数，返回新状态）。所以，命令式列表探索，看起来像这样：

f edge_1 ; f edge_2 ; f edge_3

现在看起来像这样：

let state = f (f (f state edge_1) edge_2) edge_3)

这正是 List.fold_left f state [edge_1 ;边缘_2； edge_3] 确实如此。自吹自擂，但我有一篇关于此的好文章。

第三点，“向集合添加元素”操作，在使用列表表示集合时，简单写成element::set，因为这是一个返回新集合（列表）的操作它包含原始集合的所有元素以及新元素。这使得原始集合保持不变（这对于第一步中描述的原因来说是很好的），同时使用恒定量的内存（它创建了一个 cons 单元 - 一个简单的头尾对，包含对元素的引用和对集合的引用) ：您不仅可以获得撤消功能，而且无需额外费用。

上述算法从 DFS 探索的叶子开始将节点“插入”到 visited 中，在您的情况下，这些节点代表应该最后完成的节点。简而言之，返回的列表是按拓扑排序的 - 但可能不包含所有元素，因为起点可能不是唯一的根元素（甚至根本不是根元素）。因此，这里涉及一个额外的处理步骤，其中包括从图中取出另一个节点，直到探索完所有图为止。

或者，换句话说，从图中的每个节点开始新的 DFS 探索，但忽略之前探索的任何节点 - 这相当于保留访问过的元素列表从一个 DFS 探索到下一个。

对我们之前的算法进行一个小调整，这只需要两行：

let dfs graph visited start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] visited start_node

let toposort graph = 
  List.fold_left (fun visited (node,_) -> dfs graph visited node) [] graph

调整包括允许 dfs 的调用者指定已访问节点的列表。在从每个节点启动 DFS 时，继承已访问节点的列表是使用 List.fold_left 完成的，与之前完全相同。

编辑：除了异常的类型之外，这里没有任何东西限制图中元素的类型。但是，异常不能是多态的，因此您有两种可能的解决方案。第一种是放弃实际返回任何数据以及异常：

exception CycleFound

... raise CycleFound ...

这会将 toposort 的类型恢复为更通用的 ('a * ('a list)) 列表 - > '一个列表。

另一个解决方案是相当高级的 OCaml ：您需要使包含异常和拓扑排序的模块在该特定类型中具有多态性，如下所示：

module type NODE = sig
  type t 
end

module type Topo = functor (Node:NODE) -> struct 
  exception CycleFound of Node.t list      
  let dfs ...
  let sort ...  
end

这将使 Topo(Node ).sort 为 (Node.t * (Node.t list)) list -> Node.t 列表，这意味着您可以通过定义具有该类型的节点模块来对您想要的任何类型进行排序：

type recipe = Eggs | Milk | Wheat | Mix | Cook | Serve

module Node = struct 
  type t = recipe 
end

let graph = [ Wheat, [Eggs; Milk; Mix];
              Milk,  [Mix];
              Eggs,  [Mix];
              Mix,   [Cook];
              Cook,  [Serve];
              Serve, [] ]

module RecipeTopo = Topo(Node)

let sorted = RecipeTopo.sort graph

let str_recipe = function
| Eggs -> "Eggs"
| Milk -> "Milk"
| Wheat -> "Wheat"
| Mix -> "Mix"
| Cook -> "Cook!"
| Serve -> "Serve"

let () = List.iter (fun i -> Printf.printf "%s " (str_recipe i)) sorted
//  [Wheat; Milk; Eggs; Mix; Cook; Serve]

尝试一下

First, be aware that Objective Caml does support a programming style which, despite syntactic differences, is fairly similar to C++, by means of mutable data structures (references, arrays, hash tables) and imperative constructs (for and while loops, variable assignment). I'm assuming below that you're actually trying to write your topological sort in idiomatic pure functional style.

Pure functional programming is mostly declarative : this function applied to that value is this other value. This is why the right-hand side of let x = is an expression (evaluates to a value) instead of a sequence of actions that uses return. Of course, trouble appears when adapting an algorithm that is commonly described as a series of steps instead.

Fortunately, there's a pattern (actually, a family of patterns) that lets you represent imperative-ish algorithms in functional style by turning "change the value of X" into "return a new object which is identical to the old one, except for the value of X".

A traditional DFS algorithm involves stepping through the graph while remembering which elements have already been visited - in general (so that you don't visit them more than once) and to get to your current position (so that you can detect cycles). So, the imperative state of a DFS algorithm is comprised of the current node, the set of visited nodes and the set of nodes in the current path. All this data will have to be provided to the recursive calls, and any permanent changes will have to be returned by those same recursive calls.

Using your graph structure from above, combined with a list representation for the two sets (it's hardly optimal - Set would be a better choice here), the algorithm would look somewhat like this:

let dfs graph start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] [] start_node

Three important details above: first, DFS says that one you are done exploring all the children of a given node, you should remove that node from the "current path" list and put it into the "visited" list. Since we're using immutable data structures, the first step is unnecessary: its only purpose was to undo the insertion of the node when the exploration started, and in our version the list new_path is not modified by the recursive call to explore. This is an example of case where functional data structures are more comfortable to work with than imperative structures.

Another important detail is the use of List.fold_left. When we started making the state explicit, we replaced implicit imperative functions of type -> unit with explicit functions of type -> state -> .. -> state (accept the state as parameter, return new state). So, the imperative list exploration, which looked like this:

f edge_1 ; f edge_2 ; f edge_3

Now looks like this:

let state = f (f (f state edge_1) edge_2) edge_3)

Which is exactly what List.fold_left f state [edge_1 ; edge_2 ; edge_3] does. Tooting my own horn, but I have a nice article about this here.

The third point is that the "add element to set" operation, when using lists to represent sets, is written simply as element :: set, because this is an operation that returns a new set (list) which contains all the elements of the original set along with the new element. This leaves the original set untouched (which is good for the reasons described in step one) while using a constant amount of memory (it creates a cons cell - a simple head-tail pair containing a reference to the element and a reference to the set) : not only do you get undo capabilities, but you do so at no additional cost.

The above algorithm "inserts" nodes into visited starting with the leaves of the DFS exploration, which in your case represent those nodes that should be done last. In short, the returned list is topologically sorted - but might not contain all elements because the starting point might not be the only root element (or even be a root element at all). So, there's an additional processing step involved here which consists in taking another node from the graph until all the graph has been explored.

Or, in other words, starting a new DFS exploration from every node in the graph, but ignoring any nodes previously explored - which is equivalent to keeping the list of visited elements from one DFS exploration to the next.

Using a small tweak to our previous algorithm, this takes only two lines:

let dfs graph visited start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] visited start_node

let toposort graph = 
  List.fold_left (fun visited (node,_) -> dfs graph visited node) [] graph

The tweak consists in allowing the caller of dfs to specify the list of already visited nodes. Carrying over the list of visited nodes while starting a DFS from every node is done using List.fold_left exactly as before.

EDIT: aside from the type of the exception, there's nothing here that constrains the type of the elements in the graph. However, an exception cannot be polymorphic, so you have two possible solutions. The first is to give up on actually returning any data along with the exception:

exception CycleFound

... raise CycleFound ...

This will revert the type of toposort back to a more generic ('a * ('a list)) list -> 'a list.

The other solution is rather advanced OCaml : you need to make the module that contains the exception and the topological sort polymorphic in that specific type, as follows:

module type NODE = sig
  type t 
end

module type Topo = functor (Node:NODE) -> struct 
  exception CycleFound of Node.t list      
  let dfs ...
  let sort ...  
end

This would make the type of Topo(Node).sort be (Node.t * (Node.t list)) list -> Node.t list, which means you can sort any type you wish by defining a node module with that type:

type recipe = Eggs | Milk | Wheat | Mix | Cook | Serve

module Node = struct 
  type t = recipe 
end

let graph = [ Wheat, [Eggs; Milk; Mix];
              Milk,  [Mix];
              Eggs,  [Mix];
              Mix,   [Cook];
              Cook,  [Serve];
              Serve, [] ]

module RecipeTopo = Topo(Node)

let sorted = RecipeTopo.sort graph

let str_recipe = function
| Eggs -> "Eggs"
| Milk -> "Milk"
| Wheat -> "Wheat"
| Mix -> "Mix"
| Cook -> "Cook!"
| Serve -> "Serve"

let () = List.iter (fun i -> Printf.printf "%s " (str_recipe i)) sorted
//  [Wheat; Milk; Eggs; Mix; Cook; Serve]

Try it out here.

回复收藏 0 原文

等你爱我 2024-10-18 16:13:16

[如果你不知道这个术语，我在下面写 DAG 的意思是“有向无环图”，或者连接顶点的从/到边的集合，这样就没有循环。]

一种方法是扩展您的偏序（您的 DAG 结构）转换为全序（因此，对于每对不同的顶点 u 和 v，u 是 v 的后继，反之亦然）。然后，您可以按顺序对顶点进行排序：如果 v 是 u 的后继，则 u 位于 v 之前。

您可以通过从空图开始并从原始 DAG 中一次添加一条边来构建总订单。也就是说，原始 DAG 中的一项 (u, [v1; v2; ...; vn]) 对应于边 (u, v1), (u, v2), ..., (u, vn)。对于每个这样的边 (u, v)，从总顺序中找到 u 的前驱 P 和 v 的后继 S。然后将 (p, s) 添加到 PU {u} 中所有 p 和 SU {v} 中所有 s 的总订单中。

现在！更快的方法是在原始 DAG 中找到根（即没有前辈的顶点），并从该根进行深度优先搜索，确保您永远不会两次访问同一个顶点。每次回溯遍历时，都会“输出”要离开的顶点。通过这种方式，您可以构建 DAG 的拓扑排序。如果还有剩余的顶点，请用泡沫冲洗，然后重复，直到全部完成。

回复收藏 0 原文

无敌元气妹 2024-10-18 16:13:16

您应该首先尝试 DFS，它更容易且有益。

回复收藏 0 原文

折戟 2024-10-18 16:13:16

我不知道 OCaml，但是维基百科中有一个简单的算法，我已经使用过 Kahn 认可的算法成功（转录为Python）。它相当简单，因此也许您可以将其转换为 OCaml。这里是：

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    output error message (graph has at least one cycle)
else 
    output message (proposed topologically sorted order: L)

I don't know OCaml,but there's a simple algorithm in Wikipedia accredited to Kahn which I have used successfully (transcribing to Python). It's fairly simple so perhaps you could translate it into OCaml. Here it is:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    output error message (graph has at least one cycle)
else 
    output message (proposed topologically sorted order: L)

回复收藏 0 原文

~没有更多了~