当前位置：文江博客话题详情

此操作的最佳数据结构

发布于 2024-09-30 18:13:36 字数 1721 浏览 2 评论 0原文

我正在尝试找到一种比当前方法更好的方法来管理连续马尔可夫链的当前状态向量。使用的状态向量存储（状态，概率）对，其中概率是浮点数。

需要优化的算法部分执行以下操作：

每次迭代都从当前状态向量开始，
计算向量中每个当前状态的可达状态，并将所有这些状态与前往那里的概率一起存储在临时列表
中这个新列表中的每个元素都会通过迭代可能的转换来计算新的状态向量（请注意，可能有许多转换导致相同的状态，但从不同的源状态找到），

这实际上是通过使用hashtables< /em> 以状态为键，以概率为值。

因此，基本上，为了构建新向量，对于每个转换，计算归一化值，然后使用 get 检索向量中的状态，添加当前转换的概率，然后将结果存储回来。

到目前为止，这种方法似乎有效，但我正在尝试处理导致非常大的空间向量（500k-1mil 状态）的系统，因此，尽管哈希表提供了恒定的存储和检索复杂性，但迭代开始减慢很多。

这是因为，例如，我们从具有 100k 个状态的向量开始，对于每个状态，我们计算可达状态（通常 > 1），以便我们获得 100k*（平均可达状态）的转换列表。然后经历每个转变来计算新的概率向量。

不幸的是，我需要遍历整个可达列表来找到归一化值，而不实际计算下一个 vecto，但无论如何，正如我通过分析看到的那样，这不是算法的瓶颈。当计算每个转换时，就会出现瓶颈。

这是用于从当前向量 (pi) 计算转换列表的函数：

HTable.fold (fun s p l ->
  if check s f2 then (0., s, p, [s, 1.0]) :: l
  else if not (check s f1) then (0., s, p, [s, 1.0]) :: l
  else
    let ts = P.rnext s in                         
    if List.length ts = 0 then (0., s, p, [s, 1.0]) :: l
    else
      let lm = List.fold_left (fun a (s,f) -> f +. a) 0. ts in
      (lm, s, p, ts) :: l) pi []

这是通过遍历转换列表并计算新的 pi 的函数将它们全部标准化：

let update_pi s v = 
  try
    let t = HTable.find pi s in
    HTable.replace pi s (v +. t)
  with Not_found -> HTable.add pi s v
in
  HTable.clear pi;
  List.iter (fun (llm, s, p, ts) ->
    if llm = 0. then
      update_pi s p
    else begin
      List.iter (fun (ss, pp) -> 
        update_pi ss (p *. (pp /. lm))
      ) ts;
      if llm < lm then update_pi s (p *. (1. -. (llm /. lm)))
    end 
  ) u;

我应该找到一个更适合我正在执行的操作的数据结构，问题是，使用当前的方法，我必须为每个转换执行 get 和 set 操作，但是对哈希表执行如此多的操作会导致死亡性能，因为恒定成本相当昂贵。

原文

I'm trying to find a better approach that my current one to manage the current state vector of a continuous Markov chain. The state vector used stores pairs of (state, proability) where probability is a float.

The piece of the algorithm that needs to be optimized does the following operations:

every iteration starts with a current state vector
computes the reachable states for every current one in vector and store all of them in a temporary list together with the probability of going there
for every element in this new list it calculates the new state vector by iterating over the possible transitions (mind that there can be many transitions which lead to the same state but found from different source states)

this is actually done by using hashtables that have as keys the states and as values the probabilities.

So basically to build up the new vector, for every transition, the normalized value is calculated, then the state in the vector is retrieved with a get, the probability of current transition is added and then the result is stored back.

This approach seemed to work so far but I'm trying to cope with systems that lead in very large space vectors (500k-1mil states) so, although hashtable give a constant complexity to store and retrieve, iterations starts slowing down really a lot.

This because for example, we start from a vector that has 100k states, for each one we computes the reachable states (which usually are > 1) so that we obtain a transition list of 100k*(avg reachable states). Then every transition is gone through to compute the new probability vector.

Unfortunately I need to go through the whole reachable list to find the normalizing value without actually computing the next vecto but in any case, as I saw through profiling, this is not the bottleneck of the algorithm. The bottleneck is present when every transition is computed.

This is the function used to compute the transition list from the current vector (pi):

HTable.fold (fun s p l ->
  if check s f2 then (0., s, p, [s, 1.0]) :: l
  else if not (check s f1) then (0., s, p, [s, 1.0]) :: l
  else
    let ts = P.rnext s in                         
    if List.length ts = 0 then (0., s, p, [s, 1.0]) :: l
    else
      let lm = List.fold_left (fun a (s,f) -> f +. a) 0. ts in
      (lm, s, p, ts) :: l) pi []

And this is the function that computes the new pi by going through the transition list and by normalizing them all:

let update_pi s v = 
  try
    let t = HTable.find pi s in
    HTable.replace pi s (v +. t)
  with Not_found -> HTable.add pi s v
in
  HTable.clear pi;
  List.iter (fun (llm, s, p, ts) ->
    if llm = 0. then
      update_pi s p
    else begin
      List.iter (fun (ss, pp) -> 
        update_pi ss (p *. (pp /. lm))
      ) ts;
      if llm < lm then update_pi s (p *. (1. -. (llm /. lm)))
    end 
  ) u;

I should find a data structures which is better suited for the operations I'm doing, the problem is that with current approach I have to do a get and a set for every single transition but doing so many operations over hashtables kills performance since the constant cost is quite expensive.

分享到QQ

分享到微博