一种用可变或不可变状态替换序列中发生的事件的有效技术

发布于 2024-08-17 05:58:45 字数 977 浏览 13 评论 0原文

我正在寻找一种有效的技术来查找 Seq[Op] 中出现的 Op 序列。找到某个事件后，我想用定义的替换来替换该事件，并再次运行相同的搜索，直到列表停止更改。

场景：

我有三种类型的 Op 案例类。 Pop() 扩展 Op、Push() 扩展 Op 和 Nop()扩展Op。我想用 Nop() 替换 Push(), Pop() 的出现。基本上，代码可能类似于 seq.replace(Push() ~ Pop() ~> Nop())。

问题：

现在我调用 seq.replace(...)，我必须在序列中搜索 Push()、Pop() 的出现。到目前为止，一切都很好。我发现了这个现象。但现在我必须从列表中拼接出现的情况并插入替换项。

现在有两个选择。我的列表可以是可变的也可以是不可变的。如果我使用不可变列表，我会担心性能，因为这些序列的大小通常有 500 多个元素。如果我替换很多像 A ~ B ~ C ~> 这样的事件， D ~ E 如果我没记错的话，我会创建很多新的对象。不过，我也可以使用可变序列，例如 ListBuffer[Op]。

基本上，从链表背景开始，我只会做一些指针弯曲，总共四次操作后，我完成了替换，而无需创建新对象。这就是为什么我现在关心性能。特别是因为这对我来说是一个性能关键的操作。

问题：

您将如何以 Scala 方式实现 replace() 方法，以及您将使用哪种数据结构，同时记住这是一个性能关键型操作？

我对为我指明正确方向或伪代码的答案感到满意。无需编写完整的替换方法。

谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤云独去闲 2024-08-24 05:58:45

好的，需要考虑一些因素。首先，请记住，在列表上，tail 不会创建对象，而前置 (::) 只会为每个前置元素创建一个对象。一般来说，这已经是你能得到的最好的结果了。

实现此目的的一种方法是：

def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
  // This function should be part of an KMP algorithm instead, for performance
  def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
    case (x :: xs, y :: ys) if x == y => compare(xs, ys)
    case (Nil, Nil)                   => true
    case _                            => false
  }

  var processed: List[Op] = Nil
  var unprocessed: List[Op] = input
  val patternLength = pattern.length
  val reversedReplacement = replacement.reverse

  // Do this until we finish processing the whole sequence
  while (unprocessed.nonEmpty) {

    // This inside algorithm would be better if replaced by KMP

    // Quickly process non-matching sequences
    while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
    }

    if (unprocessed.nonEmpty) {
      if (compare(pattern, unprocessed)) {
        processed :::= reversedReplacement
        unprocessed = unprocessed drop patternLength
      } else {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
      }          
    }
  }

  processed.reverse
}

您可以通过使用 KMP 来提高速度，特别是在搜索的模式很长的情况下。

现在，这个算法有什么问题呢？问题是它不会测试替换的模式是否导致该位置之前的匹配。例如，如果我用 C 替换 ACB，并且我有一个输入 AACBB，那么该算法的结果将是 ACB 而不是 C。

为了避免这个问题，您应该创建一个回溯。首先，您检查模式中的哪个位置可能会发生替换：

val positionOfReplacement = pattern.indexOfSlice(replacement)

然后，您将算法的替换部分修改为：

      if (compare(pattern, unprocessed)) {
        if (positionOfReplacement > 0) {
          unprocessed :::= replacement
          unprocessed :::= processed take positionOfReplacement
          processed = processed drop positionOfReplacement 
        } else {
          processed :::= reversedReplacement
          unprocessed = unprocessed drop patternLength
        }
      } else {

这将回溯足以解决问题。

然而，该算法无法有效地同时处理乘法模式，我想这就是您要去的地方。为此，您可能需要对 KMP 进行一些调整才能有效地完成此操作，或者使用 DFA 来控制可能的匹配。如果您想同时匹配 AB 和 ABC，情况会更糟。

在实践中，全面打击问题相当于正则表达式 match & 。替换，其中替换是匹配的函数。当然，这意味着您可能想开始研究正则表达式算法。

编辑

我忘记完成我的推理。如果该技术由于某种原因不起作用，那么我的建议是使用不可变的基于树的向量。基于树的向量可以用少量的复制替换部分序列。

如果这不起作用，那么解决方案就是双向链表。并从具有切片替换功能的库中选择一个 - 否则您最终可能会花费太多时间调试已知但棘手的算法。

Ok, some considerations to be made. First, recall that, on lists, tail does not create objects, and prepending (::) only creates one object for each prepended element. That's pretty much as good as you can get, generally speaking.

One way of doing this would be this:

def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
  // This function should be part of an KMP algorithm instead, for performance
  def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
    case (x :: xs, y :: ys) if x == y => compare(xs, ys)
    case (Nil, Nil)                   => true
    case _                            => false
  }

  var processed: List[Op] = Nil
  var unprocessed: List[Op] = input
  val patternLength = pattern.length
  val reversedReplacement = replacement.reverse

  // Do this until we finish processing the whole sequence
  while (unprocessed.nonEmpty) {

    // This inside algorithm would be better if replaced by KMP

    // Quickly process non-matching sequences
    while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
    }

    if (unprocessed.nonEmpty) {
      if (compare(pattern, unprocessed)) {
        processed :::= reversedReplacement
        unprocessed = unprocessed drop patternLength
      } else {
      processed ::= unprocessed.head
      unprocessed = unprocessed.tail
      }          
    }
  }

  processed.reverse
}

You may gain speed by using KMP, particularly if the pattern searched for is long.

Now, what is the problem with this algorithm? The problem is that it won't test if the replaced pattern causes a match before that position. For instance, if I replace ACB with C, and I have an input AACBB, then the result of this algorithm will be ACB instead of C.

To avoid this problem, you should create a backtrack. First, you check at which position in your pattern the replacement may happen:

val positionOfReplacement = pattern.indexOfSlice(replacement)

Then, you modify the replacement part of the algorithm this:

      if (compare(pattern, unprocessed)) {
        if (positionOfReplacement > 0) {
          unprocessed :::= replacement
          unprocessed :::= processed take positionOfReplacement
          processed = processed drop positionOfReplacement 
        } else {
          processed :::= reversedReplacement
          unprocessed = unprocessed drop patternLength
        }
      } else {

This will backtrack enough to solve the problem.

This algorithm won't deal efficiently, however, with multiply patterns at the same time, which I guess is where you are going. For that, you'll probably need some adaptation of KMP, to do it efficiently, or, otherwise, use a DFA to control possible matchings. It gets even worse if you want to match both AB and ABC.

In practice, the full blow problem is equivalent to regex match & replace, where the replace is a function of the match. Which means, of course, you may want to start looking into regex algorithms.

EDIT

I was forgetting to complete my reasoning. If that technique doesn't work for some reason, then my advice is going with an immutable tree-based vector. Tree-based vectors enable replacement of partial sequences with low amount of copying.

And if that doesn't do, then the solution is doubly linked lists. And pick one from a library with slice replacement -- otherwise you may end up spending way too much time debugging a known but tricky algorithm.

回复收藏 0 原文

~没有更多了~