如何确保在 map() 期间保留自定义 Scala 集合的动态类型?

发布于 2024-11-01 15:28:04 字数 2583 浏览 5 评论 0原文

我读了一篇非常有趣的关于 Scala 2.8 架构的文章集合,我一直在尝试它。首先,我只是复制了这个漂亮的 RNA 示例的最终代码。仅供参考:

abstract class Base
case object A extends Base
case object T extends Base
case object G extends Base
case object U extends Base

object Base {
  val fromInt: Int => Base = Array(A, T, G, U)
  val toInt: Base => Int = Map(A -> 0, T -> 1, G -> 2, U -> 3)
}

final class RNA private (val groups: Array[Int], val length: Int)
    extends IndexedSeq[Base] with IndexedSeqLike[Base, RNA] {

  import RNA._

  // Mandatory re-implementation of `newBuilder` in `IndexedSeq`
  override protected[this] def newBuilder: Builder[Base, RNA] =
    RNA.newBuilder

  // Mandatory implementation of `apply` in `IndexedSeq`
  def apply(idx: Int): Base = {
    if (idx < 0 || length <= idx)
      throw new IndexOutOfBoundsException
    Base.fromInt(groups(idx / N) >> (idx % N * S) & M)
  }

  // Optional re-implementation of foreach, 
  // to make it more efficient.
  override def foreach[U](f: Base => U): Unit = {
    var i = 0
    var b = 0
    while (i < length) {
      b = if (i % N == 0) groups(i / N) else b >>> S
      f(Base.fromInt(b & M))
      i += 1
    }
  }
}

object RNA {

  private val S = 2 // number of bits in group
  private val M = (1 << S) - 1 // bitmask to isolate a group
  private val N = 32 / S // number of groups in an Int

  def fromSeq(buf: Seq[Base]): RNA = {
    val groups = new Array[Int]((buf.length + N - 1) / N)
    for (i <- 0 until buf.length)
      groups(i / N) |= Base.toInt(buf(i)) << (i % N * S)
    new RNA(groups, buf.length)
  }

  def apply(bases: Base*) = fromSeq(bases)

  def newBuilder: Builder[Base, RNA] =
    new ArrayBuffer mapResult fromSeq

  implicit def canBuildFrom: CanBuildFrom[RNA, Base, RNA] =
    new CanBuildFrom[RNA, Base, RNA] {
      def apply(): Builder[Base, RNA] = newBuilder
      def apply(from: RNA): Builder[Base, RNA] = newBuilder
    }
}

现在,这是我的问题。如果我运行这个,一切都很好:

val rna = RNA(A, G, T, U)
println(rna.map(e => e)) // prints RNA(A, G, T, U)

但是这段代码将 RNA 转换为向量!

val rna: IndexedSeq[Base] = RNA(A, G, T, U)
println(rna.map(e => e)) // prints Vector(A, G, T, U)

这是一个问题,因为不知道 RNA 类的客户端代码可能会将其转换回 Vector,而不是仅从 Base 映射到 >基础。为什么会这样,有什么方法可以解决呢?

P.-S.:我找到了一个初步的答案(见下文),如果我错了,请纠正我。

I read the very interesting article on the architecture of the Scala 2.8 collections and I've been experimenting with it a little bit. For a start, I simply copied the final code for the nice RNA example. Here it is for reference:

abstract class Base
case object A extends Base
case object T extends Base
case object G extends Base
case object U extends Base

object Base {
  val fromInt: Int => Base = Array(A, T, G, U)
  val toInt: Base => Int = Map(A -> 0, T -> 1, G -> 2, U -> 3)
}

final class RNA private (val groups: Array[Int], val length: Int)
    extends IndexedSeq[Base] with IndexedSeqLike[Base, RNA] {

  import RNA._

  // Mandatory re-implementation of `newBuilder` in `IndexedSeq`
  override protected[this] def newBuilder: Builder[Base, RNA] =
    RNA.newBuilder

  // Mandatory implementation of `apply` in `IndexedSeq`
  def apply(idx: Int): Base = {
    if (idx < 0 || length <= idx)
      throw new IndexOutOfBoundsException
    Base.fromInt(groups(idx / N) >> (idx % N * S) & M)
  }

  // Optional re-implementation of foreach, 
  // to make it more efficient.
  override def foreach[U](f: Base => U): Unit = {
    var i = 0
    var b = 0
    while (i < length) {
      b = if (i % N == 0) groups(i / N) else b >>> S
      f(Base.fromInt(b & M))
      i += 1
    }
  }
}

object RNA {

  private val S = 2 // number of bits in group
  private val M = (1 << S) - 1 // bitmask to isolate a group
  private val N = 32 / S // number of groups in an Int

  def fromSeq(buf: Seq[Base]): RNA = {
    val groups = new Array[Int]((buf.length + N - 1) / N)
    for (i <- 0 until buf.length)
      groups(i / N) |= Base.toInt(buf(i)) << (i % N * S)
    new RNA(groups, buf.length)
  }

  def apply(bases: Base*) = fromSeq(bases)

  def newBuilder: Builder[Base, RNA] =
    new ArrayBuffer mapResult fromSeq

  implicit def canBuildFrom: CanBuildFrom[RNA, Base, RNA] =
    new CanBuildFrom[RNA, Base, RNA] {
      def apply(): Builder[Base, RNA] = newBuilder
      def apply(from: RNA): Builder[Base, RNA] = newBuilder
    }
}

Now, here's my problem. If I run this, everything's fine:

val rna = RNA(A, G, T, U)
println(rna.map(e => e)) // prints RNA(A, G, T, U)

but this code transforms the RNA to a Vector!

val rna: IndexedSeq[Base] = RNA(A, G, T, U)
println(rna.map(e => e)) // prints Vector(A, G, T, U)

This is a problem, as client code unaware of the RNA class may transform it back to a Vector instead when only mapping from Base to Base. Why is that so, and what are the ways to fix it?

P.-S.: I've found a tentative answer (see below), please correct me if I'm wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心不设防 2024-11-08 15:28:04

如果rna变量的静态类型为IndexedSeq[Base],则自动插入的CanBuildFrom不能是RNA中定义的 伴随对象,因为编译器不应该知道 rnaRNA 的实例。

那么它从哪里来呢?编译器依赖于 GenericCanBuildFrom 的实例,即在 IndexedSeq 对象中定义的实例。 GenericCanBuildFrom 通过在原始集合上调用 genericBuilder[B] 来生成其构建器,并且对该通用构建器的要求是它可以生成可以容纳任何类型的通用集合< code>B — 当然,传递给 map() 的函数的返回类型不受限制。

在这种情况下,RNA 只是一个 IndexedSeq[Base],而不是通用的 IndexedSeq,因此不可能覆盖 genericBuilder[ B]RNA 中返回一个 RNA 特定的构建器 - 我们必须在运行时检查 B 是否是 Base 或其他东西,但我们做不到 那。

我认为这解释了为什么,在问题中,我们得到了一个Vector。至于我们如何修复它,这是一个悬而未决的问题...

编辑:修复此问题需要map()知道它是否映射到A的子类型代码> 或不。要实现这一点,需要对馆藏库进行重大改变。请参阅相关问题 Scala 的 map() 的行为应该如何映射到相同类型时会有所不同吗?

If the static type of the rna variable is IndexedSeq[Base], the automatically inserted CanBuildFrom cannot be the one defined in the RNA companion object, as the compiler is not supposed to know that rna is an instance of RNA.

So where does it come from? The compiler falls back on an instance of GenericCanBuildFrom, the one defined in the IndexedSeq object. GenericCanBuildFroms produce their builders by calling genericBuilder[B] on the originating collection, and a requirement for that generic builder is that it can produce generic collections that can hold any type B — as of course, the return type of the function passed to a map() is not constrained.

In this case, RNA is only an IndexedSeq[Base] and not a generic IndexedSeq, so it's not possible to override genericBuilder[B] in RNA to return a RNA-specific builder — we would have to check at runtime whether B is Base or something else, but we cannot do that.

I think this explains why, in the question, we get a Vector back. As to how we can fix it, it's an open question…

Edit: Fixing this requires map() to know whether it's mapping to a subtype of A or not. A significant change in the collections library would be needed for this to happen. See the related question Should Scala's map() behave differently when mapping to the same type?.

雪落纷纷 2024-11-08 15:28:04

为什么我认为静态类型比 RNA 弱的类型不是一个好主意。它实际上应该是一个评论(因为它更多的是一个意见,但会更难阅读)。从你的评论到我的评论:

为什么不呢?作为 IndexedSeq[Base] 的子类,RNA 能够按照 Liskov 替换原则完成 IndexedSeq[Base] 所做的一切。有时,您只知道它是一个 IndexedSeq,并且您仍然期望过滤器、映射和朋友保持相同的具体实现。实际上,过滤器可以做到这一点,但映射却不行

filter 不会做到这一点,因为编译器可以静态地保证它。如果保留特定集合中的元素,最终会得到同一类型的集合。 map 不能保证这一点,这取决于传递的函数。

我的观点更多的是关于显式指定类型并期望超出其所能提供的内容的行为。作为 RNA 集合的用户,我可能会编写依赖于该集合的某些属性(例如高效内存表示)的代码。

因此,我们假设我在 val rna: IndexedSeq[Base] 中声明 rna 只是一个 IndexedSeq。几行后,我调用了一个方法 doSomething(rna) ,我希望在其中获得有效的内存表示,那么最好的签名是什么? def doSomething[T](rna: IndexedSeq[Base]): T 还是 def doSomething[T](rna: RNA): T

我想应该是后者。但如果是这种情况,则代码将无法编译,因为 rna 不是静态的 RNA 对象。如果方法签名应该是前者,那么本质上我是说我不关心内存表示效率。所以我认为明确指定较弱的类型但期望更强的行为的行为是矛盾的。这就是您在示例中所做的。

现在我确实看到,即使我这样做了:

val rna = RNA(A, G, T, U)
val rna2 = doSomething(rna)

其他人写道:

def doSomething[U](seq: IndexedSeq[U]) = seq.map(identity)

我想让 rna2 成为 RNA 对象,但这不会发生......这意味着如果其他人想让调用者获得更具体的类型,则应该编写一个采用 CanBuildFrom 的方法:

def doSomething[U, To](seq: IndexedSeq[U])
   (implicit cbf: CanBuildFrom[IndexedSeq[U], U, To]) = seq.map(identity)(cbf)

然后我可以调用: val rna2: RNA = doSomething(rna)(collection.突破)

On why I think it's not a good idea to statically type to a weaker type than RNA. It should really be a comment (cause it's more an opinion but that would be harder to read). From your comment to my comment:

Why not? As a subclass of IndexedSeq[Base], RNA is able to do everything IndexedSeq[Base] does, as per the Liskov substitution principle. Sometimes, all you know is that it's an IndexedSeq, and you still expect filter, map and friends to keep the same specific implementation. Actually, filter does it — but not map

filter does it because the compiler can statically guarantee it. If you keep elements from a particular collection, you end up with a collection from the same type. map cannot guarantee that, it depends on the function that is passed.

My point is more on the act of specifying explicitly a type and expecting more than what it can deliver. As a user of the RNA collection, I may write code that depends on certain properties of this collection such as efficient memory representation.

So let's assume I state in val rna: IndexedSeq[Base] that rna is just an IndexedSeq. A few lines later I call a method doSomething(rna) where I expect the efficient memory representation, what would be the best signature for that? def doSomething[T](rna: IndexedSeq[Base]): T or def doSomething[T](rna: RNA): T?

I think it should be the latter. But if that's the case, then the code won't compile because rna is not statically an RNA object. If the method signature should be the former, then in essence I'm saying that I don't care about the memory representation efficiency. So I think the act of specifying a weaker type explicitly but expecting a stronger behavior is a contradiction. Which is what you do in your example.

Now I do see that even if I did:

val rna = RNA(A, G, T, U)
val rna2 = doSomething(rna)

where somebody else wrote:

def doSomething[U](seq: IndexedSeq[U]) = seq.map(identity)

I would like to have rna2 be a RNA object but that won't happen... It means that this somebody else should write a method that takes a CanBuildFrom if they want to have callers get more specific types:

def doSomething[U, To](seq: IndexedSeq[U])
   (implicit cbf: CanBuildFrom[IndexedSeq[U], U, To]) = seq.map(identity)(cbf)

Then I could call: val rna2: RNA = doSomething(rna)(collection.breakOut)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文