掌握不可变数据结构
我正在学习 scala,作为一名好学生,我尝试遵守我发现的所有规则。
一条规则是:不变性!!!
所以我尝试用不可变的数据结构和值来编写所有内容,有时这真的很难。
但今天我心想:唯一重要的是对象/类不应该有可变的状态。我不必被迫以不可变的风格编写所有方法,因为这些方法不会相互影响。
我的问题:我是否正确,或者是否有任何我没有看到的问题/缺点?
编辑:
aishwarya 的代码示例:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
说明:这是一种计算变量阶齐次马尔可夫模型的对数似然的方法。 state 的 apply 方法获取所有先前的符号和即将到来的符号,并返回这样做的概率。
正如您所看到的:整个方法只是乘以一些概率,使用变量会更容易。
I am learning scala and as a good student I try to obey all rules I found.
One rule is: IMMUTABILITY!!!
So I have tried to code everything with immutable data structures and vals, and sometimes this is really hard.
But today I thought to myself: the only important thing is that the object/class should have no mutable state. I am not forced to code all methods in an immutable style, because these methods don't affect each other.
My Question: Am I correct or are there any problems/disadvantages I dont see?
EDIT:
Code example for aishwarya:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Explanation: It is a method to calculate the log-likelihood of a homogeneous Markov model of variable order. The apply method of state takes all previous symbols and the coming symbol and returns the probability of doing so.
As you may see: the whole method is just multiplying some probabilities which would be much easier using vars.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该规则并不是真正的不变性,而是引用透明度。使用本地声明的可变变量和数组是完全可以的,因为整个程序的任何其他部分都无法观察到任何影响。
引用透明 (RT) 的原则是这样的:
如果对于所有程序
p
每次出现e,则表达式
可以替换为e
是引用透明p
中的e
的计算结果,而不影响p
的可观察结果。请注意,如果
e
创建并改变某些本地状态,它不会违反 RT,因为没有人可以观察到这种情况的发生。也就是说,我非常怀疑您的 vars 实现是否更加简单。
The rule is not really immutability, but referential transparency. It's perfectly OK to use locally declared mutable variables and arrays, because none of the effects are observable to any other parts of the overall program.
The principle of referential transparency (RT) is this:
An expression
e
is referentially transparent if for all programsp
every occurrence ofe
inp
can be replaced with the result of evaluatinge
, without affecting the observable result ofp
.Note that if
e
creates and mutates some local state, it doesn't violate RT since nobody can observe this happening.That said, I very much doubt that your implementation is any more straightforward with vars.
函数式编程的案例之一是代码简洁并引入更数学的方法。它可以减少出现错误的可能性,并使您的代码更小、更具可读性。至于变得容易与否,确实需要你以不同的方式思考你的问题。但是,一旦您习惯了使用函数式模式进行思考,函数式模式可能会变得比命令式风格更容易。
实现完美功能并具有零可变状态确实很困难,但拥有最小可变状态非常有益。需要记住的是,一切都需要平衡地完成,而不是走极端。通过减少可变状态的数量,最终会导致编写产生意外后果的代码变得更加困难。一种常见的模式是拥有一个其值不可变的可变变量。这样身份(命名变量)和值(可以分配变量的不可变对象)是分开的。
foreach 正在循环我们启动 foreach 时的 acc 值。 acc 的任何突变都不会影响循环。这比 java 中的典型迭代器安全得多,其中列表可以在迭代中更改。
还有一个并发问题。 由于 JSR-133 内存模型规范,不可变对象非常有用,该规范断言对象最终成员的初始化将在任何线程对这些成员具有可见性之前发生,就这样!如果它们不是最终的,那么它们是“可变的”,并且不能保证正确的初始化。
Actor 是放置可变状态的完美场所。表示数据的对象应该是不可变的。以下面的例子为例。
在这种情况下,我们可以发送 acc 的值(它是一个 List ),并且不用担心同步,因为 List 是不可变的,即 List 对象的所有成员都是最终的。另外,由于不变性,我们知道没有其他参与者可以更改发送的底层数据结构,因此没有其他参与者可以更改该参与者的可变状态。
The case for functional programming is one of being concise in your code and bringing in a more mathematical approach. It can reduce the possibility of bugs and make your code smaller and more readable. As for being easier or not, it does require that you think about your problems differently. But once you get use to thinking with functional patterns it's likely that functional will become easier that the more imperative style.
It is really hard to be perfectly functional and have zero mutable state but very beneficial to have minimal mutable state. The thing to remember is that everything needs to done in balance and not to the extreme. By reducing the amount of mutable state you end up making it harder to write code with unintended consequences. A common pattern is to have a mutable variable whose value is immutable. This way identity ( the named variable ) and value ( an immutable object the variable can be assigned ) are seperate.
The foreach is looping over the value of acc at the time we started the foreach. Any mutations to acc do not affect the loop. This is much safer than the typical iterators in java where the list can change mid iteration.
There is also a concurrency concern. Immutable objects are useful because of the JSR-133 memory model specification which asserts that the initialization of an objects final members will occur before any thread can have visibility to those members, period! If they are not final then they are "mutable" and there is no guarantee of proper initialization.
Actors are the perfect place to put mutable state. Objects that represent data should be immutable. Take the following example.
In this case we can send the value of acc ( which is a List ) and not worry about synchronization because List is immutable aka all of the members of the List object are final. Also because of the immutability we know that no other actor can change the underlying data structure that was sent and thus no other actor can change the mutable state of this actor.
由于 Apocalisp 已经提到< /a> 我要引用他的内容,我将讨论代码。你说它只是乘以东西,但我没有看到这一点——它引用了至少三个外部定义的重要方法:
order
、states
和M.log
。我可以推断order
是一个Int
,并且states
返回一个采用List[T]
的函数和一个T
并返回Double
。还有一些奇怪的事情发生...
除了定义 seqPos 之外,从未使用过序列,那么为什么要这样做呢?
实际上,您可以在此处使用
sequence
而不是seqPos.map( _._1 )
,因为所做的只是撤消zipWithIndex
。另外,slice(0, pos)
只是take(pos)
。现在,考虑到缺少的方法,很难断言应该如何以函数式风格编写它。保留神秘方法会产生:
只有我怀疑
state
/M.log
内容也可以成为State
的一部分。既然我已经这样写了,我注意到其他优化。当然,您使用的滑动窗口让我想起了滑动:它只会从第一个元素开始,因此需要进行一些调整。不过,并不是太难。所以让我们再重写一次:
我希望我能看到
M.log
和states
...我打赌我可以把map
变成 < code>foldLeft 并取消这两个方法。我怀疑states
返回的方法可以采用整个切片而不是两个参数。不过……还不错吧?
Since Apocalisp has already mentioned the stuff I was going to quote him on, I'll discuss the code. You say it is just multiplying stuff, but I don't see that -- it makes reference to at least three important methods defined outside:
order
,states
andM.log
. I can infer thatorder
is anInt
, and thatstates
return a function that takes aList[T]
and aT
and returnsDouble
.There's also some weird stuff going on...
sequence
is never used except to defineseqPos
, so why do that?Actually, you could use
sequence
here instead ofseqPos.map( _._1 )
, since all that does is undo thezipWithIndex
. Also,slice(0, pos)
is justtake(pos)
.Now, given the missing methods, it is difficult to assert how this should really be written in functional style. Keeping the mystery methods would yield:
Only I suspect the
state
/M.log
stuff could be made part ofState
as well. I notice other optimizations now that I have written it like this. The sliding window you are using reminds me, of course, ofsliding
:That will only start at the orderth element, so some adaptation would be in order. Not too difficult, though. So let's rewrite it again:
I wish I could see
M.log
andstates
... I bet I could turn thatmap
into afoldLeft
and do away with these two methods. And I suspect the method returned bystates
could take the whole slice instead of two parameters.Still... not bad, is it?