可重复性,控制随机性,操作员级随机性在TFF中
我有一个TFF代码,在跨不同运行训练时采用略有不同的优化路径,尽管设置了所有操作员级别的种子,在每回合中对客户进行采样的Numpy种子等等。 。是,即使在设置了所有的操作员级种子之后,也无法直接控制随机性的某些方面。因为一个人无法控制启动和结束的子课程的方式?
更具体地说,这些都是我的代码已经设置的所有操作员级种子:dataset.shuffle,create_tf_dataset_from_all_all_all_all_all_clients,keras.initializers
and code> and np.random.random.random.random.seed
- 环客户端采样(使用numpy)。我已经验证了初始模型状态在整个运行中都是相同的,但是一旦训练开始,模型状态就会开始在不同的运行中发散。在大多数情况下,差异是逐渐/缓慢的,但并非总是如此。
该代码非常复杂,因此不在此处添加。
I have a TFF code that takes a slightly different optimization path while training across different runs, despite having set all the operator-level seeds, numpy seeds for sampling clients in each round, etc. The FAQ section on TFF website does talk about randomness and expectation in TFF, but I found the answer slightly confusing. Is it the case that some aspects of the randomness can't be directly controlled even after setting all the operator-level seeds that one could; because one can't control the way sub-sessions are started and ended?
To be more specific, these are all the operator-level seeds that my code already sets: dataset.shuffle, create_tf_dataset_from_all_clients, keras.initializers
and np.random.seed
for per-round client sampling (which uses numpy). I have verified that the initial model state is the same across runs, but as soon as training starts, the model states start diverging across different runs. The divergence is gradual/slow in most cases, but not always.
The code is quite complex, so not adding it here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
还有一个非确定性的来源很难控制 -
float32
数字的总和不是交换的。当您在回合中模拟许多客户端时,TFF执行程序将无法控制添加模型更新的顺序。结果,
float32
范围的底部可能存在一些差异。虽然这听起来可能忽略不计,但它可以加起来多个回合(我已经看到数百个,但也可能少),最终会导致不同的损失/准确性/模型权重轨迹,因为梯度将开始以稍微计算不同的点。btw, this Tutorial 在控制TFF中控制随机性方面有更多信息。
There is one more source of non-determinism that would be very hard to control -- summation of
float32
numbers is not commutative.When you simulate a number of clients in a round, the TFF executor does not have a way to control the order in which the model updates are added together. As a result, there could be some differences at the bottom of the
float32
range. While this may sound negligible, it can add up over a number of rounds (I have seen hundreds, but could be also less), and eventually cause different loss/accuracy/model weights trajectories, as the gradients will start to be computed at slightly different points.BTW, this tutorial has more info on best practices in controlling randomness in TFF.