可重复性,控制随机性,操作员级随机性在TFF中

发布于 2025-01-23 12:43:27 字数 462 浏览 0 评论 0原文

我有一个TFF代码,在跨不同运行训练时采用略有不同的优化路径,尽管设置了所有操作员级别的种子,在每回合中对客户进行采样的Numpy种子等等。 。是,即使在设置了所有的操作员级种子之后,也无法直接控制随机性的某些方面。因为一个人无法控制启动和结束的子课程的方式?

更具体地说,这些都是我的代码已经设置的所有操作员级种子:dataset.shuffle,create_tf_dataset_from_all_all_all_all_all_clients,keras.initializers and code> and np.random.random.random.random.seed - 环客户端采样(使用numpy)。我已经验证了初始模型状态在整个运行中都是相同的,但是一旦训练开始,模型状态就会开始在不同的运行中发散。在大多数情况下,差异是逐渐/缓慢的,但并非总是如此。

该代码非常复杂,因此不在此处添加。

I have a TFF code that takes a slightly different optimization path while training across different runs, despite having set all the operator-level seeds, numpy seeds for sampling clients in each round, etc. The FAQ section on TFF website does talk about randomness and expectation in TFF, but I found the answer slightly confusing. Is it the case that some aspects of the randomness can't be directly controlled even after setting all the operator-level seeds that one could; because one can't control the way sub-sessions are started and ended?

To be more specific, these are all the operator-level seeds that my code already sets: dataset.shuffle, create_tf_dataset_from_all_clients, keras.initializers and np.random.seed for per-round client sampling (which uses numpy). I have verified that the initial model state is the same across runs, but as soon as training starts, the model states start diverging across different runs. The divergence is gradual/slow in most cases, but not always.

The code is quite complex, so not adding it here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凝望流年 2025-01-30 12:43:27

还有一个非确定性的来源很难控制 - float32数字的总和不是交换的。

当您在回合中模拟许多客户端时,TFF执行程序将无法控制添加模型更新的顺序。结果,float32范围的底部可能存在一些差异。虽然这听起来可能忽略不计,但它可以加起来多个回合(我已经看到数百个,但也可能少),最终会导致不同的损失/准确性/模型权重轨迹,因为梯度将开始以稍微计算不同的点。


btw, this Tutorial 在控制TFF中控制随机性方面有更多信息。

There is one more source of non-determinism that would be very hard to control -- summation of float32 numbers is not commutative.

When you simulate a number of clients in a round, the TFF executor does not have a way to control the order in which the model updates are added together. As a result, there could be some differences at the bottom of the float32 range. While this may sound negligible, it can add up over a number of rounds (I have seen hundreds, but could be also less), and eventually cause different loss/accuracy/model weights trajectories, as the gradients will start to be computed at slightly different points.


BTW, this tutorial has more info on best practices in controlling randomness in TFF.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文