联邦学习中的 Epoch 与 Rounds

发布于 2025-01-20 08:28:25 字数 1277 浏览 0 评论 0原文

我正在对联盟学习模型进行联合平均。在运行了数千轮模型之后,模型仍然没有收敛。 我如何增加训练中的时期数量,以及它与回合的数量有何不同? 我如何才能融合,因为我尝试增加回合的数量,但需要很长时间才能训练(我正在使用Google Colab,并且执行时间不能超过24小时,我也尝试订阅Google Colab Pro。使用GPU但不能很好地工作)

代码和培训结果如下提供

train_data = [train.create_tf_dataset_for_client(c).repeat(2).map(reshape_data)
.batch(batch_size=50,num_parallel_calls=50)
 for c in train_client_ids]

iterative_process = tff.learning.build_federated_averaging_process(
    model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.0001),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.9))

NUM_ROUNDS = 50000
state = iterative_process.initialize()

logdir = "/tmp/logs/scalars/training/"
summary_writer = tf.summary.create_file_writer(logdir)

with summary_writer.as_default():
  for round_num in range(0, NUM_ROUNDS):
    state, metrics = iterative_process.next(state, train_data)
    if (round_num% 1000) == 0:
      print('round {:2d}, metrics={}'.format(round_num, metrics))
      for name, value in metrics['train'].items():
        tf.summary.scalar(name, value, step=round_num)

,输出如图所示

I am applying federated averaging on my federated learning model. After running the model for thousands rounds the model still did not converged.
How can I increase the number of epochs in training, and how it differs from the number of rounds?
And how can I reach to convergence, since I tried to increase the number of rounds but it take long time to train (I am using Google Colab, and the execution time can not be more than 24 hours I also tried subscribed to Google Colab Pro to use the GPU but it did not work well)

The code and the training results are provided below

train_data = [train.create_tf_dataset_for_client(c).repeat(2).map(reshape_data)
.batch(batch_size=50,num_parallel_calls=50)
 for c in train_client_ids]

iterative_process = tff.learning.build_federated_averaging_process(
    model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.0001),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.9))

NUM_ROUNDS = 50000
state = iterative_process.initialize()

logdir = "/tmp/logs/scalars/training/"
summary_writer = tf.summary.create_file_writer(logdir)

with summary_writer.as_default():
  for round_num in range(0, NUM_ROUNDS):
    state, metrics = iterative_process.next(state, train_data)
    if (round_num% 1000) == 0:
      print('round {:2d}, metrics={}'.format(round_num, metrics))
      for name, value in metrics['train'].items():
        tf.summary.scalar(name, value, step=round_num)

And the output in shown in
this image

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烦人精 2025-01-27 08:28:25

See this tutorial for how to increase epochs (basically increase the number in .repeat()). The number of epochs is the number of iterations a client train on each batch. The number of rounds is the total number of federated computation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文