当前位置：文江博客话题详情

attributeError：＆＃x27; mapdataset＆＃x27;对象没有属性＆＃x27; preprocess＆＃x27;在tensorflow_federated TFF中

发布于 2025-01-25 19:58:08 字数 2865 浏览 5 评论 0 原文

我正在测试本教程，并以非IID分布进行联合学习： https://www.tensorflow.org/federated/tutorials/tff_for_federated_learning_research_compression

In this posted问题 tensorflow> tensorflow'数据集？建议使用tff.simulation.datasets.build_single_label_dataset（）作为生成数据集的非IID分布的一种方式。

我尝试先应用该（请参阅代码）并出现错误！

emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data(
    only_digits=False)
emnist_train1 = tff.simulation.datasets.build_single_label_dataset(
  emnist_train.create_tf_dataset_from_all_clients(),
  label_key='label', desired_label=1)

print(emnist_train1.element_spec)

OrderDict（[[（[（'label'，tensorsPec（shape =（），dtype = tf.int32，name = none）），（'pixels'，tensorspec（shape =（28，28），dtype = ttype = tf.float3232 ，name = none））））

print(next(iter(emnist_train1))['label'])

tf.tensor（1，shape =（），dtype = int32）

MAX_CLIENT_DATASET_SIZE = 418

CLIENT_EPOCHS_PER_ROUND = 1
CLIENT_BATCH_SIZE = 20
TEST_BATCH_SIZE = 500

def reshape_emnist_element(element):
  return (tf.expand_dims(element['pixels'], axis=-1), element['label'])

def preprocess_train_dataset(dataset):
  return (dataset
          .shuffle(buffer_size=MAX_CLIENT_DATASET_SIZE)
          .repeat(CLIENT_EPOCHS_PER_ROUND)
          .batch(CLIENT_BATCH_SIZE, drop_remainder=False)
          .map(reshape_emnist_element))

emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

>> ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-cda96c33a0f6> in <module>()
     15           .map(reshape_emnist_element))
     16 
---> 17 emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

AttributeError: 'MapDataset' object has no attribute 'preprocess'

由于数据集被过滤，因此无法进行预处理！因此，在这种情况下，它是根据哪个标签过滤的？

... label_key='label', desired_label=1)

所需的标签= 1 emnist中的哪个标签？

我的问题是：

如何应用此函数tff.simulation.datasets.build_single_single_label_dataset（）要获取非IID数据集 （每个客户端的样本数量不同） 在本特定教程中！ https://www.tensorflow.org/federated/tutorials/tff_for_federated_learning_research_compression in details without关于过滤数据集的错误！

感谢任何帮助！

多谢！

原文

I'm testing this tutorial with non-IID distribution for federated learning:
https://www.tensorflow.org/federated/tutorials/tff_for_federated_learning_research_compression

In this posted question TensorFlow Federated: How to tune non-IIDness in federated dataset? it suggested to use tff.simulation.datasets.build_single_label_dataset() as a way to produce a non-IID distribution for the dataset.

I tried to apply that first (see the code) and got an error !

emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data(
    only_digits=False)
emnist_train1 = tff.simulation.datasets.build_single_label_dataset(
  emnist_train.create_tf_dataset_from_all_clients(),
  label_key='label', desired_label=1)

print(emnist_train1.element_spec)

OrderedDict([('label', TensorSpec(shape=(), dtype=tf.int32, name=None)), ('pixels', TensorSpec(shape=(28, 28), dtype=tf.float32, name=None))])

print(next(iter(emnist_train1))['label'])

tf.Tensor(1, shape=(), dtype=int32)

MAX_CLIENT_DATASET_SIZE = 418

CLIENT_EPOCHS_PER_ROUND = 1
CLIENT_BATCH_SIZE = 20
TEST_BATCH_SIZE = 500

def reshape_emnist_element(element):
  return (tf.expand_dims(element['pixels'], axis=-1), element['label'])

def preprocess_train_dataset(dataset):
  return (dataset
          .shuffle(buffer_size=MAX_CLIENT_DATASET_SIZE)
          .repeat(CLIENT_EPOCHS_PER_ROUND)
          .batch(CLIENT_BATCH_SIZE, drop_remainder=False)
          .map(reshape_emnist_element))

emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

>> ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-cda96c33a0f6> in <module>()
     15           .map(reshape_emnist_element))
     16 
---> 17 emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

AttributeError: 'MapDataset' object has no attribute 'preprocess'

Since dataset is filtered, it is not able to preprocess!
So, in this case, it is filtered based on what label?

... label_key='label', desired_label=1)

the desired label = 1 for which label in EMNIST?

My Question is:

How can I apply this function tff.simulation.datasets.build_single_label_dataset()
to get non-IID dataset (different number of samples for each client) in this specific tutorial ! https://www.tensorflow.org/federated/tutorials/tff_for_federated_learning_research_compression in details without error regarding the filtered dataset!

Appreciate any help!

Thanks a lot!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

感情洁癖 2025-02-01 19:58:08

可能存在 tff.simulation。 datasets.clientdata 和对覆盖有用的API。

tf.data.dataset di 有 preprocess 方法， tff.simulation.datasets.clientdata.preprocess 确实存在。

但是，使用 tf.data.DataSet 实例：输入参数和输出结果为 tf.data.data.dataset 实例。在这种情况下， emnist_train1 是 tf.data.dataset ，它 not 具有 preprocess> preprocess 方法。

但是，一切都没有丢失！ preprocess_train_dataset 函数获取 tf.data.dataset 参数，并返回 tf.data.data.dataset 结果。这应该意味着替换：

emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

使用

emnist_train1 = preprocess_train_dataset(emnist_train1)

将创建 tf.data.dataset 仅使用一个单个标签（“标签非IID”），该标签被改组，重复，批处理和重塑。请注意，单个 tf.data.dataset 通常用于表示联合算法中的一个用户。为了创建更多的批次，类似以下批次可以起作用：

client_datasets = [
   emnist_train1.take(random.randint(1, MAX_BATCHES))
   for _ in range(NUM_CLIENTS)
]

Possibly there is some confusion between the tff.simulation.datasets.ClientData and tf.data.Dataset APIs that would be useful to cover.

tf.data.Dataset does not have a preprocess method, with tff.simulation.datasets.ClientData.preprocess does exist.

However, tff.simulation.datasets.build_single_label_dataset uses tf.data.Dataset instances: both the input argument and the output result as tf.data.Dataset instances. In this case, emnist_train1 is a tf.data.Dataset which does not have a preprocess method.

However, all is not lost! The preprocess_train_dataset function takes a tf.data.Dataset argument, and returns a tf.data.Dataset result. This should mean that replacing:

emnist_train1 = emnist_train1.preprocess(preprocess_train_dataset)

with

emnist_train1 = preprocess_train_dataset(emnist_train1)

will create a tf.data.Dataset with only a single label ("label non-IID") that is shuffled, repeated, batched, and reshaped. Note that a single tf.data.Dataset is generally used to represent one user in the federated algorithm. To create more, with a random number of batches, something like the following could work:

client_datasets = [
   emnist_train1.take(random.randint(1, MAX_BATCHES))
   for _ in range(NUM_CLIENTS)
]

回复收藏 0 原文

~没有更多了~

关于作者

可爱咩

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

attributeError：＆＃x27; mapdataset＆＃x27;对象没有属性＆＃x27; preprocess＆＃x27;在tensorflow_federated TFF中

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

attributeError：＆＃x27; mapdataset＆＃x27;对象没有属性＆＃x27; preprocess＆＃x27;在tensorflow_federated TFF中

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。