TFF：train_test_client_split 对每个客户端数据进行分区

发布于 2025-01-10 10:20:16 字数 935 浏览 6 评论 0原文

我正在构建一个联邦学习模型。我已经编写了下面的代码，但我不断收到错误，这也是不正确的请告诉我如何正确使用train_test_client_split函数？


@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
   return tf.data.experimental.CsvDataset(
     dataset_path, record_defaults=record_defaults, header=True )

source = tff.simulation.datasets.FilePerUserClientData(
  dataset_paths, create_tf_dataset_for_client_fn)
print(source.client_ids)
>> ['client_0', 'client_1', 'client_2']

@classmethod
def from_clients_and_fn():
    client_ids: Iterable[str]
    create_tf_dataset_for_client_fn: Callable[[str], tf.data.Dataset]

Splitting=source.from_clients_and_tf_fn(['client_0', 'client_1', 'client_2'],create_tf_dataset_for_client_fn)

source.train_test_client_split(client_data=Splitting,
                               num_test_clients=1)

NotFoundError：client_1；没有这样的文件或目录 [Op:IteratorGetNext]

该文件在那里并且路径是正确的，但我不知道这里的问题是什么？

原文

I am building a federated learning model.
I have written the code below, but I keep getting the error, which is also not true
please let me know how to use the function train_test_client_split properly?


@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
   return tf.data.experimental.CsvDataset(
     dataset_path, record_defaults=record_defaults, header=True )

source = tff.simulation.datasets.FilePerUserClientData(
  dataset_paths, create_tf_dataset_for_client_fn)
print(source.client_ids)
>> ['client_0', 'client_1', 'client_2']

@classmethod
def from_clients_and_fn():
    client_ids: Iterable[str]
    create_tf_dataset_for_client_fn: Callable[[str], tf.data.Dataset]

Splitting=source.from_clients_and_tf_fn(['client_0', 'client_1', 'client_2'],create_tf_dataset_for_client_fn)

source.train_test_client_split(client_data=Splitting,
                               num_test_clients=1)

NotFoundError: client_1; No such file or directory [Op:IteratorGetNext]

The file is there and the path is correct, but I don't know what it the problem here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娇纵 2025-01-17 10:20:16

您只需要正确的数据结构。尝试如下操作。

创建虚拟数据

import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
from collections import OrderedDict
# Dummy data
samples = 5
data = [[tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist()]]
df = pd.DataFrame(data)
df = df.explode(list(df.columns))
df.to_csv('client1.csv', index= False)
df.to_csv('client2.csv', index= False)

加载、处理和分割数据：

record_defaults = [int(), int(), int(), int(), float(),float(),float(),float(),float(),float(), int(), int()]

@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
   return tf.data.experimental.CsvDataset(dataset_path, record_defaults=record_defaults, header=True)
   
@tf.function
def add_parsing(dataset):
  def parse_dataset(*x):
    return OrderedDict([('label', x[:-1]), ('features', x[1:-1])])
  return dataset.map(parse_dataset, num_parallel_calls=tf.data.AUTOTUNE)

dataset_paths = {'client1': '/content/client1.csv', 'client2': '/content/client2.csv'}
source = tff.simulation.datasets.FilePerUserClientData(
  dataset_paths, create_tf_dataset_for_client_fn) 

# Make sure the client ids are tensor strings when splitting data.
source._client_ids = [tf.cast(c, tf.string) for c in source.client_ids] 
source = source.preprocess(add_parsing)

train, test = source.train_test_client_split(source, 1)

You just need the correct data structure. Try something like the following.

Create dummy data

import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
from collections import OrderedDict
# Dummy data
samples = 5
data = [[tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.normal((samples,)).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist(),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32).numpy().tolist()]]
df = pd.DataFrame(data)
df = df.explode(list(df.columns))
df.to_csv('client1.csv', index= False)
df.to_csv('client2.csv', index= False)

Load, process, and split data:

record_defaults = [int(), int(), int(), int(), float(),float(),float(),float(),float(),float(), int(), int()]

@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
   return tf.data.experimental.CsvDataset(dataset_path, record_defaults=record_defaults, header=True)
   
@tf.function
def add_parsing(dataset):
  def parse_dataset(*x):
    return OrderedDict([('label', x[:-1]), ('features', x[1:-1])])
  return dataset.map(parse_dataset, num_parallel_calls=tf.data.AUTOTUNE)

dataset_paths = {'client1': '/content/client1.csv', 'client2': '/content/client2.csv'}
source = tff.simulation.datasets.FilePerUserClientData(
  dataset_paths, create_tf_dataset_for_client_fn) 

# Make sure the client ids are tensor strings when splitting data.
source._client_ids = [tf.cast(c, tf.string) for c in source.client_ids] 
source = source.preprocess(add_parsing)

train, test = source.train_test_client_split(source, 1)

回复收藏 0 原文

~没有更多了~