使用 create_tf_dataset_for_client() 定义数据集中的训练示例
我正在为联合设置准备一个数据集,在下面的代码中,我有多个 CSV 文件,并且每个文件都被视为单个客户端。
dataset_paths = {
'client_0': '/content/drive/ds1.csv',
'client_1': '/content/drive/ds2.csv',
'client_2': '/content/drive/ds3.csv',
'client_3': '/content/drive/ds4.csv',
'client_4': '/content/drive/ds5.csv',
}
## Defining the Dtyps for each columns in the datasets
record_defaults = [int(), int(), int(), int(), float(),float(),float(),float(),float(),float(), int(), int()]
@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
return tf.data.experimental.CsvDataset(
dataset_path, record_defaults=record_defaults, header=True )
source = tff.simulation.datasets.FilePerUserClientData(
dataset_paths, create_tf_dataset_for_client_fn)
我想要访问数据,以便确定 features
和 label
列。所以我输入:
for x in source.create_tf_dataset_for_client('client_1'):
print(x)
>>> (<tf.Tensor: shape=(), dtype=int32, numpy=-2145209674>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=14>, <tf.Tensor: shape=(), dtype=float32, numpy=64.17>, <tf.Tensor: shape=(), dtype=float32, numpy=18.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=270.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2143677297>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=60.83>, <tf.Tensor: shape=(), dtype=float32, numpy=14.89>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=42.5>, <tf.Tensor: shape=(), dtype=float32, numpy=184.72>, <tf.Tensor: shape=(), dtype=int32, numpy=8>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2138537298>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=18.82>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=85.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=295.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2103817421>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=8.8>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=64.58>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2081702335>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=10>, <tf.Tensor: shape=(), dtype=float32, numpy=75.83>, <tf.Tensor: shape=(), dtype=float32, numpy=9.7>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=78.47>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2067936920>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=10.95>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=95.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2065922700>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=3.76>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=11.81>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=3>)
由于我有大量数据,因此有更多行 所以我可以访问这些数据,因为它们是张量对象, 问题1我该如何声明DataFrame.iloc[1:-1] #Features
和 DataFrame.iloc[:-1] #Label 问题2 如何将每个文件拆分为训练集和测试集以开始训练过程?
I am preparing a dataset for federation settings, in the code below, I have multiple CSV files and used each is considered a single client.
dataset_paths = {
'client_0': '/content/drive/ds1.csv',
'client_1': '/content/drive/ds2.csv',
'client_2': '/content/drive/ds3.csv',
'client_3': '/content/drive/ds4.csv',
'client_4': '/content/drive/ds5.csv',
}
## Defining the Dtyps for each columns in the datasets
record_defaults = [int(), int(), int(), int(), float(),float(),float(),float(),float(),float(), int(), int()]
@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
return tf.data.experimental.CsvDataset(
dataset_path, record_defaults=record_defaults, header=True )
source = tff.simulation.datasets.FilePerUserClientData(
dataset_paths, create_tf_dataset_for_client_fn)
I wanted to access the data so I can determine the features
and label
column. so I typed:
for x in source.create_tf_dataset_for_client('client_1'):
print(x)
>>> (<tf.Tensor: shape=(), dtype=int32, numpy=-2145209674>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=14>, <tf.Tensor: shape=(), dtype=float32, numpy=64.17>, <tf.Tensor: shape=(), dtype=float32, numpy=18.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=270.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2143677297>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=60.83>, <tf.Tensor: shape=(), dtype=float32, numpy=14.89>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=42.5>, <tf.Tensor: shape=(), dtype=float32, numpy=184.72>, <tf.Tensor: shape=(), dtype=int32, numpy=8>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2138537298>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=18.82>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=85.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=295.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2103817421>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=8.8>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=64.58>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2081702335>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=10>, <tf.Tensor: shape=(), dtype=float32, numpy=75.83>, <tf.Tensor: shape=(), dtype=float32, numpy=9.7>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=78.47>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2067936920>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=10.95>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=95.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2065922700>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=3.76>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=11.81>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=3>)
there are more rows since I have a big size of data
So I can access these data as they are tensor objects,
Question1 how can I state that DataFrame.iloc[1:-1] #Features
and DataFrame.iloc[:-1] #Label
Question2 How can I split each file to training and testing sets to start the training process?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

您可以尝试这样的操作:
要创建测试和训练子集,只需使用
take
和skip
:如果您想将每个 csv 文件拆分为测试和训练数据集,您应该在创建 tf 数据集之前执行此操作。
You can try something like this:
To create test and train subsets just use
take
andskip
:If you want to split each csv file into a test and a training data set, you should do this before creating a
tf
data set.