tensorflow Federated的型号被卡在0.1精度上

发布于 2025-01-25 02:17:23 字数 5323 浏览 2 评论 0原文

我正在尝试培训MNIST数据集的联合模型。我正在使用 https://www.tensorflow.org.org.org/tuterateys/tutorials/simustorials/simustoratiors/simulations/simustions/simulations https://www.tensorflow.org/federated/tutorials/simulations 设置。所使用的数据集版本是KERAS（不是TFF中使用的LEAF的联合版本）。我正在对其进行分区，将其保存在字典上，并使用tff.simulation.datasets.testclientdata。
实现我的客户端实例应用此更改效果很好。但是，如果我从模拟中更改模型，则每回合都会给我〜0.1的精度。

教程中的模型尽可能简单，输入层为28*28 = 784神经元在DIM 10的输出层上堆叠，

model = tf.keras.models.Sequential([
  tf.keras.layers.InputLayer(input_shape=(784,)), 
  tf.keras.layers.Dense(units=10, kernel_initializer='zeros'),
  tf.keras.layers.Softmax(),
])

并激活了softmax激活：新模型是CNN：

 model = tf.keras.Sequential(
        [
            tf.keras.layers.Conv2D(
                16,
                8,
                strides=2,
                padding="same",
                activation="relu",
                input_shape=(28, 28, 1),
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Conv2D(
                32, 4, strides=2, padding="valid", activation="relu"
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(32, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

准确性从一轮变为圆，第一种情况增加，达到0.94的速度很快。在第二种情况下，我用3个固定客户端进行了大约240发子弹，每个20k元素，10个时代，批量尺寸32

。。我已经在使用花卉框架达到0.99精确度的Flower Framework上进行了偏心版和联合版本进行了测试。但是由于某种原因，我无法在TFF上使用它。

环境： Macos Bigsur TensorFlow == 2.8.0 TensorFlow ferated == 0.22.0

我希望指标和损失会更改。可能是使用其他型号存在问题吗？

完整代码：

from tensorflow.keras.datasets import cifar10, mnist
import numpy as np

EPOCHS = 10
BATCH_SIZE = 32

# ROUND_CLIENTS <= NUM_CLIENTS
ROUND_CLIENTS = 3
NUM_CLIENTS = 3

NUM_ROUNDS = 400

    
def make_client(num_clients,X, y):
    total_image_count = len(X)
    image_per_set = int(np.floor(total_image_count/num_clients))

    client_train_dataset = collections.OrderedDict()
    for i in range(1, num_clients+1):
        client_name = i-1
        start = image_per_set * (i-1)
        end = image_per_set * i

        print(f"Adding data from {start} to {end} for client : {client_name}")
        data = collections.OrderedDict((('label', y[start:end]), ('pixels', X[start:end])))
        client_train_dataset[client_name] = data
    
    train_dataset = tff.simulation.datasets.TestClientData(client_train_dataset)
    
    return train_dataset


def preprocess(X: np.ndarray, y: np.ndarray):
    """Basic preprocessing for MNIST dataset."""
    X = np.array(X, dtype=np.float32) / 255
    X = X.reshape((X.shape[0], 28, 28, 1))

    y = np.array(y, dtype=np.int32)
    y = tf.keras.utils.to_categorical(y, num_classes=10)

    return X, y


(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train, y_train) = preprocess(X_train, y_train)
(X_test, y_test) = preprocess(X_test, y_test)

mnistFedTrain = make_client(NUM_CLIENTS,X_train,y_train)

def map_fn(example):
    return collections.OrderedDict(
      x=example['pixels'], 
        y=example['label'])


def client_data(client_id):
    ds = mnistFedTrain.create_tf_dataset_for_client(mnistFedTrain.client_ids[client_id])
    return ds.repeat(EPOCHS).shuffle(500).batch(BATCH_SIZE).map(map_fn)


train_data = [client_data(n) for n in range(ROUND_CLIENTS)]
element_spec = train_data[0].element_spec

def create_cnn_model() -> tf.keras.Model:
    """Returns a sequential keras CNN Model."""
    return tf.keras.Sequential(
        [
            tf.keras.layers.Conv2D(
                16,
                8,
                strides=2,
                padding="same",
                activation="relu",
                input_shape=(28, 28, 1),
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Conv2D(
                32, 4, strides=2, padding="valid", activation="relu"
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(32, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

def model_fn():
    model = create_cnn_model()
    return tff.learning.from_keras_model(
      model,
      input_spec=element_spec,
      loss=tf.keras.losses.CategoricalCrossentropy(
                from_logits=True, reduction=tf.losses.Reduction.NONE
            ),
      metrics=[tf.keras.metrics.CategoricalAccuracy()]
    )


trainer = tff.learning.build_federated_averaging_process(
    model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02))


def evaluate(num_rounds=NUM_ROUNDS):
    state = trainer.initialize()
    for i in range(num_rounds):
        t1 = time.time()
        state, metrics = trainer.next(state, train_data)
        t2 = time.time()
        print('\n Round {r}: metrics {m}, round time {t:.2f} seconds'.format(
            m=metrics['train'], r=i, t=t2 - t1))

t1 = time.time()
evaluate(NUM_ROUNDS)
t2 = time.time()

print('Seconds:',t2 - t1,' = Minutes:', (t2 - t1)/60)

我在其他模型中也有类似的问题，例如在TF中为CIFAR10实施的MobilenetV2： `模型= tf.keras.applications.mobilenetv2（（32，32，3），类= 10，strige = none）

原文

I'm trying train a federated model for the mnist dataset. I am using the code avaible at https://www.tensorflow.org/federated/tutorials/simulations for the setup.
The dataset version being used is the the one from keras (not the federated version from leaf that is used in tff). I'm making a partition of it, saving it on a dictionary and implementing my ClientData instance with tff.simulation.datasets.TestClientData.
Applying this change works just fine. However, if I change the model from the simulation, every round gives me a ~0.1 accuracy.

The model in the tutorial is as simple as it can get, an input layer of 28*28=784 neurons stacked over an output layer of dim 10 with Softmax activation:

model = tf.keras.models.Sequential([
  tf.keras.layers.InputLayer(input_shape=(784,)), 
  tf.keras.layers.Dense(units=10, kernel_initializer='zeros'),
  tf.keras.layers.Softmax(),
])

And the new model is a cnn:

 model = tf.keras.Sequential(
        [
            tf.keras.layers.Conv2D(
                16,
                8,
                strides=2,
                padding="same",
                activation="relu",
                input_shape=(28, 28, 1),
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Conv2D(
                32, 4, strides=2, padding="valid", activation="relu"
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(32, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

Accuracy changed from round to round on the first case, increasing, reaching 0.94 quite fast.
On the second case I ran it for about 240 rounds with 3 fixed clients, 20k elements each, 10 epochs, batch size 32. Still couldn't get out of the ~0.1 accuracy and loss of ~2.3

The model works fine for this dataset. I already tested it on a centrilized version and a federated version using Flower framework reaching 0.99 accuracy. But for some reason I can't make it work on tff.

Environment:
MacOs BigSur
tensorflow==2.8.0
tensorflow-federated==0.22.0

I expect the metrics and loss to change more. Could it be that there is a problem with using other Models?

Full code:

from tensorflow.keras.datasets import cifar10, mnist
import numpy as np

EPOCHS = 10
BATCH_SIZE = 32

# ROUND_CLIENTS <= NUM_CLIENTS
ROUND_CLIENTS = 3
NUM_CLIENTS = 3

NUM_ROUNDS = 400

    
def make_client(num_clients,X, y):
    total_image_count = len(X)
    image_per_set = int(np.floor(total_image_count/num_clients))

    client_train_dataset = collections.OrderedDict()
    for i in range(1, num_clients+1):
        client_name = i-1
        start = image_per_set * (i-1)
        end = image_per_set * i

        print(f"Adding data from {start} to {end} for client : {client_name}")
        data = collections.OrderedDict((('label', y[start:end]), ('pixels', X[start:end])))
        client_train_dataset[client_name] = data
    
    train_dataset = tff.simulation.datasets.TestClientData(client_train_dataset)
    
    return train_dataset


def preprocess(X: np.ndarray, y: np.ndarray):
    """Basic preprocessing for MNIST dataset."""
    X = np.array(X, dtype=np.float32) / 255
    X = X.reshape((X.shape[0], 28, 28, 1))

    y = np.array(y, dtype=np.int32)
    y = tf.keras.utils.to_categorical(y, num_classes=10)

    return X, y


(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train, y_train) = preprocess(X_train, y_train)
(X_test, y_test) = preprocess(X_test, y_test)

mnistFedTrain = make_client(NUM_CLIENTS,X_train,y_train)

def map_fn(example):
    return collections.OrderedDict(
      x=example['pixels'], 
        y=example['label'])


def client_data(client_id):
    ds = mnistFedTrain.create_tf_dataset_for_client(mnistFedTrain.client_ids[client_id])
    return ds.repeat(EPOCHS).shuffle(500).batch(BATCH_SIZE).map(map_fn)


train_data = [client_data(n) for n in range(ROUND_CLIENTS)]
element_spec = train_data[0].element_spec

def create_cnn_model() -> tf.keras.Model:
    """Returns a sequential keras CNN Model."""
    return tf.keras.Sequential(
        [
            tf.keras.layers.Conv2D(
                16,
                8,
                strides=2,
                padding="same",
                activation="relu",
                input_shape=(28, 28, 1),
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Conv2D(
                32, 4, strides=2, padding="valid", activation="relu"
            ),
            tf.keras.layers.MaxPool2D(2, 1),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(32, activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

def model_fn():
    model = create_cnn_model()
    return tff.learning.from_keras_model(
      model,
      input_spec=element_spec,
      loss=tf.keras.losses.CategoricalCrossentropy(
                from_logits=True, reduction=tf.losses.Reduction.NONE
            ),
      metrics=[tf.keras.metrics.CategoricalAccuracy()]
    )


trainer = tff.learning.build_federated_averaging_process(
    model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02))


def evaluate(num_rounds=NUM_ROUNDS):
    state = trainer.initialize()
    for i in range(num_rounds):
        t1 = time.time()
        state, metrics = trainer.next(state, train_data)
        t2 = time.time()
        print('\n Round {r}: metrics {m}, round time {t:.2f} seconds'.format(
            m=metrics['train'], r=i, t=t2 - t1))

t1 = time.time()
evaluate(NUM_ROUNDS)
t2 = time.time()

print('Seconds:',t2 - t1,' = Minutes:', (t2 - t1)/60)

I've had a similar problem with other models as well, e.g. MobileNetV2 implemented in tf for cifar10:
`model = tf.keras.applications.MobileNetV2((32, 32, 3), classes=10, weights=None)

分享到QQ

分享到微博