与多输入多输出自动编码器的Keras拟合时的数据大小问题

发布于 2025-02-13 17:43:55 字数 2125 浏览 4 评论 0原文

我试图建立一个多输入的多输出编码器,以进行密集的数据表示。 (对体系结构的解释是,我希望一个通用网络对数字和分类数据重建进行优化。)

从我的原始数据中,我产生了一个28列宽的单热数组,用于分类数据,而另一个则产生了另一个数据。数值数据的17列广泛的归一化数组:

m_cat_train.shape # (59768, 28)
m_cat_test.shape # (3146, 28)
m_num_train.shape # (59768, 17)
m_num_test.shape # (3146, 17)

我的体系结构:

cat_input = Input(shape=(m_cat.shape[1],))
num_input = Input(shape=(m_num.shape[1],))

cat_enc = Dense(16, activation='relu')(cat_input)
cat_enc = Dense(8, activation='relu')(cat_enc)

num_enc = Dense(16, activation='relu')(num_input)
num_enc = Dense(8, activation='relu')(num_enc)

bottleneck = concatenate([cat_enc, num_enc])

cat_dec = Dense(8, activation='relu')(bottleneck)
cat_dec = Dense(16, activation='relu')(cat_dec)
cat_output = Dense(m_cat.shape[1], activation='sigmoid', name="cat_output")(cat_dec)

num_dec = Dense(8, activation='relu')(bottleneck)
num_dec = Dense(16, activation='relu')(num_dec)
num_output = Dense(m_num.shape[1], activation='linear', name="num_output")(num_dec)

model = Model(inputs=[cat_input, num_input], 
              outputs =[cat_output, num_output], 
              name="autoencoder")

但是,在培训中:

model.compile(optimizer="adam",
              loss={"cat_output" : "categorical_crossentropy", "num_output":"mse"},
              loss_weights={"cat_output": 1.0, "num_output": 1.0},
              metrics={"cat_output": 'accuracy', "num_output": 'accuracy'})

hist = model.fit([m_cat_train, m_num_train], [m_cat_test, m_num_test],
    batch_size=16,
    epochs=16,
    verbose=1)

我收到以下错误消息:

ValueError: Data cardinality is ambiguous:
  x sizes: 59756, 59756
  y sizes: 3146, 3146
Make sure all arrays contain the same number of samples.

这里有什么问题?据我所知,输入和输出的尺寸还可以,如何解释错误消息?这里的“基数”怎么样?

奖励问题相关:如何定义早期stopping回调以拟合哪种指标将两个指标结合到一个停止条件中?

from keras.callbacks import EarlyStopping

earlyStopping = EarlyStopping(monitor='val_accuracy', 
    restore_best_weights=True,
    mode='max')

在上述代码中观察到哪个val_accuracy

I try to build up a multi-input multi-output encoder for dense data representation. (The explanation for the architecture is that I wanted one common network to optimize for the numerical and the categorical data reconstruction in one.)

From my original data I produced a 28-column wide one-hot-encoded array for the categorical data and another 17-column wide normalized array for the numerical data:

m_cat_train.shape # (59768, 28)
m_cat_test.shape # (3146, 28)
m_num_train.shape # (59768, 17)
m_num_test.shape # (3146, 17)

My architecture:

cat_input = Input(shape=(m_cat.shape[1],))
num_input = Input(shape=(m_num.shape[1],))

cat_enc = Dense(16, activation='relu')(cat_input)
cat_enc = Dense(8, activation='relu')(cat_enc)

num_enc = Dense(16, activation='relu')(num_input)
num_enc = Dense(8, activation='relu')(num_enc)

bottleneck = concatenate([cat_enc, num_enc])

cat_dec = Dense(8, activation='relu')(bottleneck)
cat_dec = Dense(16, activation='relu')(cat_dec)
cat_output = Dense(m_cat.shape[1], activation='sigmoid', name="cat_output")(cat_dec)

num_dec = Dense(8, activation='relu')(bottleneck)
num_dec = Dense(16, activation='relu')(num_dec)
num_output = Dense(m_num.shape[1], activation='linear', name="num_output")(num_dec)

model = Model(inputs=[cat_input, num_input], 
              outputs =[cat_output, num_output], 
              name="autoencoder")

However, at training:

model.compile(optimizer="adam",
              loss={"cat_output" : "categorical_crossentropy", "num_output":"mse"},
              loss_weights={"cat_output": 1.0, "num_output": 1.0},
              metrics={"cat_output": 'accuracy', "num_output": 'accuracy'})

hist = model.fit([m_cat_train, m_num_train], [m_cat_test, m_num_test],
    batch_size=16,
    epochs=16,
    verbose=1)

I get the following error message:

ValueError: Data cardinality is ambiguous:
  x sizes: 59756, 59756
  y sizes: 3146, 3146
Make sure all arrays contain the same number of samples.

What is the problem here? As far as I see the dimensions of the inputs and outputs are OK, how to interpret the error message? How comes 'cardinality' here?

Bonus question related: how to define an EarlyStopping callback for fitting which somehow combines the two metrics into one as stopping condition?

from keras.callbacks import EarlyStopping

earlyStopping = EarlyStopping(monitor='val_accuracy', 
    restore_best_weights=True,
    mode='max')

Which val_accuracy is observed in the above code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文