在Keras中应用SVM的不同方法

发布于 2025-02-06 14:16:51 字数 1228 浏览 2 评论 0原文

我想使用KERAS构建一个多级分类模型。我的数据包含7个功能和4个标签。如果我使用的是Keras，我已经看到了两种应用支持向量机（SVM）算法的方法。

首先： keras中的准SVM 通过使用（RandomFourierFeatures图层）呈现在这里我已经建立了以下模型：

def create_keras_model():
  initializer = tf.keras.initializers.GlorotNormal()
  return tf.keras.models.Sequential([
                            layers.Input(shape=(7,)),
                            RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
                            layers.Dense(units=4, activation='softmax'),
                            ])

第二： 使用网络中的最后一层如所述在

def create_keras_model():
  return tf.keras.models.Sequential([
            tf.keras.layers.Input(shape=(7,)),
            tf.keras.layers.Dense(64),
            tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
            tf.keras.layers.Softmax()
                        
  ])

这里（）用作损失函数。我的问题是：这些方法是否适当，可以定义为应用SVM模型，还是仅仅是模型体系结构的近似值？简而言之，我可以说这是SVM模型的应用吗？

原文

I want to build a multi-class classification model using Keras.
My data is containing 7 features and 4 labels.
If I am using Keras I have seen two ways to apply the Support vector Machine (SVM) algorithm.

First:
A Quasi-SVM in Keras
By using the (RandomFourierFeatures layer) presented here
I have built the following model:

def create_keras_model():
  initializer = tf.keras.initializers.GlorotNormal()
  return tf.keras.models.Sequential([
                            layers.Input(shape=(7,)),
                            RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
                            layers.Dense(units=4, activation='softmax'),
                            ])

Second:
Using the last layer in the network as described here as follows:

def create_keras_model():
  return tf.keras.models.Sequential([
            tf.keras.layers.Input(shape=(7,)),
            tf.keras.layers.Dense(64),
            tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
            tf.keras.layers.Softmax()
                        
  ])

note: CategoricalHinge() was used as the loss function.
My question is: are these approaches appropriate and can be defined as applying of SVM model or it is just an approximation of the model architecture? in short, can I say this is applying of SVM model?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷弦 2025-02-13 14:16:51

您可以在数据上检查两个模型，如下所示：

我检查MNIST数据集并获得以下结果：

使用第二种方法
< Strong>快速训练使用第一次方法
首次方法
精确度准确性两种方法与彼此相同

from keras.utils.layer_utils import count_params  
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time


def create_model(approach):

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(784,)))
    if  approach == 'Quasi_SVM':
        model.add(tf.keras.layers.experimental.RandomFourierFeatures(
            output_dim=4096, scale=10.0, 
            kernel_initializer="gaussian"))
        model.add(tf.keras.layers.Dense(10))


    if approach == 'kernel_regularizer':
        model.add(tf.keras.layers.Dense(128, activation='relu'))
        model.add(tf.keras.layers.Dense(64, activation='relu'))
        model.add(tf.keras.layers.Dense(32, activation='relu'))
        model.add(tf.keras.layers.Dense(16, activation='relu'))
        model.add(tf.keras.layers.Dense(10, 
                                        kernel_regularizer = tf.keras.regularizers.l2(0.01), 
                                        activation='softmax')) 
    

    model.compile(
        optimizer = 'adam',
        loss = 'hinge',
        metrics=['accuracy'],
    )

    return model


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

for approach in ['Quasi_SVM', 'kernel_regularizer']:

    model = create_model(approach)
    start = time.time()
    history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
    print(f'Training time {approach} : {time.time() - start} sec')
    print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
    print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')

    
    df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
    fig, axes = plt.subplots(1,2, figsize=(18,6))
    for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
        ax.set_title(f'{approach} {mtr.title()} Plot')
        dfTmp = df[df['variable'].str.contains(mtr)]
        sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)

    fig.tight_layout()
    plt.show()

输出：（ colab ）

Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956

< a href =“ https://i.sstatic.net/jgtkx.png” rel =“ nofollow noreferrer”>

You can check two models on your data like below:

I check on mnist dataset and get the below result:

Less overfitting with the second approach
Fast training time with the first approach
Less trainable params with the first approach
Accuracy for two approaches same as each other

from keras.utils.layer_utils import count_params  
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time


def create_model(approach):

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(784,)))
    if  approach == 'Quasi_SVM':
        model.add(tf.keras.layers.experimental.RandomFourierFeatures(
            output_dim=4096, scale=10.0, 
            kernel_initializer="gaussian"))
        model.add(tf.keras.layers.Dense(10))


    if approach == 'kernel_regularizer':
        model.add(tf.keras.layers.Dense(128, activation='relu'))
        model.add(tf.keras.layers.Dense(64, activation='relu'))
        model.add(tf.keras.layers.Dense(32, activation='relu'))
        model.add(tf.keras.layers.Dense(16, activation='relu'))
        model.add(tf.keras.layers.Dense(10, 
                                        kernel_regularizer = tf.keras.regularizers.l2(0.01), 
                                        activation='softmax')) 
    

    model.compile(
        optimizer = 'adam',
        loss = 'hinge',
        metrics=['accuracy'],
    )

    return model


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

for approach in ['Quasi_SVM', 'kernel_regularizer']:

    model = create_model(approach)
    start = time.time()
    history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
    print(f'Training time {approach} : {time.time() - start} sec')
    print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
    print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')

    
    df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
    fig, axes = plt.subplots(1,2, figsize=(18,6))
    for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
        ax.set_title(f'{approach} {mtr.title()} Plot')
        dfTmp = df[df['variable'].str.contains(mtr)]
        sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)

    fig.tight_layout()
    plt.show()

Output: (benchmark on colab)

Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956

回复收藏 0 原文

橘和柠 2025-02-13 14:16:51

我认为这只是SVM模型的近似值，因为SVM的纯定义位于该定理上，我们必须使用原始的二键优化方法来计算支持向量，并使用此支持向量来绘制最大净金度超平面。但是在神经网络和框架中，例如Keras（通常是Tensorflow）主要使用梯度下降优化方法来查找最佳参数。此外，我认为我们必须在纯SVM中优化的参数数量与神经网络有所不同，就像您在问题中所写的那样。

回复收藏 0 原文

~没有更多了~