在Keras中应用SVM的不同方法

发布于 2025-02-06 14:16:51 字数 1228 浏览 2 评论 0原文

我想使用KERAS构建一个多级分类模型。 我的数据包含7个功能和4个标签。 如果我使用的是Keras,我已经看到了两种应用支持向量机(SVM)算法的方法。

首先keras中的准SVM 通过使用(RandomFourierFeatures图层)呈现在这里 我已经建立了以下模型:

def create_keras_model():
  initializer = tf.keras.initializers.GlorotNormal()
  return tf.keras.models.Sequential([
                            layers.Input(shape=(7,)),
                            RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
                            layers.Dense(units=4, activation='softmax'),
                            ])

第二使用网络中的最后一层如所述

def create_keras_model():
  return tf.keras.models.Sequential([
            tf.keras.layers.Input(shape=(7,)),
            tf.keras.layers.Dense(64),
            tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
            tf.keras.layers.Softmax()
                        
  ])

这里()用作损失函数。 我的问题是:这些方法是否适当,可以定义为应用SVM模型,还是仅仅是模型体系结构的近似值?简而言之,我可以说这是SVM模型的应用吗?

I want to build a multi-class classification model using Keras.
My data is containing 7 features and 4 labels.
If I am using Keras I have seen two ways to apply the Support vector Machine (SVM) algorithm.

First:
A Quasi-SVM in Keras
By using the (RandomFourierFeatures layer) presented here
I have built the following model:

def create_keras_model():
  initializer = tf.keras.initializers.GlorotNormal()
  return tf.keras.models.Sequential([
                            layers.Input(shape=(7,)),
                            RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
                            layers.Dense(units=4, activation='softmax'),
                            ])

Second:
Using the last layer in the network as described here as follows:

def create_keras_model():
  return tf.keras.models.Sequential([
            tf.keras.layers.Input(shape=(7,)),
            tf.keras.layers.Dense(64),
            tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
            tf.keras.layers.Softmax()
                        
  ])

note: CategoricalHinge() was used as the loss function.
My question is: are these approaches appropriate and can be defined as applying of SVM model or it is just an approximation of the model architecture? in short, can I say this is applying of SVM model?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷弦 2025-02-13 14:16:51

您可以在数据上检查两个模型,如下所示:

我检查MNIST数据集并获得以下结果:

  1. 使用第二种方法
  2. < Strong>快速训练使用第一次方法
  3. 首次方法
  4. 精确度准确性两种方法与彼此相同
from keras.utils.layer_utils import count_params  
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time


def create_model(approach):

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(784,)))
    if  approach == 'Quasi_SVM':
        model.add(tf.keras.layers.experimental.RandomFourierFeatures(
            output_dim=4096, scale=10.0, 
            kernel_initializer="gaussian"))
        model.add(tf.keras.layers.Dense(10))


    if approach == 'kernel_regularizer':
        model.add(tf.keras.layers.Dense(128, activation='relu'))
        model.add(tf.keras.layers.Dense(64, activation='relu'))
        model.add(tf.keras.layers.Dense(32, activation='relu'))
        model.add(tf.keras.layers.Dense(16, activation='relu'))
        model.add(tf.keras.layers.Dense(10, 
                                        kernel_regularizer = tf.keras.regularizers.l2(0.01), 
                                        activation='softmax')) 
    

    model.compile(
        optimizer = 'adam',
        loss = 'hinge',
        metrics=['accuracy'],
    )

    return model


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

for approach in ['Quasi_SVM', 'kernel_regularizer']:

    model = create_model(approach)
    start = time.time()
    history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
    print(f'Training time {approach} : {time.time() - start} sec')
    print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
    print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')

    
    df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
    fig, axes = plt.subplots(1,2, figsize=(18,6))
    for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
        ax.set_title(f'{approach} {mtr.title()} Plot')
        dfTmp = df[df['variable'].str.contains(mtr)]
        sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)

    fig.tight_layout()
    plt.show()

输出:( colab

Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956

< a href =“ https://i.sstatic.net/jgtkx.png” rel =“ nofollow noreferrer”>

”在此处输入图像说明”

You can check two models on your data like below:

I check on mnist dataset and get the below result:

  1. Less overfitting with the second approach
  2. Fast training time with the first approach
  3. Less trainable params with the first approach
  4. Accuracy for two approaches same as each other
from keras.utils.layer_utils import count_params  
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time


def create_model(approach):

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(784,)))
    if  approach == 'Quasi_SVM':
        model.add(tf.keras.layers.experimental.RandomFourierFeatures(
            output_dim=4096, scale=10.0, 
            kernel_initializer="gaussian"))
        model.add(tf.keras.layers.Dense(10))


    if approach == 'kernel_regularizer':
        model.add(tf.keras.layers.Dense(128, activation='relu'))
        model.add(tf.keras.layers.Dense(64, activation='relu'))
        model.add(tf.keras.layers.Dense(32, activation='relu'))
        model.add(tf.keras.layers.Dense(16, activation='relu'))
        model.add(tf.keras.layers.Dense(10, 
                                        kernel_regularizer = tf.keras.regularizers.l2(0.01), 
                                        activation='softmax')) 
    

    model.compile(
        optimizer = 'adam',
        loss = 'hinge',
        metrics=['accuracy'],
    )

    return model


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

for approach in ['Quasi_SVM', 'kernel_regularizer']:

    model = create_model(approach)
    start = time.time()
    history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
    print(f'Training time {approach} : {time.time() - start} sec')
    print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
    print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')

    
    df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
    fig, axes = plt.subplots(1,2, figsize=(18,6))
    for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
        ax.set_title(f'{approach} {mtr.title()} Plot')
        dfTmp = df[df['variable'].str.contains(mtr)]
        sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)

    fig.tight_layout()
    plt.show()

Output: (benchmark on colab)

Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956

enter image description here

enter image description here

橘和柠 2025-02-13 14:16:51

我认为这只是SVM模型的近似值,因为SVM的纯定义位于该定理上,我们必须使用原始的二键优化方法来计算支持向量,并使用此支持向量来绘制最大净金度超平面。但是在神经网络和框架中,例如Keras(通常是Tensorflow)主要使用梯度下降优化方法来查找最佳参数。此外,我认为我们必须在纯SVM中优化的参数数量与神经网络有所不同,就像您在问题中所写的那样。

I think it's just an approximation of the SVM model, because the pure definition of SVM stand on this theorem that, we have to compute the support vector with the Primal-Dual Optimization approach and use this support vector for draw maximum-margin hyperplane. but in the neural network and the framework like Keras(in general tensorflow) mostly use the gradient descent optimization approach to find optimal parameter. Besides, I think the number of parameters, which we have to optimize in pure SVM is different with the neural network, like which you have been wrote in the question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文