一个人如何有效地实施轴突（Elixir）中神经网络参数的扁平化和扁平化？

发布于 2025-02-07 01:04:45 字数 4700 浏览 2 评论 0 原文

要使用其他优化算法训练神经网络，例如遗传算法，需要将神经网络的权重变为矢量，使用矢量权重计算更新，然后再次将权重转换为神经网络可以使用的格式。在 Axon 中，权重/参数是作为 nx 张量的地图实现的。理想情况下，将会以深度版本/递归的形式横穿地图，并将重量结合到单个向量中。为了实现这一目标，将需要一种方法来跟踪重量形状，类型和名称。然后使用它来重建参数图。我对实现目的的这种天真实施。但是，我想知道是否有更好的方法可以达到相同的结果，而无需每次扁平或未弹性时重新创建张量。这是我的实现：

defmodule Params do
  def flatten(params) do
    params
    |> to_flat_map()
    |> Map.values()
    |> Enum.reduce(fn x, acc ->
      Nx.concatenate([Nx.flatten(acc), Nx.flatten(x)])
    end)
  end

  def unflatten(params, template) do
    template
    |> Enum.reduce({%{}, -1, params}, fn {key, tensor_opts}, {params_map, idx, params} ->
      start_idx = idx + 1
      stop_idx = start_idx + Tuple.product(tensor_opts[:shape]) - 1

      opts = Keyword.take(tensor_opts, [:type, :names])
      curr_param = Nx.reshape(params[start_idx..stop_idx], tensor_opts[:shape], opts)
      params_map = Map.put(params_map, key, curr_param)

      {params_map, stop_idx, params}
    end)
    |> then(&elem(&1, 0))
    |> Enum.reduce(%{}, fn {k, v}, params_map ->
      put_in(params_map, Enum.map(String.split(k, "."), &Access.key(&1, %{})), v)
    end)
  end

  def extract_template(params) do
    params
    |> to_flat_map()
    |> Enum.map(fn {k, v} -> {k, [shape: Nx.shape(v), type: Nx.type(v), names: Nx.names(v)]} end)
    |> Enum.into(Map.new())
  end

  defp to_flat_map(params) when is_map(params) and not is_struct(params) do
    for {k, v} <- params, sub_key = to_string(k), sub_map <- to_flat_map(v), into: %{} do
      case sub_map do
        {key, val} -> {sub_key <> "." <> key, val}
        val -> {sub_key, val}
      end
    end
  end

  defp to_flat_map(params), do: [params]
end

例如，请考虑以下简单模型：

model = 
  Axon.input({nil, 2}) 
  |> Axon.dense(2, activation: :relu) 
  |> Axon.dense(1, activation: :sigmoid)

在调用 axon.init 上获得以下参数，在模型上：

%{
  "dense_0" => %{
    "bias" => #Nx.Tensor<
      f32[2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84100>
      [0.0, 0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84101>
      [
        [0.2250952422618866, -0.2300528585910797],
        [0.8318504691123962, 1.00990629196167]
      ]
    >
  },
  "dense_1" => %{
    "bias" => #Nx.Tensor<
      f32[1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84102>
      [0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84103>
      [
        [0.5544151663780212],
        [1.0918326377868652]
      ]
    >
  }
}

呼叫 params.flatten 在参数上：

#Nx.Tensor<
  f32[9]
  EXLA.Backend<host:0, 0.2002877735.2882928648.84108>
  [0.0, 0.0, 0.2250952422618866, -0.2300528585910797, 0.8318504691123962, 1.00990629196167, 0.0, 0.5544151663780212, 1.0918326377868652]
>

然后可以通过遗传算法的各种操作员（例如跨界和突变）传递上述载体。之后，为了从模型获得预测，我们需要将向量转换回每个层的权重图。首先，我们需要模型的模板/蓝图加权它们的形状，类型和名称，我们可以通过调用 params.extract_template

%{
  "dense_0.bias" => [shape: {2}, type: {:f, 32}, names: [nil]],
  "dense_0.kernel" => [shape: {2, 2}, type: {:f, 32}, names: [nil, nil]],
  "dense_1.bias" => [shape: {1}, type: {:f, 32}, names: [nil]],
  "dense_1.kernel" => [shape: {2, 1}, type: {:f, 32}, names: [nil, nil]]
}

然后我们可以使用矢量和模板并调用 params.unflatten 获得可以与 axon.predict 一起使用的权重图。

%{
  "dense_0" => %{
    "bias" => #Nx.Tensor<
      f32[2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84109>
      [0.0, 0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84111>
      [
        [0.2250952422618866, -0.2300528585910797],
        [0.8318504691123962, 1.00990629196167]
      ]
    >
  },
  "dense_1" => %{
    "bias" => #Nx.Tensor<
      f32[1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84112>
      [0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84114>
      [
        [0.5544151663780212],
        [1.0918326377868652]
      ]
    >
  }
}

这里的问题是，在每个步骤中，我们都会创建多个张量，我们必须进行重塑操作。对于遗传算法，这可能很麻烦，因为我们在每个步骤中都有多个解决方案。我想知道如何有效地解决这个问题的扁平和坚定性，或者是对不必扁平重量的神经网络的遗传算法的方法？

原文

To train a neural network with other optimization algorithms e.g Genetic Algorithms one needs to flatten the weights of the neural network into a vector, compute the updates using the vector weights then transform the weights back again into a format that the neural network can use. In Axon the weights/params are implemented as a map of Nx tensors. Ideally one would traverse the map in a depthwise version/recursively and flattening and combining the weights into a single vector. To implement the reverse of this one would need a way to keep track of the weight shapes, types and names. Then use that to rebuild the parameters map. I have a naive implementation of this that achieves the purpose. However, I am wondering if there is better way to achieve the same results without having to recreate tensors each time they are flattened or unflattened.
Here is my implementation:

defmodule Params do
  def flatten(params) do
    params
    |> to_flat_map()
    |> Map.values()
    |> Enum.reduce(fn x, acc ->
      Nx.concatenate([Nx.flatten(acc), Nx.flatten(x)])
    end)
  end

  def unflatten(params, template) do
    template
    |> Enum.reduce({%{}, -1, params}, fn {key, tensor_opts}, {params_map, idx, params} ->
      start_idx = idx + 1
      stop_idx = start_idx + Tuple.product(tensor_opts[:shape]) - 1

      opts = Keyword.take(tensor_opts, [:type, :names])
      curr_param = Nx.reshape(params[start_idx..stop_idx], tensor_opts[:shape], opts)
      params_map = Map.put(params_map, key, curr_param)

      {params_map, stop_idx, params}
    end)
    |> then(&elem(&1, 0))
    |> Enum.reduce(%{}, fn {k, v}, params_map ->
      put_in(params_map, Enum.map(String.split(k, "."), &Access.key(&1, %{})), v)
    end)
  end

  def extract_template(params) do
    params
    |> to_flat_map()
    |> Enum.map(fn {k, v} -> {k, [shape: Nx.shape(v), type: Nx.type(v), names: Nx.names(v)]} end)
    |> Enum.into(Map.new())
  end

  defp to_flat_map(params) when is_map(params) and not is_struct(params) do
    for {k, v} <- params, sub_key = to_string(k), sub_map <- to_flat_map(v), into: %{} do
      case sub_map do
        {key, val} -> {sub_key <> "." <> key, val}
        val -> {sub_key, val}
      end
    end
  end

  defp to_flat_map(params), do: [params]
end

For example consider this simple model:

model = 
  Axon.input({nil, 2}) 
  |> Axon.dense(2, activation: :relu) 
  |> Axon.dense(1, activation: :sigmoid)

With the following parameters obtained after calling Axon.init on the model:

%{
  "dense_0" => %{
    "bias" => #Nx.Tensor<
      f32[2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84100>
      [0.0, 0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84101>
      [
        [0.2250952422618866, -0.2300528585910797],
        [0.8318504691123962, 1.00990629196167]
      ]
    >
  },
  "dense_1" => %{
    "bias" => #Nx.Tensor<
      f32[1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84102>
      [0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84103>
      [
        [0.5544151663780212],
        [1.0918326377868652]
      ]
    >
  }
}

Calling Params.flatten on the parameters yields the following:

#Nx.Tensor<
  f32[9]
  EXLA.Backend<host:0, 0.2002877735.2882928648.84108>
  [0.0, 0.0, 0.2250952422618866, -0.2300528585910797, 0.8318504691123962, 1.00990629196167, 0.0, 0.5544151663780212, 1.0918326377868652]
>

The above vector can then be passed through the various operators of genetic algorithms such as crossover and mutation. After that to get prediction from the model, we need to convert the vector back to a map of weights for each of the layers. First we need a template/blueprint of the model's weights their shapes, types and names, which we can get by calling Params.extract_template and obtain:

%{
  "dense_0.bias" => [shape: {2}, type: {:f, 32}, names: [nil]],
  "dense_0.kernel" => [shape: {2, 2}, type: {:f, 32}, names: [nil, nil]],
  "dense_1.bias" => [shape: {1}, type: {:f, 32}, names: [nil]],
  "dense_1.kernel" => [shape: {2, 1}, type: {:f, 32}, names: [nil, nil]]
}

Then we can use the vector and the template and call Params.unflatten to obtain a map of weights that we can use with Axon.predict.

%{
  "dense_0" => %{
    "bias" => #Nx.Tensor<
      f32[2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84109>
      [0.0, 0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][2]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84111>
      [
        [0.2250952422618866, -0.2300528585910797],
        [0.8318504691123962, 1.00990629196167]
      ]
    >
  },
  "dense_1" => %{
    "bias" => #Nx.Tensor<
      f32[1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84112>
      [0.0]
    >,
    "kernel" => #Nx.Tensor<
      f32[2][1]
      EXLA.Backend<host:0, 0.2002877735.2882928648.84114>
      [
        [0.5544151663780212],
        [1.0918326377868652]
      ]
    >
  }
}

The issue here is that at each step we create multiple tensors and we have to do the reshaping operations. For genetic algorithms this may be quite cumbersome as we have multiple solutions at each step. I would like to know how one can solve this issue of flattening and unflattening efficiently or is there an approach with Genetic Algorithms for neural networks where the weights do not have to be flattened?

分享到QQ

分享到微博