Solver Prototxt - 参数说明

Solver Prototxt

https://github.com/BVLC/caffe/wiki/Solver-Prototxt
caffe.proto: BVLC/caffe/src/caffe/proto/caffe.proto

The solver.prototxt is a configuration file used to tell caffe how you want the network trained.


配置文件


solver ['sɒlvə]:n. 解决者,解算机,求解程序
proto [prəʊtə]:n. 原型,样机,典型
Berkeley Vision and Learning Center,BVLC
Berkeley Artificial Intelligence Research,BAIR
Convolutional Architecture for Fast Feature Embedding,Caffe

1. caffe/examples/mnist/lenet_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU

2. Parameters

base_lr

基础学习率

This parameter indicates the base (beginning) learning rate of the network. The value is a real number (floating point).

The base learning rate.


基础学习率,在参数梯度下降优化的过程中,学习率会有所调整。调整的策略可以通过 lr_policy 参数设置。


lr_policy

学习率变化规律

This parameter indicates how the learning rate should change over time. This value is a quoted string.

Options include:

  1. “step” - drop the learning rate in step sizes indicated by the gamma parameter.
  2. “multistep” - drop the learning rate in step size indicated by the gamma at each specified stepvalue.
  3. “fixed” - the learning rate does not change. (保持 base_lr 不变。)
  4. “exp” - base_lr * gamma^iter. (iter 为当前迭代次数。)
  5. “poly” - the effective learning rate follows a polynomial decay, to be zero by the max_iter.
    base_lr * (1 - iter/max_iter) ^ (power) (学习率依照多项式衰减。)
  6. “sigmoid” - the effective learning rate follows a sigmod decay.
    base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize)))) (学习率依照 sigmod 衰减。)

where base_lr, max_iter, gamma, step, stepvalue and power are defined in the solver parameter protocol buffer, and iter is the current iteration.

  // The learning rate decay policy. The currently implemented learning rate
  // policies are as follows:
  //    - fixed: always return base_lr.
  //    - step: return base_lr * gamma ^ (floor(iter / step))
  //    - exp: return base_lr * gamma ^ iter
  //    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
  //    - multistep: similar to step but it allows non uniform steps defined by
  //      stepvalue
  //    - poly: the effective learning rate follows a polynomial decay, to be
  //      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
  //    - sigmoid: the effective learning rate follows a sigmod decay
  //      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
  //
  // where base_lr, max_iter, gamma, step, stepvalue and power are defined
  // in the solver parameter protocol buffer, and iter is the current iteration.

学习率变化规律设置为随着迭代次数的增加,慢慢降低。


quote [kwəʊt]:vt. 报价,引述,举证 vi. 报价,引用,引证 n. 引用
sigmoid  ['sɪgmɒɪd]:adj. 乙状结肠的,C 形的,S 形的 n. 乙状结肠,S 状弯曲
polynomial [,pɒlɪ'nəʊmɪəl]:n. 多项式,由 2 字以上组成的学名 adj. 多项式的,多词学名
decay  [dɪ'keɪ]:vi. 衰退,衰减,腐烂,腐朽 n. 衰退,衰减,腐烂,腐朽 vt. 使腐烂,使腐败,使衰退,使衰落
protocol ['prəʊtəkɒl]:n. 协议,草案,礼仪 vt. 拟定 vi. 拟定
multistep ['mʌlti,step]:adj. 多级的

gamma

学习率变化指数

This parameter indicates how much the learning rate should change every time we reach the next “step.” The value is a real number, and can be thought of as multiplying the current learning rate by said number to gain a new learning rate.

stepsize

学习率变化间隔

This parameter indicates how often (at some iteration count) that we should move onto the next “step” of training. This value is a positive integer.


stepsize 太小会导致学习率越来越小,达不到充分收敛的效果。


stepvalue

This parameter indicates one of potentially many iteration counts that we should move onto the next “step” of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.

potentially [pə'tɛnʃəli]:adv. 可能地,潜在地

max_iter

最大迭代次数

This parameter indicates when the network should stop training. The value is an integer indicate which iteration should be the last.

The maximum number of iterations.


iteration (iter) 相当于一次权重更新。一次前向和一次后向,然后更新权重。


momentum

动量

This parameter indicates how much of the previous weight will be retained in the new calculation. This value is a real fraction.

calculation [kælkjʊ'leɪʃ(ə)n]:n. 计算,估计,计算的结果,深思熟虑
retain  [rɪ'teɪn]:vt. 保持,雇,记住
momentum  [mə'mentəm]:n. 势头,动量,动力,冲力
Newton's laws of motion:牛顿运动定律
First law:第一定律
Second law:第二定律
Third law:第三定律

保留上一次权重的比率。

为寻优加入了惯性的影响,当误差曲面中存在平坦区的时候,SGD 可以以更快的速度学习。

牛顿第一运动定律 (牛顿第一定律、惯性定律、惰性定律):任何物体都要保持匀速直线运动或静止状态,直到外力迫使它改变运动状态为止。


weight_decay

权重衰减

This parameter indicates the factor of (regularization) penalization of large weights. This value is a often a real fraction.

regularization [,rɛɡjʊlərɪ'zeʃən]:n. 规则化,调整,合法化
penalization ['penəlai,zeiʃn]:n. 惩罚,处罚
fraction  ['frækʃ(ə)n]:n. 分数,部分,小部分,稍微

权重衰减,防止过拟合。


random_seed

A random seed used by the solver and the network (for example, in dropout layer).

solver_mode

CPU / GPU 模式

This parameter indicates which mode will be used in solving the network.

Options include:

  1. CPU
  2. GPU

snapshot

保存模型的间隔

This parameter indicates how often caffe should output a model and solverstate. This value is a positive integer.

snapshot_prefix

保存模型的前缀 / 路径

This parameter indicates how a snapshot output’s model and solverstate’s name should be prefixed. This value is a double quoted string.

prefix ['priːfɪks]:n. 前缀 vt. 加前缀,将某事物加在前面

net

训练或者测试配置文件 (train_net or test_net)

This parameter indicates the location of the network to be trained (path to prototxt). This value is a double quoted string.

iter_size

Accumulate gradients across batches through the iter_size solver field. With this setting batch_size: 16 with iter_size: 1 and batch_size: 4 with iter_size: 4 are equivalent.

Accumulate gradients over iter_size x batch_size instances.

accumulate [ə'kjuːmjʊleɪt]:vi. 累积,积聚 vt. 积攒
gradient ['greɪdɪənt]:n. 梯度,坡度,倾斜度
equivalent [ɪ'kwɪv(ə)l(ə)nt]:adj. 等价的,相等的,同意义的 n. 等价物,相等物

test_iter

测试迭代次数

This parameter indicates how many test iterations should occur per test_interval. This value is a positive integer.

The number of iterations for each test net.

occur [ə'kɜː]:vi. 发生,出现,存在

test_iter: 100
test_iter specifies how many forward passes the test should carry out.
In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.


test_interval

测试间隔

This parameter indicates how often the test phase of the network will be executed.

The number of iterations between two testing phases.


test_interval: 500
Carry out testing every 500 training iterations.

test_interval 表示网络迭代 test_interval 次进行一次测试。可以设置为网络训练完一个 epoch,进行一次测试。一个 epoch 为 5000 次迭代时,可以设置为 test_interval=5000。


display

屏幕显示间隔

This parameter indicates how often caffe should output results to the screen. This value is a positive integer and specifies an iteration count.

The number of iterations between displaying info. If display = 0, no info will be displayed.

type

This parameter indicates the back propagation algorithm used to train the network. This value is a quoted string.

Options include:

  1. Stochastic Gradient Descent “SGD”
  2. AdaDelta “AdaDelta”
  3. Adaptive Gradient “AdaGrad”
  4. Adam “Adam”
  5. Nesterov’s Accelerated Gradient “Nesterov”
  6. RMSprop “RMSProp”

loss function 可能是非凸的,没有解析解,需要通过优化方法来求解。Caffe 提供 6 种优化算法来求解最优参数,在 solver 配置文件中,通过设置 type 类型来选择。
solver 的主要作用是交替调用前向 (forward) 算法和后向 (backward) 算法来更新参数,从而最小化 loss,是一种迭代的优化算法。


batchsize 表示每迭代一次,网络训练图片的数量。如果 batchsize=256,则网络每迭代一次,训练 256 张图片。如果总图片张数为 2560000 张,将所有的图片通过网络训练一次,则需要 2560000/256=10000 次迭代。

epoch 表示将所有的训练图像全部通过网络训练一次。例如 5000 iteration 为一个 epoch,如果你想要网络训练 100 代时,则总的迭代次数为max_iteration=5000*100=500000次;

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 成长之路 设计师:Amelia_0503 返回首页