# Solver Prototxt

https://github.com/BVLC/caffe/wiki/Solver-Prototxt
caffe.proto: BVLC/caffe/src/caffe/proto/caffe.proto

The solver.prototxt is a configuration file used to tell caffe how you want the network trained.

solver ['sɒlvə]：n. 解决者，解算机，求解程序
proto [prəʊtə]：n. 原型，样机，典型
Berkeley Vision and Learning Center，BVLC
Berkeley Artificial Intelligence Research，BAIR
Convolutional Architecture for Fast Feature Embedding，Caffe


## 1. caffe/examples/mnist/lenet_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU


## 2. Parameters

#### base_lr

This parameter indicates the base (beginning) learning rate of the network. The value is a real number (floating point).

The base learning rate.

#### lr_policy

This parameter indicates how the learning rate should change over time. This value is a quoted string.

Options include:

1. “step” - drop the learning rate in step sizes indicated by the gamma parameter.
2. “multistep” - drop the learning rate in step size indicated by the gamma at each specified stepvalue.
3. “fixed” - the learning rate does not change. (保持 base_lr 不变。)
4. “exp” - base_lr * gamma^iter. (iter 为当前迭代次数。)
5. “poly” - the effective learning rate follows a polynomial decay, to be zero by the max_iter.
base_lr * (1 - iter/max_iter) ^ (power) (学习率依照多项式衰减。)
6. “sigmoid” - the effective learning rate follows a sigmod decay.
base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize)))) (学习率依照 sigmod 衰减。)

where base_lr, max_iter, gamma, step, stepvalue and power are defined in the solver parameter protocol buffer, and iter is the current iteration.

  // The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.


quote [kwəʊt]：vt. 报价，引述，举证 vi. 报价，引用，引证 n. 引用
sigmoid  ['sɪgmɒɪd]：adj. 乙状结肠的，C 形的，S 形的 n. 乙状结肠，S 状弯曲
polynomial [,pɒlɪ'nəʊmɪəl]：n. 多项式，由 2 字以上组成的学名 adj. 多项式的，多词学名
decay  [dɪ'keɪ]：vi. 衰退，衰减，腐烂，腐朽 n. 衰退，衰减，腐烂，腐朽 vt. 使腐烂，使腐败，使衰退，使衰落
protocol ['prəʊtəkɒl]：n. 协议，草案，礼仪 vt. 拟定 vi. 拟定


#### gamma

This parameter indicates how much the learning rate should change every time we reach the next “step.” The value is a real number, and can be thought of as multiplying the current learning rate by said number to gain a new learning rate.

#### stepsize

This parameter indicates how often (at some iteration count) that we should move onto the next “step” of training. This value is a positive integer.

stepsize 太小会导致学习率越来越小，达不到充分收敛的效果。

##### stepvalue

This parameter indicates one of potentially many iteration counts that we should move onto the next “step” of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.

potentially [pə'tɛnʃəli]：adv. 可能地，潜在地


#### max_iter

This parameter indicates when the network should stop training. The value is an integer indicate which iteration should be the last.

The maximum number of iterations.

iteration (iter) 相当于一次权重更新。一次前向和一次后向，然后更新权重。

#### momentum

This parameter indicates how much of the previous weight will be retained in the new calculation. This value is a real fraction.

calculation [kælkjʊ'leɪʃ(ə)n]：n. 计算，估计，计算的结果，深思熟虑
retain  [rɪ'teɪn]：vt. 保持，雇，记住
momentum  [mə'mentəm]：n. 势头，动量，动力，冲力
Newton's laws of motion：牛顿运动定律
First law：第一定律
Second law：第二定律
Third law：第三定律


#### weight_decay

This parameter indicates the factor of (regularization) penalization of large weights. This value is a often a real fraction.

regularization [,rɛɡjʊlərɪ'zeʃən]：n. 规则化，调整，合法化
penalization ['penəlai,zeiʃn]：n. 惩罚，处罚
fraction  ['frækʃ(ə)n]：n. 分数，部分，小部分，稍微


#### random_seed

A random seed used by the solver and the network (for example, in dropout layer).

#### solver_mode

CPU / GPU 模式

This parameter indicates which mode will be used in solving the network.

Options include:

1. CPU
2. GPU

#### snapshot

This parameter indicates how often caffe should output a model and solverstate. This value is a positive integer.

#### snapshot_prefix

This parameter indicates how a snapshot output’s model and solverstate’s name should be prefixed. This value is a double quoted string.

prefix ['priːfɪks]：n. 前缀 vt. 加前缀，将某事物加在前面


#### net

This parameter indicates the location of the network to be trained (path to prototxt). This value is a double quoted string.

#### iter_size

Accumulate gradients across batches through the iter_size solver field. With this setting batch_size: 16 with iter_size: 1 and batch_size: 4 with iter_size: 4 are equivalent.

Accumulate gradients over iter_size x batch_size instances.

accumulate [ə'kjuːmjʊleɪt]：vi. 累积，积聚 vt. 积攒


#### test_iter

This parameter indicates how many test iterations should occur per test_interval. This value is a positive integer.

The number of iterations for each test net.

occur [ə'kɜː]：vi. 发生，出现，存在


test_iter: 100
test_iter specifies how many forward passes the test should carry out.
In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.

#### test_interval

This parameter indicates how often the test phase of the network will be executed.

The number of iterations between two testing phases.

test_interval: 500
Carry out testing every 500 training iterations.

test_interval 表示网络迭代 test_interval 次进行一次测试。可以设置为网络训练完一个 epoch，进行一次测试。一个 epoch 为 5000 次迭代时，可以设置为 test_interval=5000。

#### display

This parameter indicates how often caffe should output results to the screen. This value is a positive integer and specifies an iteration count.

The number of iterations between displaying info. If display = 0, no info will be displayed.

#### type

This parameter indicates the back propagation algorithm used to train the network. This value is a quoted string.

Options include:

6. RMSprop “RMSProp”

loss function 可能是非凸的，没有解析解，需要通过优化方法来求解。Caffe 提供 6 种优化算法来求解最优参数，在 solver 配置文件中，通过设置 type 类型来选择。
solver 的主要作用是交替调用前向 (forward) 算法和后向 (backward) 算法来更新参数，从而最小化 loss，是一种迭代的优化算法。

batchsize 表示每迭代一次，网络训练图片的数量。如果 batchsize=256，则网络每迭代一次，训练 256 张图片。如果总图片张数为 2560000 张，将所有的图片通过网络训练一次，则需要 2560000/256=10000 次迭代。

epoch 表示将所有的训练图像全部通过网络训练一次。例如 5000 iteration 为一个 epoch，如果你想要网络训练 100 代时，则总的迭代次数为max_iteration=5000*100=500000次；

02-15 5386

06-17 2382
05-30 4万+
12-24 313
12-11 155
08-17 67
11-25 1万+
04-22 1282
09-04
01-11