LR Scheduler

from torch.optim.lr_scheduler import LambdaLR, MultiplicativeLR, StepLR, MultiStepLR
from torch.optim.lr_scheduler import ConstantLR, LinearLR, ExponentialLR, CosineAnnealingLR 
from torch.optim.lr_scheduler import ReduceLROnPlateau, CosineAnnealingWarmRestarts
import matplotlib.pyplot as plt
%matplotlib inline
  • LambdaLR
  • MultiplicativeLR
  • StepLR
  • MultiStepLR
  • ConstantLR
  • LinearLR
  • ExponentialLR
  • CosineAnnealingLR
  • ReduceLROnPlateau
  • CosineAnnealingWarmRestarts
import torch.optim as optim
from torch import nn
class Config:
    epoch=30
    
cfg = Config()

LambdaLR

根据传入的匿名函数lambda进行学习率的调整。调整的学习率 = 最初学习率 * 匿名函数返回值

  • lr_lambda: 可以指定一组匿名函数lambda,每个lambda应用到对应的optimizer上去。该函数的输入是当前的epoch数值。
def train(model, optimizer, scheduler, cfg):
    lrs = []
    lrs.append(optimizer.state_dict()['param_groups'][0]["lr"])
    for i in range(cfg.epoch):
        # train
        # valiadation
        optimizer.step()
        scheduler.step()
        lrs.append(optimizer.state_dict()['param_groups'][0]["lr"])
    return lrs

model = nn.Linear(10, 1)
optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = LambdaLR(optimizer, lr_lambda=[lambda epoch: epoch / cfg.epoch])
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

MultiplicativeLR

根据匿名函数的返回值进行调整学习率,调整的学习率 = 当前学习率 * 匿名函数的返回值。这和之前不同的是,这里做的是累积的乘法。

  • lr_lambda: 可以指定一组匿名函数lambda,每个lambda应用到对应的optimizer上去。该函数的输入是当前的epoch数值。
optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = MultiplicativeLR(optimizer, lr_lambda=[lambda epoch: 0.9])
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

StepLR

每经过step_size个epoch后就进行衰减,衰减的因子为gamma。调整的学习率 = 当前的学习率 * gamma

  • step_size: 每经过step_size个epoch进行衰减

  • gamma: 衰减的因子

optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = StepLR(optimizer, step_size=10, gamma=0.8)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

MultiStepLR

在特定的epoch处进行学习率衰减,衰减的因子为gamma,调整的学习率 = 当前学习率 * gamma

  • milestones: 包含一系列进行衰减的epoch

  • gamma: 衰减因子

optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = MultiStepLR(optimizer, milestones=[3, 6, 9, 15, 20], gamma=0.5)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

ConstantLR

学习率按某个常量持续指定的epoch后,再恢复初值。该常量为最初学习率乘衰减因子

  • factor: 衰减因子

  • total_iters: 持续多少个epoch

optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = ConstantLR(optimizer, factor=0.3, total_iters=10)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

LinearLR


optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = LinearLR(optimizer, start_factor=0.3, end_factor=1.0, total_iters=10)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

ExponentialLR

每隔一步就进行衰减,衰减的因子为gamma,调整的学习率 = 当前学习率 * gamma

  • gamma: 衰减因子
optimizer = optim.Adam(model.parameters(), 0.1)
scheduler = ExponentialLR(optimizer, gamma=0.9)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

CosineAnnealingLR

余弦退火衰减法。

  • T_max: lr变动的最小正周期是2T_max。如果要保证在最后的几个epoch学习率是不断减小的。那么T_max应该满足这样的关系:epoch = (2k+1)T_max

  • eta_min: 最小的学习率

optimizer = optim.Adam(model.parameters(), 1e-4)
scheduler = CosineAnnealingLR(optimizer, T_max=6, eta_min=1e-5)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)

ReduceLROnPlateau

这种Scheduler可以跟踪指标,当某个指标不再满足提升的时候就进行学习率衰减。

  • mode: min或者max。对于acc之类的指标用max,对于loss之类的指标用min
  • factor: 衰减的因子
  • patience: 若patience个epoch内跟踪的指标不提升,就衰减
  • threshold: 对跟踪的指标设定的提升阈值
  • min_lr: 最小的学习率
auc = [
    0.54, 0.6256, 0.7087, 0.8219, 0.8549, 
    0.8873, 0.9109, 0.918, 0.8899, 0.9478, 
    0.9325, 0.9518, 0.9597, 0.9571, 0.9679, 
    0.9471, 0.9666, 0.9725, 0.964, 0.9725, 
    0.9704, 0.9761, 0.9772, 0.9801, 0.9743,
    0.9758, 0.9746, 0.9745, 0.9798, 0.9823, 
    0.9826, 0.9805,0.9752, 0.9826, 0.9818, 
    0.9842
]
model = nn.Linear(10, 1)
optimizer = optim.Adam(model.parameters(), 1e-3)
scheduler = ReduceLROnPlateau(optimizer, mode="max", factor=0.5, patience=3, threshold=0.001, min_lr=1e-5)

lrs = []
lrs.append(optimizer.state_dict()['param_groups'][0]["lr"])
for i in range(len(auc)):
    # train
    # valiadation
    optimizer.step()
    scheduler.step(auc[i])
    lrs.append(optimizer.state_dict()['param_groups'][0]["lr"])
plt.plot(auc)
plt.plot(lrs)

CosineAnnealingWarmRestarts

带热重启的余弦退火。在开始阶段,学习率就进行退火,然后到达T_0的时候,进行第0次热重启。以后每隔 \(T_{t+1}= T_t*T_{mult}\) 后进行一次热重启。\(T_t\)表示第t次间隔\(T_t\)个epoch再进行热重启。在下面的例子中。T的序列为[10, 20, 40, 80],从而热重启的epoch的序列为[10, 10+20, 30+40, 70+80]

  • T_0: 第一次热重启的epoch
  • T_mult: T增长的倍数
  • eta_min: 最小的学习率,默认是0
cfg.epoch = 100
optimizer = optim.Adam(model.parameters(), 1e-3)
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2, eta_min=1e-5)
lrs = train(model, optimizer, scheduler, cfg)
plt.plot(lrs)