首页 >> 职场 >> MAML小样本学习算法解读及代码意味着

MAML小样本学习算法解读及代码意味着

2023-03-08 职场

el-Agnostic Meta-Learning,全名MAML)方式[7],其有数学模型比如说充分体现在,并不必需与任何应以用于了径向下滑规的有数学模型相兼容,广泛广泛运用于各种完全完全相同的机机研读执行,包含示意影像分类学、远距离监测、更进一步研读等。元研读的远距离,是在大量完全完全相同的执行上体能训练一个有数学模型,使其并不必需应以用于都是在的体能训练信息(即小抽样),收尾都是在的径向下滑逐个,就并不必需急剧为了让新执行,应以付新论题。

有数学模型方式

MAML方式的体能训练借以是赢取四组最优的模板给定,使得有数学模型并不必需急剧可用(fast adaptation)新执行。所作显然,某些特质比另一些特质更容易迁到到其他执行中的,这些特质具有地区性执行间的操作性。既然小抽样研读执行只提供少量标记抽样,有数学模型在小抽样上多轮给定体能训练后显然造成过拟合,那么就应以尽可能使有数学模型只给定体能训练回头。这就决定有数学模型现在具有广泛可用于各种执行的模板给定,这组给定应以都有有数学模型在典范集上高深到的先验常识。

结论有数学模型可以用算子θ 同上示,θ 为有数学模型给定。可用新执行 时,有数学模型通过径向下滑规给定一步(或若干步),给定θ 备份为θ ,即 θ θ α θ

其中的, α 为激给定,用于高度集中的可用操作过程的研读赴援。

在多个完全完全相同执行上,有数学模型通过推算θ 的人员伤亡来分析报告有数学模型给定 θ 。具体地,元研读的远距离是赢取四组给定θ ,使得有数学模型在执行分布区上,并不必需急剧可用所有执行,使得人员伤亡最小。用公式同上达如下:

通过随机径向下滑(SGD)规,有数学模型给定 θ 按照以下公式收尾备份:

这里必需注意,我们就此要最佳化的给定是 θ ,但推算人员伤亡算子却是在这两项后的给定θ 上收尾,体能训练操作过程可通过下示意图示意。

由于上述元研读方式在人员伤亡推算和最佳化给定方面的特点,体能训练包含了两层可逆。致密可逆是元研读操作过程,通过在执行分布区上均值四组执行,推算在这组执行上的人员伤亡算子;中间层可逆是这两项操作过程,即针对每一个执行,给定一次(或若干次)径向下滑,将给定收尾备份为θ ,然后推算在给定为 θ 时的人员伤亡。径向排外向传递信息时,必需地区性过两层可逆传递信息到初始给定θ上,收尾元研读的给定备份。

完整的MAML方式如下示意图上示意图。

试验中的结果

在Omniglot和miniImageNet信息集上,文献证明了的试验中的结果如下示意图上示意图。

飞桨借助于

本副歌证明了本人在“飞桨专著复现系列赛(第三期)”中的收尾的一小不可或缺编译机。完整计划编译机已在GitHub和AI Studio上Apache,爱戴旁观者star、fork。绑定如下:

GitHub地址:

AI Studio地址:

不可或缺编译机借助于

该有数学模型比较类似于,径向必需穿过皆两层可逆传递信息到更早给定。如果基于nn.Layer类收尾如前所述的有数学模型搭起,在内可逆备份径向时,有数学模型给定不会被覆盖,造成初始给定被窃。得益于飞桨动态示意图的系统灵活组网的特点,本计划将有数学模型给定和算子分离设计,在外可逆中的保存更早给定复制 θ ;内可逆中的通过该复制备份给定,推算人员伤亡算子。推算示意图通过动态示意图的系统自动构筑,就此将径向排外传回更早给定θ 。

MAML类的编译机如下:

1classMAML(paddle.nn.Layer):

2def originallyinitoriginally(self, n_way):

3super(MAML, self).originallyinitoriginally

4# 同上述有数学模型中的全部待最佳化给定

5self.vars = []

6self.vars_bn = []

7# ------------------------第1个conv2d-------------------------

8weight = paddle. static.create_parameter(shape=[ 64, 1, 3, 3],

9dtype= 'float32',

10default_initializer=nn.initializer.KaimingNormal,

11is_bias= False)

12bias = paddle. static.create_parameter(shape=[ 64],

13dtype= 'float32',

14is_bias= True) # 模板为零

15self.vars.extend([weight, bias])

16# 第1个BatchNorm

17weight = paddle. static.create_parameter(shape=[ 64],

18dtype= 'float32',

19default_initializer=nn.initializer.Constant(value= 1),

20is_bias= False)

21bias = paddle. static.create_parameter(shape=[ 64],

22dtype= 'float32',

23is_bias= True) # 模板为零

24self.vars.extend([weight, bias])

25running_mean = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

26running_var = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

27self.vars_bn.extend([running_mean, running_var])

28# ------------------------第2个conv2d------------------------

29weight = paddle. static.create_parameter(shape=[ 64, 64, 3, 3],

30dtype= 'float32',

31default_initializer=nn.initializer.KaimingNormal,

32is_bias= False)

33bias = paddle. static.create_parameter(shape=[ 64],

34dtype= 'float32',

35is_bias= True)

36self.vars.extend([weight, bias])

37# 第2个BatchNorm

38weight = paddle. static.create_parameter(shape=[ 64],

39dtype= 'float32',

40default_initializer=nn.initializer.Constant(value= 1),

41is_bias= False)

42bias = paddle. static.create_parameter(shape=[ 64],

43dtype= 'float32',

44is_bias= True) # 模板为零

45self.vars.extend([weight, bias])

46running_mean = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

47running_var = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

48self.vars_bn.extend([running_mean, running_var])

49# ------------------------第3个conv2d------------------------

50weight = paddle. static.create_parameter(shape=[ 64, 64, 3, 3],

51dtype= 'float32',

52default_initializer=nn.initializer.KaimingNormal,

53is_bias= False)

54bias = paddle. static.create_parameter(shape=[ 64],

55dtype= 'float32',

56is_bias= True)

57self.vars.extend([weight, bias])

58# 第3个BatchNorm

59weight = paddle. static.create_parameter(shape=[ 64],

60dtype= 'float32',

61default_initializer=nn.initializer.Constant(value= 1),

62is_bias= False)

63bias = paddle. static.create_parameter(shape=[ 64],

64dtype= 'float32',

65is_bias= True) # 模板为零

66self.vars.extend([weight, bias])

67running_mean = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

68running_var = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

69self.vars_bn.extend([running_mean, running_var])

70# ------------------------第4个conv2d------------------------

71weight = paddle. static.create_parameter(shape=[ 64, 64, 3, 3],

72dtype= 'float32',

73default_initializer=nn.initializer.KaimingNormal,

74is_bias= False)

75bias = paddle. static.create_parameter(shape=[ 64],

76dtype= 'float32',

77is_bias= True)

78self.vars.extend([weight, bias])

79# 第4个BatchNorm

80weight = paddle. static.create_parameter(shape=[ 64],

81dtype= 'float32',

82default_initializer=nn.initializer.Constant(value= 1),

83is_bias= False)

84bias = paddle. static.create_parameter(shape=[ 64],

85dtype= 'float32',

86is_bias= True) # 模板为零

87self.vars.extend([weight, bias])

88running_mean = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

89running_var = paddle.to_tensor(np.zeros([ 64], np.float32), stop_gradient= True)

90self.vars_bn.extend([running_mean, running_var])

91# ------------------------全连接层------------------------

92weight = paddle. static.create_parameter(shape=[ 64, n_way],

93dtype= 'float32',

94default_initializer=nn.initializer.XavierNormal,

95is_bias= False)

96bias = paddle. static.create_parameter(shape=[n_way],

97dtype= 'float32',

98is_bias= True)

99self.vars.extend([weight, bias])

100

101def forward(self, x, params=None, bn_training= True):

102ifparams isNone:

103params = self.vars

104weight, bias = params[ 0], params[ 1] # 第1个CONV层

105x = F.conv2d(x, weight, bias, stride= 1, padding= 1)

106weight, bias = params[ 2], params[ 3] # 第1个BN层

107running_mean, running_var = self.vars_bn[ 0], self.vars_bn[ 1]

108x = F.batch_norm(x, running_mean, running_var, weight=weight, bias=bias, training=bn_training)

109x = F.relu(x) # 第1个relu

110x = F.max_pool2d(x, kernel_size= 2) # 第1个MAX_POOL层

111weight, bias = params[ 4], params[ 5] # 第2个CONV层

112x = F.conv2d(x, weight, bias, stride= 1, padding= 1)

113weight, bias = params[ 6], params[ 7] # 第2个BN层

114running_mean, running_var = self.vars_bn[ 2], self.vars_bn[ 3]

115x = F.batch_norm(x, running_mean, running_var, weight=weight, bias=bias, training=bn_training)

116x = F.relu(x) # 第2个relu

117x = F.max_pool2d(x, kernel_size= 2) # 第2个MAX_POOL层

118weight, bias = params[ 8], params[ 9] # 第3个CONV层

119x = F.conv2d(x, weight, bias, stride= 1, padding= 1)

120weight, bias = params[ 10], params[ 11] # 第3个BN层

121running_mean, running_var = self.vars_bn[ 4], self.vars_bn[ 5]

122x = F.batch_norm(x, running_mean, running_var, weight=weight, bias=bias, training=bn_training)

123x = F.relu(x) # 第3个relu

124x = F.max_pool2d(x, kernel_size= 2) # 第3个MAX_POOL层

125weight, bias = params[ 12], params[ 13] # 第4个CONV层

126x = F.conv2d(x, weight, bias, stride= 1, padding= 1)

127weight, bias = params[ 14], params[ 15] # 第4个BN层

128running_mean, running_var = self.vars_bn[ 6], self.vars_bn[ 7]

129x = F.batch_norm(x, running_mean, running_var, weight=weight, bias=bias, training=bn_training)

130x = F.relu(x) # 第4个relu

131x = F.max_pool2d(x, kernel_size= 2) # 第4个MAX_POOL层

132x = paddle.reshape(x, [x.shape[ 0], -1]) ## flatten

133weight, bias = params[ -2], params[ -1] # linear

134x = F.linear(x, weight, bias)

135output = x

136returnoutput

137

138def parameters(self, include_sublayers= True):

139returnself.vars

元研读机类的编译机如下:

1classMetaLearner(nn.Layer):

2deforiginallyinitoriginally(self, n_way, glob_update_step, glob_update_step_test, glob_meta_lr, glob_base_lr):

3super(MetaLearner, self).originallyinit_ _

4self.update_step = glob_update_step # task-level inner update steps

5self.update_step_test = glob_update_step_test

6self.net = MAML(n_way=n_way)

7self.meta_lr = glob_meta_lr # 外可逆研读赴援

8self.base_lr = glob_base_lr # 内可逆研读赴援

9self.meta_optim = paddle.optimizer.Adam(learning_rate= self.meta_lr, parameters= self.net.parameters)

10

11defforward(self, x_spt, y_spt, x_qry, y_qry):

12task_num = x_spt.shape[ 0]

13query_size = x_qry.shape[ 1] # 75 = 15 * 5

14loss_list_qry = [ 0for_inrange( self.update_step + 1)]

15correct_list = [ 0for_inrange( self.update_step + 1)]

16

17# 内可逆径向手动备份,外可逆径向应以用于同上述好的备份机备份

18fori inrange(task_num):

19# 第0步备份

20y_hat = self.net(x_spt[i], params=None, bn_training=True) # (setsz, ways)

21loss = F.cross_entropy(y_hat, y_spt[i])

22grad = paddle.grad(loss, self.net.parameters) # 推算所有loss值得注意给定的径向和

23tuples = zip(grad, self.net.parameters) # 将径向和给定也就是说起来

24# fast_weights这一步大概求了一个 heta - alpha*abla(L)

25fast_weights = list(map(lambda p:p[ 1] - self.base_lr * p[ 0], tuples))

26# 在query集上试验,推算精准赴援

27# 这一步应以用于备份当年的信息,loss填入loss_list_qry[0],数据分析应以该有数填入correct_list[0]

28with paddle.no_grad:

29y_hat = self.net(x_qry[i], self.net.parameters, bn_training=True)

30loss_qry = F.cross_entropy(y_hat, y_qry[i])

31loss_list_qry[ 0] += loss_qry

32pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1) # size = (75) # axis取-1也行

33correct = paddle.equal(pred_qry, y_qry[i]).numpy.sum.item

34correct_list[ 0] += correct

35# 应以用于备份后的信息在query集上试验。loss填入loss_list_qry[1],数据分析应以该有数填入correct_list[1]

36with paddle.no_grad:

37y_hat = self.net(x_qry[i], fast_weights, bn_training=True)

38loss_qry = F.cross_entropy(y_hat, y_qry[i])

39loss_list_qry[ 1] += loss_qry

40pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1) # size = (75)

41correct = paddle.equal(pred_qry, y_qry[i]).numpy.sum.item

42correct_list[ 1] += correct

43

44# 剩余备份逐个

45fork inrange( 1, self.update_step):

46y_hat = self.net(x_spt[i], params=fast_weights, bn_training=True)

47loss = F.cross_entropy(y_hat, y_spt[i])

48grad = paddle.grad(loss, fast_weights)

49tuples = zip(grad, fast_weights)

50fast_weights = list(map(lambda p:p[ 1] - self.base_lr * p[ 0], tuples))

51

52ifk < self.update_step - 1:

53with paddle.no_grad:

54y_hat = self.net(x_qry[i], params=fast_weights, bn_training=True)

55loss_qry = F.cross_entropy(y_hat, y_qry[i])

56loss_list_qry[k + 1] += loss_qry

57else:# 对于终于一步update,要纪录loss推算的径向值,便于外可逆的径向的传播

58y_hat = self.net(x_qry[i], params=fast_weights, bn_training=True)

59loss_qry = F.cross_entropy(y_hat, y_qry[i])

60loss_list_qry[k + 1] += loss_qry

61

62with paddle.no_grad:

63pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1)

64correct = paddle.equal(pred_qry, y_qry[i]).numpy.sum.item

65correct_list[k + 1] += correct

66

67loss_qry = loss_list_qry[- 1] / task_num # 推算终于一次loss的数值

68self.meta_optim.clear_grad # 径向清零

69loss_qry.backward

70self.meta_optim.step

71

72accs = np.array(correct_list) / (query_size * task_num) # 推算各备份逐个acc的数值

73loss = np.array(loss_list_qry) / task_num # 推算各备份逐个loss的数值

74returnaccs, loss

75

76deffinetunning(self, x_spt, y_spt, x_qry, y_qry):

77# assert len(x_spt.shape) == 4

78query_size = x_qry.shape[ 0]

79correct_list = [ 0for_inrange( self.update_step_test + 1)]

80

81new_net = deepcopy( self.net)

82y_hat = new_net(x_spt)

83loss = F.cross_entropy(y_hat, y_spt)

84grad = paddle.grad(loss, new_net.parameters)

85fast_weights = list(map(lambda p:p[ 1] - self.base_lr * p[ 0], zip(grad, new_net.parameters)))

86

87# 在query集上试验,推算精准赴援

88# 这一步应以用于备份当年的信息

89with paddle.no_grad:

90y_hat = new_net(x_qry, params=new_net.parameters, bn_training=True)

91pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1) # size = (75)

92correct = paddle.equal(pred_qry, y_qry).numpy.sum.item

93correct_list[ 0] += correct

94

95# 应以用于备份后的信息在query集上试验。

96with paddle.no_grad:

97y_hat = new_net(x_qry, params=fast_weights, bn_training=True)

98pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1) # size = (75)

99correct = paddle.equal(pred_qry, y_qry).numpy.sum.item

100correct_list[ 1] += correct

101

102fork inrange( 1, self.update_step_test):

103y_hat = new_net(x_spt, params=fast_weights, bn_training=True)

104loss = F.cross_entropy(y_hat, y_spt)

105grad = paddle.grad(loss, fast_weights)

106fast_weights = list(map(lambda p:p[ 1] - self.base_lr * p[ 0], zip(grad, fast_weights)))

107

108y_hat = new_net(x_qry, fast_weights, bn_training=True)

109

110with paddle.no_grad:

111pred_qry = F.softmax(y_hat, axis= 1).argmax(axis= 1)

112correct = paddle.equal(pred_qry, y_qry).numpy.sum.item

113correct_list[k + 1] += correct

114

115del new_net

116accs = np.array(correct_list) / query_size

117returnaccs

复现结果

本计划在Omniglot信息集上收尾了试验中的复现,其复现的结果如下同上上示意图:

小结

本文对小抽样研读课题的深入研究历史背景、方式论、特指信息集收尾了简述引介,中长期阐述了MAML元研读有数学模型的借助于方式、试验中的结果和不可或缺编译机。该有数学模型是讲义小抽样研读的捷径,也是分析报告新方式性能这两项的根基。与众不同并握有该经典有数学模型,将对更进一步的理论深入研究或实践运用于奠定典范。飞桨官方的小抽样研读工具包PaddleFSL现在都有了包含推算机视觉和自然语言处置运用于论题的小抽样研读应以付方案,如MAML,ProtoNet,Relation Net等等,是首个基于飞桨的小抽样研读工具包,爱戴大家重视并一起揭示。

供参考

[1] Vinyals O, Blundell C, Lillicrap T, et al. Matching Networks for One Shot Learning[J], 2016.

[2] Ravi S, Larochelle H. Optimization as a model for few-shot learning[J], 2016.

[3] Ren M, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification[J]. arXiv preprint arXiv:1803.00676, 2018.

[4] Oreshkin B N, Rodriguez P, Lacoste A. Tadam: Task dependent adaptive metric for improved few-shot learning[J]. arXiv preprint arXiv:1805.10123, 2018.

[5] Lake B, Salakhutdinov R, Gross J, et al. One shot learning of simple visual concepts[C]. Proceedings of the annual meeting of the cognitive science society, 2011.

[6] Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset[J], 2011.

[7] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. International Conference on Machine Learning, 2017: 1126-1135.

END

觉得不错,请点个在看呀

常州男科专科医院哪里好
兰州白癜风医院哪里好
脚扭伤疼吃什么药物治疗
杭州最好的不孕不育医院是哪家
双错瑞因效果怎么样
友情链接