0

I am currently reading this paper [1] and [2].

The author makes the claim that their Federated Learning scheme is similar to Model-Agnostic Meta-Learning?

They state:

Interestingly, FFL is similar to Model-Agnostic Meta-Learning (MAML) in three aspects: (i) in FFL, we have workers who possess their own datasets (with different distributions) while in MAML, there are tasks with their corresponding datasets. (ii) In FFL, we have local updates that improve performances (loss/accuracy) of individual workers whereas in MAML, there are inner-loop updates that do so for individual tasks. (iii) In FFL, in each round, we apply global gradient to weights, which improves the overall performance of all workers while in MAML, the outer-loop update does so for all of tasks.

Are they right in proposing such a claim?

  • I haven't read those papers but have some familiarity with MAML. Are they making rigorous proofs showing that their method is a special case of the MAML objective? Or is this simply intuition? The intuition itself seems believable, at least. – tchainzzz Nov 19 '21 at 03:21
  • They are comparing their scheme, FFL, with MAML at the algorithmic and implementation level. But I suspect the mathematical background from which MAML algorithm is derived differs from Federated Learning. As far as I know MAML is coming from bi-level optimization. Can we say in practice/implementation, Federated Learning and the vanilla MAML are the same? – Complicated Nov 19 '21 at 03:31
  • Perhaps, we could even say that even at a mathematical level they're the same, where the workers in FL optimize inner-loops and the server optimizes the outer-loops? – Complicated Nov 19 '21 at 03:35
  • So I think MAML has a slightly different motivation than what you're suggesting. MAML is motivated by multi-task learning for few-shot generalization, where each "worker" (so to speak) is learning from one task. FFL seems to be motivated by a better communication-computation tradeoff rather than few-shot multi-task generalization. – tchainzzz Nov 19 '21 at 03:41
  • Both of them share a similar paradigm of bi-level optimization, or splitting up a larger problem into separable (inner) updates. "Mathematically they're the same" depends on the level of abstraction -- I think the connections are good for building intuition, but I'm not convinced that they're a rigorous statement. – tchainzzz Nov 19 '21 at 03:41

0 Answers0