I want to build a Speaker Identification model and I am wondering what is the best for the feature extracting step:
- Using unlabeled examples from the same distribution as labeled ones (we can use the labeled data after ignoring the labels).
- Using unlabeled examples not necessary from the same distribution as labeled ones (such as [and not restricted to] audio from nature).
- Using a mix between $1$ and $2$.
A lot of labeled data is available, but I am more into using the third approach, I will not ask for opinion based answer, so my question is: Are there any experiments about that?