1

In this slide it goes that extrinsic evaluation is time consuming, usually takes days or weeks. I have tried to understand that.

Firstly, I learned from this slide that to evaluate an n-gram model the best way is extrinsic evaluation which implies "Embed in an application and measure the total performance of the application".

Secondly, I got to know from this answer that Intrinsic evaluation is "test your model by a set of testing samples, and monitor how the model is working internally"; however extrinsic evaluation is "a model's performance can be determined by testing it using some set of testing samples (that we know their true solutions) and see how the model solves them." The later test samples should be never seen before.

So my question is if only an unseen sample set is needed for the extrinsic evaluation, like end-to-end evaluation, how can I interpret the statement that extrinsic evaluation is such time-consuming that even though it is the best choice we cannot employ it. The extrinsic test dataset used by this language model is only about some Megabits at large, and the test time is far less than days, let alone weeks.

Have I misunderstood the meaning of it?

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52
  • I think you have misunderstood: you embed in a problem, collect a *big enough* corpus and then get a *person* to evaluate the correctness of each output - in other words its the usual problem with machine learning : creating a *labelled* data set takes a long time. – seanv507 Sep 30 '16 at 13:32
  • Got it. If I take 10 samples to test the supervised or unsupervised network at last I should have the 10 results human validated. The process of human validation takes time. – Lerner Zhang Sep 30 '16 at 14:33
  • @seanv507 I watched the video and found that it's not for the human to test but for the system. Please check. https://youtu.be/OHyVNCvnsTo?t=2m10s. For other tasks, for example NER and translation, the validation cannot be manual. – Lerner Zhang Oct 05 '16 at 03:27

0 Answers0