[Since this doesn't deal with the main two questions (it relates more to their premises - answering the subsequent question in the body text), this is perhaps only a partial answer. Someone trying to answer 1 and 2 would presumably need to make the assumption that (in spite of the wording) the desire to test did not come from observing the first set of flips to deal with the numbered questions. In part I see the problem with the phrasing of the setup and the numbered questions as focusing on the details of estimating probability while ignoring the larger issue relating to severe bias, making the apparent distinction in the two cases largely beside the point.]
In short, I focus here on "Would you combine the results from the next 20 coin flips that your cousin does or ignore the previous flips and just consider the next 20 flips?" and I discuss it in detail as requested -- indeed, it's easily the most critical part here, and the part that's very widely misunderstood, occasionally even among statisticians. All those "my third cousin's birthday is the same day as my girlfriend and they were born in the same year in the same town, what are the chances??" questions are examples of this sort of problem, yet at the same time, it affects a lot of published work (that is, where the observed data impact the hypothesis tested).
If the desire to test the hypothesis is created by seeing the outcome of the first set of flips you shouldn't include them in your test -- the wording of the question suggests it. That's a data-generated hypothesis
Otherwise you bias your p-values downward; you'll be much more likely to reject nulls than you should be when there's nothing going on.
Let's take a slightly more extreme case to make the point more clearly.
Imagine 100,000 people all toss (fair) coins 20 times. The ones that get very high (18+) or low numbers (0-2) of heads (we expect around 20 of each) decide to test those coins and the ones that got middling numbers do not.
The ones that do test combine their first 20 tosses with another 20 tosses and test at the 5% level. What's the probability that they reject the null?
(NB if the test works as it should, given they have a fair coin in this setup, it should be 5%... but it's waay bigger). That's what testing hypotheses suggested by the data do.
I just did it in simulation and got 42 people (who decide to test), and of those 32 of them went on to reject the null after combining their data. This is when the coin is fair! (larger simulations give similar results)
Test on the second set (i.e. data collected after the first set made you want to test the hypothesis) & you're okay.