Flawed human intuitions:
This is a very common and pernicious confusion. You can read about this under the Wikipedia entry for the Gambler's Fallacy. Psychologists have also studied this phenomenon. Amos Tversky and Daniel Kahneman documenting it in their famous paper Belief in the law of small numbers (the title plays on the law of large numbers in statistics, discussed below). Theoretical work on cognitive mechanisms that help to produce this fallacy has been done by Ruma Falk and Clifford Konold (see, e.g., their paper, Making sense of randomness: Implicit encoding as a basis for judgment; more citations here).
Runs:
When you notice several heads in a row, you are perceiving a run. The (perfectly intuitive) belief is that runs are unlikely, thus, either the coin must not be fair, or it must revert to tails soon. Indeed, this intuition has been formalized by statisticians into a test for randomness / independence (i.e., the runs test). One thing to realize is that with lots of flips (a long series), runs of length 4 (for example) are actually quite common. Here is a quick simulation I ran to check how often I would see 4 or more of the same result in a row, given series of Bernoulli trials of lengths 20 and 50:
isRun = function(x){
runL = 1
maxR = 1
# we iterate through the length of the series
for(i in 2:lx){
# this increments the run length if the result is the same,
# but restarts the counter otherwise
runL = ifelse(x[i]-x[i-1]==0, runL+1, runL<-1)
# if the current run length is longer than the previous max,
# the new value is used
maxR = ifelse(runL>maxR, runL, maxR)
}
return(maxR)
}
r4.20 = c() # these will store the results
r4.50 = c()
set.seed(1) # this makes the code reproducible
for(i in 1:10000){
x20 = rbinom(20, size=1, prob=.5) # we generate series of length 20 & 50
x50 = rbinom(50, size=1, prob=.5)
r4.20[i] = ifelse(isRun(x20)>3,1,0) # if the maximum run length is 4 or longer
r4.50[i] = ifelse(isRun(x50)>3,1,0)
}
mean(r4.20) # [1] 0.7656 # ~77% of series
mean(r4.50) # [1] 0.9796 # ~98%
But what if you've only flipped your coin 4 times (thus far)? The probability of getting the same result 4 times is $.5^4=.0625$. Given that people flip coins commonly, this should happen quite often (more than one time in twenty).
Convergence to long run probability:
What about the fact that the number of heads in your series should converge to half the length of the series? This is true; it is guaranteed by the law of large numbers. The relative proportion is likely to converge fairly quickly (for example, there is a 95% probability that the percentage will be within 2 standard errors of the true probability, $\pi$, where
$$
S.E.(p) = \sqrt{\pi(1-\pi)/N}.
$$
Thus, when the true probability is .5, and $N=5$, 95% of the time the proportion of heads should fall within $.5\pm 2\times .5/\sqrt{5} = .5\pm 2\times .224 = (.052,.948)$, and with $N=100, (.4,.6)$. (Actually, the normal approximation is imperfect the first case, because the N is small.) However, it will still fall outside of that interval 5% of the time. Importantly, although the series will converge to .5, there's no guarantee until you 'reach' infinity. In addition, the convergence is due as much to the growing denominator as it is to the numerator being $.5\times N$; that is, the number of heads can be very far from half in raw numbers, but close as a proportion of the total.
Random variables vs. Realized values:
While it is helpful to understand something about the intuitions that lead us astray and the true mathematical properties that govern these phenomena, the key concept is understanding the distinction between random variables and realized values. When you have a coin balanced on your thumb about to be flipped 5 times in a row, those outcomes are random variables, and the laws of probability apply to how they will behave in the long run*. When the coin is laying on your forearm with one side facing up (whether you have yet seen which side or not), that outcome is a realized value. The laws of probability don't make impossible what has already happened (nor could they). Thus, $Pr(H)=.5$, and $Pr(H|HHHH)=.5$ as well, because the four H's on the right side of the vertical bar (the given 4 prior outcomes) are realized values, not random variables, and are not related to the probability that the outcome of the next flip will be a head (at least under independence; with dependent data, the prior result must be a part of, or stored somehow within, the data generating process). Likewise, $Pr(HHHHH)=.03125$, and $Pr(HHHHH|HHHH)=.03125$.
I'll acknowledge that this still isn't necessarily very intuitive; you have millennia of evolution to overcome. Nonetheless, I have found that these considerations have helped me, and others, to think about randomness more clearly.
*Note that this discussion pertains to the Frequentist conception of probability.