You seem to assume in your question that the concept of the normal distribution was around before the distribution was identified, and people tried to figure out what it was. It's not clear to me how that would work. [Edit: there is at least one sense it which we might consider there being a "search for a distribution" but it's not "a search for a distribution that describes lots and lots of phenomena"]
This is not the case; the distribution was known about before it was called the normal distribution.
how would you prove to such a person that the probability density function of all normally distributed data has a bell shape
The normal distribution function is the thing that has what is usually called a "bell shape" -- all normal distributions have the same "shape" (in the sense that they only differ in scale and location).
Data can look more or less "bell-shaped" in distribution but that doesn't make it normal. Lots of non-normal distributions look similarly "bell-shaped".
The actual population distributions that data are drawn from are likely never actually normal, though it's sometimes quite a reasonable approximation.
This is typically true of almost all the distributions we apply to things in the real world -- they're models, not facts about the world. [As an example, if we make certain assumptions (those for a Poisson process), we can derive the Poisson distribution -- a widely used distribution. But are those assumptions ever exactly satisfied? Generally the best we can say (in the right situations) is that they're very nearly true.]
what do we actually consider normally distributed data? Data that follows the probability pattern of a normal distribution, or something else?
Yes, to actually be normally distributed, the population the sample was drawn from would have to have a distribution that has the exact functional form of a normal distribution. As a result, any finite population cannot be normal. Variables that necessarily bounded cannot be normal (for example, times taken for particular tasks, lengths of particular things cannot be negative, so they cannot actually be normally distributed).
it would perhaps be more intuitive that the probability function of normally distributed data has a shape of an isosceles triangle
I don't see why this is necessarily more intuitive. It's certainly simpler.
When first developing models for error distributions (specifically for astronomy in the early period), mathematicians considered a variety of shapes in relation to error distributions (including at one early point a triangular distribution), but in much of this work it was mathematics (rather than intuition) that was used. Laplace looked at double exponential and normal distributions (among several others), for example. Similarly Gauss used mathematics to derive it at around the same time, but in relation to a different set of considerations than Laplace did.
In the narrow sense that Laplace and Gauss were considering "distributions of errors", we could regard there as being a "search for a distribution", at least for a time. Both postulated some properties for a distribution of errors they considered important (Laplace considered a sequence of somewhat different criteria over time) led to different distributions.
Basically my question is why does the normal distribution probability density function has a bell shape and not any other?
The functional form of the thing that is called the normal density function gives it that shape. Consider the standard normal (for simplicity; every other normal has the same shape, differing only in scale and location):
$$f_Z(z) = k \cdot e^{-\frac12 z^2};\;-\infty<z<\infty$$
(where $k$ is simply a constant chosen to make the total area 1)
this defines the value of the density at every value of $x$, so it completely describes the shape of the density. That mathematical object is the thing we attach the label "normal distribution" to. There's nothing special about the name; it's just a label we attach to the distribution. It's had many names (and is still called different things by different people).
While some people have regarded the normal distribution as somehow "usual" it's really only in particular sets of situations that you even tend to see it as an approximation.
The discovery of the distribution is usually credited to de Moivre (as an approximation to the binomial). He in effect derived the functional form when trying to approximate binomial coefficients (/binomial probabilities) to approximate otherwise tedious calculations but - while he does effectively derive the form of the normal distribution - he doesn't seem to have thought about his approximation as a probability distribution, though some authors do suggest that he did. A certain amount of interpretation is required so there's scope for differences in that interpretation.
Gauss and Laplace did work on it in the early 1800s; Gauss wrote about it in 1809 (in connection with it being the distribution for which the mean is the MLE of the center) and Laplace in 1810, as an approximation to the distribution of sums of symmetric random variables. A decade later Laplace gives an early form of central limit theorem, for discrete and for continuous variables.
Early names for the distribution include the law of error, the law of frequency of errors, and it was also named after both Laplace and Gauss, sometimes jointly.
The term "normal" was used to describe the distribution independently by three different authors in the 1870s (Peirce, Lexis and Galton), the first in 1873 and the other two in 1877. This is more than sixty years after the work by Gauss and Laplace and more than twice that since de Moivre's approximation. Galton's use of it was probably most influential but he used the term "normal" in relation to it only once in that 1877 work (mostly calling it "the law of deviation").
However, in the 1880s Galton used the adjective "normal" in relation to the distribution numerous times (e.g. as the "normal curve" in 1889), and he in turn had a lot of influence on later statisticians in the UK (especially Karl Pearson). He didn't say why he used the term "normal" in this way, but presumably meant it in the sense of "typical" or "usual".
The first explicit use of the phrase "normal distribution" appears to be by Karl Pearson; he certainly uses it in 1894, though he claims to have used it long before (a claim I would view with some caution).
References:
Miller, Jeff
"Earliest Known Uses of Some of the Words of Mathematics:"
Normal distribution (Entry by John Aldrich)
http://jeff560.tripod.com/n.html
(alternate: https://mathshistory.st-andrews.ac.uk/Miller/mathword/n/)
Stahl, Saul (2006),
"The Evolution of the Normal Distribution",
Mathematics Magazine, Vol. 79, No. 2 (April), pp 96-113
https://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/stahl96.pdf
Normal distribution, (2016, August 1).
In Wikipedia, The Free Encyclopedia.
Retrieved 12:02, August 3, 2016, from
https://en.wikipedia.org/w/index.php?title=Normal_distribution&oldid=732559095#History
Hald, A (2007),
"De Moivre’s Normal Approximation to the Binomial, 1733, and Its Generalization",
In: A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713–1935; pp 17-24
[You may note substantial discrepancies between these sources in relation to their account of de Moivre]