Cosine similarity is not a measure of (the strenght of) linear association like Pearson r is, it is a measure of proportional association which is a narrower definition. The difference is in centration: r is cosine for centered data.
Cosine similarity is a measure of proportionality: if points of a bivariate data cloud lie on a straight line coming from the coordinates origin then cosine similarity is maximal, $cos_{xy}=1$. If that straight line of points does not come through the origin or if the points deviate from lying on a straight line then $cos_{xy}$ gets smaller. Because Pearson $r$ is $cos$ of the cloud centered by both axes a straight line of points would always pierce the origin, and therefore for $r$ only deviations from points' lying on the straight line can decrease the coefficient: correlation is the extent of linearity. When $cos$ is $1$ $r$ is also $1$ and full linearity is observed, however if $r$ is $1$ $cos$ is not necessarily $1$: full linearity is not enough for $cos$ to be max. $cos$ is anchored by an "external" point, the origin, $r$ is anchored only to the data cloud itself as represented by its mean.
From regressional standpoint, both $r$ and $cos$ are the $R_{regr}=\sqrt{(1-SS_{resid}/SS_{tot})}$, but $cos$ is about regression w/o intercept, i.e. with the regression line forced to come through the origin and $SS_{tot}$ are deviations from Y=0, not from Y=mean.
$Cos$ and $r$ are, respectively, the scalar product and the covariance, from which the coefficient's sensitivity to the variables' scale or magnitude has been removed.
So, cosine similarity and Pearson r aren't things to mix up in the question "what do they measure", as are covariance and Pearson r, too.
As for distance correlation, the idea behind it is different from both cosine or r. It captures the notion of generalized association - linear, nonlinear, curvilinear, and the notion is from the viewpoint of stochastic independence. With normal bivariate population zero Pearson r tells of the stochastic independence. Distance correlation generalizes to any distribution, and it does not center the data to its mean (because, at the "double centering" operation, euclidean distances are taken not squared).