I'm sorry to bring back a question from ages ago, but it came as reference in a newer one and it looks to me like it might cause some misunderstandings.
The calculations that Nick Cox gave are absolutely correct when computing the Gini index of the features, and help give us information about the features and their homogeneity.
However, based on the fact that your dataset has a Target variable, that you speak of using Instance as attribute test condition and that you name at the end the information gain, it would be easy to think that you have a Classification Tree problem, and that the goal is finding the Gini decrease (equivalent to the Information Gain) when splitting (testing) on the features . I will therefore give an alternative answer to the question, that will serve as reference for Gini computation in case of Classification Trees.
In this sense, the computations would be different.
As a first step, you would need to compute the Gini index of the starting dataset. It has 4 positives and 5 negatives and therefore it is: $$GiniStart = 1-(\frac{4}{9})^2 - (\frac{5}{9})^2 \sim 0.4938$$
If we split on $a_1$ we obtain the node $T$ that has three positive instances and a negative one, and node $F$ that has one positive instance and four negative ones.
$$Gini_T = 1-(\frac{3}{4})^2 - (\frac{1}{4})^2 = 0.375$$
$$Gini_F = 1-(\frac{1}{5})^2 - (\frac{4}{5})^2 = 0.32$$
$$\Delta Gini_{a_1} = GiniStart - \frac{4}{9}Gini_T -\frac{5}{9}Gini_N \sim 0.149 $$
If we split on $a_2$ we obtain the node $T$ that has 2 positive instances and 3 negative ones, and node $F$ that has 2 positive instances and 2 negative ones.
$$Gini_T = 1-(\frac{2}{5})^2 - (\frac{3}{5})^2 = 0.48$$
$$Gini_F = 1-(\frac{2}{4})^2 - (\frac{2}{4})^2 = 0.5$$
$$\Delta Gini_{a_2} = GiniStart - \frac{5}{9}Gini_T -\frac{4}{9}Gini_N \sim 0.005$$
$a_3$ is instead a numeric variables (even though you could treat it as categorical), and as such we would need to evaluate every possible split in their range (which I am of course not going to do) and choose the best one.
As an example, imagine splitting $a_3$ in $4.5$. Then we would have the "Low Values" node, with 2 positives and a negative, and the "High Values" node, with 2 poisitives and 4 negatives.
$$Gini_{LV} = 1-(\frac{2}{3})^2 - (\frac{1}{3})^2 = 0.44$$
$$Gini_{HV} = 1-(\frac{2}{6})^2 - (\frac{4}{6})^2 = 0.44$$
$$\Delta Gini_{a_3} = GiniStart - \frac{3}{9}Gini_{LV} -\frac{6}{9}Gini_{HV} \sim 0.049$$
- Finally $Instance$. If we consider it as categorical, the variable is totally sparse, and it allows us to split it into groups ${1,2,4,8}$ and ${3,5,6,7,9}$, that would both have a Gini of $0$ since they are pure. The Gini increase would therefore be the Gini of the parent node.
However, attributes like $Instance$ are completely sparse and have no predicting power on new data (every new entry will have a different instance number), and for this reason are usually excluded. Alternatively, one can use the Gini ratio for the splits. That is, weight the Gini decreases by the inverse of the Gini coefficient of the attributes (this time, it's the ones computed by Nick!). This way, importance of sparse variables such as $Instance$ is reduced by the fact that their Gini coefficient is very high.
Please mind that my answer is not in contradiction with Nick's, but it is simply answering a different question, since it might have been also interpreted differently.
PS: just for clarification, as the information gain term was used: Information Gain is almost equivalent to difference in Gini index, and it is computed as difference in Entropy.