My 1982 paper "The Influence Function and Its Application to Data Validation" in the American Journal of Mathematical and Management Sciences was judged the best theoretical paper in that journal for the year 1982 and as a consequence I was awarded the Jacob Wolfowitz Prize for 1983.
The paper deals with Hampel's influence function and the way it can be used to detect outliers. In my case I was considering multivariate outliers. My argument regarding data validation which was a concern for the Department of Energy's data bases at that time was that outliers that effect estimates important to the intended users of the data base should be emphasized and detected. There are so many distance functions that can be used to determine multivariate outliers. I proposed using the influence function for a parameter of interest to the user of the data to be the metric to use.
Hampel's influence function depends on the parameter being estimated and the multivariate data point being considered. I took a simple but an important and illustrative case. For bivariate data (X$_1$,X$_2$) consider the correlation between the pair and the influence of a particular single point (x$_1$,x$-2$) Formally Hampel's influence function is a directional derivative. Informally as Mallows pointed out it essentially represents the difference between an estimate of the parameter based on an entire sample that includes (x$_1$,x$-2$) and the sample that contains every other point but leaves (x$_1$,x$-2$) out.
For the bivariate correlation you can do the formal mathematics and show that the influence function for bivariate correlation ( which is closely related to the influence on the slope parameter of a simple linear regression of say X$_2$ on X$_1$) for contours of constant values that are hyperbolae.
Take a scatter diagram of the data and superimpose these contours. You will see them move out from low values to high values similar to how temperature or pressure contours might look on a map. The direction of greatest increase is the direction to look for the most influential observations and the contours tell you the value of the influence at any point of interest. I illustrated this using the DOEs FPC Form 4 data which provides data at power plants comparing energy consumption (possibly of coal) to electricity generation. There is reasonable positive correlation between the two. Estimates from the data I had wer at about 0.48.
I included two figures (one for each of the two plant) each showing scatter plots with a high influence contour superimpopsed. Based on this each plant contained three outliers (based on high influence. The point of lowest influence are near the mean of the bivariate sample with 0 influence at the sample mean vector (which happens to always be a point on the least squares regression line). The outliers tended to be in the upper right corner of the scatter plot (3 points there) or at very high values of x$_2$ with very low values of X$_1$. The removal of one outlier (having the highest estimated influence) based on a sample of 36 points actually changed the estimated correlation from 0.47 to 0.77 at plant A and the largest influential outlier at plant B changed it from 0.48 to 0.85.
We found that one of the outliers had a consumption value of 93 and generation of 330 (I did not give the units). This was a very large consumption for a generation of 330. In checking with the plant we discovered that we found an error. The generation was 330 but the consumption should have only been one unit. This could have been a recording slip of two decimal places. There is a lot more detail in the paper along with a computer program to generate bivariate correlation influence function estimates from a given data set.
This work was actually done in 1979. At hat time computers were very slow (relative to today) and we programmed in Fortran. So the code in the paper is in Fortran. I think the paper is accessible over the internet and I will try to find a link for it.