0

I have got a table with known marginal sums:

|  | | | 7|
|  | | | 7|
|  | | | 6|
|  | | | 6|
|  | | | 5|
|  | | | 4|
|  | | | 3|
+--+-+-+--+
|26|6|6|38|

I want to estimate the best integer contingency table to fulfil the row and column sums that minimises the error from the expected distribution.

1) I can calculate estimates (float):

|4.79|1.11|1.11| 7|
|4.79|1.11|1.11| 7|
|4.11|0.95|0.95| 6|
|4.11|0.95|0.95| 6|
|3.42|0.79|0.79| 5|
|2.74|0.63|0.63| 4|
|2.05|0.47|0.47| 3|
+----+----+----+--+
|  26|   6|   6|38|

2) then round to the nearest integer:

| 5|1|1| 7|
| 5|1|1| 7|
| 4|1|1| 6|
| 4|1|1| 6|
| 3|1|1| 5|
| 3|1|1| 4|
| 2|0|0| 3|
+--+-+-+--+
|26|6|6|38|

This happens to work quite well but the row sums in the last two rows don't match the target row sum.

Is there an algorithm (javascript) to solve this generally?

ajo
  • 101
  • How do you define the *expected distribution*? – kjetil b halvorsen Feb 19 '20 at 17:29
  • @kjetil-b-halvorsen: The expected values would be the float estimates. The expected distribution of integers should be as close as possible to those float values. – ajo Feb 19 '20 at 17:34
  • 1
    Do you men, then, expected under independencia? – kjetil b halvorsen Feb 19 '20 at 18:22
  • 1
    If you use the Chi-squared statistic to measure the error, this is an [integer quadratic program](https://www.google.com/search?q=integer+quadratic+program&oq=integer+quadratic+program). – whuber Feb 19 '20 at 19:14
  • The maximum entropy distribution, given fixed marginals, is the independence solution. So you could maximize entropy given the marginals, and under integer constraints. See https://projecteuclid.org/download/pdf_1/euclid.aoms/1177704014 – kjetil b halvorsen Feb 19 '20 at 19:44

0 Answers0