Knowing how graders differ is good, but still doesn't tell you what to compensate the grades to. For simplicity imagine just two graders. Even if we conclude grader 1 is consistently 5 marks more generous than grader 2, that doesn't tell you what to do with two students who were each graded 70, one by grader 1 and one by grader 2. Do we say that grader 2 was a harsh marker, and uprate that 70 to 75, while keeping the 70 marked by grader 1 unchanged? Or do we assume grader 1 was unduly lenient, knock his student down to 65 marks, and keep grader 2's 70 unchanged? Do we compromise half-way between - extending to your case, based on an average of the 11 graders? It's the absolute grades that matter, so knowing relative generosity is not enough.
Your conclusion may depend on how "objective" you feel the final absolute mark should be be. One mental model would be to propose each student has a "correct" grade - the one that would be awarded by the Lead Assessor if they had time to mark each paper individually - to which the observed grades are approximations. In this model, observed grades need to be compensated for their grader, in order to bring them as close as possible towards their unobserved "true" grade. Another model might be that all grading is subjective, and we seek to transform each observed grade towards the mark we predict it would have been awarded if all graders had considered the same paper and reached some sort of compromise or average grade for it. I find the second model less convincing as a solution even if the admission of subjectivity is more realistic. In an educational setting there is usually someone who bears ultimate responsibility for assessment, to ensure that students receive "the grade they deserve", but this lead role has essentially absolved responsibility to the very graders who we already know disagree markedly. From hereon I assume there is one "correct" grade that we aim to estimate, but this is a contestable proposition and may not fit your circumstances.
Suppose students A, B, C and D, all in the same cohort, "should" be graded as 75, 80, 85 and 90 respectively but their generous grader consistently marks 5 marks too high. We observe 80, 85, 90 and 95 and should subtract 5, but finding the figure to subtract is problematic. It can't be done by comparing results between cohorts since we expect cohorts to vary in average ability. One possibility is using the multiple choice test results to predict the correct scores on the second assignment, then use this to assess variation between each grader and the correct grades. But making this prediction is non-trivial - if you expect different mean and standard deviation between the two assessments, you can't just assume that the second assessment grades should match the first.
Also, students differ in relative aptitude at multiple-choice and written assessments. You could treat that as some kind of random effect, forming a component of the student's "observed" and "true" grades, but not captured by their "predicted" grade. If cohorts differ systematically and students in a cohort tend be similar, then we shouldn't expect this effect to average out to zero within each cohort. If a cohort's observed grades average +5 versus their predicted ones, it is impossible to determine whether this is due to a generous grader, a cohort particularly better-suited to written assessment than multiple-choice, or some combination of the two. In an extreme case, the cohort may even have lower aptitude at the second assessment but had this more than compensated for by a very generous grader - or vice versa. You can't break this apart. It's confounded.
I also doubt the adequacy of such a simple additive model for your data. Graders may differ from the Lead Assessor not just by shift in location, but also spread - though since cohorts likely vary in homogeneity, you can't just check the spread of observed grades in each cohort to detect this. Moreover, the bulk of the distribution has high scores, fairly near the theoretical maximum of 100. I'd anticipate this introducing non-linearity due to compression near the maximum - a very generous grader may give A, B, C and D marks like 85, 90, 94, 97. This is harder to reverse than just subtracting a constant. Worse, you might see "clipping" - an extremely generous grader may grade them as 90, 95, 100, 100. This is impossible to reverse, and information about the relative performance of C and D is irrecoverably lost.
Your graders behave very differently. Are you sure they differ only in their overall generosity, rather than in their generosity in various components of the assessment? This might be worth checking, as it could introduce various complications - e.g. the observed grade for B may be worse than that of A, despite B being 5 point "better", even if the grader's allocated marks for each component are a monotonically increasing function of the Lead Assessor's! Suppose the assessment is split between Q1 (A should score 30/50, B 45/50) and Q2 (A should score 45/50, B 35/50). Imagine the grader is very lenient on Q1 (observed grades: A 40/50, B 50/50) but harsh on Q2 (observed: A 42/50, 30/50), then we observe totals of 82 for A and 80 for B. If you do have to consider component scores, note that clipping may be an issue - I suspect few papers get graded a perfect 100, but rather more papers will be awarded full marks in at least one component.
Arguably this is an extended comment rather than an answer, in the sense it doesn't propose a particular solution within the original bounds of your problem. But if your graders are already already handling about 55 papers each, then is it so bad for them to have to look at five or ten more for calibration purposes? You already have a good idea of students' abilities, so could pick a sample of papers from right across the range of grades. You could then assess whether you need to compensate for grader generosity across the whole test or in each component, and whether to do so just by adding/subtracting a constant or by something more sophisticated like interpolation (e.g. if you're worried about non-linearity near 100). But a word of warning on interpolation: suppose the Lead Assessor marks five sample papers as 70, 75, 80, 85 and 90, while a grader marks them as 80, 88, 84, 93 and 96 so there is some disagreement about order. You probably want to map observed grades from 96 to 100 onto the interval 90 to 100, and observed grades from 93 to 96 onto the interval 85 to 90. But some thought is required for marks below that. Perhaps observed grades from 84 to 93 should be mapped to the interval 75 to 85? An alternative would be a (possibly polynomial) regression to obtain a formula for "predicted true grade" from "observed grade".