This sounds like a simulation is in order.
So I simulated your procedure as follows: $N=1000$ people are added to the trial one-by-one, randomly assigned to one of the $4$ groups. The outcome of the treatment for this person is chosen randomly (i.e. I am simulating null hypothesis of all treatments having zero effect). After adding each person, I perform a chi squared test on the $4 \times 2$ contingency table and check if $p\le \alpha$. If it is, then (and only then) I additionally perform chi squared tests on the reduced $2 \times 2$ contingency tables to test each group against other three groups pooled together. If one of these further four tests comes out significant (with the same $\alpha$), then I check if this treatment performs better or worse than the other three pooled together. If worse, I kick this treatment out and continue adding people. If better, I stop the trial. If all $N$ people are added without any winning treatment, the trial is over (note that the results of my analysis will strongly depend on $N$).
Now we can run this many times and find out in what fraction of runs one of the treatments comes out as a winner -- these would be false positives. If I run it 1000 times for nominal $\alpha=0.05$, I get 282 false positives, i.e. $0.28$ type II error rate.
We can repeat this whole analysis for several nominal $\alpha$ and see what actual error rate we get: $$\begin{array}{cc}\alpha & \text{error rate} \\ 0.05 & \sim 0.28 \\ 0.01 & \sim 0.06 \\ 0.001 & \sim 0.008\end{array}$$ So if you want the actual error rate to be held say at $0.05$ level, you should choose the nominal $\alpha$ of around $0.008$ -- but of course it is better to run a longer simulation to estimate this more precisely.
My quick and dirty code in Matlab is below. Please note that this code is brain-dead and not optimized at all; everything runs in loops and horribly slow. This can probably be accelerated a lot.
function seqAnalysis()
alphas = [0.001 0.01 0.05];
for a = 1:length(alphas)
falsePositives(a) = trials_run(1000, 1000, alphas(a));
end
display(num2str([alphas; falsePositives]))
end
function outcome = trials_run(Nrep, N, alpha)
outcomes = zeros(1,Nrep);
for rep = 1:Nrep
if mod(rep,10) == 0
fprintf('.')
end
outcomes(rep) = trial(N, alpha);
end
fprintf('\n')
outcome = sum(outcomes);
end
function result = trial(N, alpha)
outcomes = zeros(2,4);
result = 0;
winner = [];
%// adding subjects one by one
for subject = 1:N
group = randi(size(outcomes,2));
outcome = randi(2);
outcomes(outcome, group) = outcomes(outcome, group) + 1;
%// if groups are significantly different
if chisqtest(outcomes) < alpha
%// compare each treatment against the rest
for group = 1:size(outcomes,2)
contrast = [outcomes(:, group) ...
sum(outcomes(:, setdiff(1:size(outcomes,2), group)),2)];
%// if significantly different
if chisqtest(contrast) < alpha
%// check if better or worse
if contrast(1,1)/contrast(2,1) < contrast(1,2)/contrast(2,2)
%// kick out this group
outcomes = outcomes(:, setdiff(1:size(outcomes,2), group));
else
%// winner!
winner = group;
end
break
end
end
end
if ~isempty(winner)
result = 1;
break
end
end
end
function p = chisqtest(x)
e = sum(x,2)*sum(x)/sum(x(:));
X2 = (x-e).^2./e;
X2 = sum(X2(:));
df = prod(size(x)-[1 1]);
p = 1-chi2cdf(X2,df);
end