Council Consensus Algorithms: The Mathematics of Multi-Model Agreement
Explore the mathematical foundations of consensus algorithms used in LLM councils to aggregate model outputs.
LLM councilconsensus algorithmsAI mathematicscouncil of LLMsAI consensus
The Mathematics of Consensus
LLM councils need principled ways to aggregate multiple model outputs into single answers. This requires mathematical rigor.
Voting Theory Foundations
Plurality Voting
Simple majority wins:
P(winner) = max(count(votes)) / total_votes
Problem: Can elect polarizing choices.
Borda Count
Ranking-based:
Score(option) = Σ (n - rank_i) for each voter i
Better for nuanced preferences.
Condorcet Method
Pairwise comparisons:
Winner beats every other option in head-to-head
Most fair but can have no winner (Condorcet paradox).
Agreement Metrics
Cohen's Kappa
Measures agreement beyond chance:
κ = (p_o - p_e) / (1 - p_e)
where:
p_o = observed agreement
p_e = expected agreement by chance
Inter-Rater Reliability
ICC = (MSR - MSE) / (MSR + (k-1) * MSE)
where:
MSR = mean square between raters
MSE = mean square error
k = number of raters
Consensus Scoring
Simple Consensus
consensus = max(count(responses)) / total_responses
Example: [A, A, A, B, C]
consensus = 3/5 = 0.6
Weighted Consensus
Account for model reliability:
consensus = Σ(w_i * response_i) / Σ(w_i)
where w_i = reliability_weight(model_i)
Entropy-Based Uncertainty
H = -Σ p_i * log(p_i)
Low entropy = high consensus
High entropy = disagreement
Answer Similarity
Lexical Similarity (BLEU)
BLEU = BP * exp(Σ w_n * log(p_n))
BP = brevity penalty
p_n = n-gram precision
Semantic Similarity
similarity = cos(embedding_a, embedding_b)
= (A · B) / (||A|| * ||B||)
Structural Similarity
Compare answer structure, not just text:
structure_similarity = jaccard(parse_tree_a, parse_tree_b)
Confidence Calibration
Expected Calibration Error
ECE = Σ |n_m / N| * |acc(m) - conf(m)|
where m bins by confidence
Temperature Scaling
Calibrate confidence:
calibrated_conf = softmax(logits / T)
where T minimizes NLL on validation
Synthesis Algorithms
Maximum Likelihood
answer* = argmax P(answer | all_responses)
Bayesian Combination
P(final | responses) ∝ P(responses | final) * P(final)
Incorporates prior knowledge
Evidence Accumulation
belief(A) = Σ evidence_for(A) - Σ evidence_against(A)
Implementation Considerations
Computational Complexity
- Voting: O(n*m) where n=models, m=options
- Similarity: O(n^2 * d) where d=embedding dimension
- Synthesis: O(n * output_length)
Numerical Stability
- Use log probabilities
- Normalize scores
- Handle edge cases
SPRAPP Implementation
Our consensus algorithms:
- Configurable voting methods
- Model reliability weighting
- Entropy-based uncertainty
- Calibrated confidence scores
The SPRAPP uses principled mathematics for reliable consensus.