Technical Deep Dive2025-02-0510 min read

Council Consensus Algorithms: The Mathematics of Multi-Model Agreement

Explore the mathematical foundations of consensus algorithms used in LLM councils to aggregate model outputs.

LLM councilconsensus algorithmsAI mathematicscouncil of LLMsAI consensus

The Mathematics of Consensus

LLM councils need principled ways to aggregate multiple model outputs into single answers. This requires mathematical rigor.

Voting Theory Foundations

Plurality Voting

Simple majority wins:

P(winner) = max(count(votes)) / total_votes

Problem: Can elect polarizing choices.

Borda Count

Ranking-based:

Score(option) = Σ (n - rank_i) for each voter i

Better for nuanced preferences.

Condorcet Method

Pairwise comparisons:

Winner beats every other option in head-to-head

Most fair but can have no winner (Condorcet paradox).

Agreement Metrics

Cohen's Kappa

Measures agreement beyond chance:

κ = (p_o - p_e) / (1 - p_e)

where:
p_o = observed agreement
p_e = expected agreement by chance

Inter-Rater Reliability

ICC = (MSR - MSE) / (MSR + (k-1) * MSE)

where:
MSR = mean square between raters
MSE = mean square error
k = number of raters

Consensus Scoring

Simple Consensus

consensus = max(count(responses)) / total_responses

Example: [A, A, A, B, C]
consensus = 3/5 = 0.6

Weighted Consensus

Account for model reliability:

consensus = Σ(w_i * response_i) / Σ(w_i)

where w_i = reliability_weight(model_i)

Entropy-Based Uncertainty

H = -Σ p_i * log(p_i)

Low entropy = high consensus
High entropy = disagreement

Answer Similarity

Lexical Similarity (BLEU)

BLEU = BP * exp(Σ w_n * log(p_n))

BP = brevity penalty
p_n = n-gram precision

Semantic Similarity

similarity = cos(embedding_a, embedding_b)

= (A · B) / (||A|| * ||B||)

Structural Similarity

Compare answer structure, not just text:

structure_similarity = jaccard(parse_tree_a, parse_tree_b)

Confidence Calibration

Expected Calibration Error

ECE = Σ |n_m / N| * |acc(m) - conf(m)|

where m bins by confidence

Temperature Scaling

Calibrate confidence:

calibrated_conf = softmax(logits / T)

where T minimizes NLL on validation

Synthesis Algorithms

Maximum Likelihood

answer* = argmax P(answer | all_responses)

Bayesian Combination

P(final | responses) ∝ P(responses | final) * P(final)

Incorporates prior knowledge

Evidence Accumulation

belief(A) = Σ evidence_for(A) - Σ evidence_against(A)

Implementation Considerations

Computational Complexity

Voting: O(n*m) where n=models, m=options
Similarity: O(n^2 * d) where d=embedding dimension
Synthesis: O(n * output_length)

Numerical Stability

Use log probabilities
Normalize scores
Handle edge cases

SPRAPP Implementation

Our consensus algorithms:

Configurable voting methods
Model reliability weighting
Entropy-based uncertainty
Calibrated confidence scores

The SPRAPP uses principled mathematics for reliable consensus.

Written bySPRAPP Team

Hallucination Detection in LLM Councils: Catching AI Errors Before They Matter

Learn how LLM councils detect and prevent hallucinations through cross-model verification, consensus analysis, and confidence scoring.

2025-02-148 min read

Technical Deep Dive

Prompt Engineering for LLM Councils: Optimizing Multi-Model Queries

Master the art of crafting prompts that get the best results from multiple AI models working together in a council.

2025-02-139 min read

Technical Deep Dive

Token Optimization for LLM Councils: Reducing Costs and Latency

Learn strategies to minimize token usage in your LLM council without sacrificing answer quality or accuracy.

2025-02-128 min read

Technical Deep Dive

Council Latency Engineering: Building Fast Multi-Model AI Systems

Deep dive into the engineering techniques that make LLM councils respond quickly despite coordinating multiple AI models.

2025-02-119 min read

← Back to News

Technical Deep Dive2025-02-0510 min read

Council Consensus Algorithms: The Mathematics of Multi-Model Agreement

Explore the mathematical foundations of consensus algorithms used in LLM councils to aggregate model outputs.

LLM councilconsensus algorithmsAI mathematicscouncil of LLMsAI consensus

The Mathematics of Consensus

LLM councils need principled ways to aggregate multiple model outputs into single answers. This requires mathematical rigor.

Voting Theory Foundations

Plurality Voting

Simple majority wins:

P(winner) = max(count(votes)) / total_votes

Problem: Can elect polarizing choices.

Borda Count

Ranking-based:

Score(option) = Σ (n - rank_i) for each voter i

Better for nuanced preferences.

Condorcet Method

Pairwise comparisons:

Winner beats every other option in head-to-head

Most fair but can have no winner (Condorcet paradox).

Agreement Metrics

Cohen's Kappa

Measures agreement beyond chance:

κ = (p_o - p_e) / (1 - p_e)

where:
p_o = observed agreement
p_e = expected agreement by chance

Inter-Rater Reliability

ICC = (MSR - MSE) / (MSR + (k-1) * MSE)

where:
MSR = mean square between raters
MSE = mean square error
k = number of raters

Consensus Scoring

Simple Consensus

consensus = max(count(responses)) / total_responses

Example: [A, A, A, B, C]
consensus = 3/5 = 0.6

Weighted Consensus

Account for model reliability:

consensus = Σ(w_i * response_i) / Σ(w_i)

where w_i = reliability_weight(model_i)

Entropy-Based Uncertainty

H = -Σ p_i * log(p_i)

Low entropy = high consensus
High entropy = disagreement

Answer Similarity

Lexical Similarity (BLEU)

BLEU = BP * exp(Σ w_n * log(p_n))

BP = brevity penalty
p_n = n-gram precision

Semantic Similarity

similarity = cos(embedding_a, embedding_b)

= (A · B) / (||A|| * ||B||)

Structural Similarity

Compare answer structure, not just text:

structure_similarity = jaccard(parse_tree_a, parse_tree_b)

Confidence Calibration

Expected Calibration Error

ECE = Σ |n_m / N| * |acc(m) - conf(m)|

where m bins by confidence

Temperature Scaling

Calibrate confidence:

calibrated_conf = softmax(logits / T)

where T minimizes NLL on validation

Synthesis Algorithms

Maximum Likelihood

answer* = argmax P(answer | all_responses)

Bayesian Combination

P(final | responses) ∝ P(responses | final) * P(final)

Incorporates prior knowledge

Evidence Accumulation

belief(A) = Σ evidence_for(A) - Σ evidence_against(A)

Implementation Considerations

Computational Complexity

Voting: O(n*m) where n=models, m=options
Similarity: O(n^2 * d) where d=embedding dimension
Synthesis: O(n * output_length)

Numerical Stability

Use log probabilities
Normalize scores
Handle edge cases

SPRAPP Implementation

Our consensus algorithms:

Configurable voting methods
Model reliability weighting
Entropy-based uncertainty
Calibrated confidence scores

The SPRAPP uses principled mathematics for reliable consensus.

Written bySPRAPP Team

Hallucination Detection in LLM Councils: Catching AI Errors Before They Matter

Learn how LLM councils detect and prevent hallucinations through cross-model verification, consensus analysis, and confidence scoring.

2025-02-148 min read

Technical Deep Dive

Prompt Engineering for LLM Councils: Optimizing Multi-Model Queries

Master the art of crafting prompts that get the best results from multiple AI models working together in a council.

2025-02-139 min read

Technical Deep Dive

Token Optimization for LLM Councils: Reducing Costs and Latency

Learn strategies to minimize token usage in your LLM council without sacrificing answer quality or accuracy.

2025-02-128 min read

Technical Deep Dive

Council Latency Engineering: Building Fast Multi-Model AI Systems

Deep dive into the engineering techniques that make LLM councils respond quickly despite coordinating multiple AI models.

2025-02-119 min read

← Back to News

The Mathematics of Consensus

Voting Theory Foundations

Plurality Voting

Borda Count

Condorcet Method

Agreement Metrics

Cohen's Kappa

Inter-Rater Reliability

Consensus Scoring

Simple Consensus

Weighted Consensus

Entropy-Based Uncertainty

Answer Similarity

Lexical Similarity (BLEU)

Semantic Similarity

Structural Similarity

Confidence Calibration

Expected Calibration Error

Temperature Scaling

Synthesis Algorithms

Maximum Likelihood

Bayesian Combination

Evidence Accumulation

Implementation Considerations

Computational Complexity

Numerical Stability

SPRAPP Implementation

Tags

Related Articles

Hallucination Detection in LLM Councils: Catching AI Errors Before They Matter

Prompt Engineering for LLM Councils: Optimizing Multi-Model Queries

Token Optimization for LLM Councils: Reducing Costs and Latency

Council Latency Engineering: Building Fast Multi-Model AI Systems

The Mathematics of Consensus

Voting Theory Foundations

Plurality Voting

Borda Count

Condorcet Method

Agreement Metrics

Cohen's Kappa

Inter-Rater Reliability

Consensus Scoring

Simple Consensus

Weighted Consensus

Entropy-Based Uncertainty

Answer Similarity

Lexical Similarity (BLEU)

Semantic Similarity

Structural Similarity

Confidence Calibration

Expected Calibration Error

Temperature Scaling

Synthesis Algorithms

Maximum Likelihood

Bayesian Combination

Evidence Accumulation

Implementation Considerations

Computational Complexity

Numerical Stability

SPRAPP Implementation

Tags

Related Articles

Hallucination Detection in LLM Councils: Catching AI Errors Before They Matter

Prompt Engineering for LLM Councils: Optimizing Multi-Model Queries

Token Optimization for LLM Councils: Reducing Costs and Latency

Council Latency Engineering: Building Fast Multi-Model AI Systems