name: hitl-design description: Design human-in-the-loop workflows including review queues, escalation patterns, feedback loops, and quality assurance for AI systems. allowed-tools: Read, Write, Glob, Grep, Task

Human-in-the-Loop Design

When to Use This Skill

Use this skill when:

Hitl Design tasks - Working on design human-in-the-loop workflows including review queues, escalation patterns, feedback loops, and quality assurance for ai systems
Planning or design - Need guidance on Hitl Design approaches
Best practices - Want to follow established patterns and standards

Overview

Human-in-the-Loop (HITL) design creates meaningful human oversight for AI systems. Effective HITL balances automation efficiency with human judgment, ensuring appropriate intervention points without creating bottlenecks.

HITL Pattern Taxonomy

┌─────────────────────────────────────────────────────────────────┐
│                    HITL PATTERN SPECTRUM                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  FULL AUTOMATION ◄───────────────────────────► FULL MANUAL       │
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │
│  │ AI Only  │   │ Human    │   │ Human    │   │ Human    │      │
│  │          │   │ on Loop  │   │ in Loop  │   │ Only     │      │
│  │ No human │   │ Monitor  │   │ Review   │   │ No AI    │      │
│  │ review   │   │ & audit  │   │ & decide │   │          │      │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘      │
│       │              │              │              │             │
│       ▼              ▼              ▼              ▼             │
│  Low stakes    Medium risk    High stakes    Critical/          │
│  High volume   Scalable       Accuracy       Regulated          │
│                oversight      critical                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

HITL Patterns

Pattern 1: Human-on-the-Loop (Monitoring)

                    ┌─────────────────┐
                    │  Human Monitor  │
                    │  (Dashboard)    │
                    └────────┬────────┘
                             │ Observes
                             ▼
    Input ──► AI Decision ──► Execute ──► Outcome
                    │
                    └──► Alert if anomaly

Use When:

High volume, low individual risk
AI performance is validated
Rapid response not required
Audit trail sufficient

Pattern 2: Human-in-the-Loop (Review)

                        ┌─────────────────┐
                        │  Human Review   │
                        │  Queue          │
                        └────────┬────────┘
                                 │
    Input ──► AI Recommend ──► Review ──► Decision ──► Execute
                    │                         │
                    └─── Low confidence? ─────┘
                              route

Use When:

Decisions have significant impact
Regulatory requirement
Model confidence varies
Liability concerns

Pattern 3: Human-First with AI Assist

    Input ──► Human Decision ──► AI Validation ──► Execute
                    │                   │
                    └─── Suggest ◄──────┘
                         alternatives

Use When:

Expert domain knowledge required
AI augments rather than replaces
Training/onboarding scenarios
Building trust in AI

Decision Routing

Confidence-Based Routing

public class ConfidenceRouter
{
    private readonly HitlConfiguration _config;

    public async Task<RoutingDecision> Route(
        AiPrediction prediction,
        CancellationToken ct)
    {
        // High confidence: Auto-approve
        if (prediction.Confidence >= _config.AutoApproveThreshold)
        {
            return RoutingDecision.AutoApprove(prediction);
        }

        // Low confidence: Human review required
        if (prediction.Confidence <= _config.ManualReviewThreshold)
        {
            return RoutingDecision.RequireHumanReview(
                prediction,
                ReviewPriority.High,
                "Low model confidence");
        }

        // Medium confidence: Risk-based routing
        var riskScore = await CalculateRiskScore(prediction, ct);

        if (riskScore > _config.RiskThreshold)
        {
            return RoutingDecision.RequireHumanReview(
                prediction,
                ReviewPriority.Medium,
                $"Elevated risk score: {riskScore:F2}");
        }

        return RoutingDecision.AutoApproveWithAudit(prediction);
    }

    private async Task<double> CalculateRiskScore(
        AiPrediction prediction,
        CancellationToken ct)
    {
        var factors = new List<double>
        {
            1 - prediction.Confidence,                    // Uncertainty
            prediction.ImpactScore,                       // Potential impact
            prediction.NoveltyScore,                      // Out-of-distribution
            await GetRecentErrorRate(prediction.Category) // Historical errors
        };

        return factors.Average();
    }
}

Rule-Based Routing

public class RuleBasedRouter
{
    private readonly List<IRoutingRule> _rules;

    public async Task<RoutingDecision> Route(
        AiPrediction prediction,
        Context context,
        CancellationToken ct)
    {
        foreach (var rule in _rules.OrderByDescending(r => r.Priority))
        {
            if (await rule.Matches(prediction, context, ct))
            {
                return rule.GetDecision(prediction, context);
            }
        }

        return RoutingDecision.Default(prediction);
    }
}

// Example rules
public class HighValueRule : IRoutingRule
{
    public int Priority => 100;

    public Task<bool> Matches(AiPrediction prediction, Context context, CancellationToken ct)
    {
        return Task.FromResult(context.TransactionValue > 10000);
    }

    public RoutingDecision GetDecision(AiPrediction prediction, Context context)
    {
        return RoutingDecision.RequireHumanReview(
            prediction,
            ReviewPriority.High,
            "High-value transaction requires approval");
    }
}

public class RegulatedCategoryRule : IRoutingRule
{
    public int Priority => 90;

    public Task<bool> Matches(AiPrediction prediction, Context context, CancellationToken ct)
    {
        return Task.FromResult(
            context.Category is "medical" or "legal" or "financial");
    }

    public RoutingDecision GetDecision(AiPrediction prediction, Context context)
    {
        return RoutingDecision.RequireHumanReview(
            prediction,
            ReviewPriority.Normal,
            $"Regulated category: {context.Category}");
    }
}

Review Queue Design

Queue Architecture

public class ReviewQueueService
{
    private readonly IReviewItemRepository _repository;
    private readonly IReviewerAssignment _assignment;
    private readonly INotificationService _notifications;

    public async Task<ReviewItem> EnqueueForReview(
        AiPrediction prediction,
        ReviewPriority priority,
        string reason,
        CancellationToken ct)
    {
        var item = new ReviewItem
        {
            Id = Guid.NewGuid(),
            Prediction = prediction,
            Priority = priority,
            Reason = reason,
            CreatedAt = DateTime.UtcNow,
            SlaDeadline = CalculateSla(priority),
            Status = ReviewStatus.Pending
        };

        await _repository.Create(item, ct);

        // Assign to appropriate reviewer
        var assignee = await _assignment.FindReviewer(item, ct);
        if (assignee != null)
        {
            item.AssignedTo = assignee;
            await _repository.Update(item, ct);
            await _notifications.NotifyAssignment(assignee, item, ct);
        }

        return item;
    }

    public async Task<ReviewItem> ClaimNext(
        string reviewerId,
        ReviewerCapabilities capabilities,
        CancellationToken ct)
    {
        // Find next appropriate item for reviewer
        var item = await _repository.FindNextUnassigned(
            capabilities.Categories,
            capabilities.MaxPriority,
            ct);

        if (item == null) return null;

        item.AssignedTo = reviewerId;
        item.ClaimedAt = DateTime.UtcNow;
        item.Status = ReviewStatus.InProgress;

        await _repository.Update(item, ct);

        return item;
    }

    public async Task SubmitReview(
        Guid itemId,
        string reviewerId,
        ReviewDecision decision,
        CancellationToken ct)
    {
        var item = await _repository.GetById(itemId, ct);

        if (item.AssignedTo != reviewerId)
            throw new UnauthorizedAccessException("Item not assigned to reviewer");

        item.Decision = decision;
        item.CompletedAt = DateTime.UtcNow;
        item.Status = ReviewStatus.Completed;

        await _repository.Update(item, ct);

        // Record for model improvement
        await RecordFeedback(item, decision, ct);

        // Trigger downstream actions
        await ProcessDecision(item, decision, ct);
    }

    private DateTime CalculateSla(ReviewPriority priority)
    {
        return priority switch
        {
            ReviewPriority.Critical => DateTime.UtcNow.AddMinutes(15),
            ReviewPriority.High => DateTime.UtcNow.AddHours(1),
            ReviewPriority.Normal => DateTime.UtcNow.AddHours(4),
            ReviewPriority.Low => DateTime.UtcNow.AddDays(1),
            _ => DateTime.UtcNow.AddHours(4)
        };
    }
}

Review Interface Design

## Review Interface Requirements

### Essential Information
- Original input/request
- AI prediction/recommendation
- Confidence score with explanation
- Supporting evidence/context
- Similar historical cases
- Risk indicators

### Reviewer Actions
- Approve (accept AI recommendation)
- Reject (override with reason)
- Modify (adjust AI recommendation)
- Escalate (route to specialist)
- Defer (need more information)

### Ergonomic Considerations
- Keyboard shortcuts for common actions
- Batch review mode for similar items
- Quick filters and sorting
- Time tracking for fatigue management
- Random audits of auto-approved items

Escalation Patterns

Escalation Workflow

public class EscalationService
{
    private readonly List<EscalationLevel> _levels;

    public async Task<EscalationResult> Escalate(
        ReviewItem item,
        string reason,
        string escalatingReviewer,
        CancellationToken ct)
    {
        var currentLevel = item.EscalationLevel ?? 0;
        var nextLevel = _levels.FirstOrDefault(l => l.Level == currentLevel + 1);

        if (nextLevel == null)
        {
            return EscalationResult.MaxLevelReached();
        }

        item.EscalationLevel = nextLevel.Level;
        item.EscalationReason = reason;
        item.EscalatedBy = escalatingReviewer;
        item.EscalatedAt = DateTime.UtcNow;

        // Find appropriate escalation target
        var target = await FindEscalationTarget(nextLevel, item, ct);

        item.AssignedTo = target.ReviewerId;

        await _repository.Update(item, ct);
        await _notifications.NotifyEscalation(target, item, reason, ct);

        return EscalationResult.Escalated(nextLevel, target);
    }
}

public record EscalationLevel(
    int Level,
    string Name,
    TimeSpan SlaOverride,
    string[] RequiredCapabilities
);

Escalation Triggers

Trigger	Description	Target
Complexity	Requires specialized knowledge	Subject matter expert
Conflict	Disagreement with AI/policy	Senior reviewer
Risk	High-impact decision	Manager/compliance
Timeout	SLA approaching	Next available
Uncertainty	Reviewer unsure	Second opinion

Feedback Loops

Learning from Human Decisions

public class FeedbackCollector
{
    public async Task RecordFeedback(
        ReviewItem item,
        ReviewDecision decision,
        CancellationToken ct)
    {
        var feedback = new HumanFeedback
        {
            ItemId = item.Id,
            OriginalPrediction = item.Prediction,
            HumanDecision = decision,
            Agreement = decision.Action == DecisionAction.Approve,
            ReviewerId = item.AssignedTo,
            ReviewDurationMs = CalculateDuration(item),
            Context = ExtractContext(item)
        };

        await _feedbackStore.Store(feedback, ct);

        // Aggregate for model retraining
        if (ShouldTriggerRetraining())
        {
            await _retrainingService.QueueRetraining(ct);
        }

        // Alert on significant disagreement patterns
        await CheckForSystematicDisagreement(feedback, ct);
    }

    private async Task CheckForSystematicDisagreement(
        HumanFeedback feedback,
        CancellationToken ct)
    {
        var recentFeedback = await _feedbackStore.GetRecent(
            category: feedback.Context.Category,
            hours: 24,
            ct);

        var disagreementRate = recentFeedback
            .Count(f => !f.Agreement) / (double)recentFeedback.Count;

        if (disagreementRate > 0.3)
        {
            await _alerts.Send(new SystematicDisagreementAlert
            {
                Category = feedback.Context.Category,
                DisagreementRate = disagreementRate,
                SampleSize = recentFeedback.Count
            });
        }
    }
}

Active Learning Integration

public class ActiveLearningSelector
{
    public async Task<IEnumerable<ReviewItem>> SelectForLabeling(
        int count,
        CancellationToken ct)
    {
        // Uncertainty sampling: Select items where model is most uncertain
        var uncertainItems = await _predictions
            .Where(p => p.Status == PredictionStatus.Pending)
            .OrderBy(p => Math.Abs(p.Confidence - 0.5))
            .Take(count / 2)
            .ToListAsync(ct);

        // Diversity sampling: Select diverse examples
        var diverseItems = await SelectDiverseExamples(count / 2, ct);

        return uncertainItems.Concat(diverseItems);
    }
}

HITL Metrics

Key Performance Indicators

Metric	Description	Target
Throughput	Reviews per hour	Varies by domain
Cycle Time	Queue to decision	< SLA
Agreement Rate	Human-AI alignment	> 85%
Override Rate	Human overrides AI	< 15%
Escalation Rate	Items escalated	< 10%
Reviewer Fatigue	Accuracy over time	Stable

Dashboard Design

public class HitlDashboard
{
    public async Task<DashboardData> GetMetrics(
        DateRange range,
        CancellationToken ct)
    {
        return new DashboardData
        {
            // Volume metrics
            TotalReviews = await CountReviews(range, ct),
            PendingItems = await CountPending(ct),
            QueueDepthByPriority = await GetQueueDepth(ct),

            // Efficiency metrics
            AverageCycleTime = await CalculateAverageCycleTime(range, ct),
            SlaMet = await CalculateSlaCompliance(range, ct),
            ThroughputByReviewer = await GetThroughput(range, ct),

            // Quality metrics
            AgreementRate = await CalculateAgreementRate(range, ct),
            OverridesByReason = await GetOverrideReasons(range, ct),
            EscalationRate = await CalculateEscalationRate(range, ct),

            // Trends
            VolumeOverTime = await GetVolumeTrend(range, ct),
            AgreementOverTime = await GetAgreementTrend(range, ct)
        };
    }
}

HITL Design Template

# HITL Design: [System Name]

## 1. System Overview
- **AI Function**: [What the AI does]
- **Decision Impact**: [Low/Medium/High/Critical]
- **Volume**: [Expected decisions per day]

## 2. Routing Strategy

### Auto-Approve Criteria
- Confidence > [X]%
- Category in [list]
- Risk score < [threshold]

### Human Review Required
- Confidence < [X]%
- Category in [regulated list]
- First-time patterns
- [Other criteria]

## 3. Review Queue Design

### Prioritization
| Priority | SLA | Criteria |
|----------|-----|----------|
| Critical | 15 min | [Criteria] |
| High | 1 hour | [Criteria] |
| Normal | 4 hours | [Criteria] |

### Reviewer Assignment
- [Assignment strategy]
- Required capabilities: [List]

## 4. Review Interface
- Information displayed: [List]
- Available actions: [List]
- Keyboard shortcuts: [Enabled/Disabled]

## 5. Escalation Path
| Level | Role | Trigger |
|-------|------|---------|
| 1 | [Role] | [Trigger] |
| 2 | [Role] | [Trigger] |

## 6. Feedback Loop
- Training data collection: [Yes/No]
- Retraining trigger: [Criteria]
- Disagreement monitoring: [Threshold]

## 7. Metrics & Monitoring
- Dashboard: [Link]
- Alerting: [Thresholds]

Validation Checklist

Integration Points

Inputs from:

ai-safety-planning skill → Oversight requirements
explainability-planning skill → Review explanations
Regulatory requirements → Review mandates

Outputs to:

ml-project-lifecycle skill → Feedback for retraining
Application code → Queue implementation
Operations → Staffing requirements

Last Updated: 2025-12-27

hitl-design

Installation

Details

Usage

Skill Instructions

name: hitl-design description: Design human-in-the-loop workflows including review queues, escalation patterns, feedback loops, and quality assurance for AI systems. allowed-tools: Read, Write, Glob, Grep, Task

Human-in-the-Loop Design

When to Use This Skill

Overview

HITL Pattern Taxonomy

HITL Patterns

Pattern 1: Human-on-the-Loop (Monitoring)

Pattern 2: Human-in-the-Loop (Review)

Pattern 3: Human-First with AI Assist

Decision Routing

Confidence-Based Routing

Rule-Based Routing

Review Queue Design

Queue Architecture

Review Interface Design

Escalation Patterns

Escalation Workflow

Escalation Triggers

Feedback Loops

Learning from Human Decisions

Active Learning Integration

HITL Metrics

Key Performance Indicators

Dashboard Design

HITL Design Template

Validation Checklist

Integration Points

More by melodic-software