Biathlon Big-Break Section Explanation

Overview

The big-break section is the application layer of the breakthrough prediction system in biathlon analysis. This section takes the trained models from feat-select-break and applies them to predict 2026 breakthrough candidates, generate historical comparisons, and export results to Excel files. It represents the culmination of the breakthrough analysis pipeline.

Main Functions and Components

1. Core Function: predict_2026_breakthroughs()

This is the central prediction function that applies trained models to identify 2026 breakthrough candidates.

Input Validation and Setup

if (!is.data.frame(current_data)) stop("current_data is not a data frame")
if (nrow(current_data) == 0) stop("current_data is empty")
if (is.null(breakthrough_model)) stop("breakthrough_model is NULL")
if (is.null(top_predictors) || length(top_predictors) == 0) stop("No top_predictors provided")

Biathlon-Specific Predictor Mapping

predictor_mapping <- c(
  "Prev_Pelo" = "Pelo",
  "Prev_Individual" = "Individual_Pelo", 
  "Prev_Sprint" = "Sprint_Pelo",
  "Prev_Pursuit" = "Pursuit_Pelo",
  "Prev_MassStart" = "MassStart_Pelo",
  "Prev_Pct_of_Max_Points" = "Pct_of_Max_Points",
  "Age" = "Age"
)

2. Career History Analysis

2025 Season Data Extraction

  • Focuses on 2025 as the most recent complete season
  • Takes most recent entry for each skier to avoid duplicates
  • Validates data availability and reports skier counts

Career Maximum Calculation

career_maximums <- current_data %>%
  filter(!is.na(Pct_of_Max_Points)) %>%
  group_by(Skier) %>%
  summarise(
    Career_Max_Pct = max(Pct_of_Max_Points, na.rm = TRUE),
    .groups = "drop"
  )

Breakthrough Candidate Identification

Inclusion Criteria:

  • Age ≥ 16 (junior age minimum)
  • Career_Max_Pct < 0.4 (never achieved 40% breakthrough - biathlon specific)
  • Pct_of_Max_Points > 0.01 (has competitive results in 2025)
  • Complete data for key variables

Notable Features:

  • No upper age limit: Recognizes breakthrough can occur at any career stage
  • 40% threshold: Biathlon-specific breakthrough definition
  • Career-based filtering: Excludes athletes who already achieved breakthrough

3. Debug Functionality

Athlete-Specific Debugging

if ("Campbell Wright" %in% prediction_data$Skier) {
  wright_idx <- which(prediction_data$Skier == "Campbell Wright")
  cat("\n=== DEBUG: Campbell Wright Breakthrough Model Input ===\n")
  # Display detailed input data for verification
}

if ("Jeanne Richard" %in% prediction_data$Skier) {
  # Similar debugging for ladies' representative athlete
}

Comprehensive Data Quality Checks

  • Missing value analysis for each predictor
  • Range validation and extreme value detection
  • Factor level compatibility with training data
  • Infinite value detection and replacement

4. Model Application and Prediction

Feature Mapping Process

for (prev_feature in names(feature_mapping)) {
  current_feature <- feature_mapping[[prev_feature]]
  if (prev_feature %in% top_predictors && current_feature %in% names(prediction_data)) {
    prediction_data[[prev_feature]] <- prediction_data[[current_feature]]
    cat(sprintf("✓ Mapped %s -> %s\n", current_feature, prev_feature))
  }
}

Prediction Generation

breakthrough_probs <- tryCatch({
  predict(breakthrough_model, newdata = prediction_clean, type = "prob")
}, error = function(e) {
  # Fallback with na.action = na.pass
  predict(breakthrough_model, newdata = prediction_clean, type = "prob", na.action = na.pass)
})

5. Results Processing and Classification

Probability-Based Classification

Likelihood = case_when(
  is.na(Breakthrough_Prob) ~ "Unknown",
  Breakthrough_Prob >= 0.6 ~ "Very High",
  Breakthrough_Prob >= 0.4 ~ "High", 
  Breakthrough_Prob >= 0.2 ~ "Moderate",
  Breakthrough_Prob >= 0.1 ~ "Low",
  TRUE ~ "Very Low"
)

Performance Metrics

  • Points to Threshold: pmax(0, 0.4 - Pct_of_Max_Points, na.rm = TRUE)
  • Age-based filtering: Under-25 subset for young prospects
  • Probability distributions: Summary statistics and likelihood categories

6. Excel Export System

Breakthrough Candidates Files

# Men's breakthrough candidates
men_file <- file.path(output_dir, "mens_breakthrough_candidates_2026.xlsx")
write.xlsx(men_breakthrough_workbook, men_file, rowNames = FALSE)

# Ladies breakthrough candidates  
ladies_file <- file.path(output_dir, "ladies_breakthrough_candidates_2026.xlsx")
write.xlsx(ladies_breakthrough_workbook, ladies_file, rowNames = FALSE)

Historical Comparison Files

# Comparative analysis with historical breakthroughs
comparative_men_file <- file.path(output_dir, "mens_breakthrough_comparison_historical_vs_2026.xlsx")
comparative_ladies_file <- file.path(output_dir, "ladies_breakthrough_comparison_historical_vs_2026.xlsx")

7. Historical Breakthrough Comparison

Function: predict_historical_breakthrough()

  • Applies 2026 models to historical breakthrough cases
  • Maps current performance to “previous” variables for model compatibility
  • Handles missing values with median imputation
  • Generates “what would the model have predicted” probabilities

Comparative Analysis Structure

Excel Output Columns:

  • Name, Nation, Age
  • Pre-Breakthrough Pct (performance before breakthrough)
  • Breakthrough Result (actual breakthrough performance)
  • Predicted Prob (what model predicted)
  • Season, Type (“Historical Success” vs “2026 Prediction”)

8. Comprehensive Result Validation

Multi-Level Validation

  1. Input validation: Model existence, predictor availability
  2. Data quality: Missing values, infinite values, extreme outliers
  3. Model compatibility: Factor levels, data structure alignment
  4. Output validation: Probability ranges, result completeness
  5. Export validation: File creation, data integrity

Error Handling Strategy

tryCatch({
  # Main operation
}, error = function(e) {
  cat("Detailed error context:", e$message, "\n")
  # Fallback procedures or graceful failure
})

Key Technical Features

Sport-Specific Adaptations

Biathlon Breakthrough Definition

  • 40% threshold: Represents significant competitive achievement
  • Career-based exclusion: Athletes who already achieved breakthrough
  • No age restrictions: Recognizes breakthrough can occur at various career stages

ELO Rating Integration

  • Maps discipline-specific ELO ratings (Individual, Sprint, Pursuit)
  • Handles MassStart exclusions due to data quality issues
  • Maintains consistency with training data preparation

Advanced Data Processing

Missing Value Strategy

  • Uses quartile imputation to match training preparation
  • Preserves data distribution characteristics
  • Reports imputation statistics for transparency

Feature Engineering

  • Dynamic mapping between training and prediction features
  • Handles temporal data structure (Prev_ → current mapping)
  • Validates predictor availability and compatibility

Integration with Analysis Pipeline

Upstream Dependencies

  • Models: From feat-select-break section (logistic regression, random forest)
  • Predictors: Top-ranked features from importance analysis
  • Thresholds: Sport-specific breakthrough definitions
  • Data: Cleaned training datasets with career histories

Downstream Outputs

  • Excel files: Formatted for analysis and decision-making
  • Probability scores: Quantitative breakthrough likelihood
  • Historical validation: Model reliability assessment
  • Debug information: Comprehensive data lineage

Expected Outputs and Results

Console Reporting

  • Candidate identification statistics
  • Age distribution analysis
  • Probability distribution summaries
  • High-potential candidate identification
  • Export confirmation and file locations

Excel Files Generated

  1. mens_breakthrough_candidates_2026.xlsx
  2. ladies_breakthrough_candidates_2026.xlsx
  3. mens_breakthrough_comparison_historical_vs_2026.xlsx
  4. ladies_breakthrough_comparison_historical_vs_2026.xlsx
  5. Debug files: Comprehensive predictor and prediction data

Data Objects Created

  • breakthrough_predictions_men: Complete men’s prediction results
  • breakthrough_predictions_ladies: Complete ladies’ prediction results
  • Age-stratified subsets (under-25 prospects)
  • Summary statistics and performance metrics

Analytical Insights

  • Quantitative Rankings: Probability-based candidate prioritization
  • Historical Validation: Model performance on past breakthrough cases
  • Age Patterns: Young prospect identification and analysis
  • Performance Gaps: Points needed to reach breakthrough threshold

Quality Assurance Features

Comprehensive Debugging

  • Athlete-specific tracking: Campbell Wright, Jeanne Richard verification
  • Data lineage: Input → processing → output validation
  • Statistical summaries: Distribution analysis and outlier detection
  • Model diagnostics: Prediction quality and reliability assessment

Error Recovery

  • Graceful degradation: Continues analysis when components fail
  • Fallback procedures: Alternative approaches for edge cases
  • Detailed logging: Error context and troubleshooting information
  • Data validation: Multi-stage quality checks throughout pipeline

This section represents the practical application of the breakthrough prediction system, transforming statistical models into actionable insights for identifying future biathlon stars and validating the approach against historical breakthrough patterns.