Contributor Tools Reference¶

Complete documentation for all tools available to dataset contributors.

Overview¶

The project provides several specialized tools to help convert, validate, and tune biomechanical datasets:

Tool	Purpose	Location
conversion_generate_phase_dataset.py	Convert time-indexed to phase-indexed data	Root directory
create_dataset_validation_report.py	Generate validation reports and plots	`contributor_tools/`
interactive_validation_tuner.py	GUI for tuning validation ranges	`contributor_tools/`

Phase Conversion Tool¶

conversion_generate_phase_dataset.py¶

Purpose: Converts time-indexed biomechanical data to phase-indexed format with exactly 150 points per gait cycle.

How it works: 1. Detects gait cycles using heel strike events or periodic patterns 2. Normalizes each cycle to 150 evenly-spaced points 3. Creates phase_percent column (0-100% of gait cycle) 4. Preserves all biomechanical variables during resampling

Usage:

# Basic usage
python conversion_generate_phase_dataset.py converted_datasets/your_dataset_time.parquet

# With custom output name
python conversion_generate_phase_dataset.py input_time.parquet --output output_phase.parquet

# Specify gait event detection method
python conversion_generate_phase_dataset.py input_time.parquet --method heel_strike

Parameters: - input_file: Path to time-indexed parquet file - --output: (Optional) Custom output filename (default: replaces _time with _phase) - --method: Cycle detection method (heel_strike, periodic, auto) - --min_cycle_points: Minimum points per cycle for detection (default: 50) - --max_cycle_points: Maximum points per cycle for detection (default: 200)

Output: - Creates your_dataset_phase.parquet with: - Exactly 150 rows per gait cycle - phase_percent column: 0, 0.67, 1.34, ..., 99.33, 100 - All original biomechanical variables preserved - cycle_id column for cycle identification

Example Output Structure:

subject_id | task          | cycle_id | phase_percent | knee_flexion_angle_ipsi_rad | ...
SUB01      | level_walking | 0        | 0.00          | 0.123                        | ...
SUB01      | level_walking | 0        | 0.67          | 0.125                        | ...
...        | ...           | ...      | ...           | ...                          | ...
SUB01      | level_walking | 0        | 100.00        | 0.122                        | ...

Validation Report Generator¶

create_dataset_validation_report.py¶

Purpose: Generates comprehensive validation reports showing how well your dataset conforms to biomechanical standards.

How validation works: - Phase-based checking: Validates at specific gait cycle phases (default: 0%, 25%, 50%, 75%) - Box intersection: Checks if values fall within min/max ranges at each phase - Visual feedback: Green = passing all checks, Red = failing specific variables - No rigid pass/fail: Focuses on identifying potential issues for review

Usage:

# Basic validation with default ranges
python contributor_tools/create_dataset_validation_report.py \
    --dataset converted_datasets/your_dataset_phase.parquet

# Use custom validation ranges
python contributor_tools/create_dataset_validation_report.py \
    --dataset your_dataset_phase.parquet \
    --ranges-file contributor_tools/validation_ranges/custom_ranges.yaml

# Generate validation plots
python contributor_tools/create_dataset_validation_report.py \
    --dataset your_dataset_phase.parquet \
    --generate-plots \
    --output-dir validation_results/

Parameters: - --dataset: Path to phase-indexed parquet file to validate - --ranges-file: Custom validation ranges YAML (default: default_ranges.yaml) - --generate-plots: Create visual validation plots - --output-dir: Directory for reports and plots (default: validation_reports/) - --tasks: Specific tasks to validate (default: all) - --verbose: Show detailed validation progress

Output Files: 1. Markdown Report (dataset_validation_report.md): - Summary statistics - Detailed violation listings - Suggestions for common issues

Validation Plots (if --generate-plots):
One plot per task showing all variables
Green backgrounds: passing ranges
Red highlights: failing strides
Separate plots for kinematics, kinetics, segments

Understanding the Validation Process:

The validation uses a simple box-checking approach at key phases:

Phase 0%:   [min_0 ─────────────── max_0]
            └─ Check if value at phase 0 is within this range

Phase 25%:  [min_25 ────────────── max_25]
            └─ Check if value at phase 25 is within this range

Phase 50%:  [min_50 ────────────── max_50]
            └─ Check if value at phase 50 is within this range

Phase 75%:  [min_75 ────────────── max_75]
            └─ Check if value at phase 75 is within this range

Each variable at each phase has its own acceptable range based on biomechanical norms.

Interactive Validation Tuner¶

interactive_validation_tuner.py¶

Purpose: GUI tool for visually tuning validation ranges by comparing passing and failing strides.

Key Features: - Side-by-side visualization: See passing vs failing strides - Draggable validation boxes: Adjust ranges with mouse - Real-time feedback: Instantly see effects of changes - Multiple views: Show globally passing, locally passing, or all strides - Unit conversion: Toggle between radians and degrees - YAML export: Save tuned ranges for use in validation

Usage:

# Launch GUI with dataset
python contributor_tools/interactive_validation_tuner.py

# The GUI will prompt for:
# 1. Dataset file (your_dataset_phase.parquet)
# 2. Validation ranges file (default_ranges.yaml or custom)
# 3. Task and variable to tune

Interface Controls:

File Selection:
Browse and select dataset file
Browse and select validation ranges
Auto-loads on startup if files exist
Task/Variable Selection:
Dropdown for available tasks
Dropdown for variables in selected task
Updates plot when selection changes
Visualization Options:
Show Locally Passing: Yellow lines for strides passing current variable only
Show in Degrees: Convert angular measurements for easier interpretation
Pass/Fail columns: Drag boxes to adjust validation ranges
Saving Changes:
Save Ranges: Exports tuned ranges to new YAML file
Preserves all other variables unchanged
Creates timestamped backup of original

How to Use for Special Populations:

Load your special population dataset
Start with default healthy ranges
For each variable showing many failures:
Visually inspect if failures are legitimate variations
Drag boxes to encompass normal variation
Save as population_specific_ranges.yaml

Tips for Effective Tuning: - Start with kinematic variables (angles) - they're most intuitive - Use "Show Locally Passing" to identify systematic shifts - Toggle degrees view for angular variables - Adjust one phase at a time - Save incrementally with descriptive names

Creating Custom Validation Ranges¶

For Different Populations¶

Instead of forcing all data into healthy adult ranges, create population-specific validation:

Option 1: Different Tasks for Different Populations

# In your conversion script, encode population in task name
tasks:
  level_walking_elderly:     # Elderly-specific ranges
    phases:
      '0':
        knee_flexion_angle_ipsi_rad:
          min: 0.1
          max: 0.8  # Reduced ROM expected

  level_walking_prosthetic:  # Prosthetic-specific ranges
    phases:
      '0':
        knee_flexion_angle_ipsi_rad:
          min: 0.0  # May have limited flexion
          max: 0.6

Option 2: Separate Validation Files

# Structure your validation ranges by population
contributor_tools/validation_ranges/
├── default_ranges.yaml           # Healthy adults
├── elderly_ranges.yaml           # Elderly population
├── prosthetic_ranges.yaml        # Prosthetic users
├── pediatric_ranges.yaml         # Children
└── pathological_ranges.yaml      # Clinical populations

Option 3: Generate from Your Data

# Let the data define its own normal ranges
python contributor_tools/automated_fine_tuning.py \
    --dataset special_population_phase.parquet \
    --method percentile_95 \
    --output special_population_ranges.yaml

Understanding Phase-Based Validation¶

The validation system checks values at specific phases rather than the entire curve:

Why Phase-Based? - Biomechanics vary throughout gait cycle - Peak knee flexion at ~75% shouldn't apply at 0% - Allows phase-specific acceptable ranges

Default Check Points: - 0%: Heel strike (initial contact) - 25%: Mid-stance - 50%: Toe-off (start of swing) - 75%: Mid-swing (peak knee flexion)

Custom Phase Points: You can validate at any phase percentage:

tasks:
  detailed_analysis:
    phases:
      '0':    # Heel strike
        knee_flexion_angle_ipsi_rad:
          min: -0.1
          max: 0.3
      '15':   # Loading response
        knee_flexion_angle_ipsi_rad:
          min: 0.2
          max: 0.5
      '33':   # Mid-stance
        knee_flexion_angle_ipsi_rad:
          min: -0.1
          max: 0.2
      '60':   # Initial swing
        knee_flexion_angle_ipsi_rad:
          min: 0.3
          max: 0.8
      '73':   # Peak flexion
        knee_flexion_angle_ipsi_rad:
          min: 0.9
          max: 1.4

Troubleshooting Common Issues¶

Phase Conversion Issues¶

Problem: "Cannot detect gait cycles"

# Try different detection methods
python conversion_generate_phase_dataset.py data.parquet --method periodic

# Adjust cycle detection parameters
python conversion_generate_phase_dataset.py data.parquet \
    --min_cycle_points 80 \
    --max_cycle_points 150

Problem: "Irregular cycle lengths" - Check if data includes partial cycles at start/end - Verify consistent sampling rate - Consider manual cycle marking in your conversion script

Validation Issues¶

Problem: "All strides failing at specific phase" - Check for systematic offset in your data - Verify unit conversions (degrees vs radians) - Consider population-specific ranges

Problem: "Variable not found in validation ranges" - Ensure variable names match exactly - Check if variable is in the appropriate category (kinematic/kinetic) - Add custom ranges for non-standard variables

Interactive Tuner Issues¶

Problem: "GUI not responding to drags" - Click on the box edge to start dragging - Ensure dataset is loaded before adjusting - Try restarting the tool

Problem: "Changes not saving" - Use "Save Ranges" button before closing - Check file permissions in output directory - Verify YAML syntax if manually edited

Best Practices¶

When Converting Data¶

Start with a single subject to test your pipeline
Validate incrementally - don't wait until the end
Document any assumptions or special processing
Keep your original data unchanged

When Validating¶

Use visualization to understand failures before adjusting ranges
Don't over-fit ranges to force validation passing
Document why you need custom ranges
Consider biological plausibility of your data

When Tuning Ranges¶

Start with the most common task (usually level_walking)
Adjust ranges based on multiple subjects, not outliers
Be more conservative with kinetic variables
Save different range sets for different populations

Getting Help¶

Examples: See contributor_tools/conversion_scripts/ for working implementations
Issues: Open a GitHub issue with your validation report
Discussion: Use GitHub Discussions for general questions
Direct Support: Contact maintainers for complex cases

These tools are designed to make dataset contribution as smooth as possible while maintaining high data quality standards.