For Contributors - Dataset Conversion¶
Transform your lab's biomechanical data into standardized formats that accelerate reproducible research¶
Join the standardization movement. Convert your datasets to unlock validated analysis tools, increase research impact, and contribute to the growing ecosystem of interoperable biomechanical data.
The Standardization Challenge¶
Every lab collects data differently. This fragmentation prevents:
✓ Meta-analyses across studies
✓ Validation across labs
✓ Machine learning at scale
✓ Clinical translation
Your contribution helps solve this fundamental challenge in biomechanics research.
Join the Data Ecosystem¶
When you standardize your dataset:
✓ Join a growing validated collection
✓ Enable direct reproducibility
✓ Contribute to population norms
✓ Build multi-study evidence
✓ Future-proof your research
Your data becomes part of something bigger.
Quality Standards¶
What makes a good contribution:
✓ Clear gait cycles identified
✓ Consistent data collection
✓ Documented protocols
✓ Sufficient sample size
Validation philosophy: Quality feedback, not pass/fail.
Dataset Conversion Workflow¶
Transform your data step-by-step. Most time goes to variable mapping and validation refinement. Each step provides clear feedback to ensure success.
Follow this flowchart to convert and validate your dataset:
flowchart TD
Start([Start: Have Biomechanical Data]) --> Step1[Step 1: Study Reference Dataset]
Step1 --> Step1Details[/"
• Review existing datasets (e.g., umich_2021_phase.parquet)
• Understand required columns
• Note: 150 points per cycle for phase data
"/]
Step1Details --> Step2[Step 2: Convert to Table Format]
Step2 --> Step2Details[/"
• Create conversion script (dataset-specific)
• Map variables to standard names
• Follow examples in contributor_tools/conversion_scripts/
"/]
Step2Details --> CheckPhase{Does your data
have phase indexing?}
CheckPhase -->|No| ConvertPhase[Convert Time to Phase]
ConvertPhase --> ConvertDetails[/"
Run: python conversion_generate_phase_dataset.py dataset_time.parquet
Converts to 150 points per gait cycle
"/]
ConvertDetails --> CheckRanges
CheckPhase -->|Yes| CheckRanges{Do validation ranges
exist for your population?}
CheckRanges -->|Yes, Standard Population| Step3[Step 3: Validate Dataset]
CheckRanges -->|No, or Special Population| CreateRanges[Create Validation Ranges]
CreateRanges --> RangeOptions[/"
Choose Method:
• Generate from your data (automated_fine_tuning.py)
• Copy and modify existing ranges
• Create manually for special needs
"/]
RangeOptions --> SaveRanges[/"
Save to: contributor_tools/validation_ranges/
"/]
SaveRanges --> Step3
Step3 --> ValidateCmd[/"
Run: python contributor_tools/create_dataset_validation_report.py \
--dataset your_dataset_phase.parquet
"/]
ValidateCmd --> CheckValid{Validation
Issues to Address?}
CheckValid -->|No Issues| Success([✓ Success!
Dataset Ready])
Success --> SuccessSteps[/"
• Add to converted_datasets/
• Update documentation
• Share with community
"/]
CheckValid -->|Has Issues| Review[Review Validation Report]
Review --> FixIssues[/"
• Check error messages
• Fix variable mapping
• Adjust data processing
• Consider custom ranges
"/]
FixIssues --> Step2
style Start fill:#e1f5e1
style Success fill:#c8e6c9
style CheckPhase fill:#fff3e0
style CheckRanges fill:#fff3e0
style CheckValid fill:#fff3e0
style CreateRanges fill:#e3f2fd
style Review fill:#ffebee
Quick Start Guide¶
Study Reference Dataset¶
Understand the expected structure:
import pandas as pd
# Load a reference dataset
reference = pd.read_parquet('converted_datasets/umich_2021_phase.parquet')
# Check structure
print(f"Shape: {reference.shape}")
print(f"Columns: {reference.columns.tolist()}")
print(f"Required: subject_id, task, phase_percent")
Convert Your Data¶
Create a conversion script:
✓ Standard variable names
✓ Required columns: subject_id
, task
, phase_percent
✓ Units: radians, Newtons, meters
Working examples:
Handle Phase Indexing¶
Convert to 150 points per cycle:
# Time to phase conversion tool
# Automatically detects gait cycles and resamples to 150 points
python conversion_generate_phase_dataset.py \
your_dataset_time.parquet
✓ Creates phase-indexed output
✓ Exactly 150 points per cycle
✓ Phase values: 0-100%
Validate Your Dataset¶
Run automated validation:
# Validation report generator
# Checks biomechanical ranges at key gait phases (0%, 25%, 50%, 75%)
python contributor_tools/create_dataset_validation_report.py \
--dataset your_dataset_phase.parquet
✓ Biomechanical consistency check
✓ Outlier identification
✓ Visual validation plots
✓ Improvement suggestions
Common Issues and Solutions¶
Variable Name Mismatch¶
Map to standard names:
variable_mapping = {
'KneeAngle_L': 'knee_flexion_angle_ipsi_rad',
'HipMoment_R': 'hip_moment_contra_Nm',
# Add all your mappings
}
Wrong Points Per Cycle¶
Use phase conversion:
# Converts any time-indexed data to standard 150 points/cycle
python conversion_generate_phase_dataset.py \
your_dataset_time.parquet
Automatically detects cycles and resamples. Learn more →
Validation Issues¶
Common problems:
✓ Out-of-range values at phases
✓ Missing required variables
✓ Wrong units (degrees vs radians)
✓ Special population differences
Use the interactive validation tuner for custom ranges.
Working Examples¶
MATLAB Dataset (Umich_2021)¶
From .mat files to parquet:
✓ Input: MATLAB .mat
files
✓ Converter: convert_umich_phase_to_parquet.m
✓ Output: umich_2021_phase.parquet
Python Dataset (Gtech_2023)¶
From multiple files to parquet:
✓ Input: Multiple subject files
✓ Converter: convert_gtech_all_to_parquet.py
✓ Output: gtech_2023_phase.parquet
Resources¶
Documentation & Guides¶
Technical References - Conversion Guide - Step-by-step instructions - Validation Reference - Understanding checks - Tools Reference - Detailed tool documentation - Standard Specification - Data format spec - Variable Naming - Names and units
Tools & Scripts - Conversion Scripts - Interactive Validation Tuner - Visual range adjustment - Phase Converter - Time to phase conversion
Getting Help¶
Support Options¶
Self-Service
1. Check existing examples in contributor_tools/conversion_scripts/
2. Review validation errors - they pinpoint exact issues
3. Use interactive tuner for visual feedback
Community Support - GitHub Issues - Report problems - Discussions - Ask questions - Contact Maintainers - Dataset-specific help
Next Steps¶
After Validation Success¶
Complete your contribution:
- Document - Add README with data source details
- Test - Verify with
LocomotionData
analysis tools - Submit - Create pull request with your dataset