Quick Start
This guide will help you run CheckRef quickly with minimal setup. For detailed instructions, see the full installation guide.
Prerequisites
Ensure you have:
- Nextflow (≥21.04.0) installed
- Docker or Singularity installed
- Your VCF files and reference legend files ready
Try with Test Data
The fastest way to test CheckRef is with our sample data:
Step 1: Get CheckRef and Test Data
Option A: Clone Full Repository (includes test data, source code, docs):
# Clone the repository - test data comes with it!
git clone https://github.com/AfriGen-D/checkref.git
cd checkref
# Verify test data is present
ls -lh test_data/chr22/
ls -lh test_data/reference/Option B: Download Test Data Only (if you don't need the full repo):
# Create directories
mkdir -p test_data/chr22 test_data/reference
# Download test data files (~30KB total)
wget https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/chr22/chr22_sample.vcf.gz \
-P test_data/chr22/
wget https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/reference/chr22_sample.legend.gz \
-P test_data/reference/Alternative with curl:
curl -L https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/chr22/chr22_sample.vcf.gz \
--create-dirs -o test_data/chr22/chr22_sample.vcf.gz
curl -L https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/reference/chr22_sample.legend.gz \
--create-dirs -o test_data/reference/chr22_sample.legend.gzNote: Test data (~30KB) is small and quick to download!
Step 2: Run with Test Data
nextflow run main.nf \
--targetVcfs "test_data/chr22/*.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--fixMethod remove \
--outdir test_results \
-profile dockerExpected runtime: ~2-5 minutes
Step 3: Check Results
ls test_results/
# You should see:
# - chr22_allele_switch_results.tsv
# - chr22_allele_switch_summary.txt
# - chr22.noswitch.vcf.gz
# - all_chromosomes_summary.txtRunning with Your Data
Step 1: Prepare Your Data
Organize your files:
/path/to/data/
├── vcf_files/
│ ├── sample_chr1.vcf.gz
│ ├── sample_chr2.vcf.gz
│ └── ...
└── reference_legends/
├── ref_chr1.legend.gz
├── ref_chr2.legend.gz
└── ...Step 2: Run the Pipeline
Remove switched sites (default):
nextflow run AfriGen-D/checkref \
--targetVcfs "/path/to/vcf_files/*.vcf.gz" \
--referenceDir "/path/to/reference_legends/" \
--fixMethod remove \
--outdir results \
-profile dockerCorrect switched sites:
nextflow run AfriGen-D/checkref \
--targetVcfs "/path/to/vcf_files/*.vcf.gz" \
--referenceDir "/path/to/reference_legends/" \
--fixMethod correct \
--outdir results \
-profile dockerStep 3: Check Results
Results will be in the results/ directory:
results/
├── allele_switch_results/ # TSV files with detected switches
├── summary_files/ # Summary reports
├── fixed_vcfs/ # Corrected/cleaned VCF files
└── logs/ # Validation and verification logsExample Commands
Single Chromosome (Test Data)
nextflow run AfriGen-D/checkref \
--targetVcfs "test_data/chr22/chr22_sample.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--outdir chr22_results \
-profile dockerYour Own Data - Single Chromosome
nextflow run AfriGen-D/checkref \
--targetVcfs "your_chr22.vcf.gz" \
--referenceDir "/path/to/reference/legends/" \
--outdir chr22_results \
-profile dockerMultiple Files (Comma-Separated)
nextflow run AfriGen-D/checkref \
--targetVcfs "chr1.vcf.gz,chr2.vcf.gz,chr3.vcf.gz" \
--referenceDir "/path/to/reference/legends/" \
--outdir results \
-profile dockerGlob Pattern (Multiple Chromosomes)
nextflow run AfriGen-D/checkref \
--targetVcfs "/path/to/vcfs/sample_chr*.vcf.gz" \
--referenceDir "/path/to/reference/legends/" \
--outdir results \
-profile dockerUsing Singularity (HPC)
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/path/to/reference/" \
--outdir results \
-profile singularityResume Failed Run
If the pipeline fails or is interrupted, resume from where it left off:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/reference/" \
--outdir results \
-profile docker \
-resumeUnderstanding Output
Allele Switch Results
The *_allele_switch_results.tsv files contain detected switches:
| Column | Description |
|---|---|
| CHR | Chromosome |
| POS | Position |
| ALLELE_INFO | VCF alleles vs Reference alleles |
Example:
CHR POS ALLELE_INFO
chr1 100000 A>G|G>A
chr1 150000 C>T|T>CSummary Files
The all_chromosomes_summary.txt provides aggregated statistics:
- Total variants processed
- Common variants between VCF and reference
- Number of matched variants
- Number of switched alleles
- Overlap percentages
Fixed VCFs
Depending on --fixMethod:
remove mode:
- Produces
*.noswitch.vcf.gzfiles - Problematic sites removed
- File size smaller than original
correct mode:
- Produces
*.corrected.vcf.gzfiles - REF↔ALT alleles swapped
- Sites marked with
SWITCHED=1in INFO field - Same number of variants as original
Next Steps
- Configuration - Customize pipeline settings
- Input Files - Detailed input requirements
- Output Files - Complete output description
- Parameters - All available parameters
- Examples - More advanced usage examples
Common Issues
No VCF files found
Error: No reference legend files found
Solution: Check that:
- Path to VCF files is correct
- Files match the glob pattern
- Use quotes around patterns:
"*.vcf.gz"
Chromosome mismatch
Error: No matching legend file found for chromosome
Solution: Ensure chromosome naming is consistent:
- VCF:
chr1.vcf.gz→ Legend:chr1.legend.gz - Or VCF:
1.vcf.gz→ Legend:1.legend.gz
Build mismatch detected
Message: Genome build mismatch detected
Solution: Ensure VCF and legend files use the same genome build (both hg19 or both hg38). Use liftOver to convert if needed.
For more troubleshooting, see the Troubleshooting Guide.
