Examples
Practical examples of using CheckRef in different scenarios.
Test with Sample Data
Start with our included test data to verify your installation:
bash
# Clone the repository
git clone https://github.com/AfriGen-D/checkref.git
cd checkref
# Run with test data (chr22 sample)
nextflow run main.nf \
--targetVcfs "test_data/chr22/*.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--fixMethod remove \
--outdir test_results \
-profile dockerExpected Results:
- Runtime: ~2-5 minutes
- Output: Allele switch results, summary, and cleaned VCF
- Use this to verify CheckRef works before using your own data
Basic Examples
Single Chromosome (With Test Data)
Process a single chromosome using the included test data:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "test_data/chr22/chr22_sample.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--outdir chr22_results \
-profile dockerSingle Chromosome (Your Data)
Process your own chromosome data:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "your_sample_chr22.vcf.gz" \
--referenceDir "/path/to/reference/legends/" \
--outdir chr22_results \
-profile dockerMultiple Chromosomes (Glob Pattern)
Process all chromosomes at once:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "/path/to/vcfs/sample_chr*.vcf.gz" \
--referenceDir "/path/to/reference/" \
--outdir results \
-profile dockerWhole Genome
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "/path/to/vcfs/*.vcf.gz" \
--referenceDir "/path/to/reference/" \
--fixMethod correct \
--outdir whole_genome_results \
-profile dockerFix Method Comparison
Remove Switched Sites (Default)
Test with sample data:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "test_data/chr22/*.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--fixMethod remove \
--outdir results_remove \
-profile dockerOutput: chr22.noswitch.vcf.gz (smaller file, problematic sites excluded)
Correct Switched Sites
Test with sample data:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "test_data/chr22/*.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--fixMethod correct \
--outdir results_correct \
-profile dockerOutput: chr22.corrected.vcf.gz (same size, alleles fixed, marked with SWITCHED=1)
HPC Examples
SLURM Cluster
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "\$SCRATCH/vcfs/*.vcf.gz" \
--referenceDir "\$SCRATCH/reference/" \
--outdir "\$SCRATCH/results" \
-profile singularity \
-c slurm.config \
-resumeslurm.config:
groovy
process {
executor = 'slurm'
queue = 'batch'
clusterOptions = '--account=genomics'
memory = 8.GB
time = 8.h
}PBS Cluster
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/project/reference/" \
-profile singularity \
-c pbs.configAdvanced Examples
Custom Legend Pattern
If your legend files have a different naming:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
--legendPattern "1000G_phase3_*.legend.gz" \
-profile dockerCustom Output Structure
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
--outdir "/results/project_$(date +%Y%m%d)" \
-profile dockerHigh-Memory Configuration
For large VCF files:
bash
nextflow run AfriGen-D/checkref \
--targetVcfs "large_*.vcf.gz" \
--referenceDir "/ref/" \
--maxMemory "32.GB" \
--maxTime "24.h" \
-profile dockerIntegration Examples
Pre-Imputation QC
Use CheckRef before imputation:
bash
# Step 1: Check allele switches
nextflow run AfriGen-D/checkref \
--targetVcfs "target_chr*.vcf.gz" \
--referenceDir "/1000G/" \
--fixMethod correct \
--outdir qc_results \
-profile docker
# Step 2: Use corrected VCFs for imputation
impute2 \
-m /1000G/genetic_map.txt \
-g qc_results/fixed_vcfs/target_chr22.corrected.vcf.gz \
-int 20.0e6 20.5e6 \
-o imputed_chr22.genData Harmonization
Harmonize data across multiple cohorts:
bash
# Cohort 1
nextflow run AfriGen-D/checkref \
--targetVcfs "/cohort1/*.vcf.gz" \
--referenceDir "/unified_reference/" \
--outdir cohort1_harmonized \
-profile docker
# Cohort 2
nextflow run AfriGen-D/checkref \
--targetVcfs "/cohort2/*.vcf.gz" \
--referenceDir "/unified_reference/" \
--outdir cohort2_harmonized \
-profile docker
# Now both cohorts are harmonized to the same referenceNext Steps
- Guide - Full documentation
- Parameters - All available parameters
- Troubleshooting - Common issues
