Configuration
CheckRef can be configured using parameters, profiles, and configuration files. This guide covers all configuration options.
Quick Start with Test Data
The simplest way to run CheckRef is with the included test data:
nextflow run AfriGen-D/checkref \
--targetVcfs "test_data/chr22/*.vcf.gz" \
--referenceDir "test_data/reference/" \
--legendPattern "*.legend.gz" \
--fixMethod remove \
--outdir test_results \
-profile dockerGet test data:
# Option 1: Clone repository
git clone https://github.com/AfriGen-D/checkref.git
# Option 2: Download only test data
mkdir -p test_data/{chr22,reference}
wget https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/chr22/chr22_sample.vcf.gz -P test_data/chr22/
wget https://raw.githubusercontent.com/AfriGen-D/checkref/main/test_data/reference/chr22_sample.legend.gz -P test_data/reference/Configuration Methods
1. Command-Line Parameters
Pass parameters directly when running the pipeline:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
--fixMethod correct \
--outdir results2. Configuration Profiles
Use predefined profiles with -profile:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile docker,test3. Custom Config Files
Create a custom configuration file:
Example with test data (test_data.config):
// test_data.config - Quick test configuration
params {
targetVcfs = "test_data/chr22/*.vcf.gz"
referenceDir = "test_data/reference/"
legendPattern = "*.legend.gz"
fixMethod = "remove"
outdir = "test_results"
}
process {
cpus = 2
memory = 4.GB
}Run with test data config:
nextflow run AfriGen-D/checkref -c test_data.config -profile dockerExample with your own data (my_config.config):
// my_config.config
params {
targetVcfs = "/path/to/vcfs/*.vcf.gz"
referenceDir = "/path/to/reference/"
fixMethod = "correct"
outdir = "my_results"
}
process {
cpus = 2
memory = 8.GB
}Run with custom config:
nextflow run AfriGen-D/checkref -c my_config.config -profile dockerAvailable Profiles
Container Profiles
docker - Run with Docker containers:
-profile dockersingularity - Run with Singularity containers:
-profile singularitypodman - Run with Podman containers:
-profile podmanTest Profile
test - Run with built-in test data:
-profile test,dockerThis profile includes small test datasets for quick validation.
HPC Profile
hpc - Optimized for SLURM clusters:
// Example HPC configuration
-profile hpc
// Customize for your cluster
process {
executor = 'slurm'
queue = 'normal'
clusterOptions = '--account=your_project'
}Combining Profiles
Multiple profiles can be combined:
-profile singularity,hpcParameters Reference
Required Parameters
| Parameter | Type | Description |
|---|---|---|
--targetVcfs | string | Target VCF files (glob pattern, comma-separated, or single file) |
--referenceDir | string | Directory containing reference legend files |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--outdir | string | ./results | Output directory |
--fixMethod | string | remove | Method to fix switches: 'remove' or 'correct' |
--legendPattern | string | *.legend.gz | Pattern to match legend files |
Resource Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--maxCpus | integer | 4 | Maximum CPUs per process |
--maxMemory | string | 8.GB | Maximum memory per process |
--maxTime | string | 24.h | Maximum time per process |
Process-Specific Configuration
Customize Resource Allocation
Override resources for specific processes:
// custom_resources.config
process {
withName: CHECK_ALLELE_SWITCH {
cpus = 2
memory = 8.GB
time = 6.h
}
withName: CORRECT_SWITCHED_SITES {
cpus = 1
memory = 4.GB
time = 2.h
}
}Usage:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-c custom_resources.config \
-profile dockerProcess List
Available process names for customization:
VALIDATE_VCF_FILESCHECK_ALLELE_SWITCHREMOVE_SWITCHED_SITESCORRECT_SWITCHED_SITESVERIFY_CORRECTIONSCREATE_SUMMARY
HPC Configuration
SLURM Example
// slurm.config
process {
executor = 'slurm'
queue = 'batch'
clusterOptions = '--account=genomics_project --partition=long'
cpus = 1
memory = 4.GB
time = 4.h
withName: CHECK_ALLELE_SWITCH {
cpus = 2
memory = 8.GB
time = 8.h
queue = 'highmem'
}
}
executor {
queueSize = 100
submitRateLimit = '10 sec'
}Usage:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-c slurm.config \
-profile singularityPBS/Torque Example
// pbs.config
process {
executor = 'pbs'
queue = 'batch'
clusterOptions = '-l walltime=24:00:00 -A genomics'
cpus = 1
memory = '4GB'
}LSF Example
// lsf.config
process {
executor = 'lsf'
queue = 'normal'
clusterOptions = '-P genomics'
cpus = 1
memory = 4.GB
}Container Configuration
Docker Configuration
docker {
enabled = true
runOptions = '-u $(id -u):$(id -g)'
}Singularity Configuration
singularity {
enabled = true
autoMounts = true
cacheDir = '/path/to/singularity/cache'
}Custom Container
Override the default container:
process {
container = 'your_username/custom-vcf-tools:latest'
}Output Directory Configuration
Custom Output Subdirectories
params {
// Main output directory
outdir = 'my_results'
// Subdirectories
allele_switch_results = "${params.outdir}/switches"
summary_files = "${params.outdir}/summaries"
fixed_vcfs = "${params.outdir}/corrected_vcfs"
logs = "${params.outdir}/pipeline_logs"
}Advanced Configuration
Retry Strategy
Configure automatic retries on failure:
process {
errorStrategy = 'retry'
maxRetries = 3
withName: CHECK_ALLELE_SWITCH {
errorStrategy = { task.attempt < 3 ? 'retry' : 'finish' }
memory = { 4.GB * task.attempt }
}
}Caching and Resume
Enable work directory caching:
// Specify work directory
workDir = '/scratch/nextflow/work'
// Enable resume by default
resume = trueUsage:
# Automatically resumes from last successful step
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-resumeExecution Reports
Configure detailed execution reports:
report {
enabled = true
file = "${params.outdir}/reports/execution_report.html"
}
timeline {
enabled = true
file = "${params.outdir}/reports/timeline.html"
}
dag {
enabled = true
file = "${params.outdir}/reports/dag.html"
}
trace {
enabled = true
file = "${params.outdir}/reports/trace.txt"
}Environment-Specific Configs
Local Workstation
// local.config
process {
executor = 'local'
cpus = 2
memory = 8.GB
}
docker.enabled = trueCloud (AWS)
// aws.config
process {
executor = 'awsbatch'
queue = 'genomics-queue'
container = 'mamana/vcf-processing:latest'
}
aws {
region = 'us-east-1'
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
}Cloud (Google Cloud)
// gcp.config
process {
executor = 'google-lifesciences'
container = 'mamana/vcf-processing:latest'
}
google {
region = 'us-central1'
project = 'your-project-id'
}Configuration Best Practices
- Use profiles for different environments (local, HPC, cloud)
- Version control your custom config files
- Document any custom settings
- Test configurations with small datasets first
- Monitor resource usage and adjust as needed
Example Complete Configuration
// complete_config.config
// Parameters
params {
targetVcfs = "/data/vcfs/*.vcf.gz"
referenceDir = "/data/reference/"
fixMethod = "correct"
outdir = "/results/checkref_output"
legendPattern = "*.legend.gz"
}
// Process configuration
process {
executor = 'slurm'
queue = 'batch'
cpus = 1
memory = 4.GB
time = 4.h
errorStrategy = 'retry'
maxRetries = 2
withName: CHECK_ALLELE_SWITCH {
cpus = 2
memory = 8.GB
time = 8.h
}
}
// Container
singularity {
enabled = true
autoMounts = true
cacheDir = '/scratch/singularity'
}
// Reports
report.enabled = true
timeline.enabled = true
trace.enabled = true