Running the Pipeline
This guide covers running CheckRef in various scenarios and understanding the execution process.
Basic Execution
Standard Run
nextflow run AfriGen-D/checkref \
--targetVcfs "/data/vcfs/*.vcf.gz" \
--referenceDir "/data/reference/" \
--outdir results \
-profile dockerWith Custom Parameters
nextflow run AfriGen-D/checkref \
--targetVcfs "/data/vcfs/*.vcf.gz" \
--referenceDir "/data/reference/" \
--fixMethod correct \
--legendPattern "*.legend.txt.gz" \
--outdir results \
-profile dockerMonitoring Execution
Real-time Progress
Nextflow shows real-time progress:
N E X T F L O W ~ version 23.10.0
Launching `AfriGen-D/checkref` [silly_euler] DSL2 - revision: abc1234
executor > local (15)
[3a/f8b234] process > VALIDATE_VCF_FILES (chr1:validation) [100%] 22 of 22 ✔
[7b/2cd901] process > CHECK_ALLELE_SWITCH (chr1:sample) [ 95%] 21 of 22
[4c/5ef123] process > CORRECT_SWITCHED_SITES (chr1:sample) [ 90%] 20 of 22
[8d/9ab456] process > VERIFY_CORRECTIONS (chr1:verification) [ 85%] 19 of 22
[1e/6cd789] process > CREATE_SUMMARY [ 0%] 0 of 1Check Running Processes
# List all Nextflow processes
ps aux | grep nextflow
# Monitor resource usage
top -u $USERResume Functionality
Resume After Failure
If the pipeline stops or fails, resume from the last successful step:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile docker \
-resumeHow Resume Works
- Nextflow caches each process execution in the
work/directory - On resume, completed processes are skipped
- Only failed or new processes are re-executed
- Saves time and computational resources
Clean Start (No Resume)
To start fresh without using cache:
# Remove work directory
rm -rf work/
# Run without -resume
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile dockerExecution Modes
Local Execution
Run on your local machine:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile dockerHPC Execution (SLURM)
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile singularity,hpc \
-c slurm.configBackground Execution
Run in the background:
nextflow run AfriGen-D/checkref \
--targetVcfs "*.vcf.gz" \
--referenceDir "/ref/" \
-profile docker \
-bg > checkref.log 2>&1Check progress:
tail -f checkref.logOutput and Logging
Nextflow Log
Nextflow creates a .nextflow.log file in the launch directory:
# View log
less .nextflow.log
# Follow log in real-time
tail -f .nextflow.log
# Search for errors
grep ERROR .nextflow.logPipeline Output
Pipeline messages appear in two places:
- Standard output: Real-time progress
- Log files: Detailed process logs in
work/directories
Process-Specific Logs
Each process has its own log files:
# Find work directory for a process
ls -la work/*/*
# View process stdout
cat work/3a/f8b234*/file.command.out
# View process stderr
cat work/3a/f8b234*/file.command.err
# View command executed
cat work/3a/f8b234*/file.command.shExecution Reports
Generate Reports
Reports are generated automatically (configured in nextflow.config):
results/reports/
├── execution_report.html # Resource usage and timing
├── timeline_report.html # Timeline visualization
└── dag_report.html # Workflow diagramViewing Reports
Open in web browser:
firefox results/reports/execution_report.htmlTroubleshooting Execution
Pipeline Hangs
If the pipeline appears stuck:
Check if processes are running:
bashps aux | grep nextflowCheck system resources:
bashhtop # or topCheck cluster queue (if using HPC):
bashsqueue -u $USER
Process Failures
When a process fails:
Locate the work directory from error message
Check error logs:
bashcat work/xx/xxxxxx/.command.err cat work/xx/xxxxxx/.command.logCheck the command that was run:
bashcat work/xx/xxxxxx/.command.shTry running the command manually for debugging
Resource Errors
Out of Memory:
# Increase memory for specific process
process {
withName: CHECK_ALLELE_SWITCH {
memory = 16.GB
}
}Timeout:
# Increase time limit
process {
withName: CHECK_ALLELE_SWITCH {
time = 12.h
}
}Cleanup
Remove Work Directory
After successful completion:
# This saves disk space
rm -rf work/Warning: Only do this if you don't need to resume!
Clean Nextflow Cache
# Remove all cached metadata
nextflow clean -f
# Remove specific run
nextflow clean [run_name] -fBest Practices
- Always use
-resumewhen re-running after failures - Keep work directory until pipeline completes successfully
- Monitor resources to optimize configuration
- Use background execution for long-running jobs
- Check logs if something goes wrong
- Generate reports to understand performance
Next Steps
- Troubleshooting - Resolve common issues
- Configuration - Optimize settings
- Output Files - Understanding results
