Bash Scripting: Automating Your Bioinformatics Workflows
In Part 1, we learned how to navigate the Linux terminal. But imagine this: You have 50 FASTQ files from a sequencing run. Are you going to type a command 50 times?
No. In bioinformatics, if you have to do something more than twice, you should script it.
Today, we dive into Bash Scripting - the art of giving the computer a "To-Do List" that it can execute while you go grab a coffee.
1. What is a Bash Script?
A Bash script is simply a text file containing a series of commands. It always starts with a "Shebang" line: #!/bin/bash. This tells the computer, "Hey, use the Bash shell to read this!"
Creating your first script:
- Type
nano myscript.sh Add the following:
#!/bin/bash echo "Starting the BioInfoQuant Analysis Pipeline..." mkdir -p processed_data ls -lh echo"Step 1 Complete."- Save and exit (
Ctrl+O,Enter,Ctrl+X). - Make it executable:
chmod +x myscript.sh - Run it:
./myscript.sh
2. The Power of Variables
Variables allow you to store data for later use. In bioinformatics, we use them to store file paths or sample names.
SAMPLE="Sample_A1"
echo "Processing $SAMPLE now..."3. The "For Loop": The Bioinformatician’s Superpower
This is the most important tool in your kit. A loop allows you to repeat a command for every file in a folder.
Practical Example: Compressing all FASTA files at once
for file in *.fasta
do
echo "Compressing $file..."
gzip $file
done
Practical Example: Automating GROMACS (Energy Minimization) If you have multiple protein-ligand complexes, you can automate the energy minimization step:
for dir in protein_complex_*
do
cd $dir
gmx grompp -f em.mdp -c coord.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em
cd ..
done4. Record Notes: Common Mistakes to Avoid
- Spaces Matter: In Bash,
VAR = "data"(with spaces) will fail. It must beVAR="data". - Permissions: If you get a "Permission Denied" error, remember to use
chmod +xon your script. - The Path: Always use
pwdto ensure your script is looking in the right folder.
Practical Exercise for Students
Create a script named organize.sh that:
- Creates three folders:
scripts,data, andresults. - Moves all
.shfiles intoscripts. - Moves all
.fastqor.fastafiles intodata. - Prints a message saying “Workspace Organized!”
Summary Checklist
- [ ] Use
#!/bin/bashat the top. - [ ] Use
chmod +xto make the script "alive." - [ ] Use Loops for repetitive file tasks.
- [ ] Use Comments (
#) to explain what your code does.


