Here are some
basic tips for navigating the shell that a biologist, like myself, may not know if you're just picking things up as you go. Some can save you a lot of time and/or a lot of frustration! For
more see: A
Command Line Primer for Beginners.
1. Personalize
your bash_profile/bashrc.
-Ls colors is a nice option to highlight
executables, compressed files, directories, etc. This can save time for example
if you spent all afternoon troubleshooting a program only to later realize it
is not executable. It's also a quick way to see which files you can compress to
reduce your quota on a server. See here for more on LS colors.
-Add aliases to your bash profile to save time; can
include shortcuts to directories, ssh shortcuts, and common commands
(especially if you are attached to certain command options). Other tips for customizing your command prompt are here.
2. The power of
less: I prefer less over more because it doesn't clutter up my terminal. It
also only loads a page at a time, so for large files, it's nice to get a peak
without having to open the whole thing up. You can also search within less, which may be faster than grep if you aren't sure what you are looking for. For
more tips, try here or check the man page.
-'/' and '?' allows you to search forward and
backwards within a file
-'g' and 'G' allow you to scroll to the beginning
and end of a given file
-'shift+F' allows you to turn on live streaming in a
file that is currently being written
note: alternative is tail
-f, but this output will stay in your terminal afterwards
-'-N+enter'
adds line numbers
-'###+g' scrolls to a specific line number in a file
3. Shortcuts on
the command line. I've seen so many friends mistype on the command
prompt and painstakingly back type the entire line. Below are few shortcuts to
avoid this frustration, but for more check out: Keyboard
Shortcuts for Bash.
Ctrl+a
moves to beginning of the prompt
Ctrl+e moves to the end
Ctrl+e moves to the end
Ctrl+k erases everything after the cursor
Ctrl+u erases everything before the cursor
Tab to autocomplete filenames from within
your path
4. Quick and useful one-liners. Lots of times I do very simple text manipulation that a simple one-liner can do efficiently as compared to a separate script. The nice thing about performing these tasks on the command line is that it makes it easier to employ piping the output from one into the next to avoid lots of intermediate files, that later avoid deletion or archiving.
Get column sum or average quickly using awk:
$ awk '{sum+=$4} END {print $4/NR}' somefile #calculates average across column 4 (remove '/NR' to get only the sum of column
4)
Add header to file using sed (-i option edits files in place):
$ sed -i '1s/^/header\n/' somefile #add
header or other text before the first line of somefile
Convert file with chr1 to 1 or vice versa (useful for switching from UCSC to Ensembl files):
$ sed -i 's/chr//g' OR sed -i 's/^/chr/g'
Count the number of fasta entries in a fasta file (more grep tips):
$ grep "<" infile.fasta | wc -l
Convert line endings from MAC to unix/linux:
$ cat macinput.txt | tr "\r" "\n" >
unixoutput.txt
Quickly check the syntax of your script without running it:
$ perl -wc script.pl
$ python -m py_compile script.py
Find a lost file (searches within current directory and all subdirectories), more find stuff:
Change something across multiple files:
$ find ./ -name "REGEX" -print | xargs sed -i 's/old/new/g'
More one-liners can be found here: Useful linux one-liners for bioinformatics. Also, I use awk and sed a lot and for loops in bash are pretty useful. This list is by no means comprehensive, but it's a great start! Feel free to add any additional tips or links in the comments!