Friday, September 5, 2014

Basic shell navigation and trouble shooting

Here are some basic tips for navigating the shell that a biologist, like myself, may not know if you're just picking things up as you go. Some can save you a lot of time and/or a lot of frustration! For more see: A Command Line Primer for Beginners.

1. Personalize your bash_profile/bashrc.
   -Ls colors is a nice option to highlight executables, compressed files, directories, etc. This can save time for example if you spent all afternoon troubleshooting a program only to later realize it is not executable. It's also a quick way to see which files you can compress to reduce your quota on a server. See here for more on LS colors.
   -Add aliases to your bash profile to save time; can include shortcuts to directories, ssh shortcuts, and common commands (especially if you are attached to certain command options). Other tips for customizing your command prompt are here.

2. The power of less: I prefer less over more because it doesn't clutter up my terminal. It also only loads a page at a time, so for large files, it's nice to get a peak without having to open the whole thing up. You can also search within less, which may be faster than grep if you aren't sure what you are looking for. For more tips, try here or check the man page.

    -'/' and '?' allows you to search forward and backwards within a file
    -'g' and 'G' allow you to scroll to the beginning and end of a given file
   -'shift+F' allows you to turn on live streaming in a file that is currently being written
       note: alternative is tail -f, but this output will stay in your terminal afterwards
    -'-N+enter' adds line numbers
    -'###+g' scrolls to a specific line number in a file

3. Shortcuts on the command line. I've seen so many friends mistype on the command prompt and painstakingly back type the entire line. Below are few shortcuts to avoid this frustration, but for more check out: Keyboard Shortcuts for Bash.

Ctrl+a moves to beginning of the prompt
Ctrl+e moves to the end
Ctrl+k erases everything after the cursor
Ctrl+u erases everything before the cursor
Tab to autocomplete filenames from within your path

4. Quick and useful one-liners. Lots of times I do very simple text manipulation that a simple one-liner can do efficiently as compared to a separate script. The nice thing about performing these tasks on the command line is that it makes it easier to employ piping the output from one into the next to avoid lots of intermediate files, that later avoid deletion or archiving. 

Get column sum or average quickly using awk:
$ awk '{sum+=$4} END {print $4/NR}' somefile #calculates average across column 4 (remove '/NR' to get only the sum of column 4)

Add header to file using sed (-i option edits files in place):
$ sed -i '1s/^/header\n/' somefile #add header or other text before the first line of somefile

Convert file with chr1 to 1 or vice versa (useful for switching from UCSC to Ensembl files):
$ sed -i 's/chr//g' OR sed -i 's/^/chr/g'

Count the number of fasta entries in a fasta file (more grep tips):
$ grep "<" infile.fasta | wc -l

Convert line endings from MAC to unix/linux:
$ cat macinput.txt | tr "\r" "\n" > unixoutput.txt

Quickly check the syntax of your script without running it:
$ perl -wc script.pl
$ python -m py_compile script.py

Find a lost file (searches within current directory and all subdirectories), more find stuff:
$ find ./ -name somelostfile.txt   

Change something across multiple files:
find ./ -name "REGEX" -print | xargs sed -i 's/old/new/g'

More one-liners can be found here: Useful linux one-liners for bioinformatics. Also, I use awk and sed a lot and for loops in bash are pretty useful. This list is by no means comprehensive, but it's a great start! Feel free to add any additional tips or links in the comments!