Lab3 : HMMERΒΆ
During this lab, we will acquaint ourselves with the the software package HMMER. Your objectives are:
- Familiarize yourself with the software, how to execute it, how to visualize results.
- Regarding your dataset. Characterize a few conserved domains.
The HMMER manual ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/Userguide.pdf
The HMMER webpage: http://hmmer.janelia.org/
Step 1: Launch and AMI. For this exercise, we will use a c4.xlarge instance.
ssh -i ~/Downloads/your.pem ubuntu@ec2-???-???-???-???.compute-1.amazonaws.com
Update your machine
sudo apt-get update && sudo apt-get -y upgrade
Install new software
sudo apt-get -y install tmux git curl gcc make g++ python-dev unzip default-jre
Ok, for this lab we are going to use HMMER
cd $HOME
curl -LO http://selab.janelia.org/software/hmmer3/3.1b1/hmmer-3.1b1-linux-intel-x86_64.tar.gz
tar -zxf hmmer-3.1b1-linux-intel-x86_64.tar.gz
cd hmmer-3.1b1-linux-intel-x86_64/
./configure
make && sudo make all install
make check
cd
You will download the dataset. Also, download Swissprot and Pfam-A
sudo chown ubuntu /mnt
cd /mnt
curl -LO https://s3.amazonaws.com/gen711/dataset1.pep
#download the SwissProt database
curl -LO https://s3.amazonaws.com/gen711/uniprot.pep
#download the Pfam-A database
curl -LO ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
We are going to run HMMER to identify conserved protein domains. This will take a little while, and we’ll use tmux
to allow us to do this in the background, and continue to work on other things.
gzip -d Pfam-A.hmm.gz
tmux new -s pfam
hmmpress Pfam-A.hmm #this is analogous to 'makeblastdb'
hmmscan -E 1e-3 --domtblout dataset.pfam --cpu 4 Pfam-A.hmm dataset1.pep
ctl-b d
top -c #see that hmmscan is running..
The neat thing about HMMER is that it can be used as a replacement for blastP or PSI-blast.
#blastp-like HBB-HUMAN is a Hemoglobin B protein sequence.
phmmer --noali --domtblout hbb.phmmer -E 1e-5 /home/ubuntu/hmmer-3.1b1-linux-intel-x86_64/tutorial/HBB_HUMAN uniprot.pep
look at phmmer results
more hbb.phmmer
PSI-blast-like
jackhmmer --noali --domtblout hbb.jackhammer -E 1e-5 /home/ubuntu/hmmer-3.1b1-linux-intel-x86_64/tutorial/HBB_HUMAN uniprot.pep
look at jackhammer results
more hbb.phmmer
Now let’s look at the hmmscan
results. This analyses may still be running, but we can look at it while it’s still in progress.
more dataset.pfam
#There are a bunch of columns in this table - what do they mean?
#Try to extract all the hits to a specific domain. Google a few domains (column 1) to see if any seem interesting.
#for instance, find all occurrences of ABC_tran
grep ABC_tran dataset.pfam
#use grep to count the number of matches. Copy this number down.
grep -c ABC_tran dataset.pfam
#Find all the contigs that have a ABC_tran domain.
grep ABC_tran dataset.pfam | awk '{print $4}' | sort -u
> Just for fun, check on the Pfam search to see what it is doing...
tmux attach -t pfam
ctl-b d