Follow the instructions in Logging into your new instance “in the cloud” (Mac version) or Logging into your new instance “in the cloud” (Windows version).
Two points:
Your machine name is available here.
Download a keyfile here: http://athyra.idyll.org/~t/msu-bootcamp.pem
You should now be at a ‘#’ prompt.
Type:
cd /mnt
and then type:
mkdir <NetID>
but replace <NetID> with your MSU NetID (or some distinguishing lowercase name).
Then type:
cd <NetID>
and
pwd
It should say ‘/mnt/<NetID>’. Here, you’ve created your own folder and made it your current “working directory”, which means it’s where UNIX will look for files and programs by default.
Download the E. coli MG1655 protein data set:
curl -O http://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.faa
This grabs that URL and saves the contents of ‘NC_000913.faa’ to the local disk.
Next, download a Salmonella protein data set:
curl -O http://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Salmonella_enterica_Serovar_Typhimurium_var__5__CFSAN001921_uid212972/NC_021814.faa
Likewise, this creates a local copy of NC_021814.faa.
Let’s take a quick look at these files:
head NC_000913.faa
head NC_021814.faa
These files contain a bunch of protein data from two different genomes. What can we do with it??
Format the E. coli data set for BLAST and run BLAST of the Salmonella proteins against the MG1655 protein set:
formatdb -i NC_000913.faa -o T -p T
blastall -i NC_021814.faa -d NC_000913.faa -p blastp -e 1e-12 -o salm.x.ecoli
Look at the first 50 lines of the output file:
head -50 salm.x.ecoli
good, BLAST output! But if you type ‘wc salm.x.ecoli’ you’ll see that this file has 462,000 lines in it – surely you don’t want to look at each one?
Let’s convert ‘em to a CSV file, instead, that can be opened in Excel:
python /usr/local/share/ngs-scripts/blast/blast-to-csv-with-names.py NC_021814.faa NC_000913.faa salm.x.ecoli > salm.x.ecoli.csv
Take a look at this file
head salm.x.ecoli.csv
But ... this file is on our remote computer. How do we get this file onto our local computer?? There are lots of ways of doing this; for now, I’ve set up a Web server on your Amazon computer, so you can just type:
ln -fs $PWD /var/www
and go to your computer name in your browser plus ‘/<NETID>. You should see a bunch of files, including ‘salm.x.ecoli.csv’. For an example, go to:
http://ec2-23-20-239-64.compute-1.amazonaws.com/titus/
Be sure to start in “your” directory:
cd /mnt/<NETID>
Now, let’s do the reciprocal BLAST, too:
formatdb -i NC_021814.faa -o T -p T
blastall -i NC_000913.faa -d NC_021814.faa -p blastp -e 1e-12 -o ecoli.x.salm
Extract reciprocal best hit:
python /usr/local/share/ngs-scripts/blast/blast-to-ortho-csv.py NC_021814.faa NC_000913.faa salm.x.ecoli ecoli.x.salm > ortho.csv
This generates a file ‘ortho.csv’, containing the ortholog assignments and their annotations. Now download that to your local computer and take a look at it in Excel.
Get together with those sitting around you and come up with three uses for this kind of “batch BLAST” in your collective research, whatever it may be. We’ll make a list!
Explore the NCBI bacterial genome site here: http://ftp.ncbi.nlm.nih.gov/genomes/Bacteria