Bioinformatics
Run a Delta-BLAST with the silkworm insulin protein (P26726). Limit to human proteins in the RefSeq_Protein database.
How many total sequences?
How many human homologs appear to have the insulin domain (irrespective of the e-value threshold)?
Now edit search remove the human from the organism selection and change DELTA-BLAST threshold to 0.005. Keep everything else unchanged. How many records?
Write the name of the organism that has the best e-value?
Run a second iteration. What is the name of the newly added organism that has the best e-value? What is that e-value?
How many sequences? Write the name of the only organism that has an e-value of 1e-04.
Write the approximate number of new hits at each of the subsequent iterations.
Are there still sequences being added on the fifth iteration?
Are most of those descriptions “proteases?
Run hmmer with the proteasome from question 2. Search the RefSeq database.
What is the E value of the best matching proteasome?
Based on the colors and the distribution, from which of the three domains of life are the majority of hits?
The proteasome sequence is from a hyperthermophilic archaeon. Why do you suppose there may be a (relative) lack of total hits in archaea using HMMER?