home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: bionet.software
- Path: sparky!uunet!haven.umd.edu!darwin.sura.net!lhc!host!gish
- From: gish@host.nlm.nih.gov (Warren Gish)
- Subject: Re: problem with blast
- Message-ID: <1993Jan22.160730.24504@nlm.nih.gov>
- Keywords: blastn
- Sender: news@nlm.nih.gov
- Organization: National Library of Medicine
- References: <1993Jan21.231458.19512@medmail.stanford.edu>
- Date: Fri, 22 Jan 93 16:07:30 GMT
- Lines: 88
-
- In article <1993Jan21.231458.19512@medmail.stanford.edu> wnelson@cmgm.stanford.edu (Will Nelson) writes:
- >I have been having a problem with blastn.
- >The problem is that on successive invocations of blastn,
- >I get different results, using the same input sequence.
- >
- >My input file is this:
- >
- >
- >>DROSATA - LOCUS DROSATA 254 bp ds-DNA INV 15-MAR-1989
- >canatttgcaaatttaatgaaccccccttcaaaaaatgcgaaaattaacgcaaaaattgatttccctaaa
- >tccttcaaaaagtaaataacaactttttggcaaaatctgattccctaatttcggtcattaaataatcagt
- >ttttttgccacaactttaaaaataattgtctgaatatggaatgtcatacctcgcnnagctngtaattaaa
- >tttccaatgaaactgtgttcaacaatgaaaattacatttttcgg
- >
-
- Dear Will,
-
- What you have observed is a consequence of the ambiguous 'n' letters present
- in the query sequence. An analogous phenomenon can also arise when a database
- sequence contains ambiguity codes. BLASTN searches a compressed form of the
- database and, to parallel this, it also uses a compressed form of the query.
- In compressed form, letters other than A, C, G, and T are not permitted in the
- sequences. What BLASTN does with Ns is replace them with random selections
- from the set {A,C,G,T}. For the other IUB ambiguity codes, random selections
- are made from the appropriate subset of {A,C,G,T}. For example, any Rs would
- be replaced by random selections from the set {A,G}.
-
- As you may know, the alignments found by BLASTN can be scored by counting the
- number of matches and mismatches, multiplying these two numbers by the
- corresponding match reward (default value +5) and mismatch penalty (default
- value -4), and adding them together. Depending on the random replacements that
- were made at each position of ambiguity, alignments found in different
- invocations of BLASTN may have different initial scores; and/or the alignments
- may have different start- and end-points in the query and database sequences.
- (An alignment is not supposed to begin or end on a mismatch, as might be
- encountered where a random replacement was made).
-
- After the database search is finished and the one-line descriptions are
- reported, the alignments themselves are then reported. It is at this point
- that the original query and database sequences, including any ambiguity codes
- that may be present, are used by BLASTN to re-score the alignments. When a
- final score is different from the initial score, its value is flagged with an
- asterisk pointing to the WARNING footnote that will appear at the end of BLASTN
- output. (The initial alignment is not trimmed, however, should it be
- subsequently be found to begin or end with one or more mismatches).
-
- Pruned example:
-
- Smallest
- Poisson
- High Probability
- Sequences producing High-scoring Segment Pairs: Score P(N) N
-
- DROSAT353 D.melanogaster 1.688 g/ml satellite DNA sequence. 438 1.4e-27 1
-
-
- >DROSAT353 D.melanogaster 1.688 g/ml satellite DNA sequence.
- Length = 353
-
- Plus Strand HSPs:
-
- Score = 429* (118.5 bits), Expect = 8.1e-27, P = 1.4e-27
- Identities = 97/111 (87%), Positives = 97/111 (87%), Strand = Plus
-
- Query: 141 TTTTTTGCCACAACTTTAAAAATAATTGTCTGAATATGGAATGTCATACCTCGCNNAGCT 200
- |||| ||| ||||||||||||| |||||||||||||||||| |||||| |||| ||||
- Sbjct: 62 TTTTCTGCTACAACTTTAAAAACAATTGTCTGAATATGGAAACTCATACGTCGCTGAGCT 121
-
- Query: 201 NGTAATTAAATTTCCAATGAAACTGTGTTCAACAATGAAAATTACATTTTT 251
- ||||||||||||||||| ||||||||||||| |||| |||||| |||| |
- Sbjct: 122 CGTAATTAAATTTCCAATCAAACTGTGTTCAAAAATGGAAATTAAATTTCT 172
-
- <stuff deleted>
-
- WARNING: *12 alignments contained non-ACGT(U) letters.
-
-
-
- While this behavior is described in the BUGS section of the BLAST manual page,
- the emphasis in that document is on potential cases where the initial score
- satisfies the cutoff for reporting matches but the final score falls below the
- cutoff. Worst-case, it is also possible for completely different sets of
- matches to be reported in different invocations of BLASTN when ambiguity codes
- are present.
-
- Sincerely,
- --Warren
-
-