zPicture: blastz-based alignment and visualization tool

Generating genomic alignments and analyzing sequence conservation shared by homologous DNA regions is a powerful analytical strategy for inferring evolutionary information and identifying critical functional regions in raw sequence data. Alignment tools that incorporate highly detailed dynamic visualization modules facilitate this process and empower molecular biologists to customize their sequence analysis to obtain useful informatic data that can be used experimentally.

zPicture is a pairwise alignment and visualization tool that compares two sequences using the local alignment program BlastZ (Schwarz et al. 2003) and displays the gapless blocks of shared homology as unconnected dots (PIP) or smooth trace (VISTA) conservation plots. Unlike any other available alignment visualization tool, zPicture allows for customized real-time processing of alignment data.  The user can actively modify all the visualization settings including (1) minimum size and length of evolutionary conserved regions (ECR); (2) the sequence to be used as reference; (3) annotation modifications to allow detected features; (4) bottom cut-off value for percent identify y-axis displays; and (5) adjusting picture resolution to either compact alignments or zoom-in.  Also, zPicture allows easy extraction of ECR sequences and regulatory element analysis vie transcription factor binding sites analysis portal to rVISTA tool (//rvista.dcode.org Loots et al. 2002)

BlastZ computes local alignments for sequences of any length based on the assumption that the input sequences are related and share blocks of high conservation that are separated by regions that lack homology and vary in length in the two sequences. Regions of homology are displayed collinear only to the reference sequence, while the order and orientation of the conserved elements is not necessarily the same in the second sequence.  zPicture processes alignments for DNA sequences submitted as follows: (1) UCSC genome browser genome coordinates allow for upload of sequences and annotation files from the UCSC genome database; (2) type or paste in sequence in the FASTA format in the provided window; (3) upload sequence files in FASTA format, or (4) provide NCBI accession number. 

For zPicture to download sequence information from the UCSC genome browser, the user is required to provide the following information: (1) organism for which the sequence is downloaded [currently human, mouse and rat]; (2) the assembly which should be used for download; (3) the type of annotation tables to be used [RefSeq, Genscan/Softberry/Twinscan predictions, Known Genes from UCSC, mRNAs or ESTs], and (4) the genomic location of the region to be aligned, which should be indicated using the following format:  chr10:1-100,000.

There are two options for processing redundant DNA information for repetitive elements. If data from UCSC is used, repeats are indicated within sequences by lower case alphabet letters in which case the user should select the first option (1) Repeats are identified by lower-case letters.  To process the input sequences by RepeatMasker the user has to select the second available option (2) Mask repetitive elements, in which case they also have to indicate which organism should be used for identifying all the repetitive elements (http://repeatmasker.genome.washington.edu/)

Annotation files include positional information for the nucleotide location of known coding exons (CDS), untranslated regions (UTR) or any other DNA features known for the input sequence contigs. zPicture reconstitutes gene annotation from each gene description provided. Gene description for an mRNA transcript starts with a header line and ends with an emply line. First symbol in the header line (> or <) indicates the direction of the gene transcript relative to the input DNA sequence (positive or negative, respectively). Exon description lines follow the header line. They should be in indicated as <from> <to> <exon type>, where <from> is the starting position of the exon and
<to> is the ending position of the exon. The <exon type> could be either CDS, UTR or OTH (all other features).
For example, 3 exon gene STC23 on the negative strand. Gene starts at position 1000
in the submitted sequence and ends at 500. Exons span regions [1000,800], [600,550], and [530,500]. Coding part of the gene is from 900 to 550. The annotation for this gene should be indicated as follows:

< 500 1000 STC23
500 530 UTR
550 600 CDS
800 900 CDS
900 1000 UTR

When all the available information is provided, press the submit button.

To return to a previously submitted request, enter the ID number at the bottom of the request page and press the submit button.