Release notes, and instructions for installing the programs, are below. Please note the following: 1. I ask that you not make the programs (including source code, executables, or any part thereof) available to anyone outside your group, without first obtaining permission from me. If you are operating a computer facility which provides access to the programs to several independent investigators, you should set the permissions on the executables to allow execute but not read access, so that the programs may not be copied. Investigators who want copies of the software for their own use must contact me directly. 2. If you are doing any commercially restricted sequencing, you must execute a licensing agreement with the University of Washington and pay a fee in order to use phrap, cross_match, swat, phred, or consed. "Commercially restricted sequencing" is defined as any sequencing for which a company retains patenting or licensing rights regarding the sequence, or the right to restrict or delay dissemination of the sequence; with the sole exception that sequencing is not considered to be commercially restricted if it is federally funded AND the investigators adopt the data release policies endorsed at the Wellcome Trust-sponsored Bermuda meeting, i.e. immediate release of data as it is generated. By sending you the software now I am NOT giving you permission to use it for commercially restricted sequencing. To obtain the licensing agreement please contact Gerald B Barnett Software Technologies Manager Office of Technology Transfer University of Washington Box 354810 1107 NE 45th St, Suite 200 Seattle, WA 98195 206-543-3970 Vmail: 206-685-9972, FAX: 206-685-4767 barnett@u.washington.edu VERSION 0.960718 RELEASE NOTES: Version 0.960718 contains the following significant modifications: 1) There have been improvements in memory usage and speed that should substantially facilitate phrap and cross_match analyses of large datasets. Cross_match can now be used for database searches. These changes required a fair amount of reorganization of the internal data structures. In the course of making them, I have had to temporarily inactivate the phrap option that allowed one set of reads to be assembled against a second set of sequences (e.g. reference sequences being scanned for polymorphisms), using two or more input files. This will be restored soon. 2) Swat has been improved in several ways. I have restored and improved its ability to compute z-scores and E values for database searches, and these now appear to be quite reliable. Also, it is now possible to use swat for profile searches (with the restriction that gap penalties still need to be position independent). 3) A graphical viewer for phrap assemblies, "phrapview", is now included. This is intended to complement the "local" view of the assembly provided by consed, by giving a "global" view that focusses on information pertaining to possible incorrectness, incompleteness, or non-uniqueness of the assembly. Phrapview displays depth of coverage, forward-reverse read pairs, significant pairwise matches involving reads in different locations, and chimeric reads. It requires a ".view" file produced by running phrap with the -view option. Phrapview is written in perl-tk and to run it you will need to have installed on your system a recent version of perl that includes the tk library (available for free from a number of web sites). Further documentation appears in the file "general.doc". The program was written rather hastily (in less than a week, including the time to learn perl and Tk) and I would appreciate any feedback on how to make it more useful. 4) The following known bugs have been fixed: (i) A bug that caused occasional crashes in the "Revising contigs" phase. (ii) Another bug in revise_contigs that occasionally caused premature truncation of the contig sequence, resulting in massive pileups of reads at the truncated end; and that also caused a lower quality read to occasionally be used in place of a higher quality one in deriving the contig sequence. (iii) A bug that caused an infinite loop on data readin on SGI machines (N.B. I don't have access to an SGI computer and so haven't been able to verify that the programs now run successfully -- please let me know if there are still problems). (iv) A bug that caused premature termination of phrap when there are 0 length reads in the dataset. Please continue to report any bugs to me. 5) Contig base quality information is now output in the .ace file (as well as the .contigs.qual file). You need to obtain a new version of consed from David Gordon (which he is distributing this week) in order to avoid having consed crash on these .ace files. (This version of consed does not yet actually use the contig base quality information). 6) Phrap, cross_match and swat are now all case insensitive, in the sense that all sequences are immediately converted to upper case on readin. The -use_case option is no longer available. If you were previously using that, you will need to create .qual files that contain the information instead. In any case (so to speak!) I strongly recommend that you use phred's quality values which are substantially more discriminating. INSTALLATION INSTRUCTIONS: The email message containing the swat/cross_match/phrap package is in the form of a uuencoded .tar.Z file; you will need to have access to a Unix system for the initial unpacking, but once you've uudecoded it and unpacked the .tar file, you should be able to compile the source code on computers running other operating systems -- it should be portable to almost anything with a decent C compiler and adequate memory (64 Mb RAM or more is desirable). Here are the steps needed to unpack and install the program: 1) Save that email message as a file (say temp.mail). If possible, do this using the Unix mail command, rather than another mail program -- some mail programs (e.g. Pine) may remove trailing spaces on each line of incoming messages, which will corrupt a uuencoded message. Do not attempt to modify the saved mail message in any way using a word processor. That is unnecessary and may corrupt the message. 2) To unpack the saved file, execute the following two commands on a Unix workstation, in the directory containing the file created in (1): uudecode temp.mail zcat distrib.tar.Z | tar xvf - 3) To produce working versions of the programs, move (if necessary) the files produced by the above command to an appropriate computer having a C compiler, and execute the following command in the directory containing the files: make If your compiler does not recognize the -O2 optimization flag, you should change the line CFLAGS= -O2 in the file "makefile" to CFLAGS= -O and recompile (this may require removing all executables produced using the original make). 4) The documentation (which is still somewhat incomplete) is in three .doc files. Contact me if you have problems. Before doing so however please record exactly what steps you carried out, on what computer & operating system, and what error messages you received.