gbwget is (C) 2001 by Sebastian Bunka
Please note: this program is beta software, so use it at your own risk!
If you get wrong data for your research I'm not responsible for this!
From the LICENSE file:
NO WARRANTY BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.ABOUT
REQUIREMENTS
gbwget is written in PERL and should and should run on any computer system
with perl installed. It depends however on the wget program that might not
be available on some systems (Microsoft products ?) see:
http://www.gnu.org/software/wget/wget.html.
FEATURES
gbwget can be used as a command line tool to a) fetch single nucleotide or
protein database entries, b) fetch several entries at a time, fetch many
entries from a list of accession numbers read from a regular text file or
from it's own listfile. It outputs the entries to standard out or into
different files. It can retrieve the files in genbank or fasta format. It
also can be used to search for keywords in the respective databases and
print a list of entries to stdout or into a file in it's listformat, that
can be used offline to select/view entries, append to other listfile,
retrieve complete list entries etc. Moreover, in offline or interactive mode
one can page through the lists, can mark/unmark entries, save, read, append
to lists, can download selectable ranges and can perform new searches.
Searches can be restricted to to specific database fields, wildcards and
multiple terms can be used (i.e. actinobc*+pleuropneu*).
From the online help: gbwget 0.3.5 (C) 2001 by Sebastian Bunkaunder the terms of the GNU General Public License usage: gbwget [-u accnum | -U file | -I file] [-o outfile] [-g] [-m maxnum] [-d opt] [-n | -p] >> get Genbank entries gbwget [-l] [-S] [-m maxnum] [-d opt] [-n | -p] [-o outfile] searchterm >> search for searchterm and display matching entries gbwget [-O | -L listfile] >> offline mode, always interactive gbwget [-h] [-H] (-h this help, -H usage help) Interactive mode is entered when no -u or -l option is given. Options -o and -g are meaningless in interactive mode. Searchterm can contain the asterisk. Logical AND queries are possible by concatenating two strings with '+' (i.e. actinbac*+transferr*). Fields to search in can also be specified like "actinobac*[KYWD]". Allowed fields: ACCN,AUTH,PDAT,ECNO,FKEY,GENE,JOUR,KYWD,MDAT,ORGN,PROP,PROT,SQID,SLEN,SUBS,WORD For possible meaning refer to the Entrez documentation at http://www.ncbi.nlm.nih.gov:80/Entrez/linking.html and FETCHING ENTRIES FROM COMMAND LINE If option -u 'accnum' is used, the program will fetch the given db entry and print it to std out or in the given filename or a filename generated consisting of the accnum with extension appended (i.e. AX024675.gn or .fn for fasta). Multiple accnums can be retrieved by separating the accnums with a colon, but without any space (i.e. AA123456,AX34567,..). With option -U accession numbers will be read from a file containing one acc. num each line. The same is with option -I, but file is a gbwget generated listfile (where one line holds the complete entry with fields separated by +++. If option -o or -g is given, the entries will be saved in different files. -o 'myfil' will save the entries in myfil.1.gn myfil.2.gn etc. File extesnion is .gn for genbank/genpept-nucleotide database. Fasta format would be .fn etc. -d defines display options: "g" Genbank/Genpept format, "f" Fasta, "m" Medline links, "n" nucleotide links, "p" peptide links. -n or -p selects the database to search/retrieve from. -n will set nucleotide database and -p protein database. DATABASE SELECTION Prepend the accnum with one of the following: embl:|em:|gb:|genbank:|swiss:|sw:|gi: Example: gbwget -u embl:XXU13858 fetches the embl entry for the pGEX5x3 expression vector. FETCHING SEARCH LISTS When using gbwget with option -l and 'searchterm' it will query genbank and dump a human readable list to stdout or outfile. Display options, and database selection as above. When given option -S it will display/save in gbwget listfile format. These lists can be reprocessed with gbwget in offline mode. OFFLINE MODE Option -O or -L listfile enters gbwget without prior connection to genbank. Screenshots (not GUI!) invoked as: "gbwget actinobacil*+pleuropneu*" (combined search) Mainscreen: Searchresults 1 - 22 of 100 (Dopt: g / max: 100 / Fullview / cache: 0) [ 1] - AF363363 Actinobacillus pleuropneumoniae RTX-toxin IIIA gene, complete [ 2] - AF363362 Actinobacillus pleuropneumoniae RTX toxin IIA gene, complete [ 3] - AF363361 Actinobacillus pleuropneumoniae RTX toxin IA gene, complete c [ 4] - AY017472 Actinobacillus pleuropneumoniae HS143 16S ribosomal RNA gene, [ 5] - AF013776 Salmonella typhimurium PagJ (pagJ) and SspH1 (sspH1) genes, c [ 6] - AE005174 Escherichia coli O157:H7, complete genome [ 7] - AE005215 Escherichia coli O157:H7 EDL933 genome, contig 1 of 3, sectio [ 8] - AP002551 Escherichia coli O157:H7 DNA, complete genome, section 2/20 [ 9] - AF275732 Actinobacillus pleuropneumoniae KDO transferase (msbB) gene, [ 10] - AF275731 Actinobacillus pleuropneumoniae DNA helicase (dnaB) gene, par [ 11] - AF275730 Actinobacillus pleuropneumoniae DNA topoisomerase III (topB) [ 12] - AF275729 Actinobacillus pleuropneumoniae hypothetical protein gene, co [ 13] - AF275728 Actinobacillus pleuropneumoniae aminopeptidase gene, partial [ 14] - AF275727 Actinobacillus pleuropneumoniae hypothetical protein gene, pa [ 15] - AF275726 Actinobacillus pleuropneumoniae fatty acid CoA ligase gene, c [ 16] - X99607 A.pleuropneumoniae omlaA gene, partial [ 17] - AL583918 Mycobacterium leprae strain TN complete genome; segment 2/10 [ 18] - AF167577 Actinobacillus pleuropneumoniae transcriptional regulator Apu [ 19] - AF143906 Actinobacillus pleuropneumoniae CpxD (cpxD) gene, partial cds [ 20] - AF143905 Actinobacillus pleuropneumoniae putative LPS biosynthesis pro [ 21] - AF143904 Actinobacillus pleuropneumoniae putative galactosyl transfera [ 22] - AF053017 Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate uridy e(X)it (P)rev. (D)nld. (V)iew (F)ull (G)rep (B)ack (N)ew (O)pt go(T)o f(I)le (M)ark (U)nmark (H)elp Enter=next page (V)iew entry 22: Genbank AccNo: AF053017 Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate uridylyltransferase (galU) gene, complete cds; and unknown gene. gi|3372536|gb|AF053017.1|AF053017 Press Enter to go back (F)ull view LOCUS AF053017 1850 bp DNA BCT 10-JAN-2001 DEFINITION Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate uridylyltransferase (galU) gene, complete cds; and unknown gene. ACCESSION AF053017 VERSION AF053017.1 GI:3372536 KEYWORDS . SOURCE Actinobacillus pleuropneumoniae. ORGANISM Actinobacillus pleuropneumoniae Bacteria; Proteobacteria; gamma subdivision; Pasteurellaceae; Actinobacillus. REFERENCE 1 (bases 1 to 1850) AUTHORS Rioux,S., Galarneau,C., Harel,J., Frey,J., Nicolet,J., Kobisch,M., Dubreuil,J.D. and Jacques,M. TITLE Isolation and characterization of mini-Tn10 lipopolysaccharide mutants of Actinobacillus pleuropneumoniae serotype 1 JOURNAL Can. J. Microbiol. 45 (12), 1017-1026 (1999) MEDLINE 20161471 PUBMED 10696481 REFERENCE 2 (bases 1 to 1850) AUTHORS Rioux,S., Harel,J., Frey,J., Nicolet,J., Kobisch,M., Dubreuil,J.D. and Jacques,M. TITLE Direct Submission Press Q for quit or Enter for next page (G)rep to select only entries with "putative" Searchresults 1 - 6 of 6 (Dopt: g / max: 100 / Grepmode / cache: 1) [ 1] - AF167577 Actinobacillus pleuropneumoniae transcriptional regulator Apu [ 2] - AF143905 Actinobacillus pleuropneumoniae putative LPS biosynthesis pro [ 3] - AF143904 Actinobacillus pleuropneumoniae putative galactosyl transfera [ 4] - AF329453 Actinobacillus pleuropneumoniae strain 4074 putative glycosyl [ 5] - AF329452 Actinobacillus pleuropneumoniae putative O-antigen biosynthes [ 6] - AF030523 Sinorhizobium meliloti putative periplasmic iron-binding prot e(X)it (P)rev. (D)nld. (V)iew (F)ull (G)rep (B)ack (N)ew (O)pt go(T)o f(I)le (M)ark (U)nmark (H)elp Enter=next page f(I)le S: Save list to listfile, M: Save marked entries to listfile A: Append list to listfile, P: Append marked entries to listfile L: Load list from listfile >> INTEGRATION INTO EMBOSS AS EXTERNAL APP You need to have a working installation of emboss. To use gbwget as app just edit your .embossrc file and add entries like: ------------------------.embossrc-------------------- DB gb [ #required parameters method: app format: genbank app: "/home/seb/bin/gbwget -u " #optional parameters type: N comment: "(gb) gbwget in genbank format" ] DB embl [ #required parameters method: app format: genbank app: "/home/seb/bin/gbwget -u " #optional parameters type: N comment: "(embl) gbwget genbank format" ] DB swiss [ #required parameters method: app format: genbank app: "/home/seb/bin/gbwget -u " #optional parameters type: P comment: "(swiss) gbwget genbank format" ] ----------------------------------------------------- Since the emboss applications start the external app with a parameter in the form of "gb:XXXXXXX" for an USA of ::gb:XXXXXXX you have to use the database names in the above form, or you need to change code in gbwget. Some programs of the emboss suite might not work with gbwget like textsearch, however gbwget offers similar features.
BUGS
I hope not so many. And PERL experts: please do not flame me for writing
such spaghetti code/unefficient code or whatever. It was done a little bit
to experiment with perl.
Please send bug reports to me: Sebastian.Bunka@vu-wien.ac.at
Have fun!