SourceForge Logo Project pages - Download

gbwget 0.3.5

gbwget is (C) 2001 by Sebastian Bunka This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Please note: this program is beta software, so use it at your own risk! If you get wrong data for your research I'm not responsible for this! From the LICENSE file:


			    NO WARRANTY

   BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.

   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
ABOUT
gbwget is a command line/screen oriented tool to search in nucleotide or protein databases and to view or retrieve database entries using the Entrez server at http://www.ncbi.nlm.nih.gov. It is intended as a sequence retrieval method for the EMBOSS (The European Molecular Biology Open Software Suite, see: http://www.uk.embnet.org/Software/EMBOSS/index.html) an alternative for the gcg sequence analysis suite. gbwget can also be used standalone, but web-based retrieval systems might be more comfortable.

REQUIREMENTS
gbwget is written in PERL and should and should run on any computer system with perl installed. It depends however on the wget program that might not be available on some systems (Microsoft products ?) see: http://www.gnu.org/software/wget/wget.html.

FEATURES
gbwget can be used as a command line tool to a) fetch single nucleotide or protein database entries, b) fetch several entries at a time, fetch many entries from a list of accession numbers read from a regular text file or from it's own listfile. It outputs the entries to standard out or into different files. It can retrieve the files in genbank or fasta format. It also can be used to search for keywords in the respective databases and print a list of entries to stdout or into a file in it's listformat, that can be used offline to select/view entries, append to other listfile, retrieve complete list entries etc. Moreover, in offline or interactive mode one can page through the lists, can mark/unmark entries, save, read, append to lists, can download selectable ranges and can perform new searches. Searches can be restricted to to specific database fields, wildcards and multiple terms can be used (i.e. actinobc*+pleuropneu*).

From the online help:
gbwget 0.3.5
(C) 2001 by Sebastian Bunka 
under the terms of the GNU General Public License

usage: gbwget [-u accnum | -U file | -I file] [-o outfile] [-g] 
              [-m maxnum] [-d opt] [-n | -p] 
              >> get Genbank entries
          
       gbwget [-l] [-S] [-m maxnum] [-d opt] [-n | -p] [-o outfile]
              searchterm
              >> search for searchterm and display matching entries

       gbwget [-O | -L listfile]
              >> offline mode, always interactive

       gbwget [-h] [-H] (-h this help, -H usage help)


Interactive mode is entered when no -u or -l option is given. Options -o and
-g
are meaningless in interactive mode.

Searchterm can contain the asterisk. Logical AND queries are possible by
concatenating two strings with '+' (i.e. actinbac*+transferr*).

Fields to search in can also be specified like "actinobac*[KYWD]".
Allowed fields:
ACCN,AUTH,PDAT,ECNO,FKEY,GENE,JOUR,KYWD,MDAT,ORGN,PROP,PROT,SQID,SLEN,SUBS,WORD

For possible meaning refer to the Entrez documentation at
http://www.ncbi.nlm.nih.gov:80/Entrez/linking.html

and

FETCHING ENTRIES FROM COMMAND LINE 
If option -u 'accnum' is used, the program will fetch the given db entry and
print it to std out or in the given filename or a filename generated
consisting of the accnum with extension appended (i.e. AX024675.gn or .fn
for fasta). Multiple accnums can be retrieved by separating the accnums with
a colon, but without any space (i.e. AA123456,AX34567,..). With option -U
accession numbers will be read from a file containing one acc. num each
line. The same is with option -I, but file is a gbwget generated listfile
(where one line holds the complete entry with fields separated by +++. If
option -o or -g is given, the entries will be saved in different files. -o
'myfil' will save the entries in myfil.1.gn myfil.2.gn etc.  File extesnion
is .gn for genbank/genpept-nucleotide database. Fasta format would be .fn
etc. -d defines display options: "g" Genbank/Genpept format, "f" Fasta, "m"
Medline links, "n" nucleotide links, "p" peptide links. -n or -p selects the
database to search/retrieve from. -n will set nucleotide database and -p
protein database.

DATABASE SELECTION
Prepend the accnum with one of the following:
embl:|em:|gb:|genbank:|swiss:|sw:|gi:
Example: gbwget -u embl:XXU13858 fetches the embl entry for the pGEX5x3
expression vector. 

FETCHING SEARCH LISTS
When using gbwget with option -l and 'searchterm' it will query genbank and
dump a human readable list to stdout or outfile. Display options, and
database selection as above. When given option -S it will display/save in
gbwget listfile format. These lists can be reprocessed with gbwget in
offline mode.

OFFLINE MODE
Option -O or -L listfile enters gbwget without prior connection to genbank.

Screenshots (not GUI!)
invoked as: "gbwget actinobacil*+pleuropneu*" (combined search)

Mainscreen:

Searchresults 1 -  22 of 100 (Dopt: g / max: 100 / Fullview / cache: 0)
[   1] - AF363363  Actinobacillus pleuropneumoniae RTX-toxin IIIA gene, complete
[   2] - AF363362  Actinobacillus pleuropneumoniae RTX toxin IIA gene, complete 
[   3] - AF363361  Actinobacillus pleuropneumoniae RTX toxin IA gene, complete c
[   4] - AY017472  Actinobacillus pleuropneumoniae HS143 16S ribosomal RNA gene,
[   5] - AF013776  Salmonella typhimurium PagJ (pagJ) and SspH1 (sspH1) genes, c
[   6] - AE005174  Escherichia coli O157:H7, complete genome
[   7] - AE005215  Escherichia coli O157:H7 EDL933 genome, contig 1 of 3, sectio
[   8] - AP002551  Escherichia coli O157:H7 DNA, complete genome, section 2/20
[   9] - AF275732  Actinobacillus pleuropneumoniae KDO transferase (msbB) gene, 
[  10] - AF275731  Actinobacillus pleuropneumoniae DNA helicase (dnaB) gene, par
[  11] - AF275730  Actinobacillus pleuropneumoniae DNA topoisomerase III (topB) 
[  12] - AF275729  Actinobacillus pleuropneumoniae hypothetical protein gene, co
[  13] - AF275728  Actinobacillus pleuropneumoniae aminopeptidase gene, partial 
[  14] - AF275727  Actinobacillus pleuropneumoniae hypothetical protein gene, pa
[  15] - AF275726  Actinobacillus pleuropneumoniae fatty acid CoA ligase gene, c
[  16] - X99607    A.pleuropneumoniae omlaA gene, partial
[  17] - AL583918  Mycobacterium leprae strain TN complete genome; segment 2/10
[  18] - AF167577  Actinobacillus pleuropneumoniae transcriptional regulator Apu
[  19] - AF143906  Actinobacillus pleuropneumoniae CpxD (cpxD) gene, partial cds
[  20] - AF143905  Actinobacillus pleuropneumoniae putative LPS biosynthesis pro
[  21] - AF143904  Actinobacillus pleuropneumoniae putative galactosyl transfera
[  22] - AF053017  Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate uridy
 e(X)it (P)rev. (D)nld. (V)iew (F)ull (G)rep (B)ack (N)ew (O)pt go(T)o f(I)le
 (M)ark (U)nmark (H)elp Enter=next page 

(V)iew entry 22:

Genbank AccNo: AF053017
Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate 
uridylyltransferase (galU) gene, complete cds; and unknown gene.
gi|3372536|gb|AF053017.1|AF053017

Press Enter to go back

(F)ull view

LOCUS       AF053017     1850 bp    DNA             BCT       10-JAN-2001
DEFINITION  Actinobacillus pleuropneumoniae UTP-glucose-1-phosphate
            uridylyltransferase (galU) gene, complete cds; and unknown gene.
ACCESSION   AF053017
VERSION     AF053017.1  GI:3372536
KEYWORDS    .
SOURCE      Actinobacillus pleuropneumoniae.
  ORGANISM  Actinobacillus pleuropneumoniae
            Bacteria; Proteobacteria; gamma subdivision; Pasteurellaceae;
            Actinobacillus.
REFERENCE   1  (bases 1 to 1850)
  AUTHORS   Rioux,S., Galarneau,C., Harel,J., Frey,J., Nicolet,J., Kobisch,M.,
            Dubreuil,J.D. and Jacques,M.
  TITLE     Isolation and characterization of mini-Tn10 lipopolysaccharide
            mutants of Actinobacillus pleuropneumoniae serotype 1
  JOURNAL   Can. J. Microbiol. 45 (12), 1017-1026 (1999)
  MEDLINE   20161471
   PUBMED   10696481
REFERENCE   2  (bases 1 to 1850)
  AUTHORS   Rioux,S., Harel,J., Frey,J., Nicolet,J., Kobisch,M., Dubreuil,J.D.
            and Jacques,M.
  TITLE     Direct Submission

Press Q for quit or Enter for next page 


(G)rep to select only entries with "putative"

Searchresults 1 -  6 of 6 (Dopt: g / max: 100 / Grepmode / cache: 1)
[   1] - AF167577  Actinobacillus pleuropneumoniae transcriptional regulator Apu
[   2] - AF143905  Actinobacillus pleuropneumoniae putative LPS biosynthesis pro
[   3] - AF143904  Actinobacillus pleuropneumoniae putative galactosyl transfera
[   4] - AF329453  Actinobacillus pleuropneumoniae strain 4074 putative glycosyl
[   5] - AF329452  Actinobacillus pleuropneumoniae putative O-antigen biosynthes
[   6] - AF030523  Sinorhizobium meliloti putative periplasmic iron-binding prot



 e(X)it (P)rev. (D)nld. (V)iew (F)ull (G)rep (B)ack (N)ew (O)pt go(T)o f(I)le
 (M)ark (U)nmark (H)elp Enter=next page 

f(I)le

S: Save list to listfile,
M: Save marked entries to listfile
A: Append list to listfile,
P: Append marked entries to listfile
L: Load list from listfile
>> 


INTEGRATION INTO EMBOSS AS EXTERNAL APP You need to have a working
installation of emboss. To use gbwget as app just edit your .embossrc file
and add entries like:
------------------------.embossrc--------------------
DB gb [
#required parameters
method: app
format: genbank
app: "/home/seb/bin/gbwget -u "
#optional parameters
type: N
comment: "(gb) gbwget in genbank format"
]

DB embl [
#required parameters
method: app
format: genbank
app: "/home/seb/bin/gbwget -u "
#optional parameters
type: N
comment: "(embl) gbwget genbank format"
]

DB swiss [
#required parameters
method: app
format: genbank
app: "/home/seb/bin/gbwget -u "
#optional parameters
type: P
comment: "(swiss) gbwget genbank format"
]
-----------------------------------------------------

Since the emboss applications start the external app with a parameter in the
form of "gb:XXXXXXX" for an USA of ::gb:XXXXXXX you have to use the database
names in the above form, or you need to change code in gbwget.

Some programs of the emboss suite might not work with gbwget like
textsearch, however gbwget offers similar features.

BUGS
I hope not so many. And PERL experts: please do not flame me for writing such spaghetti code/unefficient code or whatever. It was done a little bit to experiment with perl.

Please send bug reports to me: Sebastian.Bunka@vu-wien.ac.at

Have fun!