Discussion:
Creating a refGene.txt file
Shaun Tyler
2009-08-06 22:59:41 UTC
Permalink
I would like to make better use of the 454 software for mapping genome
variations and in order to do that it requires annotations to be in the
refGene.txt format. Currently with just a reference sequence we get
variation coordinates but coding changes are what we’re ultimately after.
Our work is mainly with bacterial pathogens so I assume we’ll have to
create these annotation files ourselves. Is a script available for
converting a standard GenBank file (or other format) into refGene.txt
format? Or are there any suggestion on how to go about this. Our
bioinformatics guys are stretched pretty thin so if there is a ready made
solution out there I'd rather not bug them for this. Thanks.

Shaun


***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
EMail: ***@phac-aspc.gc.ca

_______________________________________________
Genome maillist - ***@soe.ucsc.edu
https://list
Hiram Clawson
2009-08-07 02:56:50 UTC
Permalink
Good Evening Shaun:

Can you give a pointer to the definition of a refGene.txt file ?
I'm not sure it is clear what this means as a format.

--Hiram
Post by Shaun Tyler
I would like to make better use of the 454 software for mapping genome
variations and in order to do that it requires annotations to be in the
refGene.txt format. Currently with just a reference sequence we get
variation coordinates but coding changes are what we’re ultimately after.
Our work is mainly with bacterial pathogens so I assume we’ll have to
create these annotation files ourselves. Is a script available for
converting a standard GenBank file (or other format) into refGene.txt
format? Or are there any suggestion on how to go about this. Our
bioinformatics guys are stretched pretty thin so if there is a ready made
solution out there I'd rather not bug them for this. Thanks.
Shaun
***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - ***@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome
Brooke Rhead
2009-08-07 07:27:02 UTC
Permalink
Hi Shaun,

By refGene.txt file do you mean that you need to get your file into our
genePred format (described here:
http://genome.ucsc.edu/FAQ/FAQformat#format9)? We do not have anything
that will pull annotations out of GenBank entries and get them into
genePred format, but there are some tools in the Genome Browser source
tree to convert from GTF and GFF to genePred.

"gtfToGenePred" converts a GTF file to a genePred file. "ldHgGene" with
the "-out=someFile" option can convert a GFF to a genePred. You can see
the usage statements by typing them at the command line:


$ gtfToGenePred
gtfToGenePred - convert a GTF file to a genePred
usage:
gtfToGenePred gtf genePred

options:
-genePredExt - create a extended genePred, including frame
information and gene name
-allErrors - skip groups with errors rather than aborting.
Useful for getting infomation about as many errors as possible.
-infoOut=file - write a file with information on each transcript
-sourcePrefix=pre - only process entries where the source name has the
specified prefix. May be repeated.
-impliedStopAfterCds - implied stop codon in after CDS


$ ldHgGene
ldHgGene - load database with gene predictions from a gff file.
usage:
ldHgGene database table file(s).gff
options:
-bin Add bin column (now the default)
-nobin don't add binning (you probably don't want this)
-exon=type Sets type field for exons to specific value
-oldTable Don't overwrite what's already in table
-noncoding Forces whole prediction to be UTR
-gtf input is GTF, stop codon is not in CDS
-predTab input is already in genePredTab format
-requireCDS discard genes that don't have CDS annotation
-out=gpfile write output, in genePred format, instead of loading
table. Database is ignored.
-genePredExt create a extended genePred, including frame
information and gene name
-impliedStopAfterCds - implied stop codon in GFF/GTF after CDS


I hope one of these will work for you. The source tree is available
free for academic and non-profit use:
http://hgdownload.cse.ucsc.edu/downloads.html#SOURCE_DOWNLOADS
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Post by Hiram Clawson
Can you give a pointer to the definition of a refGene.txt file ?
I'm not sure it is clear what this means as a format.
--Hiram
Post by Shaun Tyler
I would like to make better use of the 454 software for mapping genome
variations and in order to do that it requires annotations to be in the
refGene.txt format. Currently with just a reference sequence we get
variation coordinates but coding changes are what we’re ultimately after.
Our work is mainly with bacterial pathogens so I assume we’ll have to
create these annotation files ourselves. Is a script available for
converting a standard GenBank file (or other format) into refGene.txt
format? Or are there any suggestion on how to go about this. Our
bioinformatics guys are stretched pretty thin so if there is a ready made
solution out there I'd rather not bug them for this. Thanks.
Shaun
***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - ***@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome
Shaun Tyler
2009-08-07 13:52:41 UTC
Permalink
Thanks Brooke.

This may come as a surprise but the 454 software manual is not actually
overflowing with information about this :-( Regarding the genome
annotation file they simply state “The format of this file must match that
of the GoldenPath “refGene.txt” file.”. However, after much Googling I was
able to discover that Yes this is the genePred format so maybe your
suggestion will be of use. Unfortunately I think I’m going to have to run
this by our bioinformatics guys after all. Once you get beyond clicking a
mouse my computer skills are severely limited ;-)

Shaun


***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
EMail: ***@phac-aspc.gc.ca



Brooke Rhead
<***@soe.ucsc.e
du> To
Shaun Tyler
2009-08-07 02:27 <***@phac-aspc.gc.ca>
AM cc
***@soe.ucsc.edu
Subject
Re: [Genome] Creating a refGene.txt
file










Hi Shaun,

By refGene.txt file do you mean that you need to get your file into our
genePred format (described here:
http://genome.ucsc.edu/FAQ/FAQformat#format9)? We do not have anything
that will pull annotations out of GenBank entries and get them into
genePred format, but there are some tools in the Genome Browser source
tree to convert from GTF and GFF to genePred.

"gtfToGenePred" converts a GTF file to a genePred file. "ldHgGene" with
the "-out=someFile" option can convert a GFF to a genePred. You can see
the usage statements by typing them at the command line:


$ gtfToGenePred
gtfToGenePred - convert a GTF file to a genePred
usage:
gtfToGenePred gtf genePred

options:
-genePredExt - create a extended genePred, including frame
information and gene name
-allErrors - skip groups with errors rather than aborting.
Useful for getting infomation about as many errors as possible.
-infoOut=file - write a file with information on each transcript
-sourcePrefix=pre - only process entries where the source name has
the
specified prefix. May be repeated.
-impliedStopAfterCds - implied stop codon in after CDS


$ ldHgGene
ldHgGene - load database with gene predictions from a gff file.
usage:
ldHgGene database table file(s).gff
options:
-bin Add bin column (now the default)
-nobin don't add binning (you probably don't want this)
-exon=type Sets type field for exons to specific value
-oldTable Don't overwrite what's already in table
-noncoding Forces whole prediction to be UTR
-gtf input is GTF, stop codon is not in CDS
-predTab input is already in genePredTab format
-requireCDS discard genes that don't have CDS annotation
-out=gpfile write output, in genePred format, instead of loading
table. Database is ignored.
-genePredExt create a extended genePred, including frame
information and gene name
-impliedStopAfterCds - implied stop codon in GFF/GTF after CDS


I hope one of these will work for you. The source tree is available
free for academic and non-profit use:
http://hgdownload.cse.ucsc.edu/downloads.html#SOURCE_DOWNLOADS

--
Brooke Rhead
UCSC Genome Bioinformatics Group
Post by Hiram Clawson
Can you give a pointer to the definition of a refGene.txt file ?
I'm not sure it is clear what this means as a format.
--Hiram
Post by Shaun Tyler
I would like to make better use of the 454 software for mapping genome
variations and in order to do that it requires annotations to be in the
refGene.txt format. Currently with just a reference sequence we get
variation coordinates but coding changes are what we’re ultimately
after.
Post by Hiram Clawson
Post by Shaun Tyler
Our work is mainly with bacterial pathogens so I assume we’ll have to
create these annotation files ourselves. Is a script available for
converting a standard GenBank file (or other format) into refGene.txt
format? Or are there any suggestion on how to go about this. Our
bioinformatics guys are stretched pretty thin so if there is a ready made
solution out there I'd rather not bug them for this. Thanks.
Shaun
***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - ***@soe.ucsc.edu
https://lists
Ann Zweig
2009-08-07 15:17:07 UTC
Permalink
Hello Shaun,

Good luck with that and be sure to write back to this list if you run into any more problems.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Post by Shaun Tyler
Thanks Brooke.
This may come as a surprise but the 454 software manual is not
actually
overflowing with information about this :-( Regarding the genome
annotation file they simply state “The format of this file must match that
of the GoldenPath “refGene.txt” file.”. However, after much Googling I was
able to discover that Yes this is the genePred format so maybe your
suggestion will be of use. Unfortunately I think I’m going to have to run
this by our bioinformatics guys after all. Once you get beyond clicking a
mouse my computer skills are severely limited ;-)
Shaun
***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
Brooke Rhead
du>
To
Shaun Tyler
AM
cc
Subject
Re: [Genome] Creating a
refGene.txt
file
Hi Shaun,
By refGene.txt file do you mean that you need to get your file into our
http://genome.ucsc.edu/FAQ/FAQformat#format9)? We do not have
anything
that will pull annotations out of GenBank entries and get them into
genePred format, but there are some tools in the Genome Browser source
tree to convert from GTF and GFF to genePred.
"gtfToGenePred" converts a GTF file to a genePred file. "ldHgGene" with
the "-out=someFile" option can convert a GFF to a genePred. You can see
$ gtfToGenePred
gtfToGenePred - convert a GTF file to a genePred
gtfToGenePred gtf genePred
-genePredExt - create a extended genePred, including frame
information and gene name
-allErrors - skip groups with errors rather than aborting.
Useful for getting infomation about as many errors as possible.
-infoOut=file - write a file with information on each transcript
-sourcePrefix=pre - only process entries where the source name has
the
specified prefix. May be repeated.
-impliedStopAfterCds - implied stop codon in after CDS
$ ldHgGene
ldHgGene - load database with gene predictions from a gff file.
ldHgGene database table file(s).gff
-bin Add bin column (now the default)
-nobin don't add binning (you probably don't want this)
-exon=type Sets type field for exons to specific value
-oldTable Don't overwrite what's already in table
-noncoding Forces whole prediction to be UTR
-gtf input is GTF, stop codon is not in CDS
-predTab input is already in genePredTab format
-requireCDS discard genes that don't have CDS annotation
-out=gpfile write output, in genePred format, instead of loading
table. Database is ignored.
-genePredExt create a extended genePred, including frame
information and gene name
-impliedStopAfterCds - implied stop codon in GFF/GTF after CDS
I hope one of these will work for you. The source tree is available
http://hgdownload.cse.ucsc.edu/downloads.html#SOURCE_DOWNLOADS
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Post by Hiram Clawson
Can you give a pointer to the definition of a refGene.txt file ?
I'm not sure it is clear what this means as a format.
--Hiram
Post by Shaun Tyler
I would like to make better use of the 454 software for mapping
genome
Post by Hiram Clawson
Post by Shaun Tyler
variations and in order to do that it requires annotations to be in
the
Post by Hiram Clawson
Post by Shaun Tyler
refGene.txt format. Currently with just a reference sequence we
get
Post by Hiram Clawson
Post by Shaun Tyler
variation coordinates but coding changes are what we’re
ultimately
after.
Post by Hiram Clawson
Post by Shaun Tyler
Our work is mainly with bacterial pathogens so I assume we’ll
have to
Post by Hiram Clawson
Post by Shaun Tyler
create these annotation files ourselves. Is a script available for
converting a standard GenBank file (or other format) into
refGene.txt
Post by Hiram Clawson
Post by Shaun Tyler
format? Or are there any suggestion on how to go about this. Our
bioinformatics guys are stretched pretty thin so if there is a
ready
made
Post by Hiram Clawson
Post by Shaun Tyler
solution out there I'd rather not bug them for this. Thanks.
Shaun
***********************************************
Shaun Tyler
Head, DNA Core Facility and
International Depositary Authority of Canada
National Microbiology Laboratory
Public Health Agency of Canada
Canadian Science Centre for Human and Animal Health
1015 Arlington St., Suite H3130
Winnipeg, MB R3E 3R2
Ph: 204-789-6030
Fax: 204-789-2018
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - ***@soe.ucsc.edu
https://lists.

Loading...