g***@soe.ucsc.edu
2014-09-18 17:19:27 UTC
=============================================================================
Today's topic summary
=============================================================================
Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics
- UCSC and problem with ClinVar / LOVD variants [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/8606b5b68c73797e
- GENCODE gtf [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cc7fa5fe83720933
- Liftover [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/747458544e5b6f76
- BLAT [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/3d0f97c5b749a8ed
- blastz-run-ucsc ignores -dropSelf [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/29af0d253b94a7f1
=============================================================================
Topic: UCSC and problem with ClinVar / LOVD variants
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/8606b5b68c73797e
=============================================================================
---------- 1 of 1 ----------
From: Konstantinos Varvagiannis <***@unige.ch>
Date: Sep 18 05:01PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/f56041185d9f1a7c
Dear Colleague,
I am writing concerning a problem concerning the UCSC representation of ClinVar and LOVD variants.
I might be doing something wrong but as I am searching for a variant that figures both in ClinVar (here<http://www.ncbi.nlm.nih.gov/clinvar/variation/67020/>) and the LOVD database (here<http://www.genomed.org/lovd2/variants.php?select_db=KCNQ1&action=view&view=0006234%2C0000634%2C0>), I cannot manage to see it in the UCSC browser (link for the graphic here<http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr11%3A2608835-2608865&hgsid=389271325_NIos3iyfZUvfaXKsvVN2C4YqQfyt>). This happens even though both ClinVar and LOVD fields are set to dense. For your convenience I also send you the graph in pdf format.
Could you please help me with this matter?
Thank you in advance for your time and your consideration.
Sincerely,
Konstantinos Varvagiannis
University of Geneva
Konstantinos VARVAGIANNIS
Médecin Interne
Département de Génétique Médicale
HÎpitaux Universitaires de GenÚve
Tél. +41 (0) 22 37 95 722 / +41 (0) 79 55 33 511
=============================================================================
Topic: GENCODE gtf
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cc7fa5fe83720933
=============================================================================
---------- 1 of 1 ----------
From: "Trakhtenberg, Feliks" <***@childrens.harvard.edu>
Date: Sep 18 05:17AM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/6c2ef19898567987
Hello,
Do you know the expected date when the Gencode M3 track would become available in the Table Browser?
I need it for using the intersection tool to identify UCSC Gene entries that do not overlap with the Gencode M3. I presume that uploading Gencode M3 GTF as a costume track to accomplish my goal would be problematic, because if it was as simple as that I guess it would already have been available in the Table Browser. Is this so? Or uploading it as a costume track may work for my purposes?
Thank you,
Ephraim
________________________________
From: Steve Heitner [***@soe.ucsc.edu]
Sent: Thursday, September 11, 2014 1:06 PM
To: Trakhtenberg, Feliks; 'Jonathan Casper'
Cc: ***@soe.ucsc.edu
Subject: RE: [genome] GENCODE gtf
Hello, Ephraim.
There is no specific order in a GTF file, so it should not be a problem to cat both files into a single file. Regarding the gene symbols being a part of the GTF output, this is a limitation of the way the Table Browser creates GTF output. If you would like the gene symbols to be a part of your GTF files, it will require some scripting on your part. We cannot provide advice on creating a script, but if you would like to use the Table Browser to provide output that will equate transcript ID to gene symbol and RefSeq ID for use in your script, you can follow these instructions:
For GENCODE:
1. As your output format, select âselected fields from primary and related tablesâ
2. Click the âget outputâ button
3. In the âSelect Fields from mm10.wgEncodeGencodeCompVM2â section, check the ânameâ and âname2â checkboxes
4. Click the âget outputâ button
For UCSC Genes:
1. As your output format, select âselected fields from primary and related tablesâ
2. Click the âget outputâ button
3. In the âSelect Fields from mm10.knownGeneâ section, check the ânameâ checkbox
4. In the âmm10.kgXref fieldsâ section, check the âgeneSymbolâ and ârefseqâ checkboxes
5. Click the âget outputâ button
Please contact us again at ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> if you have any further questions. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: Trakhtenberg, Feliks [mailto:***@childrens.harvard.edu]
Sent: Sunday, September 07, 2014 1:11 PM
To: Jonathan Casper
Cc: ***@soe.ucsc.edu
Subject: RE: [genome] GENCODE gtf
Hello,
Thank you for the advice. My goal is to predict novel genes/transcripts. I would like to compile a comprehensive mouse GTF, so that it does not turn out that the novel transcripts I find in my RNAseq have already been predicted in some major database. So, I thought that merging Gencode and UCSC Genes would provide such comprehensive set. Please let me know if this is insufficient.
Using the intersection tool you recommended below, even with no overlap selection, there are about 8k UCSC Gene transcripts not in the Gencode. Does the Table Browser have an option for merging these entries with the Gencode GTF? If not, would this command "cat out.gtf0[0-1] > merged.gtfâ produce a GTF that is compatible with the Table Browser?
The UCSC Gene GTF produced by the Table Browser reports gene and transcript IDs like this: gene_id "uc007aet.1"; transcript_id "uc007aet.1". However, it does not add to the entry the original database (e.g., RefSeq) accession nor gene name. Gencode GTF from the Table Browser also missing the gene names. How could I have the original database IDs and the gene names included in the UCSC Gene GTF produced by the Table Browser, and the gene names included in the Gencode GTF from the Table Browser?
Thanks,
Ephraim
________________________________
From: Jonathan Casper [***@soe.ucsc.edu]
Sent: Thursday, August 14, 2014 9:10 PM
To: Trakhtenberg, Feliks
Cc: ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: Re: [genome] GENCODE gtf
Hello Ephraim,
Our engineers comment that it is difficult to advise you on how to combine gene sets without knowing what you're trying to accomplish specifically. Different gene sets use different predictive models, making it hard to combine them in a scientifically meaningful way.
That said, you can use the UCSC Table Browser intersection tool to get a list of entries found in UCSC Genes but not in GENCODE.
1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables
2. Use the following settings
clade: Mammal
genome: Mouse
assembly: Dec. 2011 (GRCm38/mm10)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: genome
3. Click the "intersection: create" button
4. On the "Intersect with UCSC Genes" page, set the following options:
group: Genes and Gene Predictions
track: GENCODE Genes VM2 (or V3, after it is released)
table: Basic (wgEncodeGencodeBasicVM2)
If you decide after reading the GENCODE track page that the Comprehensive table would be more useful to you, that is also an option.
5. Choose to return "All UCSC Genes records that have no overlap with GENCODE Genes VM2"
Note that the "no overlap" requirement here is fairly strict. You may wish to instead restrict to UCSC Genes records with no more than 50% overlap, for example, depending on your needs.
6. Click "submit" to return to the main Table Browser page
Note that the output format has been changed to BED. You can leave it in that way or change to GTF output. Just remember that the GTF output of the UCSC Table Browser will not exactly match the format of your GENCODE GTF file.
7. Click "get output"
We also have command line tools that will perform this kind of operation, but they are not designed to work with files in GTF. If you would like to explore this alternative, the relevant programs are called "featureBits" and "overlapSelect". They are available as part of the kent utilities on our download server at http://hgdownload.soe.ucsc.edu<http://hgdownload.soe.ucsc.edu/>. We provide precompiled binaries for these utilities at http://hgdownload.soe.ucsc.edu/admin/exe/, but only for a few computer architectures. You may need to download the source code and compile these tools yourself if your computer is not listed there. You can run each program by itself on a command line with no arguments to see a description of how to use it.
As for your other question, RefSeq is a curated set of transcripts drawn from GenBank. Like GenBank, it is quite possible that there will be RefSeq transcripts that are not represented in GENCODE.
I hope this is helpful. If you have any further questions, please reply to ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> or genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
On Tue, Aug 12, 2014 at 11:26 AM, Trakhtenberg, Feliks <***@childrens.harvard.edu<mailto:***@childrens.harvard.edu>> wrote:
Hello,
Regarding your answer in point 4 below, is it possible to identify which UCSC Genes track transcripts from GenBank are not found in Ensembl and GENCODEv3? I would like to add them to the GENCODE gtf but do not want redundancies.
What about Refseq transcripts - might there also be some that are included in the UCSC Genes track but not in GENCODEv3, similar to how you explained about the GenBank transcripts?
Thank you,
Ephraim
________________________________
From: Steve Heitner [***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>]
Sent: Monday, August 11, 2014 5:46 PM
To: Trakhtenberg, Feliks; ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: RE: [genome] GENCODE gtf
Hello, Ephraim.
To address all of your questions:
1. We recommend that you get the GTF files from GENCODE (http://www.gencodegenes.org). The Table Browser generates least common denominator GTFs for a lot of tracks and will not contain all of the information available in the official GENCODE GTFs.
2. The GENCODE mouse V3 track will hopefully be available this month (August 2014).
3. For information regarding the different GENCODE subtracks available at UCSC, I recommend reading through the description page at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm10&g=wgEncodeGencodeVM2.
4. Concerning whether or not the GENCODE track contains everything contained in the UCSC Genes track, I donât believe this can be answered definitively. The UCSC Genes track is based on GenBank while the GENCODE track is based on Ensembl. Because these are constructed using completely different methods, you will find in many cases that GenBank contains items that Ensembl does not and vice versa.
Please contact us again at ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: Trakhtenberg, Feliks [mailto:***@childrens.harvard.edu<mailto:***@childrens.harvard.edu>]
Sent: Sunday, August 10, 2014 3:12 PM
To: ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: [genome] GENCODE gtf
Hello,
I would appreciate if some could explain why the GENCODE gtf generated through the Table Browser is lacking gene, transcript, UTR, and Selenocysteine rows, which are present in the original GENCODE file. I plan to use this gtf for Tophat/Cufflinks RNA-seq analysis and just wanted to make sure I am using the right file.
When will the GENCODE mouse V3 be available through the Table Browser?
Is the table option called Comprehensive have the most of GENCODE transcripts, including those that are only predicted? Or other GENCODE tables, such as pseudogenes, have additional transcripts?
Is everything that is in the UCSC Gene table also included in the Comprehensive GENCODE table?
Thank you
Ephraim Trakhtenberg, PhD
--
--
--
=============================================================================
Topic: Liftover
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/747458544e5b6f76
=============================================================================
---------- 1 of 1 ----------
From: Priya Moorjani <***@columbia.edu>
Date: Sep 17 05:41PM -0400
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/28b24e1a06ccd2c3
Hello there,
I am postdoc in Columbia University. I was trying to liftover some regions
from Pantro3 to Hg19 and I found that the results by running the online
tool and the linux version are not consistent. Can you please help me
understand why this might be?
I am using the following command-line -
./liftOver -minMatch=0.1 query.bed panTro3.hg19.all.chain query_hg19.bed
liftOver_unMapped.bed
Parameter minMatch=0.1 to match the online query.
The query.bed is:
chr1 78005 78636
chr1 83751 84027
chr1 84073 84351
chr1 84515 84805
chr1 130390 130810
chr1 131076 131229
chr1 131232 131532
chr1 132441 132562
chr1 132589 133033
chr1 133041 133450
The results are as follows:
Linux version - only one region is mapped and the other regions have the
following error.
#Duplicated in new
chr1 78005 78636
#Duplicated in new
chr1 83751 84027
#Duplicated in new
chr1 84073 84351
#Duplicated in new
chr1 84515 84805
#Duplicated in new
chr1 130390 130810
#Duplicated in new
chr1 131076 131229
#Duplicated in new
chr1 131232 131532
#Deleted in new
chr1 132441 132562
#Deleted in new
chr1 132589 133033
Online version - 8 / 10 regions are mapped and 2 have the error "#Deleted
in new".
Can you please suggest what I can do to resolve the discrepancy? Thank you
in advance.
Best,
Priya
--
Priya Moorjani
Postdoctoral Fellow - Przeworski Lab
Columbia University
Phone: 607-727-2250
Email: ***@columbia.edu
=============================================================================
Topic: BLAT
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/3d0f97c5b749a8ed
=============================================================================
---------- 1 of 2 ----------
From: "Lagerstedt, Susan A." <***@mayo.edu>
Date: Sep 17 08:40PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/3d0300b407690f61
Hello,
Occasionally, I have 1000-3000 sequences that I would like to process through UCSC Blat.
Please tell me about more efficient ways to access your data.
Thank you,
Susan
Susan Lagerstedt | Department of Laboratory Medicine and Pathology | Development Technologist II | Ph: 507-284-4844 | ***@mayo.edu<mailto:***@mayo.edu>
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 | www.mayoclinic.org
---------- 2 of 2 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 17 01:47PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/42001db7c789c56e
Hello Susan,
Thanks for your question. For large batch queries, we recommend downloading
BLAT and running it locally. Please see the following links for more
information:
http://genome.ucsc.edu/FAQ/FAQblat.html#blat3
<http://genome.ucsc.edu/FAQ/FAQblat.html>
http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
Also, here are a couple pages that may be useful:
http://genome.ucsc.edu/goldenPath/help/blatSpec.html
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
On Wed, Sep 17, 2014 at 1:40 PM, Lagerstedt, Susan A. <
=============================================================================
Topic: blastz-run-ucsc ignores -dropSelf
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/29af0d253b94a7f1
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 17 12:55PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/3cf2eceb032a01d1
Hello Michael,
Thank you for contacting us. This is indeed a bug. One of our engineers is
working on a fix for this, however we can't provide a release date at this
time. We will contact you once it has been resolved. Thanks again for
bringing this to our attention.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.
Today's topic summary
=============================================================================
Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics
- UCSC and problem with ClinVar / LOVD variants [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/8606b5b68c73797e
- GENCODE gtf [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cc7fa5fe83720933
- Liftover [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/747458544e5b6f76
- BLAT [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/3d0f97c5b749a8ed
- blastz-run-ucsc ignores -dropSelf [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/29af0d253b94a7f1
=============================================================================
Topic: UCSC and problem with ClinVar / LOVD variants
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/8606b5b68c73797e
=============================================================================
---------- 1 of 1 ----------
From: Konstantinos Varvagiannis <***@unige.ch>
Date: Sep 18 05:01PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/f56041185d9f1a7c
Dear Colleague,
I am writing concerning a problem concerning the UCSC representation of ClinVar and LOVD variants.
I might be doing something wrong but as I am searching for a variant that figures both in ClinVar (here<http://www.ncbi.nlm.nih.gov/clinvar/variation/67020/>) and the LOVD database (here<http://www.genomed.org/lovd2/variants.php?select_db=KCNQ1&action=view&view=0006234%2C0000634%2C0>), I cannot manage to see it in the UCSC browser (link for the graphic here<http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr11%3A2608835-2608865&hgsid=389271325_NIos3iyfZUvfaXKsvVN2C4YqQfyt>). This happens even though both ClinVar and LOVD fields are set to dense. For your convenience I also send you the graph in pdf format.
Could you please help me with this matter?
Thank you in advance for your time and your consideration.
Sincerely,
Konstantinos Varvagiannis
University of Geneva
Konstantinos VARVAGIANNIS
Médecin Interne
Département de Génétique Médicale
HÎpitaux Universitaires de GenÚve
Tél. +41 (0) 22 37 95 722 / +41 (0) 79 55 33 511
=============================================================================
Topic: GENCODE gtf
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cc7fa5fe83720933
=============================================================================
---------- 1 of 1 ----------
From: "Trakhtenberg, Feliks" <***@childrens.harvard.edu>
Date: Sep 18 05:17AM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/6c2ef19898567987
Hello,
Do you know the expected date when the Gencode M3 track would become available in the Table Browser?
I need it for using the intersection tool to identify UCSC Gene entries that do not overlap with the Gencode M3. I presume that uploading Gencode M3 GTF as a costume track to accomplish my goal would be problematic, because if it was as simple as that I guess it would already have been available in the Table Browser. Is this so? Or uploading it as a costume track may work for my purposes?
Thank you,
Ephraim
________________________________
From: Steve Heitner [***@soe.ucsc.edu]
Sent: Thursday, September 11, 2014 1:06 PM
To: Trakhtenberg, Feliks; 'Jonathan Casper'
Cc: ***@soe.ucsc.edu
Subject: RE: [genome] GENCODE gtf
Hello, Ephraim.
There is no specific order in a GTF file, so it should not be a problem to cat both files into a single file. Regarding the gene symbols being a part of the GTF output, this is a limitation of the way the Table Browser creates GTF output. If you would like the gene symbols to be a part of your GTF files, it will require some scripting on your part. We cannot provide advice on creating a script, but if you would like to use the Table Browser to provide output that will equate transcript ID to gene symbol and RefSeq ID for use in your script, you can follow these instructions:
For GENCODE:
1. As your output format, select âselected fields from primary and related tablesâ
2. Click the âget outputâ button
3. In the âSelect Fields from mm10.wgEncodeGencodeCompVM2â section, check the ânameâ and âname2â checkboxes
4. Click the âget outputâ button
For UCSC Genes:
1. As your output format, select âselected fields from primary and related tablesâ
2. Click the âget outputâ button
3. In the âSelect Fields from mm10.knownGeneâ section, check the ânameâ checkbox
4. In the âmm10.kgXref fieldsâ section, check the âgeneSymbolâ and ârefseqâ checkboxes
5. Click the âget outputâ button
Please contact us again at ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> if you have any further questions. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: Trakhtenberg, Feliks [mailto:***@childrens.harvard.edu]
Sent: Sunday, September 07, 2014 1:11 PM
To: Jonathan Casper
Cc: ***@soe.ucsc.edu
Subject: RE: [genome] GENCODE gtf
Hello,
Thank you for the advice. My goal is to predict novel genes/transcripts. I would like to compile a comprehensive mouse GTF, so that it does not turn out that the novel transcripts I find in my RNAseq have already been predicted in some major database. So, I thought that merging Gencode and UCSC Genes would provide such comprehensive set. Please let me know if this is insufficient.
Using the intersection tool you recommended below, even with no overlap selection, there are about 8k UCSC Gene transcripts not in the Gencode. Does the Table Browser have an option for merging these entries with the Gencode GTF? If not, would this command "cat out.gtf0[0-1] > merged.gtfâ produce a GTF that is compatible with the Table Browser?
The UCSC Gene GTF produced by the Table Browser reports gene and transcript IDs like this: gene_id "uc007aet.1"; transcript_id "uc007aet.1". However, it does not add to the entry the original database (e.g., RefSeq) accession nor gene name. Gencode GTF from the Table Browser also missing the gene names. How could I have the original database IDs and the gene names included in the UCSC Gene GTF produced by the Table Browser, and the gene names included in the Gencode GTF from the Table Browser?
Thanks,
Ephraim
________________________________
From: Jonathan Casper [***@soe.ucsc.edu]
Sent: Thursday, August 14, 2014 9:10 PM
To: Trakhtenberg, Feliks
Cc: ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: Re: [genome] GENCODE gtf
Hello Ephraim,
Our engineers comment that it is difficult to advise you on how to combine gene sets without knowing what you're trying to accomplish specifically. Different gene sets use different predictive models, making it hard to combine them in a scientifically meaningful way.
That said, you can use the UCSC Table Browser intersection tool to get a list of entries found in UCSC Genes but not in GENCODE.
1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables
2. Use the following settings
clade: Mammal
genome: Mouse
assembly: Dec. 2011 (GRCm38/mm10)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: genome
3. Click the "intersection: create" button
4. On the "Intersect with UCSC Genes" page, set the following options:
group: Genes and Gene Predictions
track: GENCODE Genes VM2 (or V3, after it is released)
table: Basic (wgEncodeGencodeBasicVM2)
If you decide after reading the GENCODE track page that the Comprehensive table would be more useful to you, that is also an option.
5. Choose to return "All UCSC Genes records that have no overlap with GENCODE Genes VM2"
Note that the "no overlap" requirement here is fairly strict. You may wish to instead restrict to UCSC Genes records with no more than 50% overlap, for example, depending on your needs.
6. Click "submit" to return to the main Table Browser page
Note that the output format has been changed to BED. You can leave it in that way or change to GTF output. Just remember that the GTF output of the UCSC Table Browser will not exactly match the format of your GENCODE GTF file.
7. Click "get output"
We also have command line tools that will perform this kind of operation, but they are not designed to work with files in GTF. If you would like to explore this alternative, the relevant programs are called "featureBits" and "overlapSelect". They are available as part of the kent utilities on our download server at http://hgdownload.soe.ucsc.edu<http://hgdownload.soe.ucsc.edu/>. We provide precompiled binaries for these utilities at http://hgdownload.soe.ucsc.edu/admin/exe/, but only for a few computer architectures. You may need to download the source code and compile these tools yourself if your computer is not listed there. You can run each program by itself on a command line with no arguments to see a description of how to use it.
As for your other question, RefSeq is a curated set of transcripts drawn from GenBank. Like GenBank, it is quite possible that there will be RefSeq transcripts that are not represented in GENCODE.
I hope this is helpful. If you have any further questions, please reply to ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> or genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
On Tue, Aug 12, 2014 at 11:26 AM, Trakhtenberg, Feliks <***@childrens.harvard.edu<mailto:***@childrens.harvard.edu>> wrote:
Hello,
Regarding your answer in point 4 below, is it possible to identify which UCSC Genes track transcripts from GenBank are not found in Ensembl and GENCODEv3? I would like to add them to the GENCODE gtf but do not want redundancies.
What about Refseq transcripts - might there also be some that are included in the UCSC Genes track but not in GENCODEv3, similar to how you explained about the GenBank transcripts?
Thank you,
Ephraim
________________________________
From: Steve Heitner [***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>]
Sent: Monday, August 11, 2014 5:46 PM
To: Trakhtenberg, Feliks; ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: RE: [genome] GENCODE gtf
Hello, Ephraim.
To address all of your questions:
1. We recommend that you get the GTF files from GENCODE (http://www.gencodegenes.org). The Table Browser generates least common denominator GTFs for a lot of tracks and will not contain all of the information available in the official GENCODE GTFs.
2. The GENCODE mouse V3 track will hopefully be available this month (August 2014).
3. For information regarding the different GENCODE subtracks available at UCSC, I recommend reading through the description page at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm10&g=wgEncodeGencodeVM2.
4. Concerning whether or not the GENCODE track contains everything contained in the UCSC Genes track, I donât believe this can be answered definitively. The UCSC Genes track is based on GenBank while the GENCODE track is based on Ensembl. Because these are constructed using completely different methods, you will find in many cases that GenBank contains items that Ensembl does not and vice versa.
Please contact us again at ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: Trakhtenberg, Feliks [mailto:***@childrens.harvard.edu<mailto:***@childrens.harvard.edu>]
Sent: Sunday, August 10, 2014 3:12 PM
To: ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: [genome] GENCODE gtf
Hello,
I would appreciate if some could explain why the GENCODE gtf generated through the Table Browser is lacking gene, transcript, UTR, and Selenocysteine rows, which are present in the original GENCODE file. I plan to use this gtf for Tophat/Cufflinks RNA-seq analysis and just wanted to make sure I am using the right file.
When will the GENCODE mouse V3 be available through the Table Browser?
Is the table option called Comprehensive have the most of GENCODE transcripts, including those that are only predicted? Or other GENCODE tables, such as pseudogenes, have additional transcripts?
Is everything that is in the UCSC Gene table also included in the Comprehensive GENCODE table?
Thank you
Ephraim Trakhtenberg, PhD
--
--
--
=============================================================================
Topic: Liftover
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/747458544e5b6f76
=============================================================================
---------- 1 of 1 ----------
From: Priya Moorjani <***@columbia.edu>
Date: Sep 17 05:41PM -0400
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/28b24e1a06ccd2c3
Hello there,
I am postdoc in Columbia University. I was trying to liftover some regions
from Pantro3 to Hg19 and I found that the results by running the online
tool and the linux version are not consistent. Can you please help me
understand why this might be?
I am using the following command-line -
./liftOver -minMatch=0.1 query.bed panTro3.hg19.all.chain query_hg19.bed
liftOver_unMapped.bed
Parameter minMatch=0.1 to match the online query.
The query.bed is:
chr1 78005 78636
chr1 83751 84027
chr1 84073 84351
chr1 84515 84805
chr1 130390 130810
chr1 131076 131229
chr1 131232 131532
chr1 132441 132562
chr1 132589 133033
chr1 133041 133450
The results are as follows:
Linux version - only one region is mapped and the other regions have the
following error.
#Duplicated in new
chr1 78005 78636
#Duplicated in new
chr1 83751 84027
#Duplicated in new
chr1 84073 84351
#Duplicated in new
chr1 84515 84805
#Duplicated in new
chr1 130390 130810
#Duplicated in new
chr1 131076 131229
#Duplicated in new
chr1 131232 131532
#Deleted in new
chr1 132441 132562
#Deleted in new
chr1 132589 133033
Online version - 8 / 10 regions are mapped and 2 have the error "#Deleted
in new".
Can you please suggest what I can do to resolve the discrepancy? Thank you
in advance.
Best,
Priya
--
Priya Moorjani
Postdoctoral Fellow - Przeworski Lab
Columbia University
Phone: 607-727-2250
Email: ***@columbia.edu
=============================================================================
Topic: BLAT
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/3d0f97c5b749a8ed
=============================================================================
---------- 1 of 2 ----------
From: "Lagerstedt, Susan A." <***@mayo.edu>
Date: Sep 17 08:40PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/3d0300b407690f61
Hello,
Occasionally, I have 1000-3000 sequences that I would like to process through UCSC Blat.
Please tell me about more efficient ways to access your data.
Thank you,
Susan
Susan Lagerstedt | Department of Laboratory Medicine and Pathology | Development Technologist II | Ph: 507-284-4844 | ***@mayo.edu<mailto:***@mayo.edu>
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 | www.mayoclinic.org
---------- 2 of 2 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 17 01:47PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/42001db7c789c56e
Hello Susan,
Thanks for your question. For large batch queries, we recommend downloading
BLAT and running it locally. Please see the following links for more
information:
http://genome.ucsc.edu/FAQ/FAQblat.html#blat3
<http://genome.ucsc.edu/FAQ/FAQblat.html>
http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
Also, here are a couple pages that may be useful:
http://genome.ucsc.edu/goldenPath/help/blatSpec.html
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
On Wed, Sep 17, 2014 at 1:40 PM, Lagerstedt, Susan A. <
=============================================================================
Topic: blastz-run-ucsc ignores -dropSelf
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/29af0d253b94a7f1
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 17 12:55PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/3cf2eceb032a01d1
Hello Michael,
Thank you for contacting us. This is indeed a bug. One of our engineers is
working on a fix for this, however we can't provide a release date at this
time. We will contact you once it has been resolved. Thanks again for
bringing this to our attention.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.