g***@soe.ucsc.edu
2014-09-13 17:05:36 UTC
=============================================================================
Today's topic summary
=============================================================================
Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics
- should be simple in table browser [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794
- C. elegans assembly version [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/f52433356d409205
- about gene sequence [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c4062283cfc10bb3
- About ENCODE TFBS peaks [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/daf3eed2dec10eec
- Genome browser and exon data conventions [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/a20686ead53bb3aa
=============================================================================
Topic: should be simple in table browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794
=============================================================================
---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 12 03:41PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/79d1896c18c8937c
Hi Bill,
Thank you for your question about getting gene symbols as part of your
from the Table Browser. You are on the right track with your current
Table Browser settings, and the only issue is your output settings. The
reason you are not getting gene symbols as part of your output is
because these are not stored in the knownGene table for the UCSC Genes
track, but instead stored in a linked table. When you select the output
option "all fields from selected table", you are only getting the
information contained in the knownGene table. I recommend using the
"selected fields from primary and related tables" output option. After
you click "get output", you will be taken to another page where you will
be able to select fields from both the knownGene table and various
linked tables that you want as part of your output. On this page, select
those fields from the "Select Fields from hg19.knownGene" section that
you are interested in. In the "hg19.kgXref fields" section, you will
find a number of alternative IDs for the transcripts in the knownGene
table. Check the box next to "geneSymbol", and any other IDs you are
interested in. Finally, click "get output". Your output will consist of
the fields you selected as columns in order starting from the top of the
"hg19.knownGene" section. While this output option doesn't necessarily
format your output in a terribly useful way, you can use a simple UNIX
command line utility such as awk to rearrange the columns however you want.
I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 9/11/14, 7:54 AM, LaFramboise, William A wrote:
=============================================================================
Topic: C. elegans assembly version
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/f52433356d409205
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 12 02:45PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/8c86023b63d8a284
Hello Peter,
Our funding mandates that we focus on vertebrate genomes, so we have been
unable to pursue an update to C. elegans. You are welcome to create an
assembly hub, however, as noted in this mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/topic/genome/eZ_wBLH66I0/discussion.
Assembly hubs are a tool we developed to allow users to display their own
genome assemblies and accompanying annotation in the UCSC Genome Browser.
For more information on creating assembly hubs, please review the track hub
help pages at http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html and
the assembly hubs wiki page at
http://genomewiki.ucsc.edu/index.php/Assembly_Hubs.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
On Fri, Sep 12, 2014 at 1:07 AM, Peter Frommolt <***@uni-koeln.de
=============================================================================
Topic: about gene sequence
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c4062283cfc10bb3
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 12 12:56PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/7d8c0fa9becc1d14
Hello,
Thanks for your question. I'm assuming you used the Human mRNAs track on
hg19 to download the sequence:
http://genome.ucsc.edu/cgi-bin/hgc?db=hg19&g=mrna&i=AY590150. This track is
generated by aligning GenBank human mRNAs against the genome using BLAT. It
appears PIF (proteolysis inducing factor) has not yet been annotated in the
mouse reference assembly, see http://www.ncbi.nlm.nih.gov/gene/100126829.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
=============================================================================
Topic: About ENCODE TFBS peaks
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/daf3eed2dec10eec
=============================================================================
---------- 1 of 1 ----------
From: Brian Lee <***@soe.ucsc.edu>
Date: Sep 12 12:16PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/d7dafc963a6c0b7b
Dear Hai-Bing Xie,
Thank you for using the UCSC Genome Browser and your question about the
chromosomal coordinates for Factorbook-identified canonical motifs seen as
green highlighted bars in the clustered transcription factor binding sites
track.
The Factorbook motif identifications and localizations where provided by
the Zlab (http://zlab.umassmed.edu/zlab/) at the UMass Medical School and
are available in two tables, the first providing the position of each
factorbook item, factorbookMotifPos, the second providing the position
weight matrix, factorbookMotifPwm.
These are located in the general hg19 annotation database section of our
hgdownload server along with a corresponding .sql file:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPos.txt.gz
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPwm.txt.gz
You can access these table via the Public MySQL server:
http://genome.ucsc.edu/goldenPath/help/mysql.html
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'show tables
like "factorbook%";' hg19
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'select * from
factorbookMotifPos;' hg19
There are two additional tables, factorbookMotifCanonical and
factorbookGeneAlias, that help map the information from the Zlab to the
target terms used in the UCSC Genome Browser.
You can alternatively use the hg19 Table Browser to access these tables:
http://genome.ucsc.edu/cgi-bin/hgTables
1. Set the "group:" to "All tables"
2. Set the table to "factorbookMotifPos"
3. Click "genome" to get the entire table, or click the "define regions"
button and get enter coordinates of interest, such as "chrX 14000000
150000000".
4. Click "get output". If desired, you could set "output format" to "custom
track" and see the results in the browser.
What is displayed in the wgEncodeRegTfbsClustered track is the result of a
computational mapping of the factorbookMotifPos items to the clustered TFBS
locations filtered for the highest score per cluster. There is not an easy
path to obtain these exact mappings, but you can perform similar operations
with the Table Browser.
For example if you were looking at the region around SOD1,
chr21:33,031,597-33,041,570, you could enter this as the defined region in
the Table Browser (step 3).
4. Click the "create" button next to "filter".
5. Set the "score" is ">" then a desired amount, such as "2" and click
"submit".
6. Click the "create" button next to "intersection".
7. Select "group: Regulation" and "track: Txn Factor ChIP" and "table:
wgEncodeRegTfbsClusteredV3" then click "submit".
8. Click "get output". If desired, you could set "output format" to "custom
track" and see the results in the browser.
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group
=============================================================================
Topic: Genome browser and exon data conventions
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/a20686ead53bb3aa
=============================================================================
---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 12 10:35AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/684bcfe89ee96d4
Hi Turgay,
Thank you for your questions about the UCSC Genome Browser. You are
correct about the coordinate systems used in our tables, and
subsequently in the files you downloaded. In our tables, we use 0-based
start coordinates and 1-based end coordinates. You can read more this on
the following pages:
* http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
* http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
By default, the Genome Browser displays the sequence of the plus strand.
The NBPF1 gene, however, is on the minus strand. When you were looking
at this position in the Browser, it is likely that you were looking at
the sequence for this codon on the plus strand, which is "AGA". If you
were to view this on the minus strand, you would see that the sequence
for this codon is "UCU", which does in fact code for serine. Please see
this answer to a previous mailing list question for a great explanation
of our strand display for genes and how to view the minus strand in the
Browser:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/_EjI7ddU_PY/O9UB7DwBvc8J.
I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 9/11/14, 8:40 AM, Turgay Aytac wrote:
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.
Today's topic summary
=============================================================================
Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics
- should be simple in table browser [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794
- C. elegans assembly version [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/f52433356d409205
- about gene sequence [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c4062283cfc10bb3
- About ENCODE TFBS peaks [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/daf3eed2dec10eec
- Genome browser and exon data conventions [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/a20686ead53bb3aa
=============================================================================
Topic: should be simple in table browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794
=============================================================================
---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 12 03:41PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/79d1896c18c8937c
Hi Bill,
Thank you for your question about getting gene symbols as part of your
from the Table Browser. You are on the right track with your current
Table Browser settings, and the only issue is your output settings. The
reason you are not getting gene symbols as part of your output is
because these are not stored in the knownGene table for the UCSC Genes
track, but instead stored in a linked table. When you select the output
option "all fields from selected table", you are only getting the
information contained in the knownGene table. I recommend using the
"selected fields from primary and related tables" output option. After
you click "get output", you will be taken to another page where you will
be able to select fields from both the knownGene table and various
linked tables that you want as part of your output. On this page, select
those fields from the "Select Fields from hg19.knownGene" section that
you are interested in. In the "hg19.kgXref fields" section, you will
find a number of alternative IDs for the transcripts in the knownGene
table. Check the box next to "geneSymbol", and any other IDs you are
interested in. Finally, click "get output". Your output will consist of
the fields you selected as columns in order starting from the top of the
"hg19.knownGene" section. While this output option doesn't necessarily
format your output in a terribly useful way, you can use a simple UNIX
command line utility such as awk to rearrange the columns however you want.
I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 9/11/14, 7:54 AM, LaFramboise, William A wrote:
=============================================================================
Topic: C. elegans assembly version
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/f52433356d409205
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 12 02:45PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/8c86023b63d8a284
Hello Peter,
Our funding mandates that we focus on vertebrate genomes, so we have been
unable to pursue an update to C. elegans. You are welcome to create an
assembly hub, however, as noted in this mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/topic/genome/eZ_wBLH66I0/discussion.
Assembly hubs are a tool we developed to allow users to display their own
genome assemblies and accompanying annotation in the UCSC Genome Browser.
For more information on creating assembly hubs, please review the track hub
help pages at http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html and
the assembly hubs wiki page at
http://genomewiki.ucsc.edu/index.php/Assembly_Hubs.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
On Fri, Sep 12, 2014 at 1:07 AM, Peter Frommolt <***@uni-koeln.de
=============================================================================
Topic: about gene sequence
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c4062283cfc10bb3
=============================================================================
---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 12 12:56PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/7d8c0fa9becc1d14
Hello,
Thanks for your question. I'm assuming you used the Human mRNAs track on
hg19 to download the sequence:
http://genome.ucsc.edu/cgi-bin/hgc?db=hg19&g=mrna&i=AY590150. This track is
generated by aligning GenBank human mRNAs against the genome using BLAT. It
appears PIF (proteolysis inducing factor) has not yet been annotated in the
mouse reference assembly, see http://www.ncbi.nlm.nih.gov/gene/100126829.
If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
=============================================================================
Topic: About ENCODE TFBS peaks
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/daf3eed2dec10eec
=============================================================================
---------- 1 of 1 ----------
From: Brian Lee <***@soe.ucsc.edu>
Date: Sep 12 12:16PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/d7dafc963a6c0b7b
Dear Hai-Bing Xie,
Thank you for using the UCSC Genome Browser and your question about the
chromosomal coordinates for Factorbook-identified canonical motifs seen as
green highlighted bars in the clustered transcription factor binding sites
track.
The Factorbook motif identifications and localizations where provided by
the Zlab (http://zlab.umassmed.edu/zlab/) at the UMass Medical School and
are available in two tables, the first providing the position of each
factorbook item, factorbookMotifPos, the second providing the position
weight matrix, factorbookMotifPwm.
These are located in the general hg19 annotation database section of our
hgdownload server along with a corresponding .sql file:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPos.txt.gz
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPwm.txt.gz
You can access these table via the Public MySQL server:
http://genome.ucsc.edu/goldenPath/help/mysql.html
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'show tables
like "factorbook%";' hg19
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e 'select * from
factorbookMotifPos;' hg19
There are two additional tables, factorbookMotifCanonical and
factorbookGeneAlias, that help map the information from the Zlab to the
target terms used in the UCSC Genome Browser.
You can alternatively use the hg19 Table Browser to access these tables:
http://genome.ucsc.edu/cgi-bin/hgTables
1. Set the "group:" to "All tables"
2. Set the table to "factorbookMotifPos"
3. Click "genome" to get the entire table, or click the "define regions"
button and get enter coordinates of interest, such as "chrX 14000000
150000000".
4. Click "get output". If desired, you could set "output format" to "custom
track" and see the results in the browser.
What is displayed in the wgEncodeRegTfbsClustered track is the result of a
computational mapping of the factorbookMotifPos items to the clustered TFBS
locations filtered for the highest score per cluster. There is not an easy
path to obtain these exact mappings, but you can perform similar operations
with the Table Browser.
For example if you were looking at the region around SOD1,
chr21:33,031,597-33,041,570, you could enter this as the defined region in
the Table Browser (step 3).
4. Click the "create" button next to "filter".
5. Set the "score" is ">" then a desired amount, such as "2" and click
"submit".
6. Click the "create" button next to "intersection".
7. Select "group: Regulation" and "track: Txn Factor ChIP" and "table:
wgEncodeRegTfbsClusteredV3" then click "submit".
8. Click "get output". If desired, you could set "output format" to "custom
track" and see the results in the browser.
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group
=============================================================================
Topic: Genome browser and exon data conventions
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/a20686ead53bb3aa
=============================================================================
---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 12 10:35AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/684bcfe89ee96d4
Hi Turgay,
Thank you for your questions about the UCSC Genome Browser. You are
correct about the coordinate systems used in our tables, and
subsequently in the files you downloaded. In our tables, we use 0-based
start coordinates and 1-based end coordinates. You can read more this on
the following pages:
* http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
* http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
By default, the Genome Browser displays the sequence of the plus strand.
The NBPF1 gene, however, is on the minus strand. When you were looking
at this position in the Browser, it is likely that you were looking at
the sequence for this codon on the plus strand, which is "AGA". If you
were to view this on the minus strand, you would see that the sequence
for this codon is "UCU", which does in fact code for serine. Please see
this answer to a previous mailing list question for a great explanation
of our strand display for genes and how to view the minus strand in the
Browser:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/_EjI7ddU_PY/O9UB7DwBvc8J.
I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 9/11/14, 8:40 AM, Turgay Aytac wrote:
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.