Discussion:
Digest for genome@soe.ucsc.edu - 12 updates in 7 topics
g***@soe.ucsc.edu
2014-09-15 22:24:11 UTC
Permalink
=============================================================================
Today's topic summary
=============================================================================

Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics


- sequence +/- 1kb from txStart [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/45e6c1d61cc029d0
- enquiry: UCSC genome browser can't display my data [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/5c5638cb88a12369
- HBB gene - SNP rs334 - Feb2009 - GRCh37/hg19 [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cd9029b1570a9cbf
- Table browser looking up sequences by positions [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cb3113b186616b0
- LiftOver Help [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/11e858671e94473e
- DNA methylation track hub (Smith Lab) not functioning [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/7aa1a65a71e87b27
- should be simple in table browser [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794


=============================================================================
Topic: sequence +/- 1kb from txStart
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/45e6c1d61cc029d0
=============================================================================

---------- 1 of 2 ----------
From: Jessilyn Dunn <***@gmail.com>
Date: Sep 15 02:10PM -0400
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/7bdfeb5a34d9ce62

Hello,

I am trying to use the UCSC browser to download the sequence for all genes
in the mm9 genome, but only in the region +/-1kb from the txStart.

While I understand that there is a way to retrieve promoter sequences
(-1kb, etc.) using the Table Browser (
http://genome.ucsc.edu/FAQ/FAQdownloads.html#download18) this appears to
only be useful for the upstream sequence, not *both* the up and downstream
sequences. I've also tried using the table browser "region" definition, the
and the user-defined regions (which are limited to 1,000, but there are
~34,000 genes I would need).

Any insight you can provide would be greatly appreciated!
Thank you very much!
Sincerely,
Jessilyn


---------- 2 of 2 ----------
From: "Steve Heitner" <***@soe.ucsc.edu>
Date: Sep 15 02:30PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/279987992413889

Hello, Jessilyn.

It is possible to do this by creating a custom track that contains your chromosome, transcription start site and gene name. You will need to perform some basic scripting to process your output. The general strategy will be as follows:

1. Use the Table Browser to create an output file with output like: chr1 134212701 Nuak2
2. Create a script that adds the start coordinate a second time: chr1 134212701 134212701 Nuak2
3. Load this edited file as a custom track in the Table Browser
4. Obtain sequence and specify 1,000 extra bases both upstream and downstream

Using the Table Browser, perform the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
Clade: Mammal
Genome: Mouse
Assembly: July 2007 (NCBI37/mm9)
Group: Genes and Gene Predictions
Track: RefSeq Genes
Table: refGene
Region: genome
Output format: selected fields from primary and related tables
Output file: enter a name for your file

3. Click the “get output” button

4. In the “Select Fields from mm9.refGene” section, check the “chrom”, “txStart” and “name2” checkboxes

5. Click the “get output” button

6. At this point, you will need to write a script to insert an additional txStart column into your output file. Note that the contents of the refGene table are not sorted, so if you want your results to be ordered, you will also need to sort your file.

7. In the Table Browser, on the right side of the “group” line, click the “add custom tracks” button

8. Next to “Paste URLs or data”, click the “Browse” button to select your edited output file

9. Click the “Submit” button

10. Click the “go to table browser” button

11. Change “output format” to “sequence”

12. Click the “get output” button

13. Insert 1,000 into the upstream and downstream text boxes

14. Click the “get sequence” button


Please contact us again at ***@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group



From: Jessilyn Dunn [mailto:***@gmail.com]
Sent: Monday, September 15, 2014 11:11 AM
To: ***@soe.ucsc.edu
Subject: [genome] sequence +/- 1kb from txStart



Hello,



I am trying to use the UCSC browser to download the sequence for all genes in the mm9 genome, but only in the region +/-1kb from the txStart.



While I understand that there is a way to retrieve promoter sequences (-1kb, etc.) using the Table Browser (http://genome.ucsc.edu/FAQ/FAQdownloads.html#download18) this appears to only be useful for the upstream sequence, not both the up and downstream sequences. I've also tried using the table browser "region" definition, the and the user-defined regions (which are limited to 1,000, but there are ~34,000 genes I would need).



Any insight you can provide would be greatly appreciated!

Thank you very much!

Sincerely,

Jessilyn

--



=============================================================================
Topic: enquiry: UCSC genome browser can't display my data
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/5c5638cb88a12369
=============================================================================

---------- 1 of 2 ----------
From: Mariko Ozu <***@gmail.com>
Date: Sep 15 05:28PM +0900
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/6e44a7b54d6b96bb

Dear person in charge,

I am writing because your Genome browser didn't display my data after
uploading it via Tools-Genome Graphs-upload box.
the data was in .bed format after processing NGS data (Rat RNAseq, fastq,
originally), being named as 'rat_test' and 'rat_test1' when uploading, and
produced from the tophat analysis.

It is so weird that even after min after uploading and after uploading
there was sign of the completion of uploading and available sign on the
web, the data weren't displayed on the Graph site at all.

Would you check what's wrong with the browser or your server?

Thank you for your assistance in advance.

best regards
Mariko Ozu, DVM


---------- 2 of 2 ----------
From: "Steve Heitner" <***@soe.ucsc.edu>
Date: Sep 15 12:40PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/96e9f7b8b79aa8f1

Hello, Mariko.

The Genome Graphs tool does not use BED format, but rather uses its own format. From http://genome.ucsc.edu/cgi-bin/hgGenome, when you click the “upload” button, please carefully read the “Upload file formats” section just below the “Paste URLs or data” text box.

If you cannot supply your data in Genome Graphs format, you could create a custom track from your BED file in our Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables; see also http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html) and then use the “import” button in Genome Graphs to import your custom track. Note that this will transform the track into Genome Graphs format with a certain window size and it might look different depending on the scale of view.

Please contact us again at ***@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group



From: Mariko Ozu [mailto:***@gmail.com]
Sent: Monday, September 15, 2014 1:28 AM
To: ***@soe.ucsc.edu
Subject: [genome] enquiry: UCSC genome browser can't display my data



Dear person in charge,

I am writing because your Genome browser didn't display my data after uploading it via Tools-Genome Graphs-upload box.
the data was in .bed format after processing NGS data (Rat RNAseq, fastq, originally), being named as 'rat_test' and 'rat_test1' when uploading, and produced from the tophat analysis.

It is so weird that even after min after uploading and after uploading there was sign of the completion of uploading and available sign on the web, the data weren't displayed on the Graph site at all.

Would you check what's wrong with the browser or your server?

Thank you for your assistance in advance.

best regards

Mariko Ozu, DVM

--



=============================================================================
Topic: HBB gene - SNP rs334 - Feb2009 - GRCh37/hg19
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cd9029b1570a9cbf
=============================================================================

---------- 1 of 2 ----------
From: Sami Khuri <***@sjsu.edu>
Date: Sep 13 09:18AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/49a856721e776433

--Hello
My name is Sami Khuri and I teach Bioinformatics at SJSU. I have been
using the UCSC Genome Browser for many years and noticed an interesting
phenomenon in the latest version.
Under dbSNP build 138 rs334

you used to have

*Observed: *-/A/C/G/T

And now you have:
*Observed: *A/C/G/T

So you removed the "-" which represented a deletion (which leads to Beta
Thalassemia).
So what is the reason for the removal? Is it leads to Beta Thalassemia and
not Sickle Cell Anemia? And how do you explain the "A" in "Observed" since
the reference is A?
Thanks.
Sincerely,
--Sami Khuri
Sami Khuri, Professor and Chair
Department of Computer Science
San Jose State University
One Washington Square
San Jose, CA 95192-0249
USA
Tel: (408) 924-5081
Fax: (408) 924-5062
eMail: ***@sjsu.edu
URL: http://www.cs.sjsu.edu/faculty/khuri


---------- 2 of 2 ----------
From: "Steve Heitner" <***@soe.ucsc.edu>
Date: Sep 15 12:21PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/1f70f695dca9591f

Hello, Sami.

Our SNPs track merely reports the data contained at dbSNP. If you have a question pertaining to one of their entries, I recommend contacting them at snp-***@ncbi.nlm.nih.gov.

Please contact us again at ***@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group



From: Sami Khuri [mailto:***@sjsu.edu]
Sent: Saturday, September 13, 2014 9:18 AM
To: ***@soe.ucsc.edu; Sami Khuri
Subject: [genome] HBB gene - SNP rs334 - Feb2009 - GRCh37/hg19



--Hello

My name is Sami Khuri and I teach Bioinformatics at SJSU. I have been using the UCSC Genome Browser for many years and noticed an interesting phenomenon in the latest version.

Under


dbSNP build 138 rs334


you used to have

Observed: -/A/C/G/T

And now you have:

Observed: A/C/G/T


So you removed the "-" which represented a deletion (which leads to Beta Thalassemia).

So what is the reason for the removal? Is it leads to Beta Thalassemia and not Sickle Cell Anemia? And how do you explain the "A" in "Observed" since the reference is A?

Thanks.

Sincerely,

--Sami Khuri

Sami Khuri, Professor and Chair
Department of Computer Science
San Jose State University
One Washington Square
San Jose, CA 95192-0249
USA
Tel: (408) 924-5081
Fax: (408) 924-5062
eMail: ***@sjsu.edu
URL: http://www.cs.sjsu.edu/faculty/khuri

--



=============================================================================
Topic: Table browser looking up sequences by positions
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/cb3113b186616b0
=============================================================================

---------- 1 of 2 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 15 09:57AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/c3bb606a0448d033

Hi Kevin,

Thank your for your question about getting DNA sequence for various
genomic positions. You are not getting the output you want from the
Table Browser because when define your positions and select a table,
your output will include sequence for any items in that table that
overlap the defined positions, even if those items extend outside the
regions you defined. Your best option would be to download the 2bit
files containing the genomic sequence, and then use our twoBitToFa
utility to extract the sequence for your positions from the 2bit file.
You can run the twoBitToFa utility on the command line without any
arguments to see the various input options, such as the -bed option that
allows you define a set of regions to extract sequence for. You can find
the 2bit files for the assembly of you choice on our download server:
http://hgdownload.soe.ucsc.edu/downloads.html, under the "Full data set"
link for that assembly. You can find the twoBitToFa utility for your
appropriate system here: http://hgdownload.soe.ucsc.edu/admin/exe/.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 9/11/14, 9:43 AM, Kevin Lopez wrote:


---------- 2 of 2 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Sep 15 10:37AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/ef962275fd2f7114

Hi Kevin,

An easier option would be to use a custom track of your positions and
the Table Browser. First, format your positions in bed format,
http://genome.ucsc.edu/FAQ/FAQformat.html#format1, and upload them to
the Genome Browser as a custom track:
http://genome.ucsc.edu/cgi-bin/hgCustom. Then navigate to the Table
Browser, http://genome.ucsc.edu/cgi-bin/hgTables, and use the following
settings to output the sequences for those items in the custom track:

group: Custom Tracks
track: choose the custom track of your positions
table: the default table will be the primary table for that track
region: genome
output format: sequence
output file: enter a file name to save your results to a file, or
leave blank to display results in the browser

Then click "get output". Select your sequence output options, and then
click "get sequence".

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 9/11/14, 9:43 AM, Kevin Lopez wrote:



=============================================================================
Topic: LiftOver Help
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/11e858671e94473e
=============================================================================

---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Sep 15 09:53AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/d401bd89c2d11715

Hello Nick,

Thanks for your question. It's likely the results you obtained from BLAT
are in pslx format, which is not recognized by the LiftOver program. Your
best option would be to select "psl" as your output type, and then use our
pslToBed utility to convert it to a BED file. You can download our
utilities here: http://hgdownload.soe.ucsc.edu/admin/exe/. Once your file
is in BED format, you can use LiftOver to convert your data.

More information on BED format can be found here:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1

If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group





=============================================================================
Topic: DNA methylation track hub (Smith Lab) not functioning
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/7aa1a65a71e87b27
=============================================================================

---------- 1 of 2 ----------
From: Enrique Medina-Acosta <***@gmail.com>
Date: Sep 15 11:20AM -0300
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/4d49a06a03d307bd

DNA methylation track hub (Smith Lab) not funcctioning.


Dear Genome Browser mailing list,


Just begging few hours ago I have been ebale to connect to the publich DNA
ethyaltion track (smith lab).


This is the error message I am getting:

Message: ERROR: TCP non-blocking connect() to smithlab.usc.edu timed-out in
select() after 10000 milliseconds - Cancelling! Couldn't open
http://smithlab.usc.edu/trackdata/methylation/hub.txt


Here is the information of my system:

Assembly: hg19

Track name: DNA methylation

The exact location that I am viewing: any location of the genome

The name and version of the web browser: Mozilla Firefox

Operating system: Windows

Series of items I clicked on to reach the problem: UCSC genome browser, My
Data, Track Hubs, Show all hubs, Connect "DNA Methylation".


Best regards


Enrique

Prof. Dr. Enrique Medina-Acosta, M.Sc., PhD
<http://www.uenf.br/Uenf/Pages/CBB/LBT/?modelo=1&cod_pag=650&id=1187976458&np=&tpl=1&grupo=LBT>
.
MyCVLattes - CNPq <http://lattes.cnpq.br/0494533350878531>
MyResearcherID (ISI Web of Knowledge)
<http://www.researcherid.com/ProfileView.action?queryString=KG0UuZjN5WmTO3JU3BsaLuXY%252F2fYPzmlZ42clr%252BMEqQ%253D&Init=Yes&SrcApp=CR&returnCode=ROUTER.Unauthorized>
MyORCID <http://orcid.org/0000-0002-2529-0548>

Senior Associate Professor
Unit Coordinator - Molecular Identification and Diagnostics Unit - NUDIM
Laboratory of Biotechnology, Center for Biosciences and Biotechnology
Universidade Estadual do Norte Fluminense Darcy Ribeiro - UENF
Avenida Alberto Lamego 2000, Parque Califórnia, CEP 28013-602, Campos dos
Goytacazes, RJ, Brazil.
Tel: +55 22 2726 6758 / +55 22 2739 7086

========================================================================================

*UENF - Serviços à comunidade oriundos de pesquisa translacional
<http://www.uenf.br/portal/index.php/br/servicos/nucleo-diagnostico-investigacao-molecular.html>*
=========================================================================================


---------- 2 of 2 ----------
From: "Steve Heitner" <***@soe.ucsc.edu>
Date: Sep 15 09:09AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/77f2f889768fb4eb

Hello, Enrique.

The entire Smith Lab site where their hub.txt file is hosted appears to be
down, so nobody is able to access that public hub at the moment (see
http://genome.ucsc.edu/cgi-bin/hgHubConnect). We have contacted them
regarding this, so they are definitely aware of the problem. Hopefully the
hub will be available again soon.

Please contact us again at ***@soe.ucsc.edu if you have any further
questions. All messages sent to that address are archived on a
publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group



From: Enrique Medina-Acosta [mailto:***@gmail.com]
Sent: Monday, September 15, 2014 7:20 AM
To: ***@soe.ucsc.edu
Cc: Dr. Filipe Brum Machado
Subject: [genome] DNA methylation track hub (Smith Lab) not functioning



DNA methylation track hub (Smith Lab) not funcctioning.



Dear Genome Browser mailing list,



Just begging few hours ago I have been ebale to connect to the publich DNA
ethyaltion track (smith lab).



This is the error message I am getting:

Message: ERROR: TCP non-blocking connect() to smithlab.usc.edu timed-out in
select() after 10000 milliseconds - Cancelling! Couldn't open
http://smithlab.usc.edu/trackdata/methylation/hub.txt



Here is the information of my system:

Assembly: hg19

Track name: DNA methylation

The exact location that I am viewing: any location of the genome

The name and version of the web browser: Mozilla Firefox

Operating system: Windows

Series of items I clicked on to reach the problem: UCSC genome browser, My
Data, Track Hubs, Show all hubs, Connect “DNA Methylation”.



Best regards



Enrique


Prof. Dr. Enrique Medina-Acosta, M.Sc., PhD
<http://www.uenf.br/Uenf/Pages/CBB/LBT/?modelo=1&cod_pag=650&id=1187976458&n
p=&tpl=1&grupo=LBT> .
MyCVLattes - CNPq <http://lattes.cnpq.br/0494533350878531>
MyResearcherID (ISI Web of Knowledge)
<http://www.researcherid.com/ProfileView.action?queryString=KG0UuZjN5WmTO3JU
3BsaLuXY%252F2fYPzmlZ42clr%252BMEqQ%253D&Init=Yes&SrcApp=CR&returnCode=ROUTE
R.Unauthorized>
MyORCID <http://orcid.org/0000-0002-2529-0548>

Senior Associate Professor
Unit Coordinator - Molecular Identification and Diagnostics Unit - NUDIM
Laboratory of Biotechnology, Center for Biosciences and Biotechnology
Universidade Estadual do Norte Fluminense Darcy Ribeiro - UENF
Avenida Alberto Lamego 2000, Parque Califórnia, CEP 28013-602, Campos dos
Goytacazes, RJ, Brazil.
Tel: +55 22 2726 6758 / +55 22 2739 7086

============================================================================
============
UENF - Serviços à comunidade oriundos de pesquisa translacional
<http://www.uenf.br/portal/index.php/br/servicos/nucleo-diagnostico-investig
acao-molecular.html>
============================================================================
=============



--



=============================================================================
Topic: should be simple in table browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/321f65b11d9c4794
=============================================================================

---------- 1 of 1 ----------
From: "LaFramboise, William A" <***@upmc.edu>
Date: Sep 14 12:24PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/2fd7ffad32be0917

This solution worked nicely. One niggling issue--- I get many outputs (multiple variants) for my entries but cannot align them with my original entries since none of my input is retained in the output file. Is there a simple way to retain one of my entry coordinates or add a series number to the input to sort and align the output?

Thanks,

Bill.
________________________________________
From: Matthew Speir [***@soe.ucsc.edu]
Sent: Friday, September 12, 2014 6:41 PM
To: LaFramboise, William A; ***@soe.ucsc.edu
Subject: Re: [genome] should be simple in table browser

Hi Bill,

Thank you for your question about getting gene symbols as part of your
from the Table Browser. You are on the right track with your current
Table Browser settings, and the only issue is your output settings. The
reason you are not getting gene symbols as part of your output is
because these are not stored in the knownGene table for the UCSC Genes
track, but instead stored in a linked table. When you select the output
option "all fields from selected table", you are only getting the
information contained in the knownGene table. I recommend using the
"selected fields from primary and related tables" output option. After
you click "get output", you will be taken to another page where you will
be able to select fields from both the knownGene table and various
linked tables that you want as part of your output. On this page, select
those fields from the "Select Fields from hg19.knownGene" section that
you are interested in. In the "hg19.kgXref fields" section, you will
find a number of alternative IDs for the transcripts in the knownGene
table. Check the box next to "geneSymbol", and any other IDs you are
interested in. Finally, click "get output". Your output will consist of
the fields you selected as columns in order starting from the top of the
"hg19.knownGene" section. While this output option doesn't necessarily
format your output in a terribly useful way, you can use a simple UNIX
command line utility such as awk to rearrange the columns however you want.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 9/11/14, 7:54 AM, LaFramboise, William A wrote:






--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.
Loading...