Discussion:
Digest for genome@soe.ucsc.edu - 15 updates in 10 topics
g***@soe.ucsc.edu
2014-10-10 17:22:42 UTC
Permalink
=============================================================================
Today's topic summary
=============================================================================

Group: ***@soe.ucsc.edu
Url:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/topics


- request for overchain files [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c1d101cc95e2f243
- Need help with running BLAT on CentOS [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/6397885d9d8f22bd
- Transcriptional start site [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/d192e896916d08bb
- Pombe browser [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/ad5af40d2cc4724c
- The Genome Browser in a Box (GBiB) [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/e9854b701f1e528b
- miRNA sequence alignment in the Browser [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/92803a4f2191c8a1
- multiWig [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/18437fd4fbb84198
- search for gene sequences [1 Update]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/bdb12bc37e981ca3
- Fw: [genome] DNA sequence data [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/6e7ed5b952fd915b
- Question regarding permission using snap shot images of UCSC browser [2 Updates]
http://groups.google.com/a/soe.ucsc.edu/group/genome/t/99ffed9d9cbef3d2


=============================================================================
Topic: request for overchain files
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/c1d101cc95e2f243
=============================================================================

---------- 1 of 1 ----------
From: Luvina Guruvadoo <***@soe.ucsc.edu>
Date: Oct 10 09:17AM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/c07d09081dbc815c

Hello Jinpu,

Thank you for bringing this to our attention. The hg19 to nomLeu3
over.chain file is now available for you to download:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToNomLeu3.over.chain.gz

If you have any further questions, please reply to ***@soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-***@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group





=============================================================================
Topic: Need help with running BLAT on CentOS
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/6397885d9d8f22bd
=============================================================================

---------- 1 of 2 ----------
From: Jonathan Casper <***@soe.ucsc.edu>
Date: Oct 09 04:21PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/2d886d9092f3f6a2

Hello Aneesha,

This is definitely a puzzle. Your environment is quite similar to our
development machines, but we can't seem to reproduce your errors. You could
instead try building BLAT using the source code in our userApps package at
http://hgdownload.soe.ucsc.edu/admin/exe/. There are also some precompiled
binaries for several architectures in that directory. If your system really
is that similar to our development environment, you might be able to use
those and skip the compilation step altogether.

I hope this is helpful. If you have any further questions, please reply to
***@soe.ucsc.edu or genome-***@soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-***@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



---------- 2 of 2 ----------
From: Aneesha Das <***@gmail.com>
Date: Oct 10 05:51AM +0530
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/8f8bd8c06a9f4166

Hi Jonathan,

Thank you for replying. I have the following queries:
1. We want to install blat as a part of the PASA package (
http://pasa.sourceforge.net), where we have successfully installed GMAP and
FASTA on one of the client nodes of a server machine. Will blat program run
fine if we run it on one node of a cluster of computers or does it work
more efficiently if run on a cluster of computers?
2. Do we have to install gfServer, gfClient and webBlat alongwith blat for
the program to run successfully?

Regards,
Aneesha.

On Fri, Oct 10, 2014 at 4:51 AM, Jonathan Casper <***@soe.ucsc.edu>
wrote:




=============================================================================
Topic: Transcriptional start site
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/d192e896916d08bb
=============================================================================

---------- 1 of 1 ----------
From: Jonathan Casper <***@soe.ucsc.edu>
Date: Oct 09 05:17PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/b47cc2e7ec0a7dee

Hello Laura,

I used the hg19 assembly in my example because that is still the default
human assembly for the UCSC Genome Browser. While GRCh38/hg38 is a more
complete assembly than GRCh37/hg19, it is still quite new. Much of the
annotation on the hg19 assembly (and there is a lot) has not yet been
constructed for the hg38 assembly. That does not mean that hg19 is more
accurate or trustworthy.

If you are interested in learning more about genome assemblies, NCBI
provides a short primer at http://www.ncbi.nlm.nih.gov/assembly/basics/.
You may also be interested in the NCBI Insights blog, which gives further
information about some of their projects. You can find posts about the hg38
genome assembly at http://ncbiinsights.ncbi.nlm.nih.gov/tag/grch38/.

I hope this is helpful. If you have any further questions, please reply to
***@soe.ucsc.edu or genome-***@soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-***@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

On Wed, Oct 8, 2014 at 9:28 AM, ruvalcabatrejo <***@gmail.com>
wrote:




=============================================================================
Topic: Pombe browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/ad5af40d2cc4724c
=============================================================================

---------- 1 of 2 ----------
From: Lana Schaffer <***@scripps.edu>
Date: Oct 09 01:24PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/9eadb9572b422df8

Hi,
I have looked at the Pombe genome browser and didn't find the updated version 2 for Pombe.
Could you provide this version for your browser?

Lana Schaffer
The Scripps Research Institute
Biostatistics, Informatics
DNA Array Core Facility
858-784-2263


---------- 2 of 2 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Oct 09 04:16PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/96c95cf500958024

Hi Lana,

Thank you for your question about a S. pombe Genome Browser at UCSC. It
looks like you may be using a mirror site to access the S. pombe genome,
as we don't host one here at UCSC, http://genome.ucsc.edu/. Are you
using the S. pombe browser provided by the NIH at
http://pombe.nci.nih.gov/? If so, you may want to contact the NIH, who
are in charge of running this mirror, at ***@helix.nih.gov and see if
they can add this newer S. pombe genome to their site. However, if they
can't add it to their, you can add you own genome and annotation to the
UCSC Genome Browser as an assembly hub. Assembly hubs allow users to
host their genome and related annotations on a publicly-accessible web
server and then visualize these within our browser. For information on
creating your own assembly hub, see this answer to a previous mailing
question:
https://groups.google.com/a/soe.ucsc.edu/forum/#!msg/genome/ozSm1vjaxRY/yZNRpWHRcvQJ.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 10/9/14, 1:24 PM, Lana Schaffer wrote:



=============================================================================
Topic: The Genome Browser in a Box (GBiB)
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/e9854b701f1e528b
=============================================================================

---------- 1 of 2 ----------
From: "Yuan, Qiaoping (NIH/NIAAA) [E]" <***@mail.nih.gov>
Date: Oct 09 07:22PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/ff0dfadfb9dc7149

Dear Colleagues at UCSC:

I am a frequent user of the great resources/tools from UCSC Genome Browser. Because the data sensitivity and our network/firewall setting in the lab, I cannot upload some of our data or provide the URL into UCSC Genome Browser. Your Genome Browser in a Box (GBiB) seems like a very good solution for my problem. I have installed VirtualBox. But I could not find "the Genome Browser in a Box ZIP file" (browserbox.vbox) mentioned on this web page (http://genome.ucsc.edu/goldenPath/help/gbib). Would you please let me know where I can get all related files for setting the Genome Browser in a Box or any other similar solutions for using your great resources/tools?

Thank you very much.

Qiaoping

======================
Qiaoping Yuan, Ph.D.
NIH/NIAAA/DICBR/LNG
5625 Fishers Lane
Suite 3S32A, MSC 9412
Rockville, MD 20852
Phone: 1-301-443-7632
Fax: 1-301-480-2839
E-mail: ***@mail.nih.gov<mailto:***@mail.nih.gov>

[cid:***@01CE3069.303A0090]


---------- 2 of 2 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Oct 09 03:48PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/610df0487eb8604a

Hi Qiaoping,

Thank you for your interest in the Genome Browser in a Box (GBiB). We
are currently in the final stages of testing the GBiB, and we plan to
release it during the week of October 20th. After it has been officially
released, there will be a link on that GBiB help page,
http://genome.ucsc.edu/goldenPath/help/gbib, to download the GBiB.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 10/9/14, 12:22 PM, Yuan, Qiaoping (NIH/NIAAA) [E] wrote:



=============================================================================
Topic: miRNA sequence alignment in the Browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/92803a4f2191c8a1
=============================================================================

---------- 1 of 1 ----------
From: "Ryan, Brid (NIH/NCI) [E]" <***@mail.nih.gov>
Date: Oct 09 10:06PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/820cf8eb6e57ced3

Good afternoon,

I was wondering if you could help me make sense of an observation we came across this evening.
When looking at the chromosomal location of miR-142 on the genome browser (2009, genome 19), we are observing an extra G nucleotide that does not exist on the miR-base sequence of the gene.

For example, using the following coordinates below, I pull out a sequence on the browser that reads as follows;
CATCCATAAAGTAGGAAA_CACTA (the space indicates somewhere where we expected to see a G based on the miR-base sequence, the red G indicates the extra G).

The sequence we expected to see, based on what is deposited in miR-base, corresponds to that which the UCSC browser outputs when you click on the mir and request the DNA sequence, yet what is seen on the browser interface (see screenshot below) does not match.

Would you be able to help/advise on which sequence we should work with/consider to be correct?

Many thanks,
Brid

Query seq:
chr17:56408604-56408626

[cid:CFCEB4DC-BFEE-47A8-8E5B-79BBBF902624]








Bríd M Ryan PhD MPH
Earl Stadtman Investigator
Laboratory of Human Carcinogenesis
Centre for Cancer Research, NCI
Building 37, Room 3060
Bethesda, MD, 20892
Tel: 301 496 5886
Email: ***@mail.nih.gov<mailto:***@mail.nih.gov>



=============================================================================
Topic: multiWig
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/18437fd4fbb84198
=============================================================================

---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Oct 09 02:19PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/9b67370542ddf112

Hi Ben,

Thank you for your question about the smoothingWindow option and
multiWig tracks. All you need to do to get the smoothingWindow function
to work properly with your multiWig track is to add the smoothingWindow
line to the parent track stanza. Your parent track stanza should now
look like this:

track multiHigh
longLabel Lister multiwig wald
shortLabel waldhigh
container multiWig
aggregate stacked
showSubtrackColorOnUi on
type bigWig 0 1000
viewLimits 0:1
maxHeightPixels 100:32:8
smoothingWindow 8

After you've added the smoothingWindow line to you parent track stanza,
you can remove it from your subtrack stanzas.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 10/8/14, 6:03 PM, Ben Decato wrote:



=============================================================================
Topic: search for gene sequences
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/bdb12bc37e981ca3
=============================================================================

---------- 1 of 1 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Oct 09 01:53PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/f4464e57796ce80f

Hello Chiara,

Thank you for your question about finding the IL15RA in the UCSC Genome
Browser. The IL15RA gene has numerous isoforms, and each isoform has
their own genomic coordinates. If you look at this gene in the UCSC
Genome Browser,
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mspeir&hgS_otherUserSessionName=hg38_IL15RAmlq,
you can quickly see that see GENCODE (Ensembl), UCSC, and NCBI RefSeq
all annotate numerous isoforms of this gene. You can click on the
isoforms in the different gene tracks for more details about that
particular isoform. On each of the pages, you should see a link called
"Genomic Sequence", which you can click to get the entire sequence
(introns, exons, UTRs) for that particular isoform. You should also find
an mRNA sequence on those details pages, which you can click to get the
mRNA sequence.

Unfortunately, HapMap doesn't appear to have mapped their data past the
NCBI36/hg18 version of the Human genome. However, at UCSC, we have used
our liftOver tool, http://genome.ucsc.edu/cgi-bin/hgLiftOver, to convert
these NCBI36/hg18 coordinates to GRCh37/hg19, but we have not converted
the coordinates to the newest GRCh38/hg38 assembly. If you are
interested in using the HapMap coordinates for GRCh37/hg19, you can find
the track here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=hapmapSnps. If you
want the HapMap coordinates converted into GRCh38/hg38 coordinates, you
may be able to convert them yourself, and you can find more information
about that in the following Biostars post:
https://www.biostars.org/p/93618/.

Lastly, I highly recommend taking advantage of some the introductory
resources that are available for the Genome Browser. I would start by
watching these tutorials: http://www.openhelix.com/ucsc. You can find
information on further training and tutorials here:
http://genome.ucsc.edu/training.html.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 10/9/14, 6:17 AM, Chiara Zanusso wrote:



=============================================================================
Topic: Fw: [genome] DNA sequence data
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/6e7ed5b952fd915b
=============================================================================

---------- 1 of 2 ----------
From: Konstantinos Xylogiannopoulos <***@ucalgary.ca>
Date: Oct 09 06:38PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/50cc169f47f5ed22

Dear Madam/Sir,



My apologies for disturbing you again at your personal email.

I have carefully read your email and followed your instructions. I have downloaded the files (whole genome (2bit) and some chromosomes (fa)). From what I have understood the files are compressed using repeated sequences to save space. Therefore, I have also download Tandem Repeats Finder (opened some files) and also I have tried the RepeatMasker application. If I'm correct these two tools just present on screen information repeated patterns extracted from the sequence file based on user's search criteria. However, what I want as an input for my research is a simple, non-compressed sequence of A, C, G and T which can be 3GB for the whole genome or smaller if it is describing each chromosome. From my research on your website I haven't found such files but do you know if there is any kind of reliable and trustworthy converter from .2bit or .fa compressed files to standard .txt files for DNA sequences or if there is any ither location that I can find plain text files of DNA? I care more about pattern matching on text files and, therefore, the sequences must be plain without hidden information like compression etc.


Please, if you need additional information about my work or any other kind of clarification do not hesitate to contact me.
Thank you very much in advance for your help.

Kind Regards,

Konstantinos Xylogiannopoulos



________________________________
From: Steve Heitner <***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>>
Sent: October-06-14 1:04 PM
To: Konstantinos Xylogiannopoulos; ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: RE: [genome] DNA sequence data

Hello, Konstantinos.

Occasionally, you will see items in a DNA sequence that are not in ACGT. These represent cases in which the base could not be positively identified as A, C, G or T. In these cases, you will see various IUPAC ambiguity codes. See http://en.wikipedia.org/wiki/Nucleic_acid_notation#IUPAC_notation for an explanation of these codes. When you see a long string of Ns, it typically represents a gap.

We do have the entire genomic sequence available for download in a single 2bit file. For more information about the 2bit format, see http://genome.ucsc.edu/FAQ/FAQformat.html#format7. For hg19, see http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ and for hg38, see http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/. See the README description of the 2bit files on those pages for important information. The README also explains that the difference in upper case/lower case is based on repeat sequences. For more information about RepeatMasker and Tandem Repeats Finder, see the respective track description pages at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=rmsk and http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=simpleRepeat.

Please contact us again at ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu> if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-***@soe.ucsc.edu<mailto:genome-***@soe.ucsc.edu>.

---
Steve Heitner
UCSC Genome Bioinformatics Group

From: Konstantinos Xylogiannopoulos [mailto:***@ucalgary.ca]
Sent: Friday, October 03, 2014 6:30 PM
To: ***@soe.ucsc.edu<mailto:***@soe.ucsc.edu>
Subject: [genome] DNA sequence data


Dear Madam/Sir,



My name is Konstantinos Xylogiannopoulos and I am a PhD student at the University of Calgary, Alberta, Canada, at the department of Computer Science. My work is on data mining in large sequences. I was trying to find a DNA sequence and during my research on internet I discovered your website. I have visited the "Sequence and Annotation Downloads" page (link at the end of the message) and found a list of all human chromosomes. However, I would kindly want to ask you three questions:

1) In each chromosome string sequence there are large areas with only "N" characters. What does this mean and how can affect a data mining (pattern matching) analysis?

2) Some parts of the string sequence are in capital letters while others are in small. What does this means, is there any difference between substrings with capital and small letters i.e. is "ACGT" different than "acgt"?

3) Do you have the full human genome sequence in one large file instead of the per chromosome representation or do you know any place where I can find it?

Thank you very much in advance for your help on my work. If you have any questions or you need any clarifications regarding my work I would be glad to respond.



Kind Regards,



Konstantinos F. Xylogiannopoulos

PhD Student

University of Calgary

Calgary, Alberta, Canada





Sequence and Annotation Downloads webpage:

ftp://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/chromosomes/
--


---------- 2 of 2 ----------
From: "Linlin Yan (颜林林)" <***@mail.cbi.pku.edu.cn>
Date: Oct 10 03:29AM +0800
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/484bdd5e68364b05

Thanks, Steve.

By checking the gaps in the AGP file of hg38, the smallest ones are
10bp length. By counting series of N in the fasta file, however, I got
the following list:
#cnt N
233 1
20 2
3 3
1 4
1 5
2 6
1 7
7 10
48 20
1 23
... ...

Therefore, I guess 10bp might be the cut-off in practice.





=============================================================================
Topic: Question regarding permission using snap shot images of UCSC browser
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/t/99ffed9d9cbef3d2
=============================================================================

---------- 1 of 2 ----------
From: "Alazzam, Melanie F" <***@unc.edu>
Date: Oct 09 06:47PM
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/123148ecdd45ed6d

Hi,

I am a graduate student at Dr. Everett lab/UNC Chapel Hill. I have a question regarding including a snap shot image of the genome browser that I intend on including in my dissertation (the image attached).The dissertation at some point will be available online.

I wanted to know if I need a permission to include such image in my dissertation? if yes, how can I obtain the permission? and if not, how would you like me to acknowledge such usage?



Thank you!

Melanie



Melanie Alazzam , BDS
PhD Candidate
Oral Biology Curriculum
School of Dentistry
Chapel Hill, NC. 27599
Tel:(919) 537-3218
Email: ***@email.unc.edu


---------- 2 of 2 ----------
From: Matthew Speir <***@soe.ucsc.edu>
Date: Oct 09 12:35PM -0700
Url: http://groups.google.com/a/soe.ucsc.edu/group/genome/msg/7509c02d1a09f90e

Hi Melanie,

Thank you for your question about including a screenshot of the UCSC
Genome Browser in your dissertation. You don't need copyright permission
to publish screenshots from the Genome Browser in your article, but
please be sure to cite your source. You'll find citation guidelines for
screenshots at http://genome.ucsc.edu/cite.html, near the bottom of
"Genome Browser Software and Website" section.

I hope this is helpful. If you have any further questions, please reply
to ***@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-***@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


On 10/9/14, 11:47 AM, Alazzam, Melanie F wrote:






--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page:
https://groups.google.com/a/soe.ucsc.edu/forum/?utm_source=digest&utm_medium=email/#!forum/genome/join
.
To unsubscribe from this group and stop receiving emails from it send an email to genome+***@soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+***@soe.ucsc.edu.
Loading...