Discussion:
how to understand the coordinates in "Refgene.txt" file
Li, Yunfei
2010-06-24 08:23:20 UTC
Permalink
Hello,

I want to consult on how to understand the "Refgene.txt" file.
For example
"
1654 NM_031497 chr5 + 140180782 140183255 140180782 140183255 1 140180782, 140183255, 0 PCDHA3 cmpl incmpl 0,
971 NR_024227 chr19 - 50595745 50595866 50595866 50595866 1 50595745, 50595866, 0 SNAR-A6 unk unk -1,
"
If I would like to locate the Tss of the first gene, so I go through the sequence of chr5, starting from the first base which count as "0", until the base whose count equal to "cdsStart". Is this correct?
Besides, how can I locate the second gene then? Do I search from the beginning of the reverse compliment of chr19 sequence? I mean is the first base on 5' direction of the other strand of chr19 is viewed as the first base now and count as "0"?

Best,

Yunfei Li
--------------------------------------------------------------------------------------
Research Assistant
Department of Statistics &
School of Molecular Biosciences
Biotechnology Life Sciences Building 427
Washington State University
Pullman, WA 99164-7520
Phone: 509-339-5096
http://www.wsu.edu/~ye_lab/people.html
Mary Goldman
2010-06-24 22:28:04 UTC
Permalink
Hi Yunfei,

The cdsStart and cdsEnd are always given in reference to the positive
strand. In order to find the transcription start site, you will use the
cdsStart for genes on the positive strand and cdsEnd for genes on the
negative strand. Your second example, NR_024227, is actually a special
case where there is no transcription start site since it is a non-coding
gene. This is indicated by having the cdsStart equal the cdsEnd.

As far as where to start numbering the bases, note that the UCSC Browser
uses a zero-based coordinate system for our internal databases,
including the file you mention below, and a one-based coordinate system
for display. For more information, please see this FAQ:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1.

I hope this information is helpful. Please feel free to contact the
mail list again if you require further assistance.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group
Post by Li, Yunfei
Hello,
I want to consult on how to understand the "Refgene.txt" file.
For example
"
1654 NM_031497 chr5 + 140180782 140183255 140180782 140183255 1 140180782, 140183255, 0 PCDHA3 cmpl incmpl 0,
971 NR_024227 chr19 - 50595745 50595866 50595866 50595866 1 50595745, 50595866, 0 SNAR-A6 unk unk -1,
"
If I would like to locate the Tss of the first gene, so I go through the sequence of chr5, starting from the first base which count as "0", until the base whose count equal to "cdsStart". Is this correct?
Besides, how can I locate the second gene then? Do I search from the beginning of the reverse compliment of chr19 sequence? I mean is the first base on 5' direction of the other strand of chr19 is viewed as the first base now and count as "0"?
Best,
Yunfei Li
--------------------------------------------------------------------------------------
Research Assistant
Department of Statistics&
School of Molecular Biosciences
Biotechnology Life Sciences Building 427
Washington State University
Pullman, WA 99164-7520
Phone: 509-339-5096
http://www.wsu.edu/~ye_lab/people.html
_______________________________________________
https://lists.soe.ucsc.edu/mailman/listinfo/genome
Loading...