World Families Forums - Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names

Welcome, Guest. Please login or register.
July 12, 2014, 08:13:35 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names
« previous next »
Pages: 1 2 [3] Go Down Print
Author Topic: Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names  (Read 2508 times)
df.reynolds
Old Hand
****
Offline Offline

Posts: 126


« Reply #50 on: September 27, 2012, 12:25:51 AM »

As DF Reynolds has pointed out, Geno 2.0 testing is likely to change the tree so much that R1b will soon be archaic....

ISOGG's tree is probably as current if not more current than Geno 2.0 already, in terms of new SNPs.  Am I missing something? I recognize there will be some SNPs identified in more than one haplogroup that we don't need know about, but why are we expecting the known Y DNA tree to change because of Geno 2.0, at least the R1b parts of it?

No, you're not missing something. Geno 2.0 is not an SNP discovery trip like WTY is. The Geno 2.0 chip is "loaded up" with known SNPs. The following quote is from Thomas Krahn, a member of the Geno 2.0 design team:

"Also you shouldn't have a too high expectation to find some
ground-breaking new SNP with the Geno 2.0 test. The phylogeny of the
contained markers is pretty much established
and only in a few occasions
we will find unexpected parallel or reverse mutations. Geno 2.0 cannot
find absolutely new SNPs
, however it will bring high resolution SNP
testing to a very broad user-base.
"

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-09/1348544973

Further to the above, it strikes me that there may be a point of confusion in  regards to what we might call "new" SNPs versus "known" SNPs.

As Thomas tells it (above), all of the SNPs on the chip are "known" by definition because they have to be -- the chip is programmed to recognize (and report instances of) known SNPs.

However, of those thousands of "known" SNPs on the chip, a subset could be called "new" as far as FTDNA customers are concerned because they haven't been available for testing via FTDNA until they were programmed into this chip because they were only "known" to sources outside of FTDNA and its customer community.

If that's the case, then those of us who have currently tested "to the death" as it were with FTDNA may have the opportunity to be found positive for SNPs that were hitherto unavailable to us via FTDNA, so we could consider those to be new to us (if not to science).
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.

--david
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #51 on: September 27, 2012, 08:20:12 AM »

...
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.

I guess I'm nitpicking, but I think it is important to differentiate Geno 2.0 isn't discovering new SNPs, it is really just a new and much, much more comprehensive "deep clade package". However, that is a big thing. Add to that the fact that the National Genographic Project is for the first time offering a deep clade kind of package will mean a very large influx of additional people tested fairly deeply on SNPs.  This is all bound to cause refinements to the Y DNA tree.

My perception is that the citizen-scientist team has done a great job of finding differentiating SNPs within R1b based on Y chromosome scanning projects. I don't know how it difficult this was, but I know folks gleaned through large amounts of "raw" data. Any opinion - or maybe Richard R or Greg have one on this - Do they feel a lot of Y locations in the raw data were not yet analyzed and cross-referenced across known Y DNA SNPs?

I guess, I'm asking if there is much of a chance of a latent L21 laying around in the data?... per the 23andMe example. There was no WTY back then and less human genome publicly available data in general... plus the citizen-scientist team wasn't as fully engaged.

I can see something akin to L459 and/or Z245 finding their own unique places on the Y DNA tree that split off fairly small paragroups or what have you, but I'd be surprised of a major division within L21 or P312 or something.   Am I wrong?
« Last Edit: September 27, 2012, 12:26:29 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #52 on: September 27, 2012, 08:30:58 AM »

...
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.

I guess I'm nitpicking, but Geno 2.0 isn't discovering new SNPs, it is really just a new and much, much more comprehensive "deep clade package". The fact that the National Genographic Project is for the first time offering a deep clade kind of package will mean a very large influx of additional people tested fairly deeply on SNPs.  This is all bound to cause refinements to the Y DNA tree.

My perception is that the citizen-scientist team has done a great job of finding differentiating SNPs within R1b based on Y chromosome scanning projects. I don't know how it difficult this was, but I know folks gleaned through large amounts of "raw" data. Any opinion - or maybe Richard R or Greg have one on this - Do they feel a lot of Y locations in the raw data were not yet analyzed and cross-referenced across known Y DNA SNPs?

I guess, I'm asking if there is much of a chance of a latent L21 laying around in the data?... per the 23andMe example. There was no WTY back then and lesser human genome publicly available data in general... plus the citizen-scientist team wasn't as fully engaged.

Chris Tyler-Smith trawled over the 1K Genomes Project data and reportedly found ~3500 SNPs. I don't know what, if any, overlap there is between his findings and those of the citizen scientists, but seems there's potential for "new" SNPs in that component of Geno 2, if not others.

http://www.isogg.org/wiki/Genographic_Project
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #53 on: September 27, 2012, 09:32:55 AM »

...
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.



I guess I'm nitpicking, but Geno 2.0 isn't discovering new SNPs, it is really just a new and much, much more comprehensive "deep clade package". The fact that the National Genographic Project is for the first time offering a deep clade kind of package will mean a very large influx of additional people tested fairly deeply on SNPs.  This is all bound to cause refinements to the Y DNA tree.

My perception is that the citizen-scientist team has done a great job of finding differentiating SNPs within R1b based on Y chromosome scanning projects. I don't know how it difficult this was, but I know folks gleaned through large amounts of "raw" data. Any opinion - or maybe Richard R or Greg have one on this - Do they feel a lot of Y locations in the raw data were not yet analyzed and cross-referenced across known Y DNA SNPs?

I guess, I'm asking if there is much of a chance of a latent L21 laying around in the data?... per the 23andMe example. There was no WTY back then and lesser human genome publicly available data in general... plus the citizen-scientist team wasn't as fully engaged.

Chris Tyler-Smith trawled over the 1K Genomes Project data and reportedly found ~3500 SNPs. I don't know what, if any, overlap there is between his findings and those of the citizen scientists, but seems there's potential for "new" SNPs in that component of Geno 2, if not others.

http://www.isogg.org/wiki/Genographic_Project


Tyler Smith used 525 diverse males from the 1000 Genomes dataset and 36 from the Complete Genomics dataset. here are some of the findings:
Some facts:
I don't know if this is the final status (publication tree):
1K-Genomes 523 individuals and 15,953 sites (SNPs);
Complete Genomics 36 individuals and 6,662 sites (SNPs);
expansion of DE/GR calculated at ca. 66,000 years
contemporary expansion of GR confirmed
late extreme expansion of R1b calculated at ca. 11,000 years
R1b, O and E with very good coverage.
Not much I, J, N diversity, some D, Q and R1a samples and very few A, G and T samples. L individuals completely missing?
I understood that new SNPs will simply have a rs-number
Y Haplogroups: C. Tyler-Smith asks if a nomenclature or abbreviated names for major clusters (R1b-M269) are the best solution

The current Genographic database has 750K samples.
The new Geno 2.0 SNP chip contains roughly the following SNPs:
 •~3,200 mtDNA SNPs
 •~12,000 Y-DNA SNPs
 •~130,000 autosomal and X-chromosomal AIMs

If they manage to achieve the same numbers with Geno 2.0 (~150K SNP vs 12 STRs), imagine the potential discovery of new SNPs and expansion of the Phylogenie Tree.
Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



Richard Rocca
Old Hand
****
Offline Offline

Posts: 523


« Reply #54 on: September 27, 2012, 09:41:49 AM »

...
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.

I guess I'm nitpicking, but I think it is important to differentiate Geno 2.0 isn't discovering new SNPs, it is really just a new and much, much more comprehensive "deep clade package". Howver, that is a big thing. Add to that the fact that the National Genographic Project is for the first time offering a deep clade kind of package will mean a very large influx of additional people tested fairly deeply on SNPs.  This is all bound to cause refinements to the Y DNA tree.

My perception is that the citizen-scientist team has done a great job of finding differentiating SNPs within R1b based on Y chromosome scanning projects. I don't know how it difficult this was, but I know folks gleaned through large amounts of "raw" data. Any opinion - or maybe Richard R or Greg have one on this - Do they feel a lot of Y locations in the raw data were not yet analyzed and cross-referenced across known Y DNA SNPs?

I guess, I'm asking if there is much of a chance of a latent L21 laying around in the data?... per the 23andMe example. There was no WTY back then and less human genome publicly available data in general... plus the citizen-scientist team wasn't as fully engaged.

I can see something akin to L459 and/or Z245 finding their own unique places on the Y DNA tree that split off fairly small paragroups or what have you, but I'd be surprised of a major division within L21 or P312 or something.   Am I wrong?

In the 1KG data, there were many positions on the Y that either did not sequence or did not sequence to a quality worth reporting. THis was due to the number of passes being of low quality (2x-4x). So, while unlikely, it is possible that some major branch is waiting to be discovered. Will that branch be revealed in Geno 2.0? I think Thomas has said 'no'. To be honest, I am so well tested, that I did not order Geno 2.0 until I found out that Sardinian full genomes were sequenced even though U152 in Sardinia seems to not be L2+. Since Sardinia is prone to insular founder SNPs, I will probably remain L2*.

On the other hand, there were many 1KG same-level and singleton SNPs that were located in positions that made it impossible to create primers for. Those would not be an issue for sequencing and may produce some sub-branching. Also interesting is the SNPs that are a little less stable that could act as proxies for unknown SNPs.
Logged

Paternal: R1b-U152+L2*
Maternal: H
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #55 on: September 27, 2012, 11:05:16 AM »

To be honest, I am so well tested, that I did not order Geno 2.0 until I found out that Sardinian full genomes were sequenced even though U152 in Sardinia seems to not be L2+. Since Sardinia is prone to insular founder SNPs, I will probably remain L2*.

But also Sicily is an island and there were exchanges between these two ones: see the R-M269*  haplotype of that Elymian I cannot mention, which has clear links with Sardinia. Certainly the Sardinian U152-s are continental derived, but just for this they may have many SNPs from the continent and by my friend (Grassi) haplotype from Liguria I think that L20 (then L2) is older than it is thought and may be born there and not elsewhere.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

df.reynolds
Old Hand
****
Offline Offline

Posts: 126


« Reply #56 on: September 27, 2012, 03:22:13 PM »

...
Exactly. SNP rs11799226 was added to the NCBI database in Feb 2004, so had been "known" for over four years at the point 23andME added it to their v2 chip. Then in Oct 2008, a whole bunch of R-P312* folks were pleasantly surprised to find out they were rs11799226+, which lead to Thomas Krahn offering the SNP as L21 and Jim Wilson offering it as S145.

I guess I'm nitpicking, but I think it is important to differentiate Geno 2.0 isn't discovering new SNPs, it is really just a new and much, much more comprehensive "deep clade package". However, that is a big thing. Add to that the fact that the National Genographic Project is for the first time offering a deep clade kind of package will mean a very large influx of additional people tested fairly deeply on SNPs.  This is all bound to cause refinements to the Y DNA tree.

My perception is that the citizen-scientist team has done a great job of finding differentiating SNPs within R1b based on Y chromosome scanning projects. I don't know how it difficult this was, but I know folks gleaned through large amounts of "raw" data. Any opinion - or maybe Richard R or Greg have one on this - Do they feel a lot of Y locations in the raw data were not yet analyzed and cross-referenced across known Y DNA SNPs?

I guess, I'm asking if there is much of a chance of a latent L21 laying around in the data?... per the 23andMe example. There was no WTY back then and less human genome publicly available data in general... plus the citizen-scientist team wasn't as fully engaged.

I can see something akin to L459 and/or Z245 finding their own unique places on the Y DNA tree that split off fairly small paragroups or what have you, but I'd be surprised of a major division within L21 or P312 or something.   Am I wrong?
I did not mean to imply that I thought an SNP of the magnitude of L21 would be found via Geno 2.0. It is certainly possible, but I wouldn't think it very probable.

My expectation is that many of the changes we will see will be in the mid to upper reaches of the haplogroup trees. Certainly consumer testing is focused on the terminal branches and there are hundreds of SNPs higher up in the trees that have been only very loosely characterized.

--david
Logged
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #57 on: September 27, 2012, 03:29:27 PM »

I am also interested in the 130,000 AIMs which are used for deep ancestry. This is not something we have had available before. The first results and the detailed report should shed more light on this.
There will be many new papers presented on the 7th November including Ancestry Painting 2.0 and POBI, I am wondering if Geno 2.0 will be scheduled for that time frame.
Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #58 on: September 27, 2012, 11:58:52 PM »

I am also interested in the 130,000 AIMs which are used for deep ancestry. This is not something we have had available before. The first results and the detailed report should shed more light on this.
There will be many new papers presented on the 7th November including Ancestry Painting 2.0 and POBI, I am wondering if Geno 2.0 will be scheduled for that time frame.

I believe a commencement time of October was mentioned in the initial announcement of Geno 2.

However, it seems the "go live" is also dependent upon publication of Spencer Well's paper.
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
Pages: 1 2 [3] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.112 seconds with 18 queries.