World Families Forums - Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names

Welcome, Guest. Please login or register.
July 25, 2014, 10:42:35 AM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names
« previous next »
Pages: [1] 2 3 Go Down Print
Author Topic: Let's All Start Using Terminal SNP Labels Instead of Y Haplogroup Subclade Names  (Read 2539 times)
Heber
Old Hand
****
Offline Offline

Posts: 448


« on: September 21, 2012, 04:14:53 AM »

CeCe raises a very important issue in her blog ie naming conventions.

At the DNA in Forensics conference in Innsbruck Dr. Chris Tyler-Smith made the point that if we continue using the current naming convention when we get to full genome sequencing and a much expanded Phylogenetic Tree we will be obliged to use names with up to 38 characters. Of course this is unsustainable and unwieldy.
I routinely use terminal SNP and they are understood across ISOGG, 23andme, FTDNA and Academia.
They would also like to use better toolsets to manage the Phylogenetic Tree. importing ISOGG into Excel is not an option.
My suggestion is the following. We have been using GEDCOM for 20 years now in the Genetic Genealogy community and it works. It is designed to manage ancestry trees and there is an abundance of software (free and commercial) for handling this and millions of trees managed in databases such as Ancestry, Geni, MyHeritage etc. Could we extend the GEDCOM standard to support the requirements of the Phylgenetic Tree and Academia.
One of the benefits to the Genetic Genealogy Community will be the ability to link your Family Tree or Clan  Structure to your Terminal SNP.
Does anyone have any suggestions for good tree software or a better naming convention?

http://www.yourgeneticgenealogist.com/2012/09/lets-all-start-using-terminal-snp.html?showComment=1348214401616#c4789567673719488513
Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #1 on: September 21, 2012, 07:04:02 AM »

The alphanumeric string is useful as a hierarchical sort key in spreadsheets and databases, but that's about all. It's already impractical to use in correspondence.

And when it comes to using the "shorthand" in the R haplogroup, because there are now too many SNPs to remember I use R1b-xxx rather than R-xxx.

Going forward, with Geno 2.0 and the like, I gather that SNP names will not be used but simply the 'rs' numbers.
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #2 on: September 21, 2012, 08:13:21 AM »

I asked the Hg I guys on another forum a couple of years ago to start using the terminal SNP. At the time, some were good with that but some laughed indicating they had no problems relating to their I1 or I1d or whatever it was they had.  They are changing their tune now, though.
Logged

R1b-L21>L513(DF1)>L705.2
brunetmj
Member
**
Offline Offline

Posts: 38


« Reply #3 on: September 21, 2012, 09:50:48 AM »

Well I suppose it depends on the haplogroup. The R1b1 group is huge compared to say the A group.
Likely, as some one else suggested, a combination like R1b1 DF13+ makes sense for larger groups.
Logged

L21 DF13** French
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #4 on: September 21, 2012, 02:49:35 PM »

I would be happy with an R1b-L21-DF21 description for my terminal SNP.
One of the reasons R1b is so unwieldy is that it is one of three expansions including Bantu (E), Han (O) and European (R1b) and the only one classified as "extreme". An additional post out of Africa  "F" expansion is unidentified.
To visualise it R1b on the Full Genome Phylogenetic Tree would look like a mighty oak in Summer and for example "I" would look like a sapling in Winter. It's is interesting that all three expansions were associated with the development of farming in three different continents.

"Phase 1 of the 1000 Genomes Project has generated low-coverage whole-genome sequence data from 1,094 individuals from worldwide populations, including 528 males. SNP calls on the Y chromosome were made using SAMtools. In low coverage data, there are errors and uncertainty in the genotype calls. We developed a filtering strategy to reduce these, including restricting the analysis to 8.9 Mb of Y unique regions. We called a total of 18,692 Y-SNPs, 16,679 with the ancestral allele known. The false negative rate and false positive variant site identification rates were measured at 14% and 1.72% respectively by comparison with Complete Genomics calls on an overlapping subset of samples. The genotype accuracy was 97.4% compared with HapMap3 chip genotypes and 96.6% compared with Complete Genomics sequences. Using known literature variants, we assigned each sample to a haplogroup and these samples covered most of the major lineages except F, K, L, and M. A phylogenetic tree was constructed based on all the sites with known ancestral states using the RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analysis. The tree was consistent with the established structure. It confirmed Hg E (Bantu), O (China) and R1b (Europe) expansions associated with the Neolithic transitions in different parts of the world, and revealed that the expansion in Europe was the most extreme. One novel finding was a striking expansion of lineages F to R ~20 thousand years after the out-of-Africa movement, suggesting a previously unknown event of importance to male demography at this time."
« Last Edit: September 21, 2012, 03:05:50 PM by Heber » Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



eochaidh
Old Hand
****
Offline Offline

Posts: 400


« Reply #5 on: September 21, 2012, 02:58:24 PM »

I'm Miles Kehoe, R1b DF23
Logged

Y-DNA: R1b DF23
mtDNA: T2g
rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #6 on: September 21, 2012, 07:30:21 PM »

I think the way the shorthand is supposed to be used is the major y haplogroup designator (i.e., one letter, no numbers) followed by the terminal SNP, thus: I-M253 or R-L20, etc.

The thinking is that the string of sub-designators is likely to change but not the main y haplogroup letter designator.

"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.
Logged

rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #7 on: September 21, 2012, 07:41:31 PM »

BTW, I was kind of surprised by the title of this thread and the blog post that inspired it. It seems anachronistic.

I think most of us have been using the shorthand since y haplogroup designators started getting out of hand back in 2008.
Logged

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #8 on: September 21, 2012, 08:46:44 PM »

....
"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.

I generally use R-"SNPn" but I actually think R1b-"SNP" is better just because I think the separation between R1a and R2 with R1b is very important.
« Last Edit: September 21, 2012, 08:47:31 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #9 on: September 21, 2012, 11:44:57 PM »

....
"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.

I generally use R-"SNPn" but I actually think R1b-"SNP" is better just because I think the separation between R1a and R2 with R1b is very important.

Yes, and I'm for renaming R1b altogether.
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #10 on: September 22, 2012, 04:18:31 AM »

....
"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.

I generally use R-"SNPn" but I actually think R1b-"SNP" is better just because I think the separation between R1a and R2 with R1b is very important.

I agree, to an extent, but I also think it is important to have an agreed-upon convention, and some people might think the difference between P297 and xP297 is important enough to memorialize in the shorthand, or the distinction between L11 and xL11, or the distinction between P312 and U106, and so on.

« Last Edit: September 22, 2012, 04:19:20 AM by rms2 » Logged

rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #11 on: September 23, 2012, 09:44:02 AM »

Have you all seen the following notice on your myFTDNA pages Haplotree page?

Quote


Important Notice

Long time customers of Family Tree DNA have seen the YCC-tree of Homo Sapiens evolve over the past several years as new SNPs have been discovered. Sometimes these new SNPs cause a substantial change in the “longhand” explanation of your terminal Haplogroup. Because of this confusion, we introduced a shorthand version a few years ago that lists the branch of the tree and your terminal SNP, i.e. J-L147, in lieu of J1c3d. Therefore, in the very near term, Family Tree DNA will discontinue showing the current “longhand” on the tree and we will focus all of our discussions around your terminal defining SNP.

This changes no science - it just provides an easier and less confusing way for us all to communicate.

Bennett Greenspan, Family Tree DNA
Dr. Michael Hammer, University of Arizona

Logged

wing_genealogist
Senior Member
***
Offline Offline

Posts: 81


WWW
« Reply #12 on: September 23, 2012, 09:49:35 AM »

I'm fairly sure this change by FT-DNA is related to the pending release of the Nat Geno 2.0 testing. It will almost certainly be a game changer with regard to y-DNA SNPs.

Logged

Y-DNA - R1b M157.2 (a private/family subclade of Z6/Z352) 111 markers tested

mt-DNA - J1c2g with the following private mutations: 315.1C 522.1A 522.2C C9974T C16256T (FMS tested and submitted to GenBank)

Autosomal - shows as a typical English ancestry. Tested with 23andMe, FTDNA
razyn
Old Hand
****
Offline Offline

Posts: 405


« Reply #13 on: September 23, 2012, 10:17:28 AM »

I'm fairly sure this change by FT-DNA is related to the pending release of the Nat Geno 2.0 testing. It will almost certainly be a game changer with regard to y-DNA SNPs.

I hope you're right.  There are a few other cosmetic changes in that Haplotree stuff since I last looked at it (maybe six weeks ago, but not forever ago).  It says:

Your test results show that your haplogroup is R1b1a2a1a1b. A Y-DNA SNP extension test is not available.  That's in spite of the fact that higher on the same page, one may read that I have in fact taken six further available tests (positive ones in bold, negative in red):

Your Haplogroup   R1b1a2a1a1b
Tests Taken          Z220+ Z209+ Z196+ P312+ L484+ Z216- U152- U106- M65- M153- L238- L21- L176.2-

And if one clicks on the blue arrow for "Your Match," the next page has some colorful new highlighting of things that were in the Deep Clade test a couple of years ago, for which one has already tested negative.  It looks as if the IT department (or whoever tells them what to do with the Haplotree page) has no visible connection with the lab, and is still going its own way with a set of 2010 blinders firmly affixed.
Logged

R1b Z196*
gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #14 on: September 23, 2012, 06:37:32 PM »

Thanks for the heads-up on that notice.

Even without Geno 2.0, had FTDNA tried to catch up with ISOGG I feel they would have had to significantly re-engineer their haplogroup tree anyway.

I look forward to being more than tired old R-L48 ... which is a long way up the tree from R-Z12.
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #15 on: September 23, 2012, 07:10:08 PM »

I am also looking forward to a YCC Tree update, but then I was I was kind of looking forward to seeing my newly acquired long alphanumeric string. Ah, well. R-DF41 will do nicely.
Logged

df.reynolds
Old Hand
****
Offline Offline

Posts: 126


« Reply #16 on: September 23, 2012, 08:27:26 PM »

....
"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.

I generally use R-"SNPn" but I actually think R1b-"SNP" is better just because I think the separation between R1a and R2 with R1b is very important.
I suspect we will see "R1b" (and "R1a") in use by the hobbyist community for a long time to come. I see still references to "E3b" and how many years ago was that change. :)

Note one thing though. Any use of "R1b" is likely to metaphorical only. I.e. M343, the R1b marker, is currently two levels down from the top of the Hg R tree, which currently looks like this:
. R   M207/Page37/UTY2, P224, P227, P229, P232, P280, P285, S4, S9
. . R1   M173/P241/Page29, M306/S1, P225, P231, P233, P234, P236, P238, P242, P245, P286, P29

21 SNPs listed for R and R1 in the ISOGG tree. We know from other sources there are even more SNPs at that level that no one has bothered to publish. Many of these will  be included in the Geno 2.0 test. Out of the tens of thousands of Geno 2.0 tests that are going to be taken, we will almost certainly find out that some number of these are in fact not equivalent, resulting in additional levels being inserted into top-level of the Hg R tree. And as more tests are taken, this could easily continue to change. I think the odds are pretty good that M343 is going to get shoved down at least a level or two.

It's going to be interesting, to say the least.

--david
Logged
wing_genealogist
Senior Member
***
Offline Offline

Posts: 81


WWW
« Reply #17 on: September 23, 2012, 08:40:26 PM »

It is certainly HOPED the Nat Geno 2.0 project uses some of the funds collected by the public participation portion of their project to test areas of the globe which have been short-changed by the 1K Genome project etal. Eastern Europe and Western Asia are areas of historic importance for the early history of R1b (as well as many other clades now found in Western Europe), but were entirely overlooked in previous full-genome scans.

Unfortunately, we cannot really expect all that many public participants whose direct male-line ancestry comes from this region.
Logged

Y-DNA - R1b M157.2 (a private/family subclade of Z6/Z352) 111 markers tested

mt-DNA - J1c2g with the following private mutations: 315.1C 522.1A 522.2C C9974T C16256T (FMS tested and submitted to GenBank)

Autosomal - shows as a typical English ancestry. Tested with 23andMe, FTDNA
eochaidh
Old Hand
****
Offline Offline

Posts: 400


« Reply #18 on: September 23, 2012, 09:05:28 PM »

Shouldn't this Forum's name be changed to "R and Subclades"? Hasn't "R1b" been called a bastardization?
Logged

Y-DNA: R1b DF23
mtDNA: T2g
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #19 on: September 24, 2012, 02:39:56 AM »

....
"R1b-whatever" is kind of a (pardon the expression) bastardization of the shorthand.

I generally use R-"SNPn" but I actually think R1b-"SNP" is better just because I think the separation between R1a and R2 with R1b is very important.
I suspect we will see "R1b" (and "R1a") in use by the hobbyist community for a long time to come. I see still references to "E3b" and how many years ago was that change. :)

Note one thing though. Any use of "R1b" is likely to metaphorical only. I.e. M343, the R1b marker, is currently two levels down from the top of the Hg R tree, which currently looks like this:
. R   M207/Page37/UTY2, P224, P227, P229, P232, P280, P285, S4, S9
. . R1   M173/P241/Page29, M306/S1, P225, P231, P233, P234, P236, P238, P242, P245, P286, P29

21 SNPs listed for R and R1 in the ISOGG tree. We know from other sources there are even more SNPs at that level that no one has bothered to publish. Many of these will  be included in the Geno 2.0 test. Out of the tens of thousands of Geno 2.0 tests that are going to be taken, we will almost certainly find out that some number of these are in fact not equivalent, resulting in additional levels being inserted into top-level of the Hg R tree. And as more tests are taken, this could easily continue to change. I think the odds are pretty good that M343 is going to get shoved down at least a level or two.

It's going to be interesting, to say the least.

--david

With Geno 2.0 and Next Gen Sequencing, we will be inundated with new SNPs.
I understand Geno 2.0 will also use the RS....... Notation which makes sense.

Both Roberta's and CeCe Moore blog on the announcement are really very good.

http://dna-explained.com/2012/07/25/national-geographic-geno-2-0-announcement-the-human-story/

http://dna-explained.com/2012/07/26/geno-2-0-qa-with-bennett-greenspan/

http://www.yourgeneticgenealogist.com/2012/07/national-geographic-and-family-tree-dna.html

http://www.yourgeneticgenealogist.com/2012/07/more-information-from-spencer-wells-on.html
Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



gtc
Old Hand
****
Offline Offline

Posts: 238


« Reply #20 on: September 24, 2012, 04:58:57 AM »

Shouldn't this Forum's name be changed to "R and Subclades"? Hasn't "R1b" been called a bastardization?

:-)
Logged

Y-DNA: R1b-Z12* (R1b1a2a1a1a3b2b1a1a1) GGG-GF Ireland (roots reportedly Anglo-Norman)
mtDNA: I3b (FMS) Maternal lines Irish
Richard Rocca
Old Hand
****
Offline Offline

Posts: 523


« Reply #21 on: September 24, 2012, 07:27:08 AM »

Shouldn't this Forum's name be changed to "R and Subclades"? Hasn't "R1b" been called a bastardization?

There is an R1b forum and an R1a forum. They are both large enough to warrant separate threads.
Logged

Paternal: R1b-U152+L2*
Maternal: H
rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #22 on: September 24, 2012, 07:36:53 AM »

Shouldn't this Forum's name be changed to "R and Subclades"? Hasn't "R1b" been called a bastardization?


1) "R1b" has never been referred to by anybody here as a "bastardization". What was referred to as a bastardization was altering the shorthand outside the agreed-upon convention, regardless of the sub-haplogroup, e.g., using R1b as the shorthand prefix instead of the actual major y haplogroup branch. Using G2a or I1 or O3 as prefixes in a similar manner would also be a bastardization of the shorthand convention.

2) This forum is a forum, not a person, and thus has no terminal SNP. It could have been dubbed the R Forum, except that was too broad, and R1a has its own forum.



« Last Edit: September 24, 2012, 07:38:59 AM by rms2 » Logged

eochaidh
Old Hand
****
Offline Offline

Posts: 400


« Reply #23 on: September 24, 2012, 08:07:24 AM »

I think using "R" alone in short hand designation is too broad. When we are on a Forum like this and we all know what we are referring to, "R" works fine, but in reading a book or a paper, I think R1b plus the terminal SNP, or R1a plus the terminal SNP, works better than simply R and the terminal SNP. I don't believe we are all familiar will all of R1a's and R1b's terminal SNPs. Even on the thread about "mapping the origin of Indo-European languages, I believe R1b and R1a were needed to keep things clear.

Haplogroups J1 and J2 would be the same for me. I believe that J1 and J2 are large enough and different enough to warrant their use in the shorthand designation; J1 plus terminal SNP, J2 plus terminal SNP.

I'm R1b DF23 and I believe this helps even someone who is R1b P312 to know I'm not an R1a. This is especially true since the flurry of newly discovered SNPs under L21. It works well for me as well, since I don't know of all the SNPs under P312. To be honest, R1b P312 (DF27) helps me at this point. It's still much shorter and easier to understand than the actual designation.
« Last Edit: September 24, 2012, 02:12:46 PM by eochaidh » Logged

Y-DNA: R1b DF23
mtDNA: T2g
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #24 on: September 24, 2012, 01:55:46 PM »

Another important issue facing the Genetic Genealogy community apart from naming conventions is the Tsunami of data we will have to handle when we get to Next Gen Sequencing and Geno 2.0. It is estimated the the Lifecycle of a Genome will generate a Terabyte of data. With the exponential fall in the cost of sequencing a Genome and the explosion of applications we will soon be in the range of Petabytes and Exabytes of data. Even an army of citizen scientists cannot handle this deluge. How can innovation keep pace with this relentless rate of progress. One way is to emulate the model of the mobile apps sector (Apple, Android ...) which uses Cloud Computing, Open APIs and leverages third party developers. The recently announced Open APIs by 23andme and Illumnia Base Space and MyGenome apps appear to be the way to go.
Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



Pages: [1] 2 3 Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.177 seconds with 19 queries.