World Families Forums - TMRCA calculations

Welcome, Guest. Please login or register.
August 22, 2014, 08:34:33 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  TMRCA calculations
« previous next »
Pages: [1] 2 3 ... 7 Go Down Print
Author Topic: TMRCA calculations  (Read 8343 times)
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« on: April 30, 2012, 10:03:35 AM »

Based on some recent msgs from Hans Van vliet and Machiavelli, I am beginning to believe that only SNP subsets that only have the same SNP and no subsequent SNPs can be used for TMRCA calculations?

This is something I may be observing in Z253 and is also observed in Z220 if I understand the msgs correctly.

This may be part of the problem of underestimating R-L21( if it is being underestimated) e.g., which has so many subsets?
Logged
rms2
Board Moderator
Guru
*****
Offline Offline

Posts: 5023


« Reply #1 on: April 30, 2012, 10:37:10 AM »

Wouldn't having that as a requirement make it nearly impossible to calculate TMRCA? How can you be reasonably sure you have arrived at the terminal SNP short of whole genome testing?
Logged

razyn
Old Hand
****
Offline Offline

Posts: 406


« Reply #2 on: April 30, 2012, 11:16:25 AM »

I think it has more to do with comparing SNPs that are at the same "level" (same number of steps down from a common ancestor) than SNPs that are "terminal," which is even more of a moving target.  I may be mistaken, but anyway that was why I've been pestering the FTDNA help desk (for more Z-SNPs to test) since last June: in part, to get M153 down to a more realistic level (about seven to nine steps lower than it's been appearing, on the ISOGG tree) -- but mainly to facilitate MRCA comparisons between nodes on the NS cluster side and nodes on the the L176.2 side, of Z196.  The confusion of tongues (as it were) between the SNP naming systems (Z, L, DF or whatever) doesn't help.  There may be persons for whom it isn't confusing, but I'm not all alone in that category.
« Last Edit: April 30, 2012, 11:17:38 AM by razyn » Logged

R1b Z196*
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #3 on: April 30, 2012, 12:09:21 PM »

Wouldn't having that as a requirement make it nearly impossible to calculate TMRCA? How can you be reasonably sure you have arrived at the terminal SNP short of whole genome testing?

I think this has been our fundamental problem with TMRCA estimatations.  My clan gregor Ian Cam calculations uses father/son rates and I can estimate pretty well the founders date.  At one time I did the same for M226 and got c. 400 AD which is near when O'Neill was born?

The problem is the M226 entries end at the SNP for that class of entrants, there is abolutely no 226 older than that date.  Therefore, it follows to make an intelligent TMRCA calculation, the data set has to be entries with only one SNP; any  younger SNP's will bias the estimate lower!

I've been mulling this over the weekend and I have convinced myself, that is the only way to make sense out of this.  This is why, I believe, all TMRCA's have appeared to be too young, they were a "gemischt" of subsequent, sometimes unfound/unknown SNP's which confounded the calculation.  JMHO!
Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #4 on: April 30, 2012, 12:12:38 PM »

I think it has more to do with comparing SNPs that are at the same "level" (same number of steps down from a common ancestor) than SNPs that are "terminal," which is even more of a moving target.  I may be mistaken, but anyway that was why I've been pestering the FTDNA help desk (for more Z-SNPs to test) since last June: in part, to get M153 down to a more realistic level (about seven to nine steps lower than it's been appearing, on the ISOGG tree) -- but mainly to facilitate MRCA comparisons between nodes on the NS cluster side and nodes on the the L176.2 side, of Z196.  The confusion of tongues (as it were) between the SNP naming systems (Z, L, DF or whatever) doesn't help.  There may be persons for whom it isn't confusing, but I'm not all alone in that category.
I laud your objective.  I'm not sure how many  SNP's are in some sense parallel, there has to some kind of pecking order I believe, but that question may be in the noise when you are comparing two SNP's of comparable age relative to the total TMRCA?
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #5 on: April 30, 2012, 12:47:10 PM »

I've been mulling this over the weekend and I have convinced myself, that is the only way to make sense out of this.  This is why, I believe, all TMRCA's have appeared to be too young, they were a "gemischt" of subsequent, sometimes unfound/unknown SNP's which confounded the calculation.  JMHO!

Why do you think all the TMRCA calcs 'appear' too young ?

If you would like an upper boundary for L21 calculate the interclade age for P312, you don't even have to use L21 people in the calculation.


« Last Edit: April 30, 2012, 12:49:00 PM by Jdean » Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #6 on: April 30, 2012, 01:32:15 PM »

The question of the age of SNP's is contentious, it is not clear, certainly in academia whether Zhiv is right or straight father son meisoses calculations are correct.  So my statement stands for itself.

What data do you use to calculate the interclade age for 312?  If its from subclades, it should probably only be entries from the subclades not downstream SNP's?
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #7 on: April 30, 2012, 01:57:06 PM »

The question of the age of SNP's is contentious, it is not clear, certainly in academia whether Zhiv is right or straight father son meisoses calculations are correct.  So my statement stands for itself.

What data do you use to calculate the interclade age for 312?  If its from subclades, it should probably only be entries from the subclades not downstream SNP's?

Two sub groups of P312, U152 and L21 spring to mind but you could use U152 and what's predicted to be under DF27.

You can get Ken's Generation 7 spreadsheet here

http://knordtvedt.home.bresnan.net/

If you don't like a particular loci just leave it out, or delete the column.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #8 on: April 30, 2012, 02:02:40 PM »

How do you know, or does it make any difference, whether you have U152 or U152 and all its downstreams SNP's.  I'm arguing you shouldn't include downstream SNP's in TMRCA calculations.  I'm not sure what the impact on interclade calculations, it depends on the implied assumptions.  Carrying all the extra baggage makes the dates seem smaller.  Thats what I am finding out.
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #9 on: April 30, 2012, 02:09:21 PM »

How do you know, or does it make any difference, whether you have U152 or U152 and all its downstreams SNP's.  I'm arguing you shouldn't include downstream SNP's in TMRCA calculations.  I'm not sure what the impact on interclade calculations, it depends on the implied assumptions.  Carrying all the extra baggage makes the dates seem smaller.  Thats what I am finding out.

I don't think it would make any difference, I doubt there are any P312+ people who don't have a known (probably almost all of them now) or unknown downstream SNP anyway.

The important thing is to make sure you have two clearly defined groups.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #10 on: April 30, 2012, 02:18:37 PM »

I am suggesting that having multiple SNP's in a dataset is not a clearly defined group!

when I comix SNP groups, I muddle up the TMRCA.  No M226 , founder c. 400AD, is older than the founder.  To include those entries in a calculation for an older haplotype is wrong. I am living today and am M226 -, As far as I know my lines last SNP was Z253.  To compute the TMRCA of Z253, I have to use a group of 253's.  If I include downstream SNP's the calculation will be shortened.  I would conclude that if you wanted to do a 106, R-L21 interclade then you would only use entries that have 106 and L21  SNP's as their last SNP, otherwise you're "gemischting".
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #11 on: April 30, 2012, 02:28:55 PM »

I am suggesting that having multiple SNP's in a dataset is not a clearly defined group!

when I comix SNP groups, I muddle up the TMRCA.  No M226 , founder c. 400AD, is older than the founder.  To include those entries in a calculation for an older haplotype is wrong. I am living today and am M226 -, As far as I know my lines last SNP was Z253.  To compute the TMRCA of Z253, I have to use a group of 253's.  If I include downstream SNP's the calculation will be shortened.  I would conclude that if you wanted to do a 106, R-L21 interclade then you would only use entries that have 106 and L21  SNP's as their last SNP, otherwise you're "gemischting".

Interclade calculations finds the approximate TMRCA of two groups, Ken's spreadsheet also outputs the intraclade ages.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #12 on: April 30, 2012, 02:45:23 PM »

I would like a good discussion on this point, because, I believe it explains the problems we've been having in making time estimates using STR's.

Suppose you wanted to estimate the TMRCA to M269, the only entries that should be used are those that are M269+, people living today, who have not experienced a SNP in their family tree since M269+ was born.  Including other entries subsequent to M269 would shorten the estimate.  It appears that the most recent SNP a person has determines their most recent common ancestor with entries of like SNP.

Think of it from a "bottoms up" approach.  Take the most recent SNP we are aware of and using the entries that have that SNP, we can estimate that group.  What we have left is the founders haplotype who has a previous SNP, do the same thing for that SNP.  You will be working back in time far as founders go, but you will still have entries living today who will go back to the next previous SNP. etc.

I believe this is the approach one has to use to intelligently compute TMRCA's.  JMHO.
« Last Edit: April 30, 2012, 02:46:07 PM by ironroad41 » Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #13 on: April 30, 2012, 02:51:49 PM »

I would like a good discussion on this point, because, I believe it explains the problems we've been having in making time estimates using STR's.

Suppose you wanted to estimate the TMRCA to M269, the only entries that should be used are those that are M269+, people living today, who have not experienced a SNP in their family tree since M269+ was born.  Including other entries subsequent to M269 would shorten the estimate.  It appears that the most recent SNP a person has determines their most recent common ancestor with entries of like SNP.

Think of it from a "bottoms up" approach.  Take the most recent SNP we are aware of and using the entries that have that SNP, we can estimate that group.  What we have left is the founders haplotype who has a previous SNP, do the same thing for that SNP.  You will be working back in time far as founders go, but you will still have entries living today who will go back to the next previous SNP. etc.

I believe this is the approach one has to use to intelligently compute TMRCA's.  JMHO.


?????

perhaps you mean haven’t tested positive for any known downstream SNP, but I can't see the logic in that either.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #14 on: April 30, 2012, 03:13:32 PM »

You are correct.  There are no downstream/subsequent SNP's after M269.  This implies that the haplotype will be very diverse since its TMRCA is when the M269 mutation occurred.

The logic is as I stated above, the haplotype of that entry will have had a long time to experience STR mutations and will therefore reflect the time back to its founder.  It's becoming clearer to me that including younger SNP's will reduce the diversity, since the founder existed a briefer period of time.

On another thread Mike asked the importance of diversity.  I think I can now answer his question: the haplotype has to have the SNP of interest and no subsequent SNP mutations to reflect the diversity in his haplotype for the age of the SNP of interest.

Certainly, you will agree that time zero for a SNP is the founder haplotype and subsequent descendants reflect diversity from that haplotype only?
Logged
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #15 on: April 30, 2012, 03:33:01 PM »

You are correct.  There are no downstream/subsequent SNP's after M269.  This implies that the haplotype will be very diverse since its TMRCA is when the M269 mutation occurred.

The logic is as I stated above, the haplotype of that entry will have had a long time to experience STR mutations and will therefore reflect the time back to its founder.  It's becoming clearer to me that including younger SNP's will reduce the diversity, since the founder existed a briefer period of time.

On another thread Mike asked the importance of diversity.  I think I can now answer his question: the haplotype has to have the SNP of interest and no subsequent SNP mutations to reflect the diversity in his haplotype for the age of the SNP of interest.

Certainly, you will agree that time zero for a SNP is the founder haplotype and subsequent descendants reflect diversity from that haplotype only?

I agree to an extent about not including the downstream.  It seems more realistic as an estimate.   However, if  the samples being compared in an interclade are small and the result of a recent founder effect for example, it may become important to include downstream snp's so a more accurate variance will be reflected. 
Logged

Ydna: R1b-Z253**


Mtdna: T

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #16 on: April 30, 2012, 03:44:52 PM »

I think we have a lot of work ahead of us trying to fully understand this effect, if it turns out to be true.
 As Razyn pointed out, you may be able to include parallel SNP data of comparable age, that certainly seems feasible.

I think one thing that has to happen is to have FtDNA project data be organized by SNP within a clade/subclade.  This will permit calculations to determine which SNP's are parallel among other things.

Rms's issue about do we have all the SNP's is still valid and may confuse the data?
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #17 on: April 30, 2012, 06:58:36 PM »

You are correct.  There are no downstream/subsequent SNP's after M269.  This implies that the haplotype will be very diverse since its TMRCA is when the M269 mutation occurred.

The logic is as I stated above, the haplotype of that entry will have had a long time to experience STR mutations and will therefore reflect the time back to its founder.  It's becoming clearer to me that including younger SNP's will reduce the diversity, since the founder existed a briefer period of time.

On another thread Mike asked the importance of diversity.  I think I can now answer his question: the haplotype has to have the SNP of interest and no subsequent SNP mutations to reflect the diversity in his haplotype for the age of the SNP of interest.

Certainly, you will agree that time zero for a SNP is the founder haplotype and subsequent descendants reflect diversity from that haplotype only?

That's not what I meant.

I'm afraid I don't understand what you are saying and struggle to find any logic in your explanations.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #18 on: May 01, 2012, 07:01:58 AM »

I'll try again.  SNP's are a time ordered set, characterized by Hgs.  Accompanying the formation of a Hg is a unique SNP occurrence as I understand it.  Further within a Hg are subclades which are also characterized by the occurrence of a SNP mutation, such as M269, L11, R-L21, Z253, M226 etc.  This is a hierarchical set with M269 being the eldest.

Associated, I believe, with each subclade is a particular sequence of STR's, called the founders haplotype.  This is what we converge to when we do an STR TMRCA.

We form sets of people in FtDNA projects with different sets of SNP's./names etc.  All the entries we use are from people who are alive today, or who just recently passed away.  Each person has a set of SNP's on his Y chromosome, which appears to be a historical record through time.

My point is that when I want to estimate the TMRCA of R-L21, I should only use entries  whose last mutation is that one, no subsequent SNP's.  Each subsequent SNP has a shorter TMRCA and will reduce the time estimate to R-L21, since their "diversity" started at a more recent point in time.  It's like comparing apples and oranges, they're just not the same kind of thing nor, more especially, they do not show the same kind of "diversity".
Logged
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #19 on: May 01, 2012, 09:18:56 AM »

I don't think downstream SNPs should be dismissed. The thing is that while say R-L21 people might have a common ancestor that lived say 3400 ybp, and R-U152 people have a common ancestor that lived say 5000 ybp, when you calculate the common ancestor of a set that has both L21 and U152, one is finding the common ancestor of both L21 and U152, that TMRCA should ideally be older than both the common ancestor of L21 and U152. That would in fact be a good way to test the reliability of TMRCA. I'm not sure how the sample dynamics would affect the TMRCA, i.e. if there is a set that is overpopulated by L21 folks with little U152, would the TMRCA be driven down to a number closer to the L21 TMRCA, or would it not change. I suspect there would be a significant impact in the TMRCA, just because the way it is calculated is calculated. In fact, let's see something very quickly:

Using the data from Myres et al(2010) let’s explore the variance of each SNP individually, and then as a group.

Using the following 10 markers DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS439, DYS461(A7.2)

The variance for L21(n=126) is 0.2238, the variance of U152(n=203) is 0.2089, the variance of the combine sample L21+U152(n=329) is 0.2146. So there is definitely something off here, the variance of two different SNPs should be greater when combined than each one of them separate, because if variance is a direct measured of TMRCA, the earliest R-L21 and R-U152 could share their MRCA would be R-P312, however their MRCA could have lived later than R-P312, however it would still have to be older than both the MRCA of the L21, and the U152.

PS: I know there is an anomaly here with the U152, this is probably caused by the Bashkirs who have a very young U152. Nonetheless, that doesn’t change the fact that the variance of any two SNPs which descend from a common ancestor should at least in theory be greater than their individual variances.  But to make sure this anomaly isn't causing this Mikeww if you could direct me to sets of L21 and U152 that use the 36 most linear STRs you have mentioned before, so that I can repeat this test.

« Last Edit: May 01, 2012, 09:22:19 AM by JeanL » Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #20 on: May 01, 2012, 12:11:27 PM »

They are not being dismissed, excluded is a better word.  They reduce the variance of the R-L21 sample.  All the M222 entries converge to one younger man.  If I looked at a tree, of R-L21, I would see a small set of entries whose origin is from R--L21 to the current time.  For others I would see lines from subsequent SNPs such as M222 to a larger set of entries.  To compute their variance I would have to add the variance to M222 and then the variance from M222 to R - L21?  I don't believe we are including the latter variance?  We are computing the variance to M222 and saying that is the same as computing the total variance back to L21?  JMHO

I am just exploring ways to express my concerns.  Your first calculation indicates something is not right here.  I will be exploring Z253 data set in more detail to see what I can observe from the data. 
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #21 on: May 01, 2012, 02:12:52 PM »

They are not being dismissed, excluded is a better word.  They reduce the variance of the R-L21 sample.  All the M222 entries converge to one younger man.  If I looked at a tree, of R-L21, I would see a small set of entries whose origin is from R--L21 to the current time.  For others I would see lines from subsequent SNPs such as M222 to a larger set of entries.  To compute their variance I would have to add the variance to M222 and then the variance from M222 to R - L21?  I don't believe we are including the latter variance?  We are computing the variance to M222 and saying that is the same as computing the total variance back to L21?  JMHO

I am just exploring ways to express my concerns.  Your first calculation indicates something is not right here.  I will be exploring Z253 data set in more detail to see what I can observe from the data.  

Just to give you my perspective, I try to avoid TMRCA or variance estimates of groups of people in different subclades in one TMRCA.  However, this is per the level of the Y DNA tree I'm on.  For instance, if I do a calculation for R-L21, I include all of R-M222 along with all of the other L21 subclades.  If I was doing P312, I'd also add in Z196, U152, etc.

I would not add in U106 or parts of U106 or L11* though.  That would be akin to including a "partial" and perhaps arbitrary data set.

I think of SNPs as great filters.  As you have noted, TMRCA estimates generally are not very precise and subject to mutation rate controversies. However, if I filter everyone out except those that are in the subclade at question (as marked by some SNP derived (+) result) then I've reduced the potential for error.

We should not think of SNPs, themselves, as the subclades.  They are just markers on the subclade branches of the Y DNA tree. They could be representative only a portion of a bigger, but very closely related branch of people, that they all sit on.
« Last Edit: May 01, 2012, 02:39:53 PM by Mikewww » Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #22 on: May 01, 2012, 02:18:29 PM »

...  PS: I know there is an anomaly here with the U152, this is probably caused by the Bashkirs who have a very young U152. Nonetheless, that doesn’t change the fact that the variance of any two SNPs which descend from a common ancestor should at least in theory be greater than their individual variances.  But to make sure this anomaly isn't causing this Mikeww if you could direct me to sets of L21 and U152 that use the 36 most linear STRs you have mentioned before, so that I can repeat this test.

When I can get back to my home computer, I'll update the Haplotype_Data_P312xL21 file.  Do you want to compare L21 to U152?  I don't have the Myres data set in my file. I just have stuff straight from FTDNA project screens.   Has has been discussed, there should be probably be some random sampling runs based on some cross-sectional "representative" reference by geography or by subclade (of L21 or U152).  I don't do that because I'm not smart enough and I think our data is limited as it is.
Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #23 on: May 01, 2012, 02:36:56 PM »

Based on some recent msgs from Hans Van vliet and Machiavelli, I am beginning to believe that only SNP subsets that only have the same SNP and no subsequent SNPs can be used for TMRCA calculations?

I think I'm missing the point and am just catching up on this thread, but there are plenty of SNPs out there.  We just haven't discovered them yet.  If literally, we wanted to find some group with no lower level SNPs, we might have to relegate ourselves to groups only as big as a father and his sons.. or maybe also the uncle, grandfather and g-grandfather, but maybe not all of the sons.

.... I've found more on the occurrence of Y DNA SNPs. The following is from Vince Tilroe, an ISOGG representative, on Rootsweb.
Quote from: Vince Tilroe
if the 3x10^-8 SNPs per site per generation approximation holds (implying 0.78 SNPs per generation across the ~ 26,000,000 base-pair coverage of the sequence-able Y-chromosome)
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334603956

He has calculated the rate of .78 SNPs per father-son transmission, so about 3/4th. ....

I asked Vince where he got his SNPs per site per generation data and he cited this study.
"Human Y Chromosome Base-Substitution Mutation Rate Measured by Direct Sequencing in a Deep-Rooting Pedigree" by Xue, 2009. http://download.cell.com/current-biology/pdf/PIIS0960982209014547.pdf?intermediate=true
« Last Edit: May 01, 2012, 02:39:08 PM by Mikewww » Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #24 on: May 01, 2012, 03:00:37 PM »


When I can get back to my home computer, I'll update the Haplotype_Data_P312xL21 file.  Do you want to compare L21 to U152?  I don't have the Myres data set in my file. I just have stuff straight from FTDNA project screens.   Has has been discussed, there should be probably be some random sampling runs based on some cross-sectional "representative" reference by geography or by subclade (of L21 or U152).  I don't do that because I'm not smart enough and I think our data is limited as it is.

I don't have access to the Yahoo Project, you will have to tell me how to sign up using a hotmail account. It’s fine if is FTDNA, I just want to try it out at a higher number of STRs, just to make sure that the small numbers aren’t tricking me. As for the random sampling, that would actually be a good thing to test. I can write a program to extract 75 random haplotypes from each set. It will be good to test the whole set of L21, the whole set of U152, then see if their total population numbers have any inference in the variance, and then repeat the same test using 75 or 100 randomly sampled haplotypes from each set.
Logged
Pages: [1] 2 3 ... 7 Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.081 seconds with 19 queries.