World Families Forums - STR Wars: Is diversity meaningful? more meaningful than Hg frequency?

Welcome, Guest. Please login or register.
April 19, 2014, 04:20:32 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
« previous next »
Pages: 1 ... 12 13 [14] Go Down Print
Author Topic: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?  (Read 15934 times)
Autochthon
Member
**
Offline Offline

Posts: 18


« Reply #325 on: October 12, 2012, 04:59:45 PM »

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #326 on: October 12, 2012, 06:20:57 PM »

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.
« Last Edit: October 12, 2012, 06:21:44 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #327 on: October 12, 2012, 06:25:56 PM »

Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

To the extent that this is true (and I'm not technically competent to argue it) there's also an effect of whether a marker is moving up or down; it can mutate in either direction.  The "modal" numbers are based on whatever direction the markers took that were most successful in reproducing males -- and therefore have the present day appearance of having been the prehistoric norm.  And I've seen no scientific reason really to believe that.  The WAMH, etc. may or may not have been modal a few thousand years ago, whenever pappy L11 (as one example) had his sons.  It's now modal for a majority of their West Atlantic survivors.

This post doesn't mean to deny that pappy L11 had a haplotype -- but questions how sure we can be what that was, until we have had the opportunity to dig up a few more really old guys, and test their Y-DNA to a phylogenetically meaningful level.

If you calculate the modal values for U106, U152 and L21 over 67 loci you will find the results are very similar, WAMH may not be the exact ancestral values but it can't be far off !!
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

alan trowel hands.
Guru
*****
Offline Offline

Posts: 2012


« Reply #328 on: October 12, 2012, 06:26:12 PM »

Saw this on Dienenes blog

http://dienekes.blogspot.co.uk/2012/10/ann-gibbons-on-slower-mutation-rate.html

Logged
Autochthon
Member
**
Offline Offline

Posts: 18


« Reply #329 on: October 13, 2012, 05:53:57 PM »

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #330 on: October 13, 2012, 08:20:28 PM »

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample

Thank you for the analysis. You have legitimate concerns. Please bring them up with the author of the tool, Ken Nordtvedt. He frequents this Hg I forum and he will respond.
http://archiver.rootsweb.ancestry.com/th/index/Y-DNA-HAPLOGROUP-I
« Last Edit: October 13, 2012, 08:21:48 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
stoneman
Old Hand
****
Offline Offline

Posts: 141


« Reply #331 on: October 14, 2012, 05:28:23 AM »

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample


The 5400 ybp for L21 looks more realistic and that means they could have been involved with the building of Newgrange.
Logged
alan trowel hands.
Guru
*****
Offline Offline

Posts: 2012


« Reply #332 on: October 14, 2012, 07:40:42 AM »

I am not remotely mathematical but that sort of change to the method seems to make sense to me.  Can anyone with a maths brain please comment on this. 
Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #333 on: October 14, 2012, 03:20:21 PM »

Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095

« Last Edit: October 14, 2012, 03:23:42 PM by Mark Jost » Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Autochthon
Member
**
Offline Offline

Posts: 18


« Reply #334 on: October 15, 2012, 04:00:17 PM »

Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10
Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #335 on: October 15, 2012, 06:32:20 PM »

Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost

Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #336 on: October 16, 2012, 12:00:24 AM »

....
G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         
...
Gen111TMod using a set of 111 marker haplotypes....
Notice the tighter StdDev In Generations spread at 111 vs 67.
....
111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095

I take it you feel pretty good about the 3400 years before present?  ... so that gets us to 1500 BC and maybe 2000 BC.
Logged

R1b-L21>L513(DF1)>L705.2
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #337 on: October 16, 2012, 08:47:23 AM »

....
G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         
...
Gen111TMod using a set of 111 marker haplotypes....
Notice the tighter StdDev In Generations spread at 111 vs 67.
....
111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095

I take it you feel pretty good about the 3400 years before present?  ... so that gets us to 1500 BC and maybe 2000 BC.

Unless someone comes up with another method, then yes it is very likely with a max of 4,069 years using the data's standard deviation. What I didnt show is, that using a confidence level of 95.45% has a +-987 YBP, adding a just 300 years to the best calculated probablity of +-668 years.

Just to test a question I had, I removed the two fastest markers in the 68-111 panel, STR's 712 and 710, produced these numbers and effectively, did not change the number of generations, only the variance change causing the STD Dev in Generations to increase. I was expecting this GenSD to decrease not the opposite to occur.

Generations   StdDevInGen   YBP   +-YBP
113.4   24.2   3,402.6   724.9


MJost
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #338 on: October 16, 2012, 09:02:18 AM »

Also here is a comment worth repeating.

"Finally, note that the resolution for t offered by using
n=5 markers is very poor, but rather fine precision is
offered by using 100 markers." (t = time)


"Estimating the Time to the Most Recent Common Ancestor for
the Y chromosome or Mitochondrial DNA for a Pair of Individuals"

Genetics Society of America
Bruce Walsh
March 22, 2001
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Autochthon
Member
**
Offline Offline

Posts: 18


« Reply #339 on: October 16, 2012, 03:32:16 PM »

Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost



Mark,
If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)
2) Copy the function to all markers on row 550 from D550 to BR550
3) Insert in cell BS550 the function =AVERAGE(C550:BR550)
Cell BS550 displays the MRCA in generations.

In our example MRCA estimate Ga = 178 (5340YBP) a significant variation from the above result.

This procedure also allows us to examine how the individual markers in the Haplogroup
have behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

The results using Marko H mutation rates do not differ significantly.



 
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #340 on: October 16, 2012, 04:58:01 PM »

If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

Autochthon, since Ken Nordtvedt is the author of this methodology please go on to the forum that he is on and explain the problem you are seeing.
http://archiver.rootsweb.ancestry.com/th/index/Y-DNA-HAPLOGROUP-I

I'm sure he will answer.

Although you could not couch your position in terms of Ken's tool, you could describe the problem generically on this forum below and Anatole Klyosov will respond. If he doesn't, I'll ask him to. I think he generally does the same thing.
http://archiver.rootsweb.ancestry.com/th/index/GENEALOGY-DNA
« Last Edit: October 16, 2012, 08:23:46 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #341 on: October 16, 2012, 05:54:12 PM »

I assume Ken's Gen111T engine which uses variance and mutation rates generation formula have been tested against paper lineages.

The estimator sheet I modified did use a per marker age formula and the sum was much higher than using the existing method and is in a row 30 and 31 called CladeAmarkerGen.


You will need to take your hypothosis and prove it to a know paper trail with plenty of documented haplotypes.

MJost
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #342 on: October 18, 2012, 12:36:10 PM »

Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost



Mark,
If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)
2) Copy the function to all markers on row 550 from D550 to BR550
3) Insert in cell BS550 the function =AVERAGE(C550:BR550)
Cell BS550 displays the MRCA in generations.

In our example MRCA estimate Ga = 178 (5340YBP) a significant variation from the above result.

This procedure also allows us to examine how the individual markers in the Haplogroup
have behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

The results using Marko H mutation rates do not differ significantly.



 

I have been following this entry and would like to support your position.  The present variance approach, in my judgment, underestimates TMRCA's.  I've said this for quite a period of time, but it is difficult to prove because examples are rare where known dates are available.

Goldstein and stumpf in their review paper published in Science, March 2001, used a different approach.  For each dys loci they computed the TMrCA by dividing the ASD by the mutation rate for that locus.  This weights each loci equally.

The problem has been identified, but not accepted, by this community.  The data shows that there is very little variance contributed by the slower mutators, they mutate around the modal.  Faster mutators have a limited range of values they can assume, greater than +/- 1.  Therefore they contribute some ASD/variance.  However they saturate after a while also.  In my opinion Variance/ASD does not model the mutational process.

What one needs to do is count mutations, but that is very difficult due to hidden mutations for fast and medium mutators.  For longer durations, slow mutators can be used, but this requires care.

I hope you continue this effort you have initiated.
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #343 on: November 07, 2012, 05:51:14 PM »

I thought it was interesting so I'll just archive it here.

Quote from: Anatole Klyosov
Subject: Re: [DNA] DNA] Coincidental convergence (or lack of divergence)
Quote from: Pietrzakstan
> How it is possible to count mutations between the two older haplotypes,
> if in a period of about several thousand years in one rapidly mutating
> locus
> may be 5-10 mutations?
> How it is possible to count mutations, if they are parallel in the same
> locus in both haplotypes?
> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "in
a period of about several thousand years". Who on Earth would want to do it
and for what purpose? Mutations are governed by statistics, and two
haplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just exclude
them. As simple as that. For example, for thousands years back I employ 22
marker haplotypes, in which one mutations happens in several thousand years.
This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.
2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to a
common ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the same
locus in both haplotypes?" - please elaborate. Your question is hard to
understand. However, please remember that you cannot work reliably with two
haplotypes. You cannot toss a coin two times only and hope to calculate
something out of this "statistics".
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352323225

I agree with Anatole's general point here that doing things like TMRCA estimates between just two people is not reliable. You need more data to apply statistical averages.

I don't agree or disagree on his point about the 16 slowest markers, but apparently he thinks they have enough linear duration to be linear for a few million years.
Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Online Online

Posts: 2105


« Reply #344 on: November 07, 2012, 10:25:41 PM »

I was writing about this from many years, but my principles were three (at least):

1)   mutations happen around the modal
2)   there is a convergence to the modal as time passes
3)   sometime a mutation goes for the tangent

DYS391 mutates above all around 10 and 11 values
DYS439 mutates above all around 11-12-13 etc
Of course I am speaking of hg. R, but the same principle explains all the other haplogroups, which diverged only because they started from a different values gone for the tangent, but frequently from the same value and these values are almost the same also on very distant haplogroups.

My theory of the ancientness of hg. R in Europe presupposes this and I think it will come out winning.

Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Autochthon
Member
**
Offline Offline

Posts: 18


« Reply #345 on: November 10, 2012, 06:26:15 AM »

I thought it was interesting so I'll just archive it here.

Quote from: Anatole Klyosov
Subject: Re: [DNA] DNA] Coincidental convergence (or lack of divergence)
Quote from: Pietrzakstan
> How it is possible to count mutations between the two older haplotypes,
> if in a period of about several thousand years in one rapidly mutating
> locus
> may be 5-10 mutations?
> How it is possible to count mutations, if they are parallel in the same
> locus in both haplotypes?
> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "in
a period of about several thousand years". Who on Earth would want to do it
and for what purpose? Mutations are governed by statistics, and two
haplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just exclude
them. As simple as that. For example, for thousands years back I employ 22
marker haplotypes, in which one mutations happens in several thousand years.
This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.
2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to a
common ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the same
locus in both haplotypes?" - please elaborate. Your question is hard to
understand. However, please remember that you cannot work reliably with two
haplotypes. You cannot toss a coin two times only and hope to calculate
something out of this "statistics".
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352323225

I agree with Anatole's general point here that doing things like TMRCA estimates between just two people is not reliable. You need more data to apply statistical averages.

I don't agree or disagree on his point about the 16 slowest markers, but apparently he thinks they have enough linear duration to be linear for a few million years.
His method is relatively simple which seem to go as follows.
A) Choose a set of appropriate markers to suit the approximate age of the Haplogroup, fast markers for recent, slow markers for several thousands of years and very slow markers for longer periods.
B) Discard any of the markers where the mutations on an individual allele are suspected of having gone none-linear due to back (reverse) mutations.
C) Count the total number of mutations from the modal of the Haplogroup applicable to the chosen markers.
D) Apply a constant to the total number of mutations for back (reverse) mutations which is derived from probability calculations to give a new (increased) total.
E) Divide the New total of mutations by a single mutation rate derived from haplogroups of "known?" age/generations. The result is the TMRCA in generations.
Logged
seferhabahir
Old Hand
****
Offline Offline

Posts: 271


« Reply #346 on: November 11, 2012, 02:00:40 PM »


There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.
Logged

Y-DNA: R-L21 (Z251+ L583+)

mtDNA: J1c7a

razyn
Old Hand
****
Offline Offline

Posts: 405


« Reply #347 on: November 11, 2012, 02:20:25 PM »


There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028
Logged

R1b Z196*
alan trowel hands.
Guru
*****
Offline Offline

Posts: 2012


« Reply #348 on: November 11, 2012, 07:11:40 PM »


There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028


That sounds very promising.  What would the implication of this be for R1b?
Logged
Heber
Old Hand
****
Offline Offline

Posts: 448


« Reply #349 on: November 11, 2012, 08:04:17 PM »


There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028


That sounds very promising.  What would the implication of this be for R1b?

Indeed, it looks like progress. I retained a basic formulae of 50 years per SNP. As R1b has the highest density of SNP on the Phylogenetic tree, it is a matter of counting SNP between clades and multiplying by 50 to estimate the age. A nice simple rule of thumb. I like the fact that Anatole is broadly agreeing with the methodology which gives me greater confidence re the checks and balances.
As we are currently experiencing a rapid expansion in the Phylogenetic tree and number of new SNPs discovered, this will be of great benefit in calculating rough migration routes and timelines.
We should get an updated tree (I hope) in the next few weeks with the release of Geno 2.0.
That should be a good opportunity to test the theory.
« Last Edit: November 11, 2012, 08:11:16 PM by Heber » Logged

Heber


 
R1b1a2a1a1b4  L459+ L21+ DF21+ DF13+ U198- U106- P66- P314.2- M37- M222- L96- L513- L48- L44- L4- L226- L2- L196- L195- L193- L192.1- L176.2- L165- L159.2- L148- L144- L130- L1-
Paternal L21* DF21


Maternal H1C1



Pages: 1 ... 12 13 [14] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.154 seconds with 18 queries.