World Families Forums - Age of L238?

Welcome, Guest. Please login or register.
September 17, 2014, 08:30:36 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  Age of L238?
« previous next »
Pages: [1] Go Down Print
Author Topic: Age of L238?  (Read 1135 times)
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« on: April 03, 2013, 02:43:51 PM »


We now have a dozen or so L238 examples with at least 67 markers. Is this sufficient to calculate an approximate age of L238? Is anyone who is competent at this sort of thing willing to give it a go?

It has a very tight STR profile, and I believe at least two of them are relatively fast mutators. Its distribution so far is largely limited to Scandinavia, with a few in England and Scotland. This suggests to me that it is a relatively recent subclade, probably not much over 2000 YBP, and perhaps even younger.
Logged
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #1 on: April 03, 2013, 10:45:39 PM »

Using the data for the 12 L-238 haplotypes who have tested to at least 67 markers yields an estimated convergence age of 1300 +-180 ybp.  This convergence could be to an individual rather than to the L-238 SNP mutation occurence.
Logged

CED
R-U106**
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« Reply #2 on: April 04, 2013, 03:20:14 PM »

Using the data for the 12 L-238 haplotypes who have tested to at least 67 markers yields an estimated convergence age of 1300 +-180 ybp.  This convergence could be to an individual rather than to the L-238 SNP mutation occurence.

Thanks very much. That does reinforce my belief that it is a comparatively recent subclade of P312, and that there is likely one or more as yet undiscovered SNPs between P312 and L238.
Logged
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #3 on: April 04, 2013, 05:20:50 PM »

Mutation rates do  vary from individual to individual, family group to family group, subclade to subclade, and haplogroup to haplogroup.  The range of the mutation variation can be from 40 percent to 160 percent as compared to the haplogroup average mutation rate.  There is an anomaly in  the L-238 convergence age since the apparent average mutation rate of the group is only 67 percent of the R1b rate.   The R1b average mutation rate was estimated  using a large data set of over 5000 haplotypes.  Consequently, taking this mutation rate difference into account results in estimating the potential convergence age to be 1965 +- 250 ybp.  This would probably be the upper limit of the convergence age.
Logged

CED
R-U106**
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« Reply #4 on: April 04, 2013, 08:10:14 PM »

Mutation rates do  vary from individual to individual, family group to family group, subclade to subclade, and haplogroup to haplogroup.  The range of the mutation variation can be from 40 percent to 160 percent as compared to the haplogroup average mutation rate.  There is an anomaly in  the L-238 convergence age since the apparent average mutation rate of the group is only 67 percent of the R1b rate.   The R1b average mutation rate was estimated  using a large data set of over 5000 haplotypes.  Consequently, taking this mutation rate difference into account results in estimating the potential convergence age to be 1965 +- 250 ybp.  This would probably be the upper limit of the convergence age.


Thanks again for your additional information. You are obviously very well informed, and I very much appreciate your taking the time to work this out.

I am a little taken back by my initial ball park estimate of 2000 YBP coming so close to your final figure.
Logged
chris1
Senior Member
***
Offline Offline

Posts: 66


« Reply #5 on: April 05, 2013, 08:24:27 AM »

Mutation rates do  vary from individual to individual, family group to family group, subclade to subclade, and haplogroup to haplogroup.  The range of the mutation variation can be from 40 percent to 160 percent as compared to the haplogroup average mutation rate.  There is an anomaly in  the L-238 convergence age since the apparent average mutation rate of the group is only 67 percent of the R1b rate.   The R1b average mutation rate was estimated  using a large data set of over 5000 haplotypes.  Consequently, taking this mutation rate difference into account results in estimating the potential convergence age to be 1965 +- 250 ybp.  This would probably be the upper limit of the convergence age.

That's interesting. How would one work out the mutation variation for a particular subclade? What might cause the anomaly in L238?
Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #6 on: April 05, 2013, 09:11:41 AM »

For whats it worth from my TMRCA Estimator

R1b-P312 n=1180
R1b-P312 > L238 n=13

Bird's q 67 marker STRs

IntraClade Coalescence (n-1) Age
1,705.7 +OR- 735.1 YBP

Intraclade Founder's Modal Age
1,998.1 +OR- 795.6 YBP

InterClade Moda R1b-P312 & L238
2,862.1 +OR-549.6 YBP

MJost
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #7 on: April 07, 2013, 01:03:30 PM »

Col 1     Col 2  Col 1 x Col 2      Col 4     Col 5  Col 4 x Col 5
26   0   0   33   0   0
25   2   50   32   1   32
24   2   48   31   2   62
23   7   161   30   6   180
22   15   330   29   6   174
21   42   882   28   9   252
20   74   1480   27   21   567
19   108   2052   26   30   780
18   143   2574   25   58   1450
17   251   4267   24   69   1656
16   316   5056   23   104   2392
15   373   5595   22   114   2508
14   368   5152   21   157   3297
13   378   4914   20   172   3440
12   345   4140   19   225   4275
11   267   2937   18   289   5202
10   194   1940   17   296   5032
9   120   1080   16   290   4640
8   57   456   15   310   4650
7   36   252   14   253   3542
6   16   96   13   222   2886
5   2   10   12   176   2112
4   2   8   11   121   1331
3   2   6   10   81   810
2   0   0   9   46   414
Totals    3120   43486   8   35   280
          7   20   140
R1b Data Set      6   4   24
# of Hts   3120       5   3   15
# of  Loci   67       4   0   0
Total=Loci x Hts   209040    Totals   3120   52143
# of  Mutations   43486            
# Non-mutations   165554            
# Non-muts/Total   0.791973
Avg # Muts/Hts   13.94   
Genetic Distance   52143            
Avg # GDs/Hts   16.71            

Investigating mutation variations in detail requires about ten steps using MS Excel , MS Word and specialized software applications.   However, it is worth the effort because of what it reveals. Enclosed is compiled  histogram data pertaining to infinite allele mutations and genetic distance data for 3120 R1b haplotypes who tested to 67 markers.  Each loci is included whether or not it is palindromic.  Consequently, each component of the following markers are treated as separate loci:  DYS385a,b; DYS459a,b; DYS464a,b,c,d; YCAIIa,b; CDYa,b; DYF395s1a,b; and DYS413a,b.  The allele count of each loci is compared to the R1b modal values to determine the numbers of mutations and the genetic distances.  Since the column headings did not copy properly I will describe each column of the data.  Column one is the number of mutations for the row.  Column two is the number of haplotypes who had the number of mutations shown in column one.  Column three is the product of columns one and two thereby resulting in the total number of mutations for the row.  Column four is the genetic distance number for the row.  Column five is the number of haploypes who had that genetic distance number.  Column six is the product of columns four and five thereby yielding the total genetic distance for the row.  The total mutations (43486)  divided by the number of haplotype cases (3120) yields an average rate of 13.94 mutations per haplotype for 67 markers.  The total gentic distance (52143) divided by the number of haplotypes (3120) yields an average rate of 16.71 genetic distance per haplotype for 67 markers.

This same methodolgy was used to determine the average rates for the 12 L-238 haplotypes.  The average mutation rate for 12 haplotypes tested to 67 markers was 9.33, which is 67 percent of the 13.94 R1b rate.

The histogram data begs assumptions.  Either there are multiple MRCAs spread between 20 to 400 generations or the mutation rates are random variables which are influenced by factors such as  time, allele locations, number of repeats, organism stress, etc.  I hope that the emphasis shift to SNPs does not stop the study of STRs since the future of Homo sapiens may hinge on yet to be determined STR factors.

Sorry about the table data misalignment when entered into this reply space!
Logged

CED
R-U106**
mcg11
Member
**
Offline Offline

Posts: 38


« Reply #8 on: April 07, 2013, 02:36:29 PM »

Some pretty neat work with excel.  Do you have any way of accommodating back mutations/hidden mutations?  This does happen at the faster mutators and makes TMRCA estimates variable.

CDYa,b are no longer useful beyond 400 or so years and others soon are problematic.
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #9 on: April 07, 2013, 02:40:33 PM »

Investigating mutation variations in detail requires about ten steps using MS Excel , MS Word and specialized software applications.

I might be wrong but it looks to me that you have discovered different subclades have different ages.
« Last Edit: April 07, 2013, 02:43:09 PM by Jdean » Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

rocketman
Member
**
Offline Offline

Posts: 12


« Reply #10 on: April 07, 2013, 04:59:57 PM »

 To Jdean:  Yes different subclades exhibit different ages.  The cleaner the subclade data, i.e., all the same SNP level,  the better  the estimate.

To mcj11:  Excel is only used to condition the data.  The analysis is performed with various software programs that I have developed.  The main analysis programs (25, 37, 67 and 111 markers) are forward and backward random walk Markoff process using one step, two step, and three step probabilities based on mutation rates.  Mutation rates for each marker are calculated to the fifth decimal place for use in the programs.  The non-mutations and the mutations at each plus and minus steps are computed at each marker over the allele range (generally modal value + 7  and – 7).  Running the 67 marker 3120 haplotype case requires approximately 420000 random number calls for each generation.  The beauty of these programs is that nearly every aspect of the mutation environment is taken into account.  The skewness of the data can be emulated post priori, however, it cannot be predicted apriori.  The distribution of the mutated values around the non-mutated value follow a Poisson Probabilty Distribution.  The non-mutated value ratio (non-mutated value divided by the number of haplotypes) is the P(0) of the Poisson probabilty.
An interesting aspect of the mutation rates across Haplogroups is that, for example,  even though the HGs E3a, E3b, G, I, J2, R1a, and R1b all seem to have approximately the same overall average mutation rate (0.0022) the mutation rates for each marker appear to be different for each Haplogroup.  It looks like Nature keeps track of the big picture while allowing and balancing randomness within the structure.
Logged

CED
R-U106**
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« Reply #11 on: April 07, 2013, 08:09:59 PM »

Col 1     Col 2  Col 1 x Col 2      Col 4     Col 5  Col 4 x Col 5
26   0   0   33   0   0
25   2   50   32   1   32
24   2   48   31   2   62
23   7   161   30   6   180
22   15   330   29   6   174
21   42   882   28   9   252
20   74   1480   27   21   567
19   108   2052   26   30   780
18   143   2574   25   58   1450
17   251   4267   24   69   1656
16   316   5056   23   104   2392
15   373   5595   22   114   2508
14   368   5152   21   157   3297
13   378   4914   20   172   3440
12   345   4140   19   225   4275
11   267   2937   18   289   5202
10   194   1940   17   296   5032
9   120   1080   16   290   4640
8   57   456   15   310   4650
7   36   252   14   253   3542
6   16   96   13   222   2886
5   2   10   12   176   2112
4   2   8   11   121   1331
3   2   6   10   81   810
2   0   0   9   46   414
Totals    3120   43486   8   35   280
          7   20   140
R1b Data Set      6   4   24
# of Hts   3120       5   3   15
# of  Loci   67       4   0   0
Total=Loci x Hts   209040    Totals   3120   52143
# of  Mutations   43486            
# Non-mutations   165554            
# Non-muts/Total   0.791973
Avg # Muts/Hts   13.94   
Genetic Distance   52143            
Avg # GDs/Hts   16.71            

Investigating mutation variations in detail requires about ten steps using MS Excel , MS Word and specialized software applications.   However, it is worth the effort because of what it reveals. Enclosed is compiled  histogram data pertaining to infinite allele mutations and genetic distance data for 3120 R1b haplotypes who tested to 67 markers.  Each loci is included whether or not it is palindromic.  Consequently, each component of the following markers are treated as separate loci:  DYS385a,b; DYS459a,b; DYS464a,b,c,d; YCAIIa,b; CDYa,b; DYF395s1a,b; and DYS413a,b.  The allele count of each loci is compared to the R1b modal values to determine the numbers of mutations and the genetic distances.  Since the column headings did not copy properly I will describe each column of the data.  Column one is the number of mutations for the row.  Column two is the number of haplotypes who had the number of mutations shown in column one.  Column three is the product of columns one and two thereby resulting in the total number of mutations for the row.  Column four is the genetic distance number for the row.  Column five is the number of haploypes who had that genetic distance number.  Column six is the product of columns four and five thereby yielding the total genetic distance for the row.  The total mutations (43486)  divided by the number of haplotype cases (3120) yields an average rate of 13.94 mutations per haplotype for 67 markers.  The total gentic distance (52143) divided by the number of haplotypes (3120) yields an average rate of 16.71 genetic distance per haplotype for 67 markers.

This same methodolgy was used to determine the average rates for the 12 L-238 haplotypes.  The average mutation rate for 12 haplotypes tested to 67 markers was 9.33, which is 67 percent of the 13.94 R1b rate.

The histogram data begs assumptions.  Either there are multiple MRCAs spread between 20 to 400 generations or the mutation rates are random variables which are influenced by factors such as  time, allele locations, number of repeats, organism stress, etc.  I hope that the emphasis shift to SNPs does not stop the study of STRs since the future of Homo sapiens may hinge on yet to be determined STR factors.

Sorry about the table data misalignment when entered into this reply space!


This way over my head. Out of this jumble of numbers, what is your take on the age of P312 itself?
Logged
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #12 on: April 08, 2013, 09:27:28 AM »

Sorry about any confusion. The data in the Table was from a conglomeration of potential R1b1a2 candidates most of whom had not been SNP tested to determine their actual subclade.  Group Aa. R1b-P312* in the FTDNA R1b-P312 & Subclades Project has about 115 kits who tested to 67 markers.  I will use that data to estimate the P-312 age point and let you know the results.
Logged

CED
R-U106**
mcg11
Member
**
Offline Offline

Posts: 38


« Reply #13 on: April 08, 2013, 10:07:18 AM »

To mcj11:  Excel is only used to condition the data.  The analysis is performed with various software programs that I have developed.  The main analysis programs (25, 37, 67 and 111 markers) are forward and backward random walk Markoff process using one step, two step, and three step probabilities based on mutation rates.  Mutation rates for each marker are calculated to the fifth decimal place for use in the programs.  The non-mutations and the mutations at each plus and minus steps are computed at each marker over the allele range (generally modal value + 7  and – 7).  Running the 67 marker 3120 haplotype case requires approximately 420000 random number calls for each generation.  The beauty of these programs is that nearly every aspect of the mutation environment is taken into account.  The skewness of the data can be emulated post priori, however, it cannot be predicted apriori.  The distribution of the mutated values around the non-mutated value follow a Poisson Probabilty Distribution.  The non-mutated value ratio (non-mutated value divided by the number of haplotypes) is the P(0) of the Poisson probabilty.
An interesting aspect of the mutation rates across Haplogroups is that, for example,  even though the HGs E3a, E3b, G, I, J2, R1a, and R1b all seem to have approximately the same overall average mutation rate (0.0022) the mutation rates for each marker appear to be different for each Haplogroup.  It looks like Nature keeps track of the big picture while allowing and balancing randomness within the structure.

  Good work.  Several additional comments.  1.  For small sets of closely related data, only unique mutational events should be counted (see Kerchners blog re: UME's).  Apparently mutation rate may change with modal marker value.  Look at 388 in I,J vs R1b.  I wonder if 5 place precision is warranted for mutation rates. 2. what do you do to handle multisteps?  3.  Have you considered using different sets of rates for different applications?  e.g. slow mutators for long periods of time; faster for recent genealogic events and medium for in-between situations. Klyosov (google him) has written extensively about this approach especially wrt slow mutators.
Logged
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« Reply #14 on: April 08, 2013, 02:34:25 PM »

Sorry about any confusion. The data in the Table was from a conglomeration of potential R1b1a2 candidates most of whom had not been SNP tested to determine their actual subclade.  Group Aa. R1b-P312* in the FTDNA R1b-P312 & Subclades Project has about 115 kits who tested to 67 markers.  I will use that data to estimate the P-312 age point and let you know the results.

Thanks, I would be very interested in the result. Sorry to put you to any extra work. I assumed the answer was hidden somewhere in the jumble of numbers above, and that I wasn't clever enough to spot it.

I wonder though why you would choose Group Aa from the P312 project pages for your calculations. This is not a random group of P312. It is those who have tested negative for L21 and U152, but have not yet tested for DF27, DF19 or L238. As such, the vast majority, probably something around 80%, will end up being DF27+. I wouldn't be surprised if DF27 is very nearly as old as P312 itself, but would using a group that excludes L21 and U152 give a skewed result?
Logged
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #15 on: April 08, 2013, 04:02:18 PM »

I chose Group Aa because I assumed they were true R1b-P312* who had already tested negative for all subclades.  Is Group D true P312* individuals who have tested negative for all subclades or is there a better sample?  I have not been paying much attention to Clade P312 since I am U106* with negative all subclades.

Logged

CED
R-U106**
GoldenHind
Old Hand
****
Offline Offline

Posts: 731


« Reply #16 on: April 09, 2013, 02:19:08 PM »

I chose Group Aa because I assumed they were true R1b-P312* who had already tested negative for all subclades.  Is Group D true P312* individuals who have tested negative for all subclades or is there a better sample?  I have not been paying much attention to Clade P312 since I am U106* with negative all subclades.



I can understand the confusion. Group A at one time consisted of those who had tested negative for all known subclades. However that was before the discovery a year or so ago of DF27 and DF19. Group A was still useful as it included those who had already tested negative for the two major subclades L21 and U152. So the group was kept intact with the suggestion that they next test for DF27. About 80% of those on the P312* list who have tested for DF27 have got positive results.

Those who have tested negative for all currently known P312 subclades are classified as P312** and listed in section D. Although steadily growing, there are  only 34 on the list at the moment.

That being said, I cannot see why you think the P312** list would give an accurate assessment of the age of P312 itself. I have no doubt that this is just a list of those whose subclade under P312 remains unidentified, and there is no reason to think they are any older variety of P312 than those in any other P312 subclade. 
Logged
rocketman
Member
**
Offline Offline

Posts: 12


« Reply #17 on: April 11, 2013, 09:07:44 AM »

The following information is not claimed to be totally accurate since many assumptions have to be made pertaining to both the available data and the analysis methodologies used. There is no absolute or  perfect way to establish the age of any clade or subclade.  However, investigating each aggregated subclade group provides a feel for the potential age of the overarching clade.  Mutation counts are based on the Infinite Alleles Model while the dynamic computation methods are based on the Stepwise Mutation Model.  These two models were introduced in Bruce Walsh’s 2001 article in Genetics, Volume 158: pgs 897-912 entitled “Estimating the Time to the MRCA for the Y Chromosome or mtDNA for a Pair of Individuals”.  His article may be the basis of the rationale why FTDNA converted to using the infinite alleles method for comparison purposes, a source of frustration for many. 
I have analyzed both Group Aa and Group D for convergence age.  Data from 110 haplotypes in Group Aa who had tested to 67 markers or more was used in the analysis.  Group Aa only had one modal value different from an R1b group of over 5000 haplotypes.  It was a value of 15 versus 16 at Locus 30, DYS456, one of the faster mutating markers.  However, each haplotype’s mutation and GD counts were compared to the Group Aa modals.  The age estimated using a computer modeling and simulation program is 3455 + or – 180 ybp. The confidence interval is 95 percent (+ or - 2 sigma).
 Data from 30 haplotypes who had tested to 67 markers or more was  used as the Group D population. Group D only had one modal value different from an R1b group of over 5000 haplotypes.  It was a value of 30 versus 29 at Locus 21, DYS449, another one of the faster mutating markers.  However, each haplotype’s mutation and GD counts were compared to the Group D modals. The age was estimated by a computer modeling and simulation program to be 4160 + or – 300 ybp.  This is a surprisingly significant increase over the Group Aa estimate.  However, removing the five fastest mutating markers from consideration did reduce the convergence age to 4000 + or – 210 ybp.  The markers taken out of consideration were: Loci  21, 30, 32, 34 and 35, i.e., DYS449, DYS456, DYS576, CDYa and CDYb.
There are some apparent parallel mutation events in both of the Group Aa and Group D samples which would  influence the results, i.e., causing an increase in the computed convergence age.  Parallel mutations are redundant or carry through of mutations in multiple descendant lines from generation to generation as a consequence of a mutation event which happened to some earlier ancestor.  Consequently, a number of the observed mutation counts may not be independent events.  For example, loci 4, 9, 10, 12, 15, 24, 27, 31, 33, 36, 43, 47, 49 and 59 could be candidates of loci that contain some degree of parallel mutations.  Removing parallel mutations from consideration will reduce the convergence age.  For back of the envelope type of year per mutation  adjustment rates the values would be approximately minus 100 years per 42 mutations removed in Group Aa and minus 100 years per 11 mutations removed in Group D.   However, the problem is to determine which and how many observed mutations are redundant and can be removed from consideration in the computations.
Logged

CED
R-U106**
mcg11
Member
**
Offline Offline

Posts: 38


« Reply #18 on: April 11, 2013, 09:28:50 AM »

Parallel Mutations  = Unique mutational events.  That said if a subsequent mutation occurs at some other loci, than the haplotypes are different , but you still shouldn't count the UME more than once.
« Last Edit: April 11, 2013, 09:29:12 AM by mcg11 » Logged
Pages: [1] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.136 seconds with 18 queries.