World Families Forums

General Forums - Note: You must Be Logged In to post. Anyone can browse. => R1b General => Topic started by: Mike Walsh on April 12, 2012, 12:17:08 PM



Title: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 12:17:08 PM
This is always a contentious issue. I think STR diversity is useful. There are challenges and they must be considered in context.

In my opinion, people are fine with it until it disagrees with their theory, then they must shoot it down rather than adjust their theory. To me it is just another data point, and unfortunately we are in dire need of those.

Anyway, let's discuss this topic here so we don't have to argue the points over and over again in other topics, drowning them out.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 12:24:35 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 12, 2012, 12:30:24 PM
This is why I like looking at the relative STR variance numbers because that avoids the mutation rate issues you are talking about, but variance still gives indications of direction/migration.

My only issue with relative STR variance, is that one measures STR variance off what one presumes to be the modal haplotype for a given population. I mentioned on a different thread the two key assumptions that are made when calculating modals:

1-The ancestral allele for a given locus is such that minimizes the number of mutations in that given locus for a population.

2- The ancestral allele is still present in the sample being analyzed.

Moreover, often times, if not always, the STR variance is given as a function of the overall variance, not for each locus analyzed. So what does it matter if population A has an excessive variance coming from locus DYS-XXX, if locus DYS-XXX is known to mutate very fast? Does that somehow makes that population somewhat older because they have a higher overall variance, what if a population-B doesn't have as many mutations in DYS-XXX, but has more than twice the number of mutations population-A has on a different locus DYS-XXY, which is known to mutate very slowly? Still when one looks at the overall variance population-A is going to have more variance than population-B, but  once the variance is broken down per locus, we find that population-B accumulated more variance in the slower marker than population-A. Of course there are at least two possible explanations for these phenomena:

1-) Population-B for some odd reason(environmental, positive selection, modal allele having more repetitions) actually accumulates mutations on DYS-XXY at a faster rate than population-A.

2-)Population-B accumulates mutations on DYS-XXY at the same rate as population-A, but it just so happens that on locus DYS-XXX population-B has experienced more back-mutations than population-A.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 04:43:44 PM
This is why I like looking at the relative STR variance numbers because that avoids the mutation rate issues you are talking about, but variance still gives indications of direction/migration.

My only issue with relative STR variance, is that one measures STR variance off what one presumes to be the modal haplotype for a given population. I mentioned on a different thread the two key assumptions that are made when calculating modals:

1-The ancestral allele for a given locus is such that minimizes the number of mutations in that given locus for a population.

2- The ancestral allele is still present in the sample being analyzed.

Moreover, often times, if not always, the STR variance is given as a function of the overall variance, not for each locus analyzed. So what does it matter if population A has an excessive variance coming from locus DYS-XXX, if locus DYS-XXX is known to mutate very fast? Does that somehow makes that population somewhat older because they have a higher overall variance, what if a population-B doesn't have as many mutations in DYS-XXX, but has more than twice the number of mutations population-A has on a different locus DYS-XXY, which is known to mutate very slowly? Still when one looks at the overall variance population-A is going to have more variance than population-B, but  once the variance is broken down per locus, we find that population-B accumulated more variance in the slower marker than population-A. Of course there are at least two possible explanations for these phenomena:

1-) Population-B for some odd reason(environmental, positive selection, modal allele having more repetitions) actually accumulates mutations on DYS-XXY at a faster rate than population-A.

2-)Population-B accumulates mutations on DYS-XXY at the same rate as population-A, but it just so happens that on locus DYS-XXX population-B has experienced more back-mutations than population-A.

The use of variance in statistical analysis is pretty standard stuff.
Quote
variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution...

Real-world distributions ... not fully known, unlike the behavior of perfect dice or an ideal distribution such as the normal distribution, because it is impractical to account for every raindrop. Instead one estimates the mean and variance of the whole distribution as the computed mean and variance of a sample of n observations drawn suitably randomly from the whole sample space,
http://en.wikipedia.org/wiki/Variance


If I understand your concerns about variance, I think there is a counter-consideration - The Law of Large Numbers -
Quote
According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
http://en.wikipedia.org/wiki/Law_of_large_numbers

This is fairly intuitive but guys much smarter than I have figured this out long ago.

Ken Nordtvedt describes STR variance based calculations as "individual experiments", one per each STR. It is true that any one STR set of allele frequencies for a sample population may not be representative of the total population. However, the more STR "experiments" you run the more likely you are to receive and accurate result.  Most of the variance calculations I've displayed lately have been on 49 STRs. That is a pretty healthy set, particularly compared to academics performing analysis on only 10 or 15 STRs.

Having more STRs is a good thing.  So is having more haplotypes (a larger sample.)



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 04:50:36 PM
There is a concern that STR variance is not linear with time (number of generations.)
There is no variance when mutations have happened many times forwards and backwards, as it happened for a long time ago. This principle is worth only for a short lapse of time. I have said this to you many times in the past, and you are free to believe in what you like, but for very ancient times I'm afraid that your theories will come out wrong. Already the ADNA whose also JeanL spoke has demonstrated this.

Has anyone done any analysis on STRs that have short durations?  Maliclavelli, what do you consider a short lapse of time?

I am aware that Busby et al did an analysis of 15 or 20 STRs.   Marko Heinila has evaluated all 67 of FTDNA's 67 STR marker set across tens of thousands of haplotypes, in an effort to determine linear duration.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 05:02:48 PM
Some people have concerns that back mutations are "hidden" and therefore cause an error.
There is no variance when mutations have happened many times forwards and backwards...  

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, and true bottlenecks are quite rare (since total extinction is the usual outcome of a steep decline). However, variable fecundity, whether systematic or random, introduces an unknown distortion into any statistical method based solely on the sampling of the current population. In other words, the "coalescence time" is necessarily a biased estimate of the TMRCA -- the bias direction is known, but the
amount is not.
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

Chandler addressed that back mutations are accounted for, but I quoted the whole answer because Chandler alludes to "coalescence time."  Most intraclade TMRCA estimates we see are estimated from the "coalescence time." The true TMRCA is unknowable and all of these methods just estimated their original time of expansion.

M222 might be a good example. It's TMRCA's are generally youthful, less than 2000 ypb, but this is really the coalescence time. M222 has a distinctive haplotype and interclade TMRCA estimates with other DF23* subclades indicate M222's lineage broke away from DF23* long ago, may be 4000 ybp. M222 could have been "born" anywhere from 4000 to 1500 years ago and there is no way of knowing for sure where in that time period.  We just know that fairly recently M222 began expanding in earnest.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 12, 2012, 05:51:57 PM

Ken Nordtvedt describes STR variance based calculations as "individual experiments", one per each STR. It is true that any one STR set of allele frequencies for a sample population may not be representative of the total population. However, the more STR "experiments" you run the more likely you are to receive and accurate result.  Most of the variance calculations I've displayed lately have been on 49 STRs. That is a pretty healthy set, particularly compared to academics performing analysis on only 10 or 15 STRs.

Having more STRs is a good thing.  So is having more haplotypes (a larger sample.)


While I agree that the more loci one analyzes the more accurate the results would be, my main concern is that when one mixes slow and fast mutating loci, one is undermining the relative variance on each loci. I can tell you that a set of 49 STRs where some STRs are three and four order of magnitude slower than others isn’t going to yield results more accurate than a set of 10 or 15 STRs where all STRs mutate with a very similar mutation rate. I would say that when it comes to STRs quantity matters, but quality matters more. Choose STRs that have similar (i.e. they are not two orders of magnitude apart) mutation rates and calculate the variance using those, and you should be ok.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Skip McDonald on April 12, 2012, 06:23:39 PM
I agree that there are orders of magnitude differences in the rate that STR markers mutate, but they also stay in predictable ranges.   That indicates that there are different rates for the same STR depending on its value.   The chance of a 16 changing to a 17 is probably not the same rate as a 17 moving to an 18 or back to a 16.


Our rates are averages, over a large population and ranges of values and should be applied to the Macro questions of comparing different large populations against each other.   If you try to apply it to a Micro sized question the answers you get will be absolutely wrong, but not entirely useless.   What we get is an educated guess, but a guess none the less.

Testing more and more markers is absolutely the best way to help improve these guesses.   Excluding fast moving markers is probably a bad idea as there is useful information there and it should improve your guesses in a large population.  

More sophisticated models are called for perhaps one day we will have better mutation rates based on STR and the STR value.   But to do that we need more people to test and to test more STRs.  

The other "Elephant in the room" from a statistical standpoint is that the population we have is NOT a random sample,  often large clusters of close/distant kin get tested.   Age estimates that ignore known kinship of participants may skew the results.   Likewise people with virtually identical haplotypes that can document they have no common ancestor for 6 or 7 generations don't have those years added to estimates either.   Many researchers assume they are the same and may even throw out the duplicates.

The bottom line is that Statistics isn't perfect but is one of the best tools we have.

My 2 cents..

Skip


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 12, 2012, 06:33:32 PM
Ken Nordtvedt describes STR variance based calculations as "individual experiments", one per each STR. It is true that any one STR set of allele frequencies for a sample population may not be representative of the total population. However, the more STR "experiments" you run the more likely you are to receive and accurate result.  Most of the variance calculations I've displayed lately have been on 49 STRs. That is a pretty healthy set, particularly compared to academics performing analysis on only 10 or 15 STRs.

Having more STRs is a good thing.  So is having more haplotypes (a larger sample.)

While I agree that the more loci one analyzes the more accurate the results would be, my main concern is that when one mixes slow and fast mutating loci, one is undermining the relative variance on each loci. I can tell you that a set of 49 STRs where some STRs are three and four order of magnitude slower than others isn’t going to yield results more accurate than a set of 10 or 15 STRs where all STRs mutate with a very similar mutation rate. I would say that when it comes to STRs quantity matters, but quality matters more....

Ken Nordtvedt runs simulations on different methodologies he evaluates.  He does concur there is a potential saturation effect with faster STRs but in his simulations he says the positives of removing some of those STRs are outweighed by the the negatives of cutting out STRs, and cutting out fast STRs definitely reduces precision. It's a question of using a watch to measure hours versus using a calendar.

An M222 hobbyist/researcher, Sandy Paterson, has done simulations on the number of STRs to use and he comes up with 50.  This is partially why I'm using the 49 non-multi-copy/non-null STRs of FTDNA's first 67.
http://archiver.rootsweb.ancestry.com/th/read/dna-r1b1c7/2012-03/1332498888

Quote from: JeanL l
Choose STRs that have similar (i.e. they are not two orders of magnitude apart) mutation rates and calculate the variance using those, and you should be ok.

Do you have any papers or research that demonstrates this is effective?

I don't know where and how to draw the line based on statistics and I don't run true simulations, but I've made some comparison runs to see if I could "eyeball" any distinctions.

I've run through multiple comparisons, some of which you can probably find on this forum, of selected STR sets based on Marko Heinila's analysis of the linearity of STRs.  Generally, there is not much difference in the relative positioning of variance between haplogroups between using 49 mixed speed markers or Marko's 36 "best" linear markers (out of the first 67.) There is one exception - U198.

I've also tried to weight each STR against its maximum variance so that no STR would have more weight than another. That didn't work out so well. I received some crazy results. I think it goes back to using a calendar to measure hours and every now then even the slowest STRs have fairly quick successive mutations. It's like the calendar page turned on that STR when I'm only trying to measure 10 or 12 hours worth of time.

Ultimately, the Law of Large Numbers can average or "wash" out aberrations. Nothing is perfect, though.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: rms2 on April 12, 2012, 07:49:21 PM
I'm certainly no expert on the math of variance calculations, but I do think they must be considered within the context of other evidence, like history, the distribution of a y haplogroup, its known ethnolinguistic affiliations, etc. I think the SNP trail, where one can be established, is probably more important than variance.

Variance cannot be the sole consideration in trying to determine where a particular y haplogroup originated or where it was at a particular time. It can only establish an upper bound on the age of a haplogroup in a place, for one thing, barring something odd like a bottleneck or genetic drift, both of which are nearly impossible to prove outside of known historical incidents (like a plague, for example).

A haplogroup could have a fairly high variance in a place and yet be a relatively late arrival there.

Witness Mike's recent North American U106 variance calculation. It was fairly high, even relative to places in Europe. If it were the sole or even paramount consideration, someone might be tempted to conclude that u106 has been in North America for millennia.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 12, 2012, 08:54:16 PM
Do you have any papers or research that demonstrates this is effective?
While there aren’t any papers that address the issue directly, the Busby et al(2011) publication talked about an appreciable effect of microsatellite choice on age estimates.
Quote from: Busby et al(2011)
We further investigate the young, STR-based time to the most recent common ancestor estimates proposed so far for R-M269-related lineages and find evidence for an appreciable effect of microsatellite choice on age estimates.

Ultimately, the Law of Large Numbers can average or "wash" out aberrations. Nothing is perfect, though.
Yes, and no. The law of large number would average or “wash” out aberrations when this aberrations are nothing but outliers, but when you have 49 markers, and half of them are slow, and the other half are fast, then the law of large number  doesn’t do squat for us.  Now if in your sample you have 44 STRs which have a somewhat similar mutation rate, and 5 that have a different(slower or faster) one, then yeah chances are any effects would be “wash” out.




-----------------------------------------------------------------------------------------------------------
I agree that there are orders of magnitude differences in the rate that STR markers mutate, but they also stay in predictable ranges.   That indicates that there are different rates for the same STR depending on its value.   The chance of a 16 changing to a 17 is probably not the same rate as a 17 moving to an 18 or back to a 16.

In fact, what happens is that the mutation rate increases as the total length of the repetitions increases. So a mutation from 17 to 18 is more likely than a mutation from 16 to 17. So as we try to estimate the time that it took population A to mutate from ancestral allele 13 to allele 16, one needs to take into account that the mutation rate was slower from 13 to 14, than from 14 to 15, than from 15 to 16. Now imagine folks that simply use a mean mutation rate or rather a constant mutation rate. If the change in mutation rate was linear then sure, one could use an average mutation rate, but the thing is that it  isn’t linear.

Testing more and more markers is absolutely the best way to help improve these guesses.   Excluding fast moving markers is probably a bad idea as there is useful information there and it should improve your guesses in a large population.   

More sophisticated models are called for perhaps one day we will have better mutation rates based on STR and the STR value.   But to do that we need more people to test and to test more STRs.

Couldn’t agree more on that. 

The other "Elephant in the room" from a statistical standpoint is that the population we have is NOT a random sample,  often large clusters of close/distant kin get tested.   Age estimates that ignore known kinship of participants may skew the results.   

Indeed a lot of samples used here come from FTDNA Projects, so unfortunately there is a lack of randomness which is vital for statistical analyses. I tried to use only samples from published studies, but even those often times offer too little resolution.(i.e. They only test a limited number of STRs, or give too basic resolution into the SNP levels). Hopefully this situation will change in the near future. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 08:12:16 AM
I'm certainly no expert on the math of variance calculations, but I do think they must be considered within the context of other evidence, like history, the distribution of a y haplogroup, its known ethnolinguistic affiliations, etc. I think the SNP trail, where one can be established, is probably more important than variance.
I agree, although I'd add a couple of other disciplines to the list to consider in context, including archeology, geography, terrain, climate and what's known about prehistory. I'm sure you would agree.

I think that looking at variance and the SNP trail go hand in glove as we try to look at deeper resolution as new SNPs are discovered.

Variance cannot be the sole consideration in trying to determine where a particular y haplogroup originated or where it was at a particular time. It can only establish an upper bound on the age of a haplogroup in a place, for one thing, barring something odd like a bottleneck or genetic drift, both of which are nearly impossible to prove outside of known historical incidents (like a plague, for example).

A haplogroup could have a fairly high variance in a place and yet be a relatively late arrival there.....
I agree, although I'd add a couple of other disciplines to the list to consider in context, including archeology, geography, terrain, climate and what's known about prehistory. I'm sure you would agree.

This is one of the difficulties about looking at variance by geography that is not inherent in looking at variance by haplogroup.  We know the haplogroup is all related people but within a geography there are really probably a mix of sub-haplogroups, some of which came in at different times.

Variance can be high in a geography, but how does one tell whether that pooling point or crossroads from a launch/origin point?

Nevertheless, if variance is low for a haplogroup in a location, it is young there.




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 08:21:43 AM
Ultimately, the Law of Large Numbers can average or "wash" out aberrations. Nothing is perfect, though.
Yes, and no. The law of large number would average or “wash” out aberrations when this aberrations are nothing but outliers, but when you have 49 markers, and half of them are slow, and the other half are fast, then the law of large number  doesn’t do squat for us.  Now if in your sample you have 44 STRs which have a somewhat similar mutation rate, and 5 that have a different(slower or faster) one, then yeah chances are any effects would be “wash” out.  
That is not true, the Law of Large Numbers is still applicable. You may argue that 49 markers is not enough. That if fine, but Sandy Paterson (the M222 researcher) has done simulations and determined that 50 was enough for reasonable precision.

As I said, Ken Nordtvedt has also run simulations on this and concludes you want a mix of slow and fast markers. You don't want to discard the fast markers unless you have to, like is done with the multi-copy markers.

Most of the scientific genetic research available today is based on 15, 10 or less markers.  You should write a counter-argument paper to tell them they are all wrong.

Even Busby, who you cite, uses STR diversity on only 10 markers to justify their primary point against Balaresque, that there are no clines across Europe for R1b-L11/S127.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 08:31:40 AM
Choose STRs that have similar (i.e. they are not two orders of magnitude apart) mutation rates and calculate the variance using those, and you should be ok.
Do you have any papers or research that demonstrates this is effective?
While there aren’t any papers that address the issue directly, the Busby et al(2011) publication talked about an appreciable effect of microsatellite choice on age estimates.

The Busby discussion was based on picking microsatellites (STR markers) that had long linear durations. They did not try to pick STRs based based on whether they had similar mutation rates or not.

Perhaps Busby's real problem was not considering enough STRs and taking advantage of more "individual experiments." I can see that the fewer the STRs you have, the more critical it becomes to pick ones that are representative and linear withing your target population.  The problem is how do you really know you are picking the right ones or you might really just be cherry picking the data.

BTW, Busby left a gaping logic hole in the STRs they used for their analysis.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 13, 2012, 08:43:23 AM
That is not true, the Law of Large Numbers is still applicable. You may argue that 49 markers is not enough. That if fine, but Sandy Paterson (the M222 researcher) has done simulations and determined that 50 was enough for reasonable precision.

As I said, Ken Nordtvedt has also run simulations on this and concludes you want a mix of slow and fast markers. You don't want to discard the fast markers unless you have to, like is done with the multi-copy markers.

Most of the scientific genetic research available today is based on 15, 10 or less markers.  You should write a counter-argument paper to tell them they are all wrong.

Even Busby, who you cite, uses STR diversity on only 10 markers to justify their primary point against Balaresque, that there are no clines across Europe for R1b-L11/S127.

Well if you think the law of large number still applies, check the variance on a set of slow markers, then on a set of fast markers and then on the combined set of the two.  See if your combined variance falls anywhere within the standard deviation of the mean variance of any of the other two variances. Again, I’m not talking about a set of 49 STRs where 44 STRs have similar mutation rates, I’m talking about a set of 49 STR where one has about 50% of them being slow markers, 50% being fast markers.

As for Dr.Nordtvedt, yeah one could mix fast and slow markers if one presumes that the TMRCA on the set is fairly recent, and that any mutation coming from the slow markers is either 0, or simply just one mutation, as time frame isn’t long enough for any of the very slow ones to have backmutated.

I don’t think scientists are wrong in using only 10-15 STRs, it would be preferable to use larger numbers, but often times budget constrains lead us to choose the most cost effective option.

You are right Busby used 10-15 STRs, but his team also showed that the TMRCA varied a lot when he changed the choice of STRs, see Figure S4 in his study.  

The Busby discussion was based on picking microsatellites (STR markers) that had long linear durations. They did not try to pick STRs based based on whether they had similar mutation rates or not.

You wanna take a guess which STRs have the longer linearity: fast or slow mutating ones? See figure-1 in the Busby study to get your answer.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 11:14:31 AM
...
The Busby discussion was based on picking microsatellites (STR markers) that had long linear durations. They did not try to pick STRs based based on whether they had similar mutation rates or not.

You wanna take a guess which STRs have the longer linearity: fast or slow mutating ones? See figure-1 in the Busby study to get your answer.

My point is still correct. Busby et al sought, as they should, to find linear correlation with time. Yes, that generally means slower markers rather than faster, but that is NOT the criteria and is not the 100% rule.

Anyway, Busby's analysis was only half-hearted. Marko Heinila's is much more thorough.  The real finding is that STR markers with high absolute allele values (i.e. 30, 31, etc.) are the ones that are saturated and have linearity concerns.

This is the study that shows that.  "Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit" by Jarve et al, 2009. Marko Heinila's work also reflects this but it is not a 100% rule either.

Marko's perspective is that the mutation rate actually increases as a marker reaches the high end number of repeat units, but that back-mutations increase dramatically so they appear to "saturate."

It is not the mutation rate that is the issue although any marker with a high mutation rate could easily end up in the high end of the allele range.

Still, Ken Nordtvedt has run this through simulations and his statistical outcomes show that the loss of linearity is not worth the lost precision from including faster markers.  Again, he does agree that multi-copy markers should be removed in such calculations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 01:12:28 PM
... I don’t think scientists are wrong in using only 10-15 STRs, it would be preferable to use larger numbers, but often times budget constrains lead us to choose the most cost effective option.

You are right Busby used 10-15 STRs, but his team also showed that the TMRCA varied a lot when he changed the choice of STRs, see Figure S4 in his study.  

The Busby discussion was based on picking microsatellites (STR markers) that had long linear durations. They did not try to pick STRs based based on whether they had similar mutation rates or not.

You wanna take a guess which STRs have the longer linearity: fast or slow mutating ones? See figure-1 in the Busby study to get your answer.

I've looked at the Busby data in detail and compared it with Marko Heinila's analysis and the study I just cited.

Scientists are not arbitrarily wrong if their budget is limited and they use only 10-15 STRs, but they are dramatically decreasing their precision and dramatically increasing their risk of having a wrong conclusion.

Let's look at one of Busby's illogical application.

"The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269" by Busby et al, 2010.

The 2nd column is the ybp.  I added the "xxx"s to show you which markers Busby used in their R1b-L11/S127 STR diversity calculations.

Quote from: Busby
Fifteen Y-STRs with mutation rates, range of alleles and estimate of duration of linearity. All STRs investigated in this study are shown with their mutation rates (μ), estimated from Ballantyne et al, and range of observed observed alleles, R, with 95% CI is taken from the YHRD. θ(R)/2μ is an estimate of the duration of linearity.
(from Table 1)

Y-STR____ θ(R)/2μ
DYS448___ 25381   
DYS392___ 19244 XXX
DYS438___ 12465 XXX
DYS390___ 9211 XXX
DYS393___ 5648 XXX<<<
DYS439___ 4861 XXX<<<
DYS437___ 4357 XXX<<<
DYS635___ 4221   
DYS456___ 3289   
DYS389II_ 3111 XXX<<<
DYS391___ 2554 XXX<<<
DYS458___ 1944   
DYS19____ 1888 XXX<<<
Y-GATA-H4_ 1630   
DYS389I___ 953 XXX<<<  

Please note that Busby's key conclusion that is a counter-argument to Barlaresque's R1b Neolithic argument is based on the STR diversity of R1b-L11/S127.
Quote from: Busby
(Abstract)
Our analysis reveals no
geographical trends in diversity, in contradiction to expectation under the Neolithic hypothesis...
(Conclusions)
Alternatively,if R-S127 originated prior to the Neolithic wave of expansion, then either it was already present in most of Europe before the expansion, or the mutation occurred in the east, and was spread before or after the expansion, in which case we would expect higher diversity in the east closer to the origins of agriculture, which is not what we observe.

Notice how that the Neolithic revolution started some 10k ybp and was spreading across Europe about 7k ybp.
Quote from: Busby
(Introduction)
Following the development of agriculture in the Fertile Crescent some 10000 years ago, this technology spread from the Near East westward into Europe...

Go back up and look at Busby's Table 1. Only three of the ten STRs they used to draw their conclusion were based on STRs with enough linear duration according to their own evaluation! I put "<<<"'s next to the STRs with linear durity less than 7k ybp.

I've asked this on this forum, on DNA-forums and on Rootsweb. Isn't this a major flaw in their logic? Their own analysis argues against the validity of their primary conclusion, which was to argue against Balaresque.  No one has yet to respond as to the logic.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 01:18:35 PM
... The other "Elephant in the room" from a statistical standpoint is that the population we have is NOT a random sample,  often large clusters of close/distant kin get tested.   ...
Skip, I agree. I think a scientifically designed cross-sectionally designed, random sampling of Europe and Western Asia, including the Near East is needed. It should be based on long haplotypes and high resolution deep clade testing. We don't have that anywhere that I can see. Hence, we are speculating.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 01:26:20 PM
Ken Nordtvedt describes STR variance based calculations as "individual experiments", one per each STR. It is true that any one STR set of allele frequencies for a sample population may not be representative of the total population. However, the more STR "experiments" you run the more likely you are to receive and accurate result.  Most of the variance calculations I've displayed lately have been on 49 STRs. That is a pretty healthy set, particularly compared to academics performing analysis on only 10 or 15 STRs.

Having more STRs is a good thing.  So is having more haplotypes (a larger sample.)

While I agree that the more loci one analyzes the more accurate the results would be, my main concern is that when one mixes slow and fast mutating loci, one is undermining the relative variance on each loci. I can tell you that a set of 49 STRs where some STRs are three and four order of magnitude slower than others isn’t going to yield results more accurate than a set of 10 or 15 STRs where all STRs mutate with a very similar mutation rate. I would say that when it comes to STRs quantity matters, but quality matters more....

Ken Nordtvedt runs simulations on different methodologies he evaluates.  He does concur there is a potential saturation effect with faster STRs but in his simulations he says the positives of removing some of those STRs are outweighed by the the negatives of cutting out STRs, and cutting out fast STRs definitely reduces precision. It's a question of using a watch to measure hours versus using a calendar.

An M222 hobbyist/researcher, Sandy Paterson, has done simulations on the number of STRs to use and he comes up with 50.  This is partially why I'm using the 49 non-multi-copy/non-null STRs of FTDNA's first 67.
http://archiver.rootsweb.ancestry.com/th/read/dna-r1b1c7/2012-03/1332498888

Quote from: JeanL l
Choose STRs that have similar (i.e. they are not two orders of magnitude apart) mutation rates and calculate the variance using those, and you should be ok.

Do you have any papers or research that demonstrates this is effective?

I don't know where and how to draw the line based on statistics and I don't run true simulations, but I've made some comparison runs to see if I could "eyeball" any distinctions.

I've run through multiple comparisons, some of which you can probably find on this forum, of selected STR sets based on Marko Heinila's analysis of the linearity of STRs.  Generally, there is not much difference in the relative positioning of variance between haplogroups between using 49 mixed speed markers or Marko's 36 "best" linear markers (out of the first 67.) There is one exception - U198.

I've also tried to weight each STR against its maximum variance so that no STR would have more weight than another. That didn't work out so well. I received some crazy results. I think it goes back to using a calendar to measure hours and every now then even the slowest STRs have fairly quick successive mutations. It's like the calendar page turned on that STR when I'm only trying to measure 10 or 12 hours worth of time.

Ultimately, the Law of Large Numbers can average or "wash" out aberrations. Nothing is perfect, though.

There are those that disagree with some of Vincent Vizachero's arguments. He has been a long term project admin for the R1b ht35 project and has a very large database for R1b. My position is that he is very credible on R1b, just like I consider Ken Nordtvedt very credible on TMRCAs, John Chandler on mutation rates and Marko Heinila on STR linear durations and TMRCAs.

Quote from: Vincent Vizachero
For young haplogroups (e.g. within R-M269) that random component in the GD matrix swamps the true phylogenetic signal with such short (e.g. 67 marker) haplotypes such that the relationship between the haplotypes proposed by the algorithms is almost entirely phantom.

For old haplogroups (e.g. more than 25 ky old) the problem of non-linear accumulation of GD due to marker saturation becomes the dominant problem. Creating trees from STRs in this timeframe is typically not necessary, thankfully, now that our SNP-based trees are so much more complete than they were several years ago.
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334313410

Linear duration of STRs is an issue (I don't deny that,) but you can see that Vizachero considers this to be the issue with haplogroups that are old (of 25k ybp age) and he does not consider R-M269 in that category.  

This is in general agreement with Ken Nordtvedt's simulations although Ken never gives a time break when STR linear duration causes a negative (accuracy risk) return for shorter durations STRs.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 13, 2012, 02:07:26 PM
Maybe I've missed it, but has there ever been a case where a population with the highest frequency (percentage and/or numbers) has had a greater diversity than places with a lower frequency (percentage and /or numbers)?

And, if an SNP DF952b+ (just for fun) was found in Ireland among 95% of the L21+ men and was found in Germany  and France among 20 L21+ men total, would the origin of the SNP be Continental if the diversity was higher among the 20 German men? I would say the answer would be yes among most people on these forums.

I will also say again that as soon as one L226+ is found on the Continent then L226 becomes Continental. Actually, I'd say that L226+ is already thought of as Continental, but nothing on the Continent has been found yet. :)

The big question is why this is accepted theory.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 02:21:23 PM
Maybe I've missed it, but has there ever been a case where a population with the highest frequency (percentage and/or numbers) has had a greater diversity than places with a lower frequency (percentage and /or numbers)?....
I don't know.

I don't think even those two numbers, in context of each other, are enough to declare an origination point.  The archaeology, the cultures, linguistics, terrain, etc. all must be considered.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: NealtheRed on April 13, 2012, 02:23:49 PM
Maybe I've missed it, but has there ever been a case where a population with the highest frequency (percentage and/or numbers) has had a greater diversity than places with a lower frequency (percentage and /or numbers)?

And, if an SNP DF952b+ (just for fun) was found in Ireland among 95% of the L21+ men and was found in Germany  and France among 20 L21+ men total, would the origin of the SNP be Continental if the diversity was higher among the 20 German men? I would say the answer would be yes among most people on these forums.

I will also say again that as soon as one L226+ is found on the Continent then L226 becomes Continental. Actually, I'd say that L226+ is already thought of as Continental, but nothing on the Continent has been found yet. :)

The big question is why this is accepted theory.

I would bet that L226 arose in the Isles. I don't think it has been found on the Continent, but I may be wrong.

I would say the likelihood is stronger that Z253, L226's father, has a Continental origin.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 13, 2012, 02:48:13 PM
If the place with the highest diversity is never in the place of highest frequency, then something is wrong.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on April 13, 2012, 04:03:59 PM
If the place with the highest diversity is never in the place of highest frequency, then something is wrong.

It will depend on the pattern of movement. Where a mutation occurs in a fairly stable population, we would expect it to very gradually spread out in all directions from the point of origin. That leaves a pattern with a high frequency centre which should also be high in diversity. R1b-U152 (http://www.u152.org/) looks roughly like that.

Where a mutation occurs at the spearhead of a migration, you can expect to see the highest density at the point where the migration hits a barrier such as an ocean and is forced to stop. See clines and waves (http://www.buildinghistory.org/distantpast/geneticdebate.shtml#clines).

In reality of course the nice neat patterns created by one kind of movement are likely to be messed up later by another movement. So we can't expect everything to look exactly like a computer model.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 04:57:23 PM
Maybe I've missed it, but has there ever been a case where a population with the highest frequency (percentage and/or numbers) has had a greater diversity than places with a lower frequency (percentage and /or numbers)?....
I don't know.

I don't think even those two numbers, in context of each other, are enough to declare an origination point.  The archaeology, the cultures, linguistics, terrain, etc. all must be considered.


I don't know the answer to your question, but I would be surprised if there weren't some situations, perhaps many, that highest diversity and highest frequency correspond.

However, I don't get the point in looking for that just for the sake of looking for that. We have many challenges and difficulties with all of this data, which is being well discussed. Why going looking for hypothetical situations?

If the place with the highest diversity is never in the place of highest frequency, then something is wrong.

What's the point you are trying to make?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 13, 2012, 05:11:18 PM
The point I'm trying to make is that I have never even see a spot that has a higher diversity AND a higher frequency, even within L21+ or a subclade of L21+.

For isntance, a made up example: Lx+ is highest in Ireland #1, Scotland #2, Wales #3, England #4, France #5, Germany #6, but the diversity of Scotland #2 is higher than the diversity in Germany#6.

It seems that "low frequency equals high diversity" and "high frequency equals low diversity". If this is the case, then either we have an amazing mathematical coincidence or something is wrong.

Does it mean that a SNP originates in an area then the descendants ALWAYS go to a new area and prosper. Leaving the highest diversity in the lowest area of frequency.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 05:27:20 PM
The point I'm trying to make is that I have never even see a spot that has a higher diversity AND a higher frequency, even within L21+ or a subclade of L21+.

For isntance, a made up example: Lx+ is highest in Ireland #1, Scotland #2, Wales #3, England #4, France #5, Germany #6, but the diversity of Scotland #2 is higher than the diversity in Germany#6.

It seems that "low frequency equals high diversity" and "high frequency equals low diversity". If this is the case, then either we have an amazing mathematical coincidence or something is wrong.

Does it mean that a SNP originates in an area then the descendants ALWAYS go to a new area and prosper. Leaving the highest diversity in the lowest area of frequency.
I think, as has been pointed out, U152 may be a good example.  

L226 could be a case under L21.  I was about to say M222, but I actually get M222 with higher diversity in England... however, the STR variance differences on M222 are too close to say anything conclusively.

L21's history/prehistory may be different than U152.  Perhaps that is the story.

You should probably read the two studies on "Surfing the Wave". I think Klopfstein was the primary author.  I've pointed to them before.  No one is saying that concept always applies, just that it may apply in some situations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Skip McDonald on April 13, 2012, 05:29:26 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?
I can not locate one right now but I do recall the hypothesis in at least one paper.   The only one I can show you is that there is a difference between closely related species.

http://www.genetics.org/content/168/1/383.full.pdf+html

chimps vs. people may be too many generations apart, and this study was not Y DNA specific but it does support the concept.  

Does someone have a link to a DNA mutation rates for "family groups" paper?  I believe that was the topic, my google skill has failed me so it may have been an offline paper.

Skip



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 13, 2012, 05:41:01 PM
It just seems that a person could say that with L21 and its subclades, "Higher Frequency" will equal "Lower Diversity" and that person would be correct.

When I see someone bring up "Diversity" with L21 and/or its Subclades I think, "Why read it?" I know that the place of Highest Frequency has no chance of being the place of Highest Diversity. Which means that in EVERY case the descendants of the first man with a mutated SNP ALWAYS left the area of origin and prospered elsewhere. As a matter of fact, wherever they prospered the most, that place will be the place of Lowest Diversity.

So, when there is one L226 found on the Continent, or even in Britain L226's origin will move out of Ireland because Ireland, as the place of Highest Frequency, cannot be the place of Highest Diversity. I say this with 100% certainty.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 13, 2012, 05:45:49 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?
I can not locate one right now but I do recall the hypothesis in at least one paper.   The only one I can show you is that there is a difference between closely related species.

http://www.genetics.org/content/168/1/383.full.pdf+html

chimps vs. people may be too many generations apart, and this study was not Y DNA specific but it does support the concept.   ...
The reason I ask is that is brought up as an objection sometimes, but I've never really seen any strong case to say true mutation rates are different by haplogroups.

When I asked this on Rootsweb or in the past with Vizachero ... BTW Klyosov is a bio-chemist so he should understand...  I consistently get the answer from the scientist/hobbyists that the same mutation rates apply across all haplogroups.  We are all homo sapiens sapiens and are much more alike than different.

I don't know of a study of Y DNA where they use different mutation rates by haplogroup.

It is definitely true that you can observe different mutation rates in different haplogroups but the response to that is ...   it's like flipping a coin, just because it came up heads three times in a row doesn't it will be heads the fourth time. What matters is the expected mutation rate and that doesn't change.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 13, 2012, 06:27:03 PM

I've looked at the Busby data in detail and compared it with Marko Heinila's analysis and the study I just cited.

Scientists are not arbitrarily wrong if their budget is limited and they use only 10-15 STRs, but they are dramatically decreasing their precision and dramatically increasing their risk of having a wrong conclusion.

Let's look at one of Busby's illogical application.

"The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269" by Busby et al, 2010.

The 2nd column is the ybp.  I added the "xxx"s to show you which markers Busby used in their R1b-L11/S127 STR diversity calculations.

Quote from: Busby
Fifteen Y-STRs with mutation rates, range of alleles and estimate of duration of linearity. All STRs investigated in this study are shown with their mutation rates (μ), estimated from Ballantyne et al, and range of observed observed alleles, R, with 95% CI is taken from the YHRD. θ(R)/2μ is an estimate of the duration of linearity.
(from Table 1)

Y-STR____ θ(R)/2μ
DYS448___ 25381   
DYS392___ 19244 XXX
DYS438___ 12465 XXX
DYS390___ 9211 XXX
DYS393___ 5648 XXX<<<
DYS439___ 4861 XXX<<<
DYS437___ 4357 XXX<<<
DYS635___ 4221   
DYS456___ 3289   
DYS389II_ 3111 XXX<<<
DYS391___ 2554 XXX<<<
DYS458___ 1944   
DYS19____ 1888 XXX<<<
Y-GATA-H4_ 1630   
DYS389I___ 953 XXX<<< 

Please note that Busby's key conclusion that is a counter-argument to Barlaresque's R1b Neolithic argument is based on the STR diversity of R1b-L11/S127.
Quote from: Busby
(Abstract)
Our analysis reveals no
geographical trends in diversity, in contradiction to expectation under the Neolithic hypothesis...
(Conclusions)
Alternatively,if R-S127 originated prior to the Neolithic wave of expansion, then either it was already present in most of Europe before the expansion, or the mutation occurred in the east, and was spread before or after the expansion, in which case we would expect higher diversity in the east closer to the origins of agriculture, which is not what we observe.

Notice how that the Neolithic revolution started some 10k ybp and was spreading across Europe about 7k ybp.
Quote from: Busby
(Introduction)
Following the development of agriculture in the Fertile Crescent some 10000 years ago, this technology spread from the Near East westward into Europe...

Go back up and look at Busby's Table 1. Only three of the ten STRs they used to draw their conclusion were based on STRs with enough linear duration according to their own evaluation! I put "<<<"'s next to the STRs with linear durity less than 7k ybp.

I've asked this on this forum, on DNA-forums and on Rootsweb. Isn't this a major flaw in their logic? Their own analysis argues against the validity of their primary conclusion, which was to argue against Balaresque.  No one has yet to respond as to the logic.

Their primary conclusion:

Quote from: Busby et al(2011)
We further investigate the young, STR-based time to the most recent common ancestor estimates proposed so far for R-M269-related lineages and find evidence for an appreciable effect of microsatellite choice on age estimates.


It is still well supported by Table-S4 and Figure-4, therefore there isn’t any flaw on their logic pertaining that primary finding. As for them using a mixed set of STRs to counter Balaresque’s argument, well that just goes to show that with the addition of some new populations sampled in this study + the Myres et al. set; the whole East-West quasi-gradient that the Balaresque et al. thought existed disappears. Yes, it is wrong for them to use a mixed set, but what they did was just to show that even under the assumption that STRs linearity does not affect the age estimates there isn’t any perceivable variance distribution in Europe that shows a Neolithic Eastern expansion. Moreover, they even went as far as to use Balaresque’s own set with a different Irish sample and showed that there apparent east-west gradient disappeared, which can be seen on Figure-S2. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 13, 2012, 06:44:50 PM
Linear duration of STRs is an issue (I don't deny that,) but you can see that Vizachero considers this to be the issue with haplogroups that are old (of 25k ybp age) and he does not consider R-M269 in that category. 

This is in general agreement with Ken Nordtvedt's simulations although Ken never gives a time break when STR linear duration causes a negative (accuracy risk) return for shorter durations STRs.

So basically because this Vizachero person has a pre-conceived notion that R-M269 has to be less than 25 kbp, then there is no need to worry about loss of linearity on the STRs mutations?  Is this some sort of appeal to authority, because a couple of paragraphs above you mentioned how much you trusted him and all that? This is exactly the same problem I have with Klyosov’s calculation and data, because he thought for some reason that R1b was less than 6000 ybp in Iberia, then he could use whichever STRs he wanted without worrying about loss of linearity, and surprise, surprise, he got a TMRCA that was less than 6000 ybp. As someone who has done quite some good amount of projects and research I can tell you that it is a very rare instance when one gets something so close to what to what one predicts. Not saying that it always has to be the rule, optimally one would get something close to the predictions. But the thing is that when the data is manipulated to yield the desired results then the research losses all its value.  Look, what I’m saying basically, is that maybe R1b-M269 is older than 25 ybp, maybe it isn’t, but assuming that it isn’t is not a well thought motive to not take into account the effects of loss of linearity. Which I assure you, take place in fast mutating STRs way sooner than in a 25 kya period. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 13, 2012, 07:48:44 PM
It just seems that a person could say that with L21 and its subclades, "Higher Frequency" will equal "Lower Diversity" and that person would be correct.

When I see someone bring up "Diversity" with L21 and/or its Subclades I think, "Why read it?" I know that the place of Highest Frequency has no chance of being the place of Highest Diversity. Which means that in EVERY case the descendants of the first man with a mutated SNP ALWAYS left the area of origin and prospered elsewhere. As a matter of fact, wherever they prospered the most, that place will be the place of Lowest Diversity.

So, when there is one L226 found on the Continent, or even in Britain L226's origin will move out of Ireland because Ireland, as the place of Highest Frequency, cannot be the place of Highest Diversity. I say this with 100% certainty.

It's not so much of an snp leaving and proliferating elsewhere.  Older populations have more time for some lines to go extinct or daughter out, thus leaving more variance/less modality in the case of R1b among the living. All things being equal.
High variance and diversity can correlate with high frequency in places like cities in a relatively recent time frame for reasons outside of genetics.  

As for L21 and really all of L11+, I don't think we can narrow down any particular countries as being the origin anymore.  We will have to see how things shake out with all the new snps.  Then we can revisit the variance of haplotypes.  I think a better approach for a proposed  origin will depend on how the oldest upstream snps cluster.  A couple of years ago France was proposed as an origin for then monolithic L21, but now we have DF21, DF23, and others that are going to be more informative.  The Isles are going to be the origin for some, the continent for others.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: rms2 on April 13, 2012, 09:21:35 PM
The Isles will no doubt be the origin point for some subclades of L21 but not for L21 itself (probably not, anyway).

Although it is still possible that L21 could have originated in Ireland (I could be wrong, but I think that is what Miles is hinting at) or somewhere else in the British Isles, that is seeming less and less likely. Look at the recent L21 results among the Basques from the Begoña Martinez-Cruz et al study, for example. It doesn't seem at all likely that that could be the result of Irish input. Likewise, there is too much L21 in France for it to have been the result of Irish input. If one wants to argue for a prehistoric migration from Ireland to the continent, he should have some reason, some evidence, for believing such a thing occurred. Instead, the flow of newcomers, even in prehistoric times, has been into the Isles from the Continent and not really the other way around.

Variance does not absolutely exclude Ireland as the possible place of origin for L21, although Ireland's L21 variance is not the highest in Europe (that honor still belongs to France, I believe). But variance cannot be the sole consideration. There are a number of other factors that, when taken together with variance, militate against Ireland as the birthplace of L21.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 13, 2012, 09:40:04 PM
The Isles will no doubt be the origin point for some subclades of L21 but not for L21 itself (probably not, anyway).

Although it is still possible that L21 could have originated in Ireland (I could be wrong, but I think that is what Miles is hinting at) or somewhere else in the British Isles, that is seeming less and less likely. Look at the recent L21 results among the Basques from the Begoña Martinez-Cruz et al study, for example. It doesn't seem at all likely that that could be the result of Irish input. Likewise, there is too much L21 in France for it to have been the result of Irish input. If one wants to argue for a prehistoric migration from Ireland to the continent, he should have some reason, some evidence, for believing such a thing occurred. Instead, the flow of newcomers, even in prehistoric times, has been into the Isles from the Continent and not really the other way around.

Variance does not absolutely exclude Ireland as the possible place of origin for L21, although Ireland's L21 variance is not the highest in Europe (that honor still belongs to France, I believe). But variance cannot be the sole consideration. There are a number of other factors that, when taken together with variance, militate against Ireland as the birthplace of L21.


Out of 12 L21** members, 2 are Irish, 3 English, 1 Welsh, 1 German, 1 Belarussian, and 4 unknown who might be at least British Isles.  The origin could be British Isles, but not necessarily Ireland.  More testing needed, of course.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: rms2 on April 13, 2012, 10:02:06 PM
The Isles will no doubt be the origin point for some subclades of L21 but not for L21 itself (probably not, anyway).

Although it is still possible that L21 could have originated in Ireland (I could be wrong, but I think that is what Miles is hinting at) or somewhere else in the British Isles, that is seeming less and less likely. Look at the recent L21 results among the Basques from the Begoña Martinez-Cruz et al study, for example. It doesn't seem at all likely that that could be the result of Irish input. Likewise, there is too much L21 in France for it to have been the result of Irish input. If one wants to argue for a prehistoric migration from Ireland to the continent, he should have some reason, some evidence, for believing such a thing occurred. Instead, the flow of newcomers, even in prehistoric times, has been into the Isles from the Continent and not really the other way around.

Variance does not absolutely exclude Ireland as the possible place of origin for L21, although Ireland's L21 variance is not the highest in Europe (that honor still belongs to France, I believe). But variance cannot be the sole consideration. There are a number of other factors that, when taken together with variance, militate against Ireland as the birthplace of L21.


Out of 12 L21** members, 2 are Irish, 3 English, 1 Welsh, 1 German, 1 Belarussian, and 4 unknown who might be at least British Isles.  The origin could be British Isles, but not necessarily Ireland.  More testing needed, of course.

It could be, but those totals would seem to indicate the Continent rather than the Isles, given the fact that the British Isles are overwhelmingly better represented in commercial dna testing than anywhere on the Continent.

I think L21 originated on the Continent and got to the Isles as part of the Atlantic Bronze Age trade network, perhaps with Maritime Bell Beaker Folk. My vote goes to somewhere in the vicinity of Morbihan in Bretagne, which was a Maritime Beaker hub.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 13, 2012, 10:21:01 PM
Rich!!!    We go through this everytime and yet you always persist! I am ABSOLUTELY NOT hinting, implying, stating or otherwise that L21 originated in Ireland! This is YOUR easy way of not addressing the question! The question is about frequency and variance!

Miles Kehoe does NOT believe AND is NOT suggesting that L21 originated in  Ireland!!!! Will you EVER stop Rich? Where could a guy like you go to have this obsession looked at? :) Very, very odd.

AGAIN, doesn't it seem just a wee bit odd to any of you that High Frequency ALWAYS means Low Variance/Diversity. As soon as a High Frequency is found, then HEY! let's look for the smallest frequency, see if it has a High Variance/Diversity, and lo and behold we'll have our origin! As Mike H posted above, this no longer works with all of the subclades, yet there the pattern will persist.

And so, I state with 100% certainty that as soon as L226+ is found outside of Ireland, Ireland will be ruled out as the place of origin! One! One result with a non-Irish name in Britain will do it. Because at that point Ireland will become the one of two places that has the Highest Frequency and therefore it can't possibly have the Highest Variance/Diversity. IT HAS NEVER HAPPENED IN THE HISTORY OF L21 RESEARCH THAT THE HIGEST FREQUENCY HAS EVER BEEN CONSIDERED THE POINT OF ORIGIN ON ANY SNP. Why would L226+ be different?

And, of course, when L226+ is found on the Continent, then the Continent be the origin. If it had EVER happened the other way I wouldn't be posting this. I can feel you people thinking, "Yea, well of course Ireland won't be the origin as soon as one is found outside of Ireland. What's his point?"

I think that High Frequency equals Low Variance/Diversity is fishy. Now, go about twisting my words to say that I am saying that Ireland is the origin of L21. You know no other way.  :)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 13, 2012, 11:47:38 PM
Rich!!!    We go through this everytime and yet you always persist! I am ABSOLUTELY NOT hinting, implying, stating or otherwise that L21 originated in Ireland! This is YOUR easy way of not addressing the question! The question is about frequency and variance!

Miles Kehoe does NOT believe AND is NOT suggesting that L21 originated in  Ireland!!!! Will you EVER stop Rich? Where could a guy like you go to have this obsession looked at? :) Very, very odd.

AGAIN, doesn't it seem just a wee bit odd to any of you that High Frequency ALWAYS means Low Variance/Diversity. As soon as a High Frequency is found, then HEY! let's look for the smallest frequency, see if it has a High Variance/Diversity, and lo and behold we'll have our origin! As Mike H posted above, this no longer works with all of the subclades, yet there the pattern will persist.

And so, I state with 100% certainty that as soon as L226+ is found outside of Ireland, Ireland will be ruled out as the place of origin! One! One result with a non-Irish name in Britain will do it. Because at that point Ireland will become the one of two places that has the Highest Frequency and therefore it can't possibly have the Highest Variance/Diversity. IT HAS NEVER HAPPENED IN THE HISTORY OF L21 RESEARCH THAT THE HIGEST FREQUENCY HAS EVER BEEN CONSIDERED THE POINT OF ORIGIN ON ANY SNP. Why would L226+ be different?

And, of course, when L226+ is found on the Continent, then the Continent be the origin. If it had EVER happened the other way I wouldn't be posting this. I can feel you people thinking, "Yea, well of course Ireland won't be the origin as soon as one is found outside of Ireland. What's his point?"

I think that High Frequency equals Low Variance/Diversity is fishy. Now, go about twisting my words to say that I am saying that Ireland is the origin of L21. You know no other way.  :)

Miles,

As the number of haplotypes increases in a population there tends to be a saturation towards a modal haplotype which causes the low variance.  The behavior of str's and their relationship with snp's is uncertain.  The trail of L21 will unlikely be solved by variance anyway.  Most of them (snp's) seem to cluster by origin in less than a millenium's time frame.  Basically rapid expansion via maritime networks, possibly in the bronze age. 



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 14, 2012, 12:13:32 AM
Mike H,

     And, of course, the descendants of the first boy born with the mutated SNP could have stayed in the origin for one generation or 10. With the chance that a brother(s) or cousin(s) could have migrated alone or with male relations at anytime between or after. Some may have prospered, some may have died out. Somehow the logic of multiple possibilities seems to be missing when it all comes down to Frequency and Variance/Diversity.

     Luckily for all of us the onslaught of new SNPs has begun to cause some entrenched thinking to wobble.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 14, 2012, 12:28:40 AM
Mike H,

     And, of course, the descendants of the first boy born with the mutated SNP could have stayed in the origin for one generation or 10. With the chance that a brother(s) or cousin(s) could have migrated alone or with male relations at anytime between or after. Some may have prospered, some may have died out. Somehow the logic of multiple possibilities seems to be missing when it all comes down to Frequency and Variance/Diversity.

     Luckily for all of us the onslaught of new SNPs has begun to cause some entrenched thinking to wobble.

The logic is sound.  It's only a rule of thumb, there will always be exceptions.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: eochaidh on April 14, 2012, 01:05:19 AM
I think "rules of thumb" too easily become entrenched thinking.

When L21+ was first discovered we were still under an archaeological spell; if there wasn't a pot or ornament to connect two groups, then it didn't happen. Since then we have grown to embrace a vibrant Atlantic coast. Many of us were laughed at when we brought up the idea of such a thing.

Still, as many people embrace this vibrant, seafaring, Atlantic Coast, many are unable to picture a group of people leaving Ireland on a boat. As I've said before, Ireland is the Black Hole of genetics; what goes in never comes out. Once in Ireland, the seafarer loses all vibrancy, he and his descendants will never leave. Atlantic Bronze age travel means migtation to and from all coastal and island countries other than Ireland. Ireland is a one way street. That is why I say wirh 100% certainty that L226+ will lose any chance of having Ireland as its origin as soon as one result is found outside of Ireland.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Heber on April 14, 2012, 02:49:38 AM
I created a strawman which attempts to explain the Atlantic route. Morbihan in France would appear to be a central hub in the four ages Atlantic, Megalithic, Copper, Bronze and Iron.
Prior to arriving at Morbihan, IMO, L21 or it's ancestor came from Tartessos in Iberia and Galecia where there is strong evidence of Celtic language and culture development.
Of course there is an equally strong Rivers route which also includes U106 and U152. I believe the use of river and maritime routes was a matter of routine and bidirectional and accomplished in months rather than a Demic diffusion type wave taking centuries. The Demic diffusion did happen but they were following in the footsteps of those who went before and spreading inland away from coasts and rivers.. I don't believe that L21 marched across the continent like Lemmings and walked over the cliffs at Dun Aengus or were sucked into a black hole. There were many back migrations and onward migrations which cannot be explained by STR diversity, eg L21 or U106 in the US.
 
As always I am wary of the age estimates but it is the best we have to go with at the moment.
The STR clock is broken but even a broken clock tells the right time twice a day.
We depend on STR analysis and new SNPs but ultimately the granualarity and certainty will be provided by new SNP discovery and the holy grail is full Y sequencing. We also need new methodologies and tools such as matching halpogroup analysis, both Y and mtDNA to match the DNA to localities and tribal affiliations.
The "People of the British Isles" and "Irish DNA Atlas" project and good old fashioned genealogy should help in this regard.
We have under estimated the role of mtDNA, although it is half the population and recent analysis would point to similar migration routes. We also have a lot to learn from aDNA both Y and mtDNA.
Above all we need a large amount of common sense and patience. The tools will come and full sequencing will arrive. We can barely keep up with the flood of new SNPs as it stands.

http://m.box.com/view_shared/d0nr7768zv18ht6tk28i

https://www.box.net/shared/pf653l1r181ry7r61ix4


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: rms2 on April 14, 2012, 07:16:03 AM
Rich!!!    We go through this everytime and yet you always persist! I am ABSOLUTELY NOT hinting, implying, stating or otherwise that L21 originated in Ireland! This is YOUR easy way of not addressing the question! The question is about frequency and variance!

Miles Kehoe does NOT believe AND is NOT suggesting that L21 originated in  Ireland!!!! Will you EVER stop Rich? Where could a guy like you go to have this obsession looked at? :) Very, very odd.

AGAIN, doesn't it seem just a wee bit odd to any of you that High Frequency ALWAYS means Low Variance/Diversity. As soon as a High Frequency is found, then HEY! let's look for the smallest frequency, see if it has a High Variance/Diversity, and lo and behold we'll have our origin! As Mike H posted above, this no longer works with all of the subclades, yet there the pattern will persist.

And so, I state with 100% certainty that as soon as L226+ is found outside of Ireland, Ireland will be ruled out as the place of origin! One! One result with a non-Irish name in Britain will do it. Because at that point Ireland will become the one of two places that has the Highest Frequency and therefore it can't possibly have the Highest Variance/Diversity. IT HAS NEVER HAPPENED IN THE HISTORY OF L21 RESEARCH THAT THE HIGEST FREQUENCY HAS EVER BEEN CONSIDERED THE POINT OF ORIGIN ON ANY SNP. Why would L226+ be different?

And, of course, when L226+ is found on the Continent, then the Continent be the origin. If it had EVER happened the other way I wouldn't be posting this. I can feel you people thinking, "Yea, well of course Ireland won't be the origin as soon as one is found outside of Ireland. What's his point?"

I think that High Frequency equals Low Variance/Diversity is fishy. Now, go about twisting my words to say that I am saying that Ireland is the origin of L21. You know no other way.  :)

Lol.

Now, golly, why would I ever think that you were once again (for the umpteenth time) pushing the overweening Irish nationalist schtick?

It gets old, Miles.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 14, 2012, 07:26:28 AM
You should probably read the two studies on "Surfing the Wave". I think Klopfstein was the primary author.  I've pointed to them before.  No one is saying that concept always applies, just that it may apply in some situations.

Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?

It's looking increasingly unlikely that an L21+ L459- and/or Z245- person is going to turn up (or L21- L459+ and/or Z245+) and even if such a person were found they probably wouldn’t have much company.

How likely is it that all three of these SNPs formed at the front of a population expansion without splitting up, and then there is also Z260 & Z290 to think about which will probably cover at the least a fairly large proportion of L21+ folk.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 11:26:09 AM
.. As the number of haplotypes increases in a population there tends to be a saturation towards a modal haplotype which causes the low variance.  The behavior of str's and their relationship with snp's is uncertain.  The trail of L21 will unlikely be solved by variance anyway. 

Agreed, Skip M was trying to make the same point on the need for true scientifically designed random sampling.

The trail of SNPs in the phylogeny of L21 or any clade is critical, and I agree, is more important that STR variance.   STR variance is just another data point to be used in conjunction with the phylogeny (which is what interclade calculations support), the archaeology, etc.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 11:29:17 AM
You should probably read the two studies on "Surfing the Wave". I think Klopfstein was the primary author.  I've pointed to them before.  No one is saying that concept always applies, just that it may apply in some situations.

Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?

It's looking increasingly unlikely that an L21+ L459- and/or Z245- person is going to turn up (or L21- L459+ and/or Z245+) and even if such a person were found they probably wouldn’t have much company.

How likely is it that all three of these SNPs formed at the front of a population expansion without splitting up, and then there is also Z260 & Z290 to think about which will probably cover at the least a fairly large proportion of L21+ folk.
I don't know, but I don't think your argument changes much.

How often do SNPs occur?  Isn't it something like one per generation?   I've heard statements like that.  If it is some frequency like that, having 3 SNPs in 3 generations that align is not a big deal.

The issue is just finding all of those SNPs, or at least the ones that don't go extinct. Probably more L21 people have been through full genome or WTY testing than any other subclade. We shouldn't look at SNPs as all that rare. This is their great promise for the future.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 12:23:30 PM
Linear duration of STRs is an issue (I don't deny that,) but you can see that Vizachero considers this to be the issue with haplogroups that are old (of 25k ybp age) and he does not consider R-M269 in that category.  

This is in general agreement with Ken Nordtvedt's simulations although Ken never gives a time break when STR linear duration causes a negative (accuracy risk) return for shorter durations STRs.

So basically because this Vizachero person has a pre-conceived notion that R-M269 has to be less than 25 kbp, then there is no need to worry about loss of linearity on the STRs mutations?  Is this some sort of appeal to authority, because a couple of paragraphs above you mentioned how much you trusted him and all that? ....

JeanL, I'm not really hoping to change your mind so I wasn't really thinking of invoking Vincent Vizachero's name as an appeal to authority although I guess you could consider it that. My intent is really just to let the undecided read along and make up their own minds. I am not the statistician to argue this stuff in-depth so I wanted to refer to who I find credible.

At one time time I believed R1b was Cro-Magnon in Europe and was up to 35k ybp old. I was influenced by National Genographic's Spencer Wells. However, people like Vizachero, Chandler, Nordtvedt, Vernande and Stevens convinced me, really not by their authority (Spencer Wells as more authority in this field) but by logic and data. I'm just telling you where I'm getting my thinking from. Chandler is the "rates" guy and Nordtvedt is the "TMRCA" guy in this field as far as hobbyists go, and it is my opinion they out think the actual academic folks on these issues. No doubt about it, Nordtvedt has provided several innovations to TMRCA calculation methods. I think I should add Marko Heinila as well, but there are others. People like Sandy Patterson can create their own statistical simulations and try out different approaches.

I have changed my mind in the past and I'm sure I will in the future so I'm open if you can present a logical case. Most people seem to just throw out objections, but don't really provide a full counter-proposal.  I think the Busby paper is an example.

This is exactly the same problem I have with Klyosov’s calculation and data, because he thought for some reason that R1b was less than 6000 ybp in Iberia, then he could use whichever STRs he wanted without worrying about loss of linearity, and surprise, surprise, he got a TMRCA that was less than 6000 ybp. ....

Like it or not, hobbyists who invested the most time and discussion (and I think are most credible) on TMRCAs and STR diversity - Ken, Marko, Anatole (Klyosov), Tim Janzen, Vince, etc. all come out with R-M269 being fairly young, like the 4-8k ybp age.  The Chief Scientist at FTDNA, Michael Hammer, says R-M269 is "4-8k years" old.

Klyosov's method is different than Nordtvedt as is Heinila's.  Still their results are similar.  They do explain their methods and they are available.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 14, 2012, 12:47:41 PM
You should probably read the two studies on "Surfing the Wave". I think Klopfstein was the primary author.  I've pointed to them before.  No one is saying that concept always applies, just that it may apply in some situations.

Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?

It's looking increasingly unlikely that an L21+ L459- and/or Z245- person is going to turn up (or L21- L459+ and/or Z245+) and even if such a person were found they probably wouldn’t have much company.

How likely is it that all three of these SNPs formed at the front of a population expansion without splitting up, and then there is also Z260 & Z290 to think about which will probably cover at the least a fairly large proportion of L21+ folk.
I don't know, but I don't think your argument changes much.

How often do SNPs occur?  Isn't it something like one per generation?   I've heard statements like that.  If it is some frequency like that, having 3 SNPs in 3 generations that align is not a big deal.

The issue is just finding all of those SNPs, or at least the ones that don't go extinct. We shouldn't look at SNPs as all that rare.

Ken quoted an estimated figure of 1 SNP per generation on average but I was told this would work out as 1/2 because only 1/2 the Y chromosome is readable or something, got to be honest I got lost in the conversation at that point :)

however as you pointed out most male lines die out, which is presumably what happened to L21-,L459+ or  L21+,Z245- (and any of the other possible combinations), but I thought reduced extinction rates was one of the facets of the wave surfing idea ?

At the end of the day probably the biggest limiting factor isn't the rarity of SNPs but the difficulty in identifying them.

This brings me back round to the point that 5 SNPs (L21 included) were found at the root of L21 in the 1000 genome project but only one for U106, P312 & U152 as far as I can remember.

Of course this could just be down to happenstance but to my mind there is also a reasonable possibility that this is relevant to the history of L21.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 12:50:24 PM
The STR clock is broken but even a broken clock tells the right time twice a day.

I wouldn't use the word broken nor treat this is as a singular clock.  There are 67 (up to 111) STR clocks now widely available.  Some may actually be practically broken for long time periods.  Some STRs probably work better than others, but the problem is we don't truly know which is which.  This is where the power of statistics and law of large numbers is of immense help, to blend the all of the clocks together as is appropriate.

ultimately the granualarity and certainty will be provided by new SNP discovery and the holy grail is full Y sequencing.

I agree although I wouldn't use the same terminology. The greater benefits are realized when the SNP phylogeny AND STR diversity are considered in conjunction with each other.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 12:56:09 PM
You should probably read the two studies on "Surfing the Wave". I think Klopfstein was the primary author.  I've pointed to them before.  No one is saying that concept always applies, just that it may apply in some situations.

Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?

It's looking increasingly unlikely that an L21+ L459- and/or Z245- person is going to turn up (or L21- L459+ and/or Z245+) and even if such a person were found they probably wouldn’t have much company.

How likely is it that all three of these SNPs formed at the front of a population expansion without splitting up, and then there is also Z260 & Z290 to think about which will probably cover at the least a fairly large proportion of L21+ folk.
I don't know, but I don't think your argument changes much.

How often do SNPs occur?  Isn't it something like one per generation?   I've heard statements like that.  If it is some frequency like that, having 3 SNPs in 3 generations that align is not a big deal.

The issue is just finding all of those SNPs, or at least the ones that don't go extinct. We shouldn't look at SNPs as all that rare.

Ken quoted an estimated figure of 1 SNP per generation on average but I was told this would work out as 1/2 because only 1/2 the Y chromosome is readable or something, got to be honest I got lost in the conversation at that point :)
Believe me, I get lost plenty.

If only 1/2 an SNP occurs per generation, then it would take only 6 generations to get 3, but these are just averages anyway.

however as you pointed out most male lines die out, which is presumably what happened to L21-,L459+ or  L21+,Z245- (and any of the other possible combinations), but I thought reduced extinction rates was one of the facets of the wave surfing idea ?

We don't know if L459 or Z245 is upstream of L21 or not.  Unfortunately, NOT many P312* folks have tested for those.

.. but you just said it, most lineages die out.  Everyone on the wave of an expansion does not prosper. As far as paternal lineages, it looks like those who prosper are fairly limited in number, it's just they do a ton of damage (so to speak.)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 14, 2012, 12:59:19 PM

Like it or not, hobbyists who invested the most time and discussion (and I think are most credible) on TMRCAs and STR diversity - Ken, Marko, Anatole (Klyosov), Tim Janzen, Vince, etc. all come out with R-M269 being fairly young, like the 4-8k ybp age.  The Chief Scientist at FTDNA, Michael Hammer, says R-M269 is "4-8k years" old.

Klyosov's method is different than Nordtvedt as is Heinila's.  Still their results are similar.  They do explain their methods and they are available.

They are not overtly different, the methodologies. I can't speak for Heinila's or even Janzen, or much about Nordvedt methodology. But I can say that most of the work Klyosov has done comes from projects from FTDNA, and not from randomly collected data.  May I also remind the readers that Klyosov, and at least what I’ve seen from Nordtvedt get also extremely young TMRCA for almost all European haplogroups including I1, I-M253, I-M26, etc. The only thing I would say, is that from what I have observed unlike Klyosov, who appears to be very close minded when it comes to criticism of his methodology, and resorts to a cesspool of all sort of logical fallacies one couldn’t even imagine, Ken Nordtvedt appears to be more open minded, and even willing to modify his methodology if he believes something is wrong.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 01:05:41 PM
Like it or not, hobbyists who invested the most time and discussion (and I think are most credible) on TMRCAs and STR diversity - Ken, Marko, Anatole (Klyosov), Tim Janzen, Vince, etc. all come out with R-M269 being fairly young, like the 4-8k ybp age.  The Chief Scientist at FTDNA, Michael Hammer, says R-M269 is "4-8k years" old.

Klyosov's method is different than Nordtvedt as is Heinila's.  Still their results are similar.  They do explain their methods and they are available.

I should add, a non-STR diversity based method, counting SNPs on branch lengths, was used by Karafet et al in 2008 to estimate the age of R1 (not R1b or R1a but their common ancestor) as 18.5k ybp.  FTDNA's Michael Hammer was in that author group.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 14, 2012, 01:26:48 PM
Believe me, I get lost plenty.

If only 1/2 an SNP occurs per generation, then it would take only 6 generations to get 3, but these are just averages anyway.

however as you pointed out most male lines die out, which is presumably what happened to L21-,L459+ or  L21+,Z245- (and any of the other possible combinations), but I thought reduced extinction rates was one of the facets of the wave surfing idea ?

We don't know if L459 or Z245 is upstream of L21 or not.  Unfortunately, NOT many P312* folks have tested for those.

.. but you just said it, most lineages die out.  Everyone on the wave of an expansion does not prosper. As far as paternal lineages, it looks like those who prosper are fairly limited in number, it's just they do a ton of damage (so to speak.)

I don’t know if it makes that much difference to my point which order the SNPs occurred in apart from the more limited testing under P312, but L21- people are testing L459 & Z245. According to Ymap 274 people have tested for L459 of which 94 were positive. I’m assuming the other 180 were P312+, L21- folk but I suppose Thomas could have been adding tests from WTY as well.

These SNPs could have occurred in 6 generations (or more if we include the untested Z260 & Z290) but what are the chances that all of them were then discovered in the 1000 genome project, at 1 SNP every 2 generations there are presumably many yet to be unearthed !!!


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: seferhabahir on April 14, 2012, 01:32:05 PM
Out of 12 L21** members, 2 are Irish, 3 English, 1 Welsh, 1 German, 1 Belarussian, and 4 unknown who might be at least British Isles.  The origin could be British Isles, but not necessarily Ireland.  More testing needed, of course.

I should remind you that I'm really not a Belarussian, and that my Belarussian (or perhaps even my "continental" status) is likely an accident of historical migration of the 1111EE cluster, due to various expulsions over time. If there were a lot more U.S. testers whose Ashkenazi great-grandparents lived in modern day Belarus or Ukraine, it would really skew the R-L21 frequency maps. Of course, I suffer from a huge case of Male Haplogroup Disorder, which makes me believe the origin of L21 is nowhere near the British Isles. But if any non-Ashkenazi person ever shows up in 1111EE, I'm open to considering myself to be a pre-proto-Celt just for fun.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 14, 2012, 01:33:04 PM
I should add, a non-STR diversity based method, counting SNPs on branch lengths, was used by Karafet et al in 2008 to estimate the age of R1 (not R1b or R1a but their common ancestor) as 18.5k ybp.  FTDNA's Michael Hammer was in that author group.

Indeed, that was done by fixing the age of CT at 70 kya, and then interpolating how far downstream each haplogroup was located. In fact here is the quote about it:

Quote from: Karafet et al(2008)
The time to the most recent common ancestral Y chromosome and the estimated ages of 11 major clades are presented in Table 2. To provide estimates of the age of the nodes, we chose to fix the time to the most recent common ancestor of CT (defined by P9.1, M168, and M294) at 70 thousand years ago (Kya), which is consistent with previous estimates from genetic and archaeological data (Lahr and Foley 1998; Hammer and Zegura 2002; Macaulay et al. 2005), and is the chronological approximation given in Jobling et al. (2004) (p250) for the first major human out-of-Africa dispersals. We estimated the times for intermediate nodes by using a linear interpolation. The age estimates in years should be viewed with caution because we do not know if the calibration date chosen above is accurate.[/color]

Moreover, per Table-2 of that study the age of R1 is 18,500 ybp (95% CI 12,500-25,700), and the age of I is 22,200 ybp (95% CI 15,300-30,000), so it seems I is about 1.2 times older than R1. Also, if we move TMRCA of CT upwards, then R1 TMRCA goes upward, and I TMRCA goes up, the same happens if we move it downwards.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: seferhabahir on April 14, 2012, 02:09:00 PM
Out of 12 L21** members, 2 are Irish, 3 English, 1 Welsh, 1 German, 1 Belarussian, and 4 unknown who might be at least British Isles.  The origin could be British Isles, but not necessarily Ireland.  More testing needed, of course.
I should remind you that I'm really not a Belarussian, and that my Belarussian (or perhaps even my "continental" status) is likely an accident of historical migration of the 1111EE cluster, due to various expulsions over time. If there were a lot more U.S. testers whose Ashkenazi great-grandparents lived in modern day Belarus or Ukraine, it would really skew the R-L21 frequency maps. Of course, I suffer from a huge case of Male Haplogroup Disorder, which makes me believe the origin of L21 is nowhere near the British Isles. But if any non-Ashkenazi person ever shows up in 1111EE, I'm open to considering myself to be a pre-proto-Celt just for fun.

And one of the Irish L21** just now came back as DF41+ so is no longer in the L21** list.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Dubhthach on April 14, 2012, 04:23:59 PM
Out of 12 L21** members, 2 are Irish, 3 English, 1 Welsh, 1 German, 1 Belarussian, and 4 unknown who might be at least British Isles.  The origin could be British Isles, but not necessarily Ireland.  More testing needed, of course.
I should remind you that I'm really not a Belarussian, and that my Belarussian (or perhaps even my "continental" status) is likely an accident of historical migration of the 1111EE cluster, due to various expulsions over time. If there were a lot more U.S. testers whose Ashkenazi great-grandparents lived in modern day Belarus or Ukraine, it would really skew the R-L21 frequency maps. Of course, I suffer from a huge case of Male Haplogroup Disorder, which makes me believe the origin of L21 is nowhere near the British Isles. But if any non-Ashkenazi person ever shows up in 1111EE, I'm open to considering myself to be a pre-proto-Celt just for fun.

And one of the Irish L21** just now came back as DF41+ so is no longer in the L21** list.

That would be me! :-)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: seferhabahir on April 14, 2012, 04:29:26 PM
And one of the Irish L21** just now came back as DF41+ so is no longer in the L21** list.
That would be me! :-)

Yes, very cool. Congratulations on getting into a probable new son of L21...


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 10:54:46 PM
Believe me, I get lost plenty.

If only 1/2 an SNP occurs per generation, then it would take only 6 generations to get 3, but these are just averages anyway.

however as you pointed out most male lines die out, which is presumably what happened to L21-,L459+ or  L21+,Z245- (and any of the other possible combinations), but I thought reduced extinction rates was one of the facets of the wave surfing idea ?

We don't know if L459 or Z245 is upstream of L21 or not.  Unfortunately, NOT many P312* folks have tested for those.

.. but you just said it, most lineages die out.  Everyone on the wave of an expansion does not prosper. As far as paternal lineages, it looks like those who prosper are fairly limited in number, it's just they do a ton of damage (so to speak.)

I don’t know if it makes that much difference to my point which order the SNPs occurred in apart from the more limited testing under P312, but L21- people are testing L459 & Z245. According to Ymap 274 people have tested for L459 of which 94 were positive. I’m assuming the other 180 were P312+, L21- folk but I suppose Thomas could have been adding tests from WTY as well.

The point is that we don't know if L459, L21 and Z245 all happened at about the same time. L459, for instance could have occurred many generations upstream of L21 in some P312* lineage that is now mostly extinct except the L21 sub-element of it.

I don't think you can assume the other 180 were P312xL21. There aren't nearly that many P312* guys in WTY, not even close. Outside of WTY, a only a few P312* have tested for L459.

These SNPs could have occurred in 6 generations (or more if we include the untested Z260 & Z290) but what are the chances that all of them were then discovered in the 1000 genome project, at 1 SNP every 2 generations there are presumably many yet to be unearthed !!!

I don't know the odds and you don't know the odds, but neither us has much idea of the generations between these three SNPs.  You are making assumptions about those three SNPs to support your objections.

We should be careful about what we assume.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 14, 2012, 11:01:52 PM
I should add, a non-STR diversity based method, counting SNPs on branch lengths, was used by Karafet et al in 2008 to estimate the age of R1 (not R1b or R1a but their common ancestor) as 18.5k ybp.  FTDNA's Michael Hammer was in that author group.

Indeed, that was done by fixing the age of CT at 70 kya, and then interpolating how far downstream each haplogroup was located. In fact here is the quote about it:

Quote from: Karafet et al(2008)
The time to the most recent common ancestral Y chromosome and the estimated ages of 11 major clades are presented in Table 2. To provide estimates of the age of the nodes, we chose to fix the time to the most recent common ancestor of CT (defined by P9.1, M168, and M294) at 70 thousand years ago (Kya), which is consistent with previous estimates from genetic and archaeological data (Lahr and Foley 1998; Hammer and Zegura 2002; Macaulay et al. 2005), and is the chronological approximation given in Jobling et al. (2004) (p250) for the first major human out-of-Africa dispersals. We estimated the times for intermediate nodes by using a linear interpolation. The age estimates in years should be viewed with caution because we do not know if the calibration date chosen above is accurate.[/color]

Moreover, per Table-2 of that study the age of R1 is 18,500 ybp (95% CI 12,500-25,700), and the age of I is 22,200 ybp (95% CI 15,300-30,000), so it seems I is about 1.2 times older than R1. Also, if we move TMRCA of CT upwards, then R1 TMRCA goes upward, and I TMRCA goes up, the same happens if we move it downwards.


This is no proof. These are just estimates.

Nevertheless, the "most likely" case for an R1 TMRCA estimate using a totally non-STR based (SNP counting) method aligns very nicely with our top scientist-hobbyist TMRCA estimates for R1b and its subclades, our our top scientist-hobbyists are using at least three different methods - Nordtvedt's Gen7, Klyosov's, and Heinila's "most probable outcome."

The net is we have an SNP based method that supports STR variance based methods and three of those methods generally agree.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 14, 2012, 11:09:24 PM
This is no proof. These are just estimates.

Nevertheless, the "most likely" case for an R1 TMRCA estimate using a totally non-STR based (SNP counting) method aligns very nicely with our top scientist-hobbyist TMRCA estimates for R1b and its subclades, our our top scientist-hobbyists are using at least three different methods - Nordtvedt's Gen7, Klyosov's, and Heinila's "most probable outcome."

The net is we have an SNP based method that supports STR variance based methods and three of those methods generally agree.


What are you talking about when you said: this is no proof?? The net from Karafet et al(2008) is that R1 is 18500 ybp if CT is 70000 ybp, and under that assumption I is 22200 ybp. If you use three different methods which do not take into account the effects of microsatellite choice, you would still get the same age estimates, because the three methods would undermine the age of the haplogroup, so nothing new there. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 15, 2012, 07:01:59 AM

The point is that we don't know if L459, L21 and Z245 all happened at about the same time. L459, for instance could have occurred many generations upstream of L21 in some P312* lineage that is now mostly extinct except the L21 sub-element of it.

I don't think you can assume the other 180 were P312xL21. There aren't nearly that many P312* guys in WTY, not even close. Outside of WTY, a only a few P312* have tested for L459.

These SNPs could have occurred in 6 generations (or more if we include the untested Z260 & Z290) but what are the chances that all of them were then discovered in the 1000 genome project, at 1 SNP every 2 generations there are presumably many yet to be unearthed !!!

I don't know the odds and you don't know the odds, but neither us has much idea of the generations between these three SNPs.  You are making assumptions about those three SNPs to support your objections.

We should be careful about what we assume.



I’m not assuming anything, simply wondering if this data can be used to some effect :)

Whether or not Thomas included WTY results in the Ymap data is an important consideration (that’s why I mentioned it), the best way to know for sure of course would be to ask him but he can be a little erratic with his replies but of course he’s a busy man. However I think it’s at least more than likely that he does and since there are 122 negative results in WTY for L459 of which 117 are something other than P312*, I think we can reasonably comfortably remove them from the 180 neg results reported at Ymap.

That still leaves 63 and unless we can think of another source of random testing Thomas could be using (I can’t other than the 1000 genome which seems unlikely) I think it’s reasonable to assume (bum, am I allowed one ? :) these 63 are P312*

That would leave a roughly guessed 63 P312+, L459- against 94 L21+, L459+ results which is probably still a little light on numbers to draw concrete conclusions from but at least gives enough detail to say if an L21+, L459- or L21-, L459+ fellow did turn up he would be quite lonely.

Without going through all that again I think it’s fair to say the results for Z245 are going to be roughly in line with those for L459, we at least know with certainty that L21+, Z245- or L21-, Z245+ hasn’t been found.

So getting back to the original question

We have a reasonably good idea that L21, L459 & Z245 are pretty much the same thing, but we don’t know the order they arrived (and probably never will).

What we can say is this suggests there was a reasonable time frame between the P312* grandfather  and the first L21+, L459+, Z245+ fellow, unless these three SNPs happened right on top of each other which sounds less likely.

But from interclade calculations we know there wasn’t that much time between P312, U152 and L21 (or L459 / Z245, whichever was last)

This tells us that L21 (or whatever) split from P312 earlier than interclade calculations can tell us and in my opinion draws questions around the idea that L21’s spread is due to it being born on the crest of a wave. Of course this idea (and it’s only that) doesn’t completely quash the ‘surfing the wave’ idea but to my mind at least suggests L21 sat around somewhere fairly sedentarily for a reasonable time (possibly building up numbers) before surfing out.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 15, 2012, 11:01:12 AM

The point is that we don't know if L459, L21 and Z245 all happened at about the same time. L459, for instance could have occurred many generations upstream of L21 in some P312* lineage that is now mostly extinct except the L21 sub-element of it.

I don't think you can assume the other 180 were P312xL21. There aren't nearly that many P312* guys in WTY, not even close. Outside of WTY, a only a few P312* have tested for L459.

These SNPs could have occurred in 6 generations (or more if we include the untested Z260 & Z290) but what are the chances that all of them were then discovered in the 1000 genome project, at 1 SNP every 2 generations there are presumably many yet to be unearthed !!!

I don't know the odds and you don't know the odds, but neither us has much idea of the generations between these three SNPs.  You are making assumptions about those three SNPs to support your objections.

We should be careful about what we assume.
I’m not assuming anything, simply wondering if this data can be used to some effect :)
Your counter-arguments are based on assumptions, whether you call them wondering or whatever. They apparently are constructed to argue that you've found an exception to statistically researched methods like what Ken Nordtvedt or Marko Heinila have constructed.

Whether or not Thomas included WTY results in the Ymap data is an important consideration (that’s why I mentioned it), the best way to know for sure of course would be to ask him but he can be a little erratic with his replies but of course he’s a busy man. However I think it’s at least more than likely that he does and since there are 122 negative results in WTY for L459 of which 117 are something other than P312*, I think we can reasonably comfortably remove them from the 180 neg results reported at Ymap.

That still leaves 63 and unless we can think of another source of random testing Thomas could be using (I can’t other than the 1000 genome which seems unlikely) I think it’s reasonable to assume (bum, am I allowed one ? :) these 63 are P312*
You can assume all you want, but then the weight of your counter-arguments mean little if your assumptions are false or unknown, which is the case.

That would leave a roughly guessed 63 P312+, L459- against 94 L21+, L459+ results which is probably still a little light on numbers to draw concrete conclusions from but at least gives enough detail to say if an L21+, L459- or L21-, L459+ fellow did turn up he would be quite lonely....

Maybe you missed it. Do you agree most Y lineages go extinct?  If so, then it is very conceivable that a P312* lineage had the L459 mutation and then many generations had the L21 mutation, but all the L459+ L21- lineages died off

.... or have not been found and tested yet. A lot can happen in 4000 years. What % of the population do you think we have tested for P312, L21 and L459?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 15, 2012, 11:15:16 AM

The point is that we don't know if L459, L21 and Z245 all happened at about the same time. L459, for instance could have occurred many generations upstream of L21 in some P312* lineage that is now mostly extinct except the L21 sub-element of it.

I don't think you can assume the other 180 were P312xL21. There aren't nearly that many P312* guys in WTY, not even close. Outside of WTY, a only a few P312* have tested for L459.

These SNPs could have occurred in 6 generations (or more if we include the untested Z260 & Z290) but what are the chances that all of them were then discovered in the 1000 genome project, at 1 SNP every 2 generations there are presumably many yet to be unearthed !!!

I don't know the odds and you don't know the odds, but neither us has much idea of the generations between these three SNPs.  You are making assumptions about those three SNPs to support your objections.

We should be careful about what we assume.
I’m not assuming anything, simply wondering if this data can be used to some effect :)
Your counter-arguments are based on assumptions, whether you call them wondering or whatever. They apparently are constructed to argue that you've found an exception to statistically researched methods like what Ken Nordtvedt or Marko Heinila have constructed.

Whether or not Thomas included WTY results in the Ymap data is an important consideration (that’s why I mentioned it), the best way to know for sure of course would be to ask him but he can be a little erratic with his replies but of course he’s a busy man. However I think it’s at least more than likely that he does and since there are 122 negative results in WTY for L459 of which 117 are something other than P312*, I think we can reasonably comfortably remove them from the 180 neg results reported at Ymap.

That still leaves 63 and unless we can think of another source of random testing Thomas could be using (I can’t other than the 1000 genome which seems unlikely) I think it’s reasonable to assume (bum, am I allowed one ? :) these 63 are P312*
You can assume all you want, but then the weight of your counter-arguments mean little if your assumptions are false or unknown, which is the case.

That would leave a roughly guessed 63 P312+, L459- against 94 L21+, L459+ results which is probably still a little light on numbers to draw concrete conclusions from but at least gives enough detail to say if an L21+, L459- or L21-, L459+ fellow did turn up he would be quite lonely....

Maybe you missed it. Do you agree most Y lineages go extinct?  If so, then it is very conceivable that a P312* lineage had the L459 mutation and then many generations had the L21 mutation, but all the L459+ L21- lineages died off

.... or have not been found and tested yet. A lot can happen in 4000 years. What % of the population do you think we have tested for P312, L21 and L459?

Oh well never mind, I thought this was a potentially interesting observation, never mind I'll crawl back under my rock.

BTW I'm not sure which statistical evidence you think I'm trying to overturn and yes I'm extremely aware of extinction rates.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 15, 2012, 11:34:24 AM
This was part of the R-P312 Basque discussions but since we are getting into STR variance issues I copied it over here.

Quote from: Mikewww link=topic=10511.msg129159#msg129159

Just three or ten STRs are just not enough. Of course, with such a limited experiment you are going to get erratic results.  Remember Sandy Paterson's simulations where it was determined you need a minimum of 50 STRs to have any precision?

Also, I think you are doing comparisons of R-L23xM412. This is NOT a haplogroup. It is a paragroup. Since it is not really a single group with a single common ancestor I'm not sure that is a valid comparison to make between geographies.

I'm working with what I got, the data from Myres et al(2010) was sampled using 10 STRs, and the point of the exercise I did, was to show how much the variance changed when one used more linear vs.less linear STRs. As you can see when all of the STRs are used, Turkey turns out to have a higher variance than Western Europe, but when the slowest, most linear ones are used, it turns out Western Europe has a higher variance. It’s not about the numbers, but about the choice, of course 10-20 slow STRs trump 3 slow STRs, however to say that a set of 50 STRs regardless of their mutation rate is better than a set of 10-20 slow STRs is just not logic to me. 
I think it is fine to work with what you have but that doesn't mean you've demonstrated your point well.  Three or ten STRs is just not enough. Of course, you can get erratic results with working with such small data sets.

Yeah I’m doing comparisons on the L23(xM412) data from Myres et al(2010), and yes maybe the folks from the Caucasus are have some SNP than the folks from Europe do not have, but that doesn’t change the fact that they both descend from an L23 man, so yeah it is a group with a single common ancestor
Yes, but this is not representative of the whole group. All of the M412 (L51) folks are excluded and that's nine tenths of the group, or probably more.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 15, 2012, 12:05:05 PM
I think it is fine to work with what you have but that doesn't mean you've demonstrated your point well.  Three or ten STRs is just not enough. Of course, you can get erratic results with working with such small data sets.

Yes, but this is not representative of the whole group. All of the M412 (L51) folks are excluded and that's nine tenths of the group, or probably more.

First off, my point was to show that the relative variance in populations varies as a function of the microsatellite choice. I accept that 3 STRs are rather small, but well if you can find me a scientific paper that uses more than 10-15 STRs I’ll be more than happy to work with it. As for the inclusion of the L51 folks, well, per Table-S3 of the Myres et al(2010) study, R-L23 has 214 samples, whereas M412(L51) only has 14, that is nowhere near nine tenths of the group, they are in fact a small minority. Moreover 13/14 M412 are found in Europe, with the additional one coming from Turkey, so it would be kind of pointless to work with a single Turkish haplotype.

PS: Not to get into gossip, but here is what Dr.Klyosov just told Sandy Paterson over at Rootsweb:

Quote
There are several problems with your approach, and it would be good if you
listen if you really want to understand where the problems are.

Some of those problems are technical, but they show that you are not "in" as
yet. Some of them are fundamental.

Let me explain. What you do, you pick something such as "the sum of
variance" which can always be picked for any senseless series of numbers,
you divide by something which is highly uncertain, you get something, which
you would always get when you divide something by something, and you say -
voila, I got it. What is the worst in all of it, you do not even read papers
which explain all what you do, you ignore them, you do not take the data
into account.

This is a recipe for a complete disaster which you get.

You insist that the mutation rate constant for the 111 marker haplotypes is
0.41. Have you seen HOW it was "obtained"? It was obtained for only 34
markers from those 111 markers, and only 9 of them were in the 68-111 marker
row. Do you call it "science"?
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334407313 (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334407313)

Here is what he said very recently in regards to John Chandler’s mutation rates:
Quote

For the record, Chandler's estimates are good only for the 12 marker panel.
For the 25 and 37 marker panel they are grossly incorrect. They do not in
agreement with his own data on the 12 marker haplotypes.
 http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334423409 (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334423409)

It seems to me there isn’t much harmony in the hobbyist community.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 15, 2012, 04:46:29 PM
I think it is fine to work with what you have but that doesn't mean you've demonstrated your point well.  Three or ten STRs is just not enough. Of course, you can get erratic results with working with such small data sets.

Yes, but this is not representative of the whole group. All of the M412 (L51) folks are excluded and that's nine tenths of the group, or probably more.

First off, my point was to show that the relative variance in populations varies as a function of the microsatellite choice. I accept that 3 STRs are rather small, but well if you can find me a scientific paper that uses more than 10-15 STRs I’ll be more than happy to work with it.
I'm not going to defend the scientific papers' use of 10-15 STRs. I think it is too small of a number.  We know that for matching people in R1b, using 67 markers is highly beneficial even versus 37. I don't why anyone thinks 3, for sure, is worth even looking at. I don't think anyone in the hobbyist community is claiming to do valid TMRCA's with 3 markers or 12 or 15. FTDNA has come out with 111 STRs and has used this reasoning - to do more precise TMRCA's.

As for the inclusion of the L51 folks, well, per Table-S3 of the Myres et al(2010) study, R-L23 has 214 samples, whereas M412(L51) only has 14, that is nowhere near nine tenths of the group, they are in fact a small minority. Moreover 13/14 M412 are found in Europe, with the additional one coming from Turkey, so it would be kind of pointless to work with a single Turkish haplotype.

All of L51 includes all of L11 and all of U106 and all of P312. Paragroups do not represent a group of people with a single common ancestor. Paragroups may be missing large chunks of the data that is available for total group. In your example, you are missing the bulk of L23 and/or L51.

PS: Not to get into gossip, but here is what Dr.Klyosov just told Sandy Paterson over at Rootsweb:...

Here is what he said very recently in regards to John Chandler’s mutation rates:
....
For the record, Chandler's estimates are good only for the 12 marker panel.
For the 25 and 37 marker panel they are grossly incorrect.
....
It seems to me there isn’t much harmony in the hobbyist community.

If you are certain Chandler's estimates are grossly incorrect you should go on to Rootsweb and make your case. I know that Leo Little's rates are also used on certain panels, but unfortunately, he is no longer with us.

I agree there is not harmony in the hobbyist community, but regardless of methodology, prognostication and personal and communications differences, that makes the following even more significant.

Nevertheless, the "most likely" case for an R1 TMRCA estimate using a totally non-STR based (SNP counting) method aligns very nicely with our top scientist-hobbyist TMRCA estimates for R1b and its subclades, our our top scientist-hobbyists are using at least three different methods - Nordtvedt's Gen7, Klyosov's, and Heinila's "most probable outcome."

The net is we have an SNP based method that supports STR variance based methods and three of those methods generally agree.

No matter how they do it, R-M269 comes out about the same age. I should add Tim Janzen to the list, but he is just using a variance of Nordtvedt's methods. We could add Vince Vizachero as well but he may also use Nordtvedt's stuff too.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 15, 2012, 05:43:28 PM
I'm not going to defend the scientific papers' use of 10-15 STRs. I think it is too small of a number.  We know that for matching people in R1b, using 67 markers is highly beneficial even versus 37. I don't why anyone thinks 3, for sure, is worth even looking at. I don't think anyone in the hobbyist community is claiming to do valid TMRCA's with 3 markers or 12 or 15. FTDNA has come out with 111 STRs and has used this reasoning - to do more precise TMRCA's.

Well 37, 67, 111 markers sets are good to get personal matches, to find TMRCA in population databases a good 10-15 panel of linear STRs would just do it. Again the usage of the 3 slowest/more linear STRs vs. the 4 faster/less linear STRs was to show how the relative variance changed, and how places that had a greater variance overall ended up having less variance than other places when the most linear markers were used.

All of L51 includes all of L11 and all of U106 and all of P312. Paragroups do not represent a group of people with a single common ancestor. Paragroups may be missing large chunks of the data that is available for total group. In your example, you are missing the bulk of L23 and/or L51.

Well the L23 samples I analyzed were L23(xL51), so again it wasn’t a paragroup, or at least not on the way you are describing it. So no, I’m not missing the bulk of L23 because, I only included the L23(xL51) samples, not the L23+ samples.

If you are certain Chandler's estimates are grossly incorrect you should go on to Rootsweb and make your case. I know that Leo Little's rates are also used on certain panels, but unfortunately, he is no longer with us.

I didn’t say that, that was Anatole Klyosov who said that, did you even care to check the links I provided?

I agree there is not harmony in the hobbyist community, but regardless of methodology, prognostication and personal and communications differences, that makes the following even more significant.

Nevertheless, the "most likely" case for an R1 TMRCA estimate using a totally non-STR based (SNP counting) method aligns very nicely with our top scientist-hobbyist TMRCA estimates for R1b and its subclades, our our top scientist-hobbyists are using at least three different methods - Nordtvedt's Gen7, Klyosov's, and Heinila's "most probable outcome."

The net is we have an SNP based method that supports STR variance based methods and three of those methods generally agree.

No matter how they do it, R-M269 comes out about the same age. I should add Tim Janzen to the list, but he is just using a variance of Nordtvedt's methods. We could add Vince Vizachero as well but he may also use Nordtvedt's stuff too.

Well as long as a bunch of less linear and few more linear STRs are being thrown together you are going to get TMRCA that are saturated by the inherit loss of linearity of most STRs that were used.  What’s the point of using 111 markers if most of them lose linearity in less than 5000 ybp, no wonder they get TMRCA that are between 4000-8000 ybp.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 16, 2012, 04:54:11 PM

All of L51 includes all of L11 and all of U106 and all of P312. Paragroups do not represent a group of people with a single common ancestor. Paragroups may be missing large chunks of the data that is available for total group. In your example, you are missing the bulk of L23 and/or L51.

Well the L23 samples I analyzed were L23(xL51), so again it wasn’t a paragroup, or at least not on the way you are describing it. So no, I’m not missing the bulk of L23 because, I only included the L23(xL51) samples, not the L23+ samples.

It looks like we have a disagreement on terminology or understanding.
Quote from: wikipedia
Paragroup is a term used in population genetics to describe lineages within a haplogroup that are not defined by any additional unique markers. In human Y-chromosome DNA haplogroups, paragroups are typically represented by an asterisk (*) placed after the main haplogroup[1].
[1] The Y Chromosome Consortium, T. Y C. (2002). "A Nomenclature System for the Tree of Human Y-Chromosomal Binary" Genome Research
http://en.wikipedia.org/wiki/Paragroup

R-L23xL51 or call it R-L23* if you wish, is a paragroup.

There may be people that are R-L23xL51 in your sample that are closer related to R-L51 people than to all of the other R-L23xL51 people in the sample. Make sense? We don't know how many subclades there are hidden in R-L23xL51. Their Most Recent Common Ancestor can not be determined by SNP knowledge, other than to say they have the same Most Recent Common Ancestor as all of R-L23, including all of the L51 (on down to P312, U106) guys.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 16, 2012, 05:05:36 PM
Here is what he said very recently in regards to John Chandler’s mutation rates:
Quote
For the record, Chandler's estimates are good only for the 12 marker panel.
For the 25 and 37 marker panel they are grossly incorrect. They do not in
agreement with his own data on the 12 marker haplotypes.
 http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334423409 (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334423409)

If you are certain Chandler's estimates are grossly incorrect you should go on to Rootsweb and make your case. I know that Leo Little's rates are also used on certain panels, but unfortunately, he is no longer with us.

I didn’t say that, that was Anatole Klyosov who said that, did you even care to check the links I provided?

I apologize I thought the way you referred to him meant that you agreed with him. Why quote someone unless you are explicit in your agreement or disagreement?  I guess you are just saying there is not harmony in the hobbyist community.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 16, 2012, 05:59:45 PM
R-L23xL51 or call it R-L23* if you wish, is a paragroup.

There may be people that are R-L23xL51 in your sample that are closer related to R-L51 people than to all of the other R-L23xL51 people in the sample. Make sense? We don't know how many subclades there are hidden in R-L23xL51. Their Most Recent Common Ancestor can not be determined by SNP knowledge, other than to say they have the same Most Recent Common Ancestor as all of R-L23, including all of the L51 (on down to P312, U106) guys.

Ok let’s put it this way, the yet-to-be-discovered SNPs under L23 that aren’t L51 appear to be older in Western Europe than in Eastern Europe. At the same time Western European variance appears to be slightly younger than the Caucasus one in all instances, and older than Turkey when the most linear markers, otherwise Turkey appears to be the oldest when using the four less linear STRs.

But here is something interesting, if we assume that L23 was born outside of Europe, and that the clades that entered Europe were either L11, or P312/U106, then one would expect the European L23 to be relatively scarce, and very young; because any L23 in Europe would be newcomers from outside very recently, yet Western Europe has L23 that have a TMRCA as old as Turkey, and almost as old as the Caucasus. So does that mean that L23 was part of the initial wave of colonization, and that L11 was born along the way?

Ok, mind everyone this is all based on the data provided by Myres et al(2010), this is nowhere near conclusive of the European genetic panorama. I want to make sure everyone understand that I am advancing this observations based on very limited data.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 16, 2012, 06:01:26 PM

I apologize I thought the way you referred to him meant that you agreed with him. Why quote someone unless you are explicit in your agreement or disagreement?  I guess you are just saying there is not harmony in the hobbyist community.


You got it.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 16, 2012, 06:15:28 PM
I think it is fine to work with what you have but that doesn't mean you've demonstrated your point well.  Three or ten STRs is just not enough. Of course, you can get erratic results with working with such small data sets....

First off, my point was to show that the relative variance in populations varies as a function of the microsatellite choice. I accept that 3 STRs are rather small, but well if you can find me a scientific paper that uses more than 10-15...

I've actually played with "miccrosatellite choice" in the past, because of concern about your point.  I ran through the R-L21 file of long haplotypes and tried 12, 25, 37, 67 length haplotypes and after throwing out the non-multicopy non-null STRs, I would run variance calculations adding an STR or two or subtracting.   What I found was the variance relationships between the subclade of L21 were fairly stable when you start using above 15-20 STRs.

Generally, I find very little jostling of the relationships in R1b subclades when you start using 25 or so markers and get up to about 30 haplotypes.

Here is "test" run for you on R-L21's major subclades based on different sets of markers.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=0.99 (N=2590)
DF21_________:  Var=0.80 (N=116)
L513_________:  Var=0.75 (N=157)
Z253_________:  Var=0.61 (N=145)
M222_________:  Var=0.49 (N=540)
Z255_________:  Var=0.39 (N=102)

Relative variance with the 36 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=1.02  (N=2590)
DF21_________:  Var=0.73 (N=116)   
L513_________:  Var=0.64 (N=157)
Z253_________:  Var=0.60 (N=145)
M222_________:  Var=0.45 (N=540)
Z255_________:  Var=0.35 N=102)

Relative variance with the 24 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.95 (N=125)   
L513________:  Var=0.73 (N=166)
Z253________:  Var=0.64 (N=170)   
M222________:  Var=0.57 (N=734)
Z255________:  Var=0.45 (N=128)

Relative variance with the 16 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.92 (N=125)
L513________:  Var=0.64 (N=166)
Z253________:  Var=0.63 (N=170)
M222________:  Var=0.54 (N=734)
Z255________:  Var=0.43 (N=128)

* Linear durations greater than 7000 years according to Marko Heinila's analysis.


See how stable the order of the above haplogroup stays?  The percentage differences between the different haplogroups do change depending on the STRs used. I am not trying to say that STR variance is precise. It isn't, but the more data you have you can improve precision.

Generally, what I've found is that the linear 36 STR (most of which are slower) and the 49 STR mixed speed marker calculation runs rarely change the positioning of haplogroups.

Most variance relationships between R1b haplogroups work well at 16 or 24 markers on 37 length haplotypes. M222 did flip-flop with Z255 for us on the low marker runs above, however, the notable exception is that U198 looks quite old (high variance compared to U106 or Z381) with the 37 length haplotypes.  However if you ratchet up the U198 analysis to 36 or 49 markers on 67 length haplotypes everything seems to fit back into place (younger than Z381.)

I just think it is the law of large numbers at work and the value of having more STR "experiments."


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 16, 2012, 06:25:48 PM
R-L23xL51 or call it R-L23* if you wish, is a paragroup.

There may be people that are R-L23xL51 in your sample that are closer related to R-L51 people than to all of the other R-L23xL51 people in the sample. Make sense? We don't know how many subclades there are hidden in R-L23xL51. Their Most Recent Common Ancestor can not be determined by SNP knowledge, other than to say they have the same Most Recent Common Ancestor as all of R-L23, including all of the L51 (on down to P312, U106) guys.

Ok let’s put it this way, the yet-to-be-discovered SNPs under L23 that aren’t L51 appear to be older in Western Europe than in Eastern Europe. ...

You can't really say that.  There may be some not yet discovered SNPs under L23* that are older than L51, but we really don't know.  There could just be well "balanced" distribution of four or five major SNPs (A, B, C, D, & E) under L23* that are each younger than L51 and one or two could (A & B perhaps) are actually closer related to L51 than to C, D and E.

If some L23* not yet discovered SNPs are older in Western Europe versus Eastern Europe I don't think that means STR diversity is not meaningful, which is the topic of this thread.  

Regardless, I don't think have many long Western European R-L23* haplotypes to be very conclusive in comparing with the East, do we?...  well that question belongs on another thread.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 16, 2012, 07:21:03 PM
I'm just adding this reply FYI because of the comments from Vincent Vizachero on Rootsweb related to weighting STRs.

.... I've also tried to weight each STR against its maximum variance so that no STR would have more weight than another. That didn't work out so well. I received some crazy results. I think it goes back to using a calendar to measure hours and every now then even the slowest STRs have fairly quick successive mutations. It's like the calendar page turned on that STR when I'm only trying to measure 10 or 12 hours worth of time....

Quote from: Rootsweb question
> Is it reasonable to downweight the GDs for specific markers as a function of their mutation rates (analogous to what is done in Ken's Generations spreadsheets)?

Quote from: Vincent Vizachero
If there was a time-efficient way adjust reweight the markers for each pair of haplotypes, I suppose it might be helpful. But the benefit would be far outweighed by the cost of developing the algorithm and computer code to make it happen, I suspect. Theoretically, yes, this would be an improvement but the magnitude would be small.
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334616624

Despite the lack of harmony among hobbyist researchers, what Vincent posted on this specific topis is supportive of positions Anatole Klyosov has taken where he uses the average rate for a set of STRs in his TMRCA calculations rather than applying each individually, which would be needed if you wanted to weight STRs.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 16, 2012, 09:40:56 PM
I've actually played with "miccrosatellite choice" in the past, because of concern about your point.  I ran through the R-L21 file of long haplotypes and tried 12, 25, 37, 67 length haplotypes and after throwing out the non-multicopy non-null STRs, I would run variance calculations adding an STR or two or subtracting.   What I found was the variance relationships between the subclade of L21 were fairly stable when you start using above 15-20 STRs.

Generally, I find very little jostling of the relationships in R1b subclades when you start using 25 or so markers and get up to about 30 haplotypes.

Here is "test" run for you on R-L21's major subclades based on different sets of markers.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=0.99 (N=2590)
DF21_________:  Var=0.80 (N=116)
L513_________:  Var=0.75 (N=157)
Z253_________:  Var=0.61 (N=145)
M222_________:  Var=0.49 (N=540)
Z255_________:  Var=0.39 (N=102)

Relative variance with the 36 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=1.02  (N=2590)
DF21_________:  Var=0.73 (N=116)   
L513_________:  Var=0.64 (N=157)
Z253_________:  Var=0.60 (N=145)
M222_________:  Var=0.45 (N=540)
Z255_________:  Var=0.35 N=102)

Relative variance with the 24 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.95 (N=125)   
L513________:  Var=0.73 (N=166)
Z253________:  Var=0.64 (N=170)   
M222________:  Var=0.57 (N=734)
Z255________:  Var=0.45 (N=128)

Relative variance with the 16 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.92 (N=125)
L513________:  Var=0.64 (N=166)
Z253________:  Var=0.63 (N=170)
M222________:  Var=0.54 (N=734)
Z255________:  Var=0.43 (N=128)

* Linear durations greater than 7000 years according to Marko Heinila's analysis.


See how stable the order of the above haplogroup stays?  The percentage differences between the different haplogroups do change depending on the STRs used. I am not trying to say that STR variance is precise. It isn't, but the more data you have you can improve precision.

Generally, what I've found is that the linear 36 STR (most of which are slower) and the 49 STR mixed speed marker calculation runs rarely change the positioning of haplogroups.

Most variance relationships between R1b haplogroups work well at 16 or 24 markers on 37 length haplotypes. M222 did flip-flop with Z255 for us on the low marker runs above, however, the notable exception is that U198 looks quite old (high variance compared to U106 or Z381) with the 37 length haplotypes.  However if you ratchet up the U198 analysis to 36 or 49 markers on 67 length haplotypes everything seems to fit back into place (younger than Z381.)

I just think it is the law of large numbers at work and the value of having more STR "experiments."

First off, I’m quite curious, how do you get variances that are so close to 1, or that are even greater than 1. Generally what I understand as variance is average mutations/marker, or per haplotype? Could you show me what the mean mutation rate per marker is for the 36 or 16 best linear duration, and if possible the standard deviation? Like I said the law of large number works when the overtly different markers behave as outliers, if one has overtly different markers in terms of mutation rate it doesn’t matter if you have 10, 100 or 500 STRs your estimates are going to be poor. Here just to show you an example of what I am saying:

Say one has a series of 10 STR with the following mutation rates(10-3)

10 7 8 7 8 9 50 20 3 6

The mean value would be 12.8, the standard deviation would be 13.79, this distribution would result in a poor estimate, because any mutations occurring in the marker with the mutation rate of 50 would overestimate the TMRCA largely, while any mutation occurring in the marker with mutation rate 3 would underestimate the TMRCA somewhat.

Now let’s say we increase our testing set to 37 markers with the following mutation rates(10-3)

10 5 7 9 20 50 74 3 4 8 22 34 96 8 5 6 23 87 56 43 54 32 43 89 43 5 6 8 9 7 34 56 47 44 32 51 69

The mean value would be 32.41, the standard deviation would be 27.1656(Granted it improved significantly from the 10 STR distribution), this distribution would results in mutation occurring in the markers with mutation rates of 96, 89, 50, 56 all overestimating the TMRCA largely, while a large portion of mutation found in all loci with mutation rates less than 32.41 would be underestimated.

Now say instead one chose a panel with the following set of markers, again mutation rates(10-3)

5 8 9 11 7 8 4 5 8 7 9 10

That is 12 markers, they have a mean mutation rate of 7.583333, and a standard deviation of  2.1087. This in turn is actually a really good distribution, because even if most mutation were in say the marker with the slowest mutation rate, they amount by which the TMRCA would be underestimated would be far smaller than the amount all mutations located in slow(i.e. with mutation rate less than 20) markers in the 37 markers sample set.

So as you can see, the law of large number can only do so much to try to harmonize the distribution, but if one has such a distribution were the numbers differ from one another in orders of magnitude, no matter how much numbers you keep adding, is never gonna get fixed. The other solution would be that adding a large amount of fast STRs, might drive the slow STRs to a minority position, where they would act as outliers, and hence, the more fast STRs one adds, the smaller the effect of the outliers. However, if one tries to measure TMRCA that are outside the linear range of those STRs, then the margins of errors are going to be huge, and the value would be greatly underestimated.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 16, 2012, 09:58:34 PM
Despite the lack of harmony among hobbyist researchers, what Vincent posted on this specific topis is supportive of positions Anatole Klyosov has taken where he uses the average rate for a set of STRs in his TMRCA calculations rather than applying each individually, which would be needed if you wanted to weight STRs.

Essentially in a nutshell this is the current thought of process of some of the folks in the hobbyist community:

1-We know that there might be loss of linearity and saturation of mutations in the long run if a haplogroup is older than x kyp.

2-Well it is safe to use it in haplogroup A, because I think haplogroup A isn’t older than x kyp.

3-I got a TMRCA y which is younger than x kyp, so it must be correct.

4-It has been crosschecked using different sets from FTDNA all of them yielded a TMRCA younger than x kyp for haplogroup A, so it must definitely be correct.

5-In the case of Klyosov, he disregards the observed mutation rates in father-son pairs, as he considers those to be statistically insignificant due to small sample sizes. However, this is one of those cases where instead of the theory adjusting to meet the practical results, the practical results are dismissed because they do not agree with the theory.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 12:24:49 AM
Despite the lack of harmony among hobbyist researchers, what Vincent posted on this specific topis is supportive of positions Anatole Klyosov has taken where he uses the average rate for a set of STRs in his TMRCA calculations rather than applying each individually, which would be needed if you wanted to weight STRs.

Essentially in a nutshell this is the current thought of process of some of the folks in the hobbyist community:

1-We know that there might be loss of linearity and saturation of mutations in the long run if a haplogroup is older than x kyp.

2-Well it is safe to use it in haplogroup A, because I think haplogroup A isn’t older than x kyp.

3-I got a TMRCA y which is younger than x kyp, so it must be correct.

4-It has been crosschecked using different sets from FTDNA all of them yielded a TMRCA younger than x kyp for haplogroup A, so it must definitely be correct.

5-In the case of Klyosov, he disregards the observed mutation rates in father-son pairs, as he considers those to be statistically insignificant due to small sample sizes. However, this is one of those cases where instead of the theory adjusting to meet the practical results, the practical results are dismissed because they do not agree with the theory.

There are disagreements about the phylogeny when going back to those "base" SNP in the "Out of Africa" discussion. Those are also much, much older haplogroups than R-M269.

We have much better knowledge of the SNP phylogeny in the R1b family. My example shows you SNPs within the known phylogeny of the L21 family. There is no circular reasoning in that. I'm not scaling anything to mutation rates.

In the case of the Out of Africa arguments or mutation rates in general, you might want to start another thread for those topics..

I just set this up to talk about STR diversity and variance and its usefulness or lack of usefulness.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 12:40:27 AM
I've actually played with "miccrosatellite choice" in the past, because of concern about your point.  I ran through the R-L21 file of long haplotypes and tried 12, 25, 37, 67 length haplotypes and after throwing out the non-multicopy non-null STRs, I would run variance calculations adding an STR or two or subtracting.   What I found was the variance relationships between the subclade of L21 were fairly stable when you start using above 15-20 STRs.

Generally, I find very little jostling of the relationships in R1b subclades when you start using 25 or so markers and get up to about 30 haplotypes.

Here is "test" run for you on R-L21's major subclades based on different sets of markers.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=0.99 (N=2590)
DF21_________:  Var=0.80 (N=116)
L513_________:  Var=0.75 (N=157)
Z253_________:  Var=0.61 (N=145)
M222_________:  Var=0.49 (N=540)
Z255_________:  Var=0.39 (N=102)

Relative variance with the 36 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=1.02  (N=2590)
DF21_________:  Var=0.73 (N=116)   
L513_________:  Var=0.64 (N=157)
Z253_________:  Var=0.60 (N=145)
M222_________:  Var=0.45 (N=540)
Z255_________:  Var=0.35 N=102)

Relative variance with the 24 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.95 (N=125)   
L513________:  Var=0.73 (N=166)
Z253________:  Var=0.64 (N=170)   
M222________:  Var=0.57 (N=734)
Z255________:  Var=0.45 (N=128)

Relative variance with the 16 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 37:
L21_________:  Var=0.95 (N=3234)
DF21________:  Var=0.92 (N=125)
L513________:  Var=0.64 (N=166)
Z253________:  Var=0.63 (N=170)
M222________:  Var=0.54 (N=734)
Z255________:  Var=0.43 (N=128)

* Linear durations greater than 7000 years according to Marko Heinila's analysis.


See how stable the order of the above haplogroup stays?  The percentage differences between the different haplogroups do change depending on the STRs used. I am not trying to say that STR variance is precise. It isn't, but the more data you have you can improve precision.

Generally, what I've found is that the linear 36 STR (most of which are slower) and the 49 STR mixed speed marker calculation runs rarely change the positioning of haplogroups.

Most variance relationships between R1b haplogroups work well at 16 or 24 markers on 37 length haplotypes. M222 did flip-flop with Z255 for us on the low marker runs above, however, the notable exception is that U198 looks quite old (high variance compared to U106 or Z381) with the 37 length haplotypes.  However if you ratchet up the U198 analysis to 36 or 49 markers on 67 length haplotypes everything seems to fit back into place (younger than Z381.)

I just think it is the law of large numbers at work and the value of having more STR "experiments."

First off, I’m quite curious, how do you get variances that are so close to 1, or that are even greater than 1. Generally what I understand as variance is average mutations/marker, or per haplotype?

I'm using standard variance of a population based on a sample (subset) of the population.  It is the same as the Excel VAR function.
http://en.wikipedia.org/wiki/Variance

I'm calculating the sum of the variance for the STR markers, which is pretty standard statistics. However, since those results are not easy to comprehend in their absolute form I divide every sum of the variance for every calculation by a standard population's sum of the variance. I'm using P312 = 1.0 as the standard although my data is a little old as far as P312 "all".  All I'm doing is rescaling the results to that base of 1.0. It does not change any of the percentage differences between calculations (haplogroups in the examples.)

Could you show me what the mean mutation rate per marker is for the 36 or 16 best linear duration, and if possible the standard deviation?
...
That is 12 markers
...
So as you can see, the law of large number can only do so much to try to harmonize the distribution, but if one has such a distribution were the numbers differ from one another in orders of magnitude, no matter how much numbers you keep adding, is never gonna get fixed. The other solution would be that adding a large amount of fast STRs, might drive the slow STRs to a minority position, where they would act as outliers, and hence, the more fast STRs one adds, the smaller the effect of the outliers. However, if one tries to measure TMRCA that are outside the linear range of those STRs, then the margins of errors are going to be huge, and the value would be greatly underestimated.

All STRs do not have to be the same speed to be aggregated for calculations. There is nothing to fix as long as long as the "expected" mutation rate for each STR does not change per group (haplogroup in my example) and as long as you are using the same STRs for each group compared.

I'm not using mutation rates at all. You would only need them if you want to calculate a TMRCA. I'm just calculating the relationships between groups.

I've tried to show you that changing the STRs used when you use good sized (25 or more) set of STRs and a good sized group of haplotypes doesn't change the relationships of the STR diversity between different groups of haplotypes (that are related.)  To me that is the law of large numbers in action.  You can come up with hypothetical problems, but with real data the concepts work.

Ultimately, STR variance really does have a relationship with age.  It can be argued that it is not completely linear or that it is linear for a limited duration, but I've shown you that whether use the researched "linear" markers (at least according to Heinila) or a combination of STRs, it makes little difference for haplogroups within the age of the R-M269 family.

Companies like FTDNA, a large number of academic scientists and hobbyist-scientists use STR variance, accepting their generally linear relationship with time. It works! Now you can disagree with mutation rates, but I'm not using any, so all you can say about STR variance is you don't trust Y DNA STRs. That's okay, but the scientific community, by and large, is using them.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 09:09:28 AM
Quote from: Jdean
... Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?   ....

Quote from: Mikewww
How often do SNPs occur?  Isn't it something like one per generation?   I've heard statements like that.  If it is some frequency like that, having 3 SNPs in 3 generations that align is not a big deal.

The issue is just finding all of those SNPs, or at least the ones that don't go extinct. We shouldn't look at SNPs as all that rare.

Ken quoted an estimated figure of 1 SNP per generation on average but I was told this would work out as 1/2 because only 1/2 the Y chromosome is readable or something, got to be honest I got lost in the conversation at that point :)

... If only 1/2 an SNP occurs per generation, then it would take only 6 generations to get 3, but these are just averages anyway.

This is just FYI to catalog this data item.  I've found more on the occurrence of Y DNA SNPs. The following is from Vince Tilroe, an ISOGG representative, on Rootsweb.
Quote from: Vince Tilroe
if the 3x10^-8 SNPs per site per generation approximation holds (implying 0.78 SNPs per generation across the ~ 26,000,000 base-pair coverage of the sequence-able Y-chromosome)
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334603956

He has calculated the rate of .78 SNPs per father-son transmission, so about 3/4th.  I don't know how much of the Y chromosome can now be scanned by testing. I would think its mostly just a matter of cost.

I asked Vince where he got his SNPs per site per generation data and he cited this study.
"Human Y Chromosome Base-Substitution Mutation Rate Measured by Direct Sequencing in a Deep-Rooting Pedigree" by Xue, 2009. http://download.cell.com/current-biology/pdf/PIIS0960982209014547.pdf?intermediate=true


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 17, 2012, 10:09:55 AM
All STRs do not have to be the same speed to be aggregated for calculations. There is nothing to fix as long as long as the "expected" mutation rate for each STR does not change per group (haplogroup in my example) and as long as you are using the same STRs for each group compared.

I'm not using mutation rates at all. You would only need them if you want to calculate a TMRCA. I'm just calculating the relationships between groups.

I've tried to show you that changing the STRs used when you use good sized (25 or more) set of STRs and a good sized group of haplotypes doesn't change the relationships of the STR diversity between different groups of haplotypes (that are related.)  To me that is the law of large numbers in action.  You can come up with hypothetical problems, but with real data the concepts work.

Ultimately, STR variance really does have a relationship with age.  It can be argued that it is not completely linear or that it is linear for a limited duration, but I've shown you that whether use the researched "linear" markers (at least according to Heinila) or a combination of STRs, it makes little difference for haplogroups within the age of the R-M269 family.
 

We are going in circles, it seems to me like this is a never ending discussion. Ok, let’s reiterate some points:

1-You said that you are not using mutation rates, because you are not interested in calculating the TMRCA. Well, I’m not referring to you specifically, but to the bunch of other people that do use mutation rates, and mixed(mutation wise) STR samples to calculate TMRCA. After all, you mention not too long ago, that R1b-M269 was between 4-8 kybp following estimates by Heinila, Klyosov, etc. Moreover, you accept that there is a relationship with age, so while, you mention that you do not want to get into the details of TMRCA calculation, you would still take the mixed STR sets as indicative  of age.

2-I was providing hypothetical examples, just to show the public how the law of large number works. The other thing, you seem to forget, is that there is direct (nonlinear) relationship between mutation rates, and number of mutations observed in any locus. So when using large sample of mixed(mutation wise) STRs, there is a very real possibility of a sample displaying a relatively higher variance compared to a different sample while having that variance mostly accumulated in fast mutating STR markers. Like I said a higher variance in the slower/ more stable loci is a better indicator than the overall variance. What I’m talking about, is not just calculating the variance of a haplogroup using 37, 67, or 25 STRs. I’m talking about comparing the variance of Haplogroup Y in population X, to the variance of the same haplogroup in population Z.


Companies like FTDNA, a large number of academic scientists and hobbyist-scientists use STR variance, accepting their generally linear relationship with time. It works! Now you can disagree with mutation rates, but I'm not using any, so all you can say about STR variance is you don't trust Y DNA STRs. That's okay, but the scientific community, by and large, is using them.

Well, that’s quite an oversimplification there.  What makes you think that I don’t trust variance in loci? Have I ever said anything where I have specifically said that STR variance is useless? I said I disagree with the concept of accepting the linear relationship of certain STRs with time. I said I think loci should be carefully selected in terms of purpose of the test, mutation rate, etc. There is quite some difference between saying: “this car ought to be fixed, and cars don’t work at all”.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 17, 2012, 01:34:08 PM
Quote from: Jdean
... Would L459 & Z245 cause problems for "Surfing the Wave" with respects to L21 ?   ....

Quote from: Mikewww
How often do SNPs occur?  Isn't it something like one per generation?   I've heard statements like that.  If it is some frequency like that, having 3 SNPs in 3 generations that align is not a big deal.

The issue is just finding all of those SNPs, or at least the ones that don't go extinct. We shouldn't look at SNPs as all that rare.

Ken quoted an estimated figure of 1 SNP per generation on average but I was told this would work out as 1/2 because only 1/2 the Y chromosome is readable or something, got to be honest I got lost in the conversation at that point :)

... If only 1/2 an SNP occurs per generation, then it would take only 6 generations to get 3, but these are just averages anyway.

This is just FYI to catalog this data item.  I've found more on the occurrence of Y DNA SNPs. The following is from Vince Tilroe, an ISOGG representative, on Rootsweb.
Quote from: Vince Tilroe
if the 3x10^-8 SNPs per site per generation approximation holds (implying 0.78 SNPs per generation across the ~ 26,000,000 base-pair coverage of the sequence-able Y-chromosome)
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334603956

He has calculated the rate of .78 SNPs per father-son transmission, so about 3/4th.  I don't know how much of the Y chromosome can now be scanned by testing. I would think its mostly just a matter of cost.

I asked Vince where he got his SNPs per site per generation data and he cited this study.
"Human Y Chromosome Base-Substitution Mutation Rate Measured by Direct Sequencing in a Deep-Rooting Pedigree" by Xue, 2009. http://download.cell.com/current-biology/pdf/PIIS0960982209014547.pdf?intermediate=true


Yes I noticed 0.78 stat mentioned on rootsweb as well.

Thanks for the link, I tried ploughing through it and only had to contend with three interruptions (one about 1 1/2 hours long) whilst trying to fathom it, just as well we have Vince at hand for the interpretation :)

Would I be write in guessing that the mutation rate they came up with needs to be multiplied out depending on what length of DNA is being investigated, it seemed very small ?

Presumably in order to make more use of this information we also need to know what % the Y chromosome is being covered by 1000 Genome. I know it's a lot bigger than WTY but presumably it's still a fraction, but how big ?


I was looking the other day at the spreadsheet detailing kits in the 1000 Genome project with singleton mutations, which I found a bit odd. There was no mention of U152 or P312 (which presumably was to do with lack of data on the part of the complier ?) but the nos. being reported for L21 were much lower than that of the other haplogroups.

https://docs.google.com/spreadsheet/ccc?key=0Au_yP14v4kTIdDZUS0h4M1hzckRYUTM0ME1ndFJUT3c&hl=en_US#gid=0 (https://docs.google.com/spreadsheet/ccc?key=0Au_yP14v4kTIdDZUS0h4M1hzckRYUTM0ME1ndFJUT3c&hl=en_US#gid=0)

either way if the SNP mutation rate stat is accurate it would appear 1000 Genome is still quite a long way of whole genome analysis.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 05:06:45 PM
1-You said that you are not using mutation rates, because you are not interested in calculating the TMRCA. Well, I’m not referring to you specifically, but to the bunch of other people that do use mutation rates, and mixed(mutation wise) STR samples to calculate TMRCA. After all, you mention not too long ago, that R1b-M269 was between 4-8 kybp following estimates by Heinila, Klyosov, etc. Moreover, you accept that there is a relationship with age, so while, you mention that you do not want to get into the details of TMRCA calculation, you would still take the mixed STR sets as indicative  of age.

I think there is a general linear relationship between the variance of non-multicopy/non-null STRs and the number of generations (which infers time) to the initial time of expansion for a related group of people (that have a common ancestor.)

I'm interested in TMRCAs as well but I think that is a more complex topic and there is definitely a disagreement in the academic community and to some degree in the hobbyist community about whether to use evolutionary rates or germ-line rates.  The hobbyist community, at least the scientists in it, seem heavily inclined towards germ-line rates, but I don't want to try to argue that as there is a general stale-mate in those arguments.

The way I look at it, I'll just calculation the variance of one haplogroup relative to another and then you tell me what mutation rates you want to use and we'll slide the whole scale (when multiplying to get years) whichever direction you want for the discussion.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 05:27:05 PM
2-I was providing hypothetical examples, just to show the public how the law of large number works. The other thing, you seem to forget, is that there is direct (nonlinear) relationship between mutation rates, and number of mutations observed in any locus. So when using large sample of mixed(mutation wise) STRs, there is a very real possibility of a sample displaying a relatively higher variance compared to a different sample while having that variance mostly accumulated in fast mutating STR markers. Like I said a higher variance in the slower/ more stable loci is a better indicator than the overall variance. What I’m talking about, is not just calculating the variance of a haplogroup using 37, 67, or 25 STRs. I’m talking about comparing the variance of Haplogroup Y in population X, to the variance of the same haplogroup in population Z.

Let me say that anything is possible in a particular situation, however, my experience backs up with folks like Ken Nordtvedt say and do. They use mixed speed markers to calculate variance and TMRCA's.  The explanation from Ken is that there are greater accuracy benefits from including more STRs than there is from reducing the number of STR "experients."

You are interested in are talking about comparing the same haplogroup in two different populations.  This would be similar to geographic comparison.  I think this can be done for analysis but there will obviously be greater risk since the haplogroup in the first geography to compare may be sourced from two or three different geographies while the haplogroup in the second geography may be from a single common ancestor.  I don't this invalidates STR variance or diversity, it just means that the application has additional challenges convoluting the interpretation of the results.

Companies like FTDNA, a large number of academic scientists and hobbyist-scientists use STR variance, accepting their generally linear relationship with time. It works! Now you can disagree with mutation rates, but I'm not using any, so all you can say about STR variance is you don't trust Y DNA STRs. That's okay, but the scientific community, by and large, is using them.

Quote from: JeanL
Well, that’s quite an oversimplification there.  What makes you think that I don’t trust variance in loci? Have I ever said anything where I have specifically said that STR variance is useless? I said I disagree with the concept of accepting the linear relationship of certain STRs with time. I said I think loci should be carefully selected in terms of purpose of the test, mutation rate, etc. There is quite some difference between saying: “this car ought to be fixed, and cars don’t work at all”.

I'm all for using STRs smartly too.  For instance, per Ken Nordtvedt and Vince Vizachero, I throw out multi-copy STRs. I also throw out STRs that have null values.  I am also saying that I've tried using just highly linear STRs as well as mixed speed STRs and the results about the same as long as you get the number STRs up into a good high range. Mixed STRs groups also seem to be a little more precise than when faster markers are thrown out.  Ken says he has demonstrated this in simulations.  I have observed on that is neutral to slightly positive on what Ken says.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 17, 2012, 05:42:59 PM
I've actually played with "miccrosatellite choice" in the past, because of concern about your point.  I ran through the R-L21 file of long haplotypes and tried 12, 25, 37, 67 length haplotypes and after throwing out the non-multicopy non-null STRs, I would run variance calculations adding an STR or two or subtracting.   What I found was the variance relationships between the subclade of L21 were fairly stable when you start using above 15-20 STRs.

Generally, I find very little jostling of the relationships in R1b subclades when you start using 25 or so markers and get up to about 30 haplotypes.

Here is "test" run for you on R-L21's major subclades based on different sets of markers.

I copied the data from that reply and I'll add some add comparison with Z196 and U152 also.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=0.99 (N=2590)
DF21_________:  Var=0.80 (N=116)
L513_________:  Var=0.75 (N=157)
Z253_________:  Var=0.61 (N=145)
M222_________:  Var=0.49 (N=540)
Z255_________:  Var=0.39 (N=102)

Relative variance with the 36 best* linear duration, non-multicopy, non-null STRs from FTDNA's 1st 67:
L21__________:  Var=1.02  (N=2590)
DF21_________:  Var=0.73 (N=116)   
L513_________:  Var=0.64 (N=157)
Z253_________:  Var=0.60 (N=145)
M222_________:  Var=0.45 (N=540)
Z255_________:  Var=0.35 N=102)

* Linear durations greater than 7000 years according to Marko Heinila's analysis.


Immediately below are the big subclades of the Z196 family along with M153, the "Basque" diagnostic marker.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs
Z196_________:  Var=1.00 (N=285)   
Z196-1418(NS):  Var=0.92 (N=97)   
SRY2627______:  Var=0.83 (N=151)   
M153_________:  Var=0.31 (N=7)

Relative variance with the 36 linear duration, non-multicopy, non-null STRs
Z196_________:  Var=1.02 (N=285)   
Z196-1418(NS):  Var=0.86  (N=97)   
SRY2627______:  Var=0.76  (N=151)   
M153_________:  Var=0.21 (N=7)   

My M153 data is very limited. I also included everyone with 67 markers in the 1418 North-South cluster which encompasses M153.  This cluster may be marked by Z209 and is quite old.

Below is U152 and its large subclades.  I think it is still the oldest subclade of P312 but DF27 may some day challenge it.

Relative variance with the 49 mixed speed, non-multicopy, non-null STRs
U152________:  Var=1.07 (N=806)
L2__________:  Var=1.02 (N=287)
Z56_________:  Var=0.97 (N=32)   
Z36_________:  Var=0.92 (N=34)   

Relative variance with the 36 linear duration, non-multicopy, non-null STRs
U152________:  Var=0.97 (N=520)
L2__________:  Var=0.94 (N=287)
Z36_________:  Var=0.89 (N=34)   
Z56_________:  Var=0.87 (N=32)   


Z56 and Z36 did flip-flop on me.  I wouldn't interpret this as significant.  They are just both about the same age.

Generally, I observe there isn't much difference in the relationships if you de-select markers that Heinila calculates don't have as high a confidence of being linear for >7k years.

umm....   I still think U152, L2, Z36, Z56, L21, DF23, Z196 (and I guess now DF27 and Z209) all must have expanded fairly rapidly.  Not everyone goes for this, but I think if we were doing family surname project about 400-500 years after P312 started expanding, we'd include them all in the same cluster.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 17, 2012, 08:27:19 PM

I think there is a general linear relationship between the variance of non-multicopy/non-null STRs and the number of generations (which infers time) to the initial time of expansion for a related group of people (that have a common ancestor.)

I'm interested in TMRCAs as well but I think that is a more complex topic and there is definitely a disagreement in the academic community and to some degree in the hobbyist community about whether to use evolutionary rates or germ-line rates.  The hobbyist community, at least the scientists in it, seem heavily inclined towards germ-line rates, but I don't want to try to argue that as there is a general stale-mate in those arguments.

The way I look at it, I'll just calculation the variance of one haplogroup relative to another and then you tell me what mutation rates you want to use and we'll slide the whole scale (when multiplying to get years) whichever direction you want for the discussion.

I think our main disagreement comes in the part I highlighted. You think there is a linear relationship between TMRCA and variance regardless of the STRs used, as long as they aren’t multicopy/null STRs. I don’t think so, and I already explained my reasons. So I propose we wait and see what comes up, hopefully we’ll get some good aDNA studies soon. I think the samples from SJAPL and Longar all dating to the 4500-5000 ybp in the fringe of the Basque Country would probably be tested for Y-DNA soon; I mean, they were already tested for lactose tolerance, so it seems Y-DNA will follow.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 19, 2012, 05:22:13 PM
I think there is a general linear relationship between the variance of non-multicopy/non-null STRs and the number of generations (which infers time) to the initial time of expansion for a related group of people (that have a common ancestor.)
I think our main disagreement comes in the part I highlighted. You think there is a linear relationship between TMRCA and variance regardless of the STRs used, as long as they aren’t multicopy/null STRs. I don’t think so, and I already explained my reasons. So I propose we wait and see what comes up, hopefully we’ll get some good aDNA studies soon. I think the samples from SJAPL and Longar all dating to the 4500-5000 ybp in the fringe of the Basque Country would probably be tested for Y-DNA soon; I mean, they were already tested for lactose tolerance, so it seems Y-DNA will follow.

Please be cautious in reading my thoughts which is what you did on the sentence I highlighted (emboldened.)

I went up to my original statement above and underlined the word "general" which you omitted when paraphrasing me.  I'm NOT asserting that every non-multicopy non-null STR has a strict linear relationship with number of generations, which is related to time.  I think that, generally speaking, the STRs that FTDNA tests for in the first 111 markers (less the multicopy/null) have a general relationship with number of generations. In aggregate, statistical use of these markers can provide improved precision to the relationship with time.

Think about it...  FTDNA, and science in general, probably have tested a broader range of STRs.  The only reasons they would select these STRs is because they have some value in measuring the "closeness" in relationship, which is a function of generations.

Some STRs are probably better than others for different timeframes, but the problem is we don't know which are better and which are worse.  Only Heinila has really attempted any kind of thorough analysis that I can find.  The best way to address this problem is with large numbers and statistics. This is why I value it when folks like Ken Nordtvedt say they run simulations and the benefits of including more STRs, rather than less, outweigh the negatives.

You can wait more ancient DNA, but I'm not. Here's why - We do NOT have adequate data (long haplotypes and SNPs) across the board to do the proper cross-sectionally representative random sampling.  We don't have this with tens of thousands of haplotypes of modern people. How long do you think it'll be before we have that much ancient DNA?  I'm NOT saying ancient DNA is useless. It is very valuable, but its just another piece of data.

Do you think FTDNA should drop their Tip TMRCA calculator?  Should the academics, i.e. Busby, Barlaresque, Myres, etc. quit using STR diversity to estimate time?

I agree that improvements are needed.

                


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 19, 2012, 08:00:47 PM
I went up to my original statement above and underlined the word "general" which you omitted when paraphrasing me.  I'm NOT asserting that every non-multicopy non-null STR has a strict linear relationship with number of generations, which is related to time.  I think that, generally speaking, the STRs that FTDNA tests for in the first 111 markers (less the multicopy/null) have a general relationship with number of generations. In aggregate, statistical use of these markers can provide improved precision to the relationship with time.

Think about it...  FTDNA, and science in general, probably have tested a broader range of STRs.  The only reasons they would select these STRs is because they have some value in measuring the "closeness" in relationship, which is a function of generations.

Like I said before, we can go on forever. A large number of markers are good in determining close relationship, i.e. if one gets a match on a 12 STR set it could mean that there is a common ancestry in anywhere from a recent ancestor to a very distant one. Now if one gets a match on the 37, 67 or 111 STRs set is another story, given that if two people's haplotypes are only 2 mutations apart on the 111 STRs set, one could definitely estimate the time of common ancestry for those two people. That works on an individual level, what I’m talking about is population genetics, where that doesn’t work. The more I think about, the more I realize how important things like the choice of microsatellite in age estimates are being overlooked in population studies. 

Some STRs are probably better than others for different timeframes, but the problem is we don't know which are better and which are worse.  Only Heinila has really attempted any kind of thorough analysis that I can find.  The best way to address this problem is with large numbers and statistics. This is why I value it when folks like Ken Nordtvedt say they run simulations and the benefits of including more STRs, rather than less, outweigh the negatives.

The best way to address that problem is to actually do an experiment using empirically measured mutation rates, and seeing if micro-satellite choice has an effect on age estimates. However to see any considerable effects one must look at large time frames.  Likely I doubt the loss of linearity would have any effect in folks that share common ancestry in the last 1000 years; now, when we are working to determine the age of haplogroups that could presumably be older than 5000 ybp, then it is definitely a big issue. This is something that large numbers cannot fix, as what we see is that certain STRs are only good to measure certain time spans. So to try to measure TMRCA that are older than 5000 ybp with STRs that lose their linearity in less than 5000 ybp, would be like measuring a mile with a 6 inch ruler. Yes, I applaud Ken Nordvedt for taking the time to run the simulations. Now is there any practical example out there where the simulations could be tested? I mean Dr.Nordvedt might have ran some simulations, and perhaps he got that using 37 STRs instead of 12 STRs was a better predictor of TMRCA for maybe a set of people that descend from a guy who lived in say 1700. That doesn’t mean that one could extrapolate those results and use them in a time span of 5000+ ybp.


You can wait more ancient DNA, but I'm not. Here's why - We do NOT have adequate data (long haplotypes and SNPs) across the board to do the proper cross-sectionally representative random sampling.  We don't have this with tens of thousands of haplotypes of modern people. How long do you think it'll be before we have that much ancient DNA?  I'm NOT saying ancient DNA is useless. It is very valuable, but its just another piece of data.

Do you think FTDNA should drop their Tip TMRCA calculator?  Should the academics, i.e. Busby, Barlaresque, Myres, etc. quit using STR diversity to estimate time?

I agree that improvements are needed.

The only thing I would say is that we should start exploring the effects of microsatellite choice in age estimates, that we shouldn’t neglect the effects of loss of linearity, saturations, and non-constant  mutation rates. I think Busby et al(2012) already noticed that, so I don’t think his team should quit using STR diversity, on the contrary, they should work on it even more, to explore the effects using large STRs sets.

               


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 01:22:24 PM
....  Anyway the problem will be resolved by the aDNA and all turns around the age of these haplogroups. I remember to you all that many markers get the same value with hg.Q, and the separation happened at least 20,000 years ago.

Remember:
1)   mutations around the modal
2)   convergence to the modal as time passes
3)   sometimes a value goes for the tangent

I've never understood the relevance of your three points other then they are your objections.  The key is are they well founded objections or baseless?  

Do you have any statistical analysis that demonstrates the value of your objections?  or are these just concerns that you feel?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 22, 2012, 01:28:37 PM
I'll begin my comments with a simple presentation of the observed properties of Y STR's and datasets.  Hopefully these comments will be reviewed and commented on to the point we can all agree what the fundamental properites of the process we are trying to model are.

First a definition from Schaums outline on random processes:  A random process is the mathematical model of an empirical process whose development is governed by probability laws.

Observation #1:  A probability space for the STR mutational process can be created by taking a set of, say 69,  Y STR's, summing the mutational rates to determine the probability of a mutation and then defining the P (no mutations) = 1 - Sum.  Given that we also observe a dynamic range exceeding 100 for the mutational rates, we conclude that the Y STR is not a random process with equally likely events.

Observation #2: In data sets Y STR single step mutations are most probable.  Multi-step occur some 5% to 10% of the time with step changes up to 4 or more.

Observation #3: For haplogroup R1b it is observed that Y STR frequency distributions over the first 69 FtDNA dys loci (as presented at
www.freepages.genealogy.rootsweb.ancestry.com) includes 95% of the entry values for only 3 values; modal and +/_ 1 from modal for some 59 of the 69 entries.  This suggests that if a mutation to a non-modal value has occurred , then the most probable next mutation is back to the modal.  This also suggests that as time accumulates many hidden mutations may have occurred.

Observation #4:  Most data sets, especially family sets are highly correlated.  Many entries have apparent mutations that are all derived from a single ancestor.  (See Kerchners family analysis and his definition of "unique mutational events").  This effect will cause an overestimation of diversity.

Observation #5:  Within a data set all descending from a common ancestor such as the Ian Cam of Clan Gregor, there is a wide range in the number of mutations observed.  In the case of Clan Gregor from 0 to 7 within the data set.  (note the apparent direct descendant of the founder appears to have not had a mutation in his family line for almost  700 years).  There appears to be some correlation as to when the line separated and age but as shown above its not all that direct.

A parting comment re: technique.  I use the equation developed by Stumpf and Goldstein , 2 March 2001, SCIENCE, "Genealogical and Evolutionary Inference with the Human Y Chromosome", p. 1740.  I only count mutations, I do not use ASD/variance.  In their analysis, they point out that you calculate ASD for one locus and then average over many to increase the accuracy of the observation.  This implies that using Y STR's with the same or similar mutation rate will improve precision and increase accuracy.




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 08:17:59 PM
... Observation #1:  A probability space for the STR mutational process can be created by taking a set of, say 69,  Y STR's, summing the mutational rates to determine the probability of a mutation and then defining the P (no mutations) = 1 - Sum.  Given that we also observe a dynamic range exceeding 100 for the mutational rates, we conclude that the Y STR is not a random process with equally likely events....

You are mixing your observations with your conclusions. 

Are you saying that it is required for STR mutations to be perfectly random for them to be useful?  What in life is perfect? I can think of only one.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 08:25:59 PM
Observation #2: In data sets Y STR single step mutations are most probable.  Multi-step occur some 5% to 10% of the time with step changes up to 4 or more....

That seems very plausible, but I would think different STRs would have different properties. I don't know.  Do you have a study that has determined the distribution of single step and multi-step mutations by STR or on average?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 08:36:41 PM
Observation #3: For haplogroup R1b it is observed that Y STR frequency distributions over the first 69 FtDNA dys loci (as presented at
www.freepages.genealogy.rootsweb.ancestry.com) includes 95% of the entry values for only 3 values; modal and +/_ 1 from modal for some 59 of the 69 entries.  

Why choose R1b?   Why not R1b-L21 or R1b-U106 or R1b-L226. Each has different modals, sometimes dramatically.   For that matter, why not use R1?

Are you saying that Y STRs have different expected properties depending on  the haplogroup?   Y SNPs don't generally or necessarily have any biological connection to Y STRs.  I've asked this before (on this thread even) but I have no reason to think that the expected property of one Y STR is different by haplogroup. We are all homo sapiens sapiens and are much more alike than different.

This suggests that if a mutation to a non-modal value has occurred , then the most probable next mutation is back to the modal.

Why? We use modal haplotypes as proxies for ancestral haplotypes, but mode is really just statistical concept.  Naturally, we might expect mutations within STRs that are primarly single-step focused would revolve around the mode. That's essentially circular reasoning or perhaps I should say self-defining.

This also suggests that as time accumulates many hidden mutations may have occurred.

Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 08:57:45 PM
... Observation #4:  Most data sets, especially family sets are highly correlated.  Many entries have apparent mutations that are all derived from a single ancestor.  (See Kerchners family analysis and his definition of "unique mutational events").  This effect will cause an overestimation of diversity.

I agree that our DNA project data is not representative. I think most academic studies try to guard against this but I don't know if they are doing a good job.

I think you mean this will cause an underestimation of diversity, right?  Fortunately, interclade calculations effectively eliminate this concern.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 22, 2012, 09:02:46 PM
Observation #5:  Within a data set all descending from a common ancestor such as the Ian Cam of Clan Gregor, there is a wide range in the number of mutations observed.  In the case of Clan Gregor from 0 to 7 within the data set.  (note the apparent direct descendant of the founder appears to have not had a mutation in his family line for almost  700 years).  There appears to be some correlation as to when the line separated and age but as shown above its not all that direct.

You are assuming in the data set you are citing that everyone has a common ancestor.  Maybe so, but how do we know?  How long ago did the Most Recent Common Ancestor live?  It sounds like at least 700 years. That's a long time for genealogical records and zero NPE's to happen.  I don't know  the situation for this group, but since you bring it up as point of evidence, how do we know if this group has the common ancestor it is thought to have?  This is not defined by a series of SNPs, is it?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 06:34:21 AM
... Observation #1:  A probability space for the STR mutational process can be created by taking a set of, say 69,  Y STR's, summing the mutational rates to determine the probability of a mutation and then defining the P (no mutations) = 1 - Sum.  Given that we also observe a dynamic range exceeding 100 for the mutational rates, we conclude that the Y STR is not a random process with equally likely events....

You are mixing your observations with your conclusions. 

Are you saying that it is required for STR mutations to be perfectly random for them to be useful?  What in life is perfect? I can think of only one.


  These observations were, hopefully, meant to be rhetorical.  I was hoping that we could discuss whether these observations are generally true and then try to determine the best method for modelling the mutational process.  I should have also pointed out that the mutational process appears to be a linear, independent process, which suggests that probabilities of events can be multiplied.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 06:39:09 AM
Observation #2: In data sets Y STR single step mutations are most probable.  Multi-step occur some 5% to 10% of the time with step changes up to 4 or more....

That seems very plausible, but I would think different STRs would have different properties. I don't know.  Do you have a study that has determined the distribution of single step and multi-step mutations by STR or on average?

The only work I am aware of is a work in progress(?) by Charles Kirchner.  The only hiccup there was he wasn't keeping any track of multiple step mutations, just single step.  He was/is trying to determine average mutational rates for different sets of FtDNA dys loci.  If you look at the rootsweb Y STR frequency tables, it is evident that some of the outlier mutations were multi-step.  I've also heard John Chandler espouse this view, but I don't know if he's published any data?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 06:54:46 AM
Observation #3: For haplogroup R1b it is observed that Y STR frequency distributions over the first 69 FtDNA dys loci (as presented at
www.freepages.genealogy.rootsweb.ancestry.com) includes 95% of the entry values for only 3 values; modal and +/_ 1 from modal for some 59 of the 69 entries.  

Why choose R1b?   Why not R1b-L21 or R1b-U106 or R1b-L226. Each has different modals, sometimes dramatically.   For that matter, why not use R1?

Are you saying that Y STRs have different expected properties depending on  the haplogroup?   Y SNPs don't generally or necessarily have any biological connection to Y STRs.  I've asked this before (on this thread even) but I have no reason to think that the expected property of one Y STR is different by haplogroup. We are all homo sapiens sapiens and are much more alike than different.
It appears that mutation rate depends on the modal value for some STR's. in the table I referenced which includes data for 7 Hgs (not only R1b).  DYS 388 is quite different for I1 and J2.  They have a higher modal value, 14 and 15 respectively.
This suggests that if a mutation to a non-modal value has occurred , then the most probable next mutation is back to the modal.

Why? We use modal haplotypes as proxies for ancestral haplotypes, but mode is really just statistical concept.  Naturally, we might expect mutations within STRs that are primarly single-step focused would revolve around the mode. That's essentially circular reasoning or perhaps I should say self-defining.
I do not have a lot of confidence in the modal concept personally.  In the case of disasters it can lead to erroneous comments about modals for a Hg.  In the data set referenced, the data shows that the modal +/- 1 are the most frequent values observed.  If 95% or more of the entries have these values and if multisteps occur at the 5% or greater rate, than my observation stands.
This also suggests that as time accumulates many hidden mutations may have occurred.

Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.
.  That comment may be true for a DYS loci like CDYa,b where over a few thousand year, many mutations will have occurred.  Most DYS loci have very few mutations relative to CDYa,b and a mutation is a low probability event.  But, the point is that there are no range of values for most dys loci, unlike CDY a,b.  If a mutation from the modal has occurred, then the most probable next event is a mutation back to the modal.  Again, this can't be modelled by a random walk process, which I believe assumes that the process is unbounded?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 07:04:22 AM
... Observation #4:  Most data sets, especially family sets are highly correlated.  Many entries have apparent mutations that are all derived from a single ancestor.  (See Kerchners family analysis and his definition of "unique mutational events").  This effect will cause an overestimation of diversity.

I agree that our DNA project data is not representative. I think most academic studies try to guard against this but I don't know if they are doing a good job.

I think you mean this will cause an underestimation of diversity, right?  Fortunately, interclade calculations effectively eliminate this concern.

In Kerchners case, there are 14 apparent mutations and only 8 unique mutational event.  If you consider all 14 as real mutations you overestimate diversity, as I said, i.e. you will estimate a TMRCA older than is real.

I believe interclade calculations, like most of the Variance calculations only consider coalescent time, the time to the build-up of the population, it doesn't penetrate the "disaster" and give real TMRCA.  So, the value of the interclade estimate will be dependent on the population history under examination.  I can certainly stand correction if this observation is incorrect?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 07:13:01 AM
I have tried to reply to your questions as honestly as I can.  I presented my comments in this format, because I wanted to explore the assumptions built into the calculations presented for Variance/diversity.

It is critical that the assumptions represent what we observe in the data generated to date, whether it has been published as a study or is just a dataset.

The ASD/Variance model was developed by Goldstein, et.al., and modified by Nordtvedt and others.  At the time of development, not much data existed to verify the assumptions.

We will never have enought data, but sufficient data has been collected, both germ-line and family data sets to make observations about the "real" properties of the Y STR mutational process.

I do not believe your subject question can ever be rationally answered until these properties are understood and agreed upon and a model created using these properties.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 11:29:43 AM
.... We will never have enought data, but sufficient data has been collected, both germ-line and family data sets to make observations about the "real" properties of the Y STR mutational process.

I agree we don't have enough data on the real properties of these Y STRs.

I do not believe your subject question can ever be rationally answered until these properties are understood and agreed upon and a model created using these properties.

I agree we can't prove anything beyond a reasonable doubt but I think there have been enough simulation runs, and the statistical modeling is improving so we are getting some useful results.  My thinking is that something is better than nothing, but we should have no illusions of final answers.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 11:33:49 AM
Quote from: Mikewww
Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.
.  That comment may be true for a DYS loci like CDYa,b where over a few thousand year, many mutations will have occurred.  Most DYS loci have very few mutations relative to CDYa,b and a mutation is a low probability event.  But, the point is that there are no range of values for most dys loci, unlike CDY a,b.  If a mutation from the modal has occurred, then the most probable next event is a mutation back to the modal. Again, this can't be modelled by a random walk process, which I believe assumes that the process is unbounded?
STR diversity is used in study after study but I don't know of any study that says a back mutation towards the modal is more probable than another mutation away from it.   The exception is an STR that is at the high end of the full allele range from an absolute STR count.... which I interpret as STR counts in the 30's or approaching the 30's, primarily.

I do calculate variance using only the linear markers (according to Heinila) as well as mixed but I don't really see that it makes much difference, for R1b subclades anyway.  I don't want to misrepresent any of this, though. I don't think we have a lot of precision.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 12:01:30 PM
Quote from: Mikewww
Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.
.  That comment may be true for a DYS loci like CDYa,b where over a few thousand year, many mutations will have occurred.  Most DYS loci have very few mutations relative to CDYa,b and a mutation is a low probability event.  But, the point is that there are no range of values for most dys loci, unlike CDY a,b.  If a mutation from the modal has occurred, then the most probable next event is a mutation back to the modal. Again, this can't be modelled by a random walk process, which I believe assumes that the process is unbounded?
STR diversity is used in study after study but I don't know of any study that says a back mutation towards the modal is more probable than another mutation away from it.   The exception is an STR that is at the high end of the full allele range from an absolute STR count.... which I interpret as STR counts in the 30's or approaching the 30's, primarily.

I do calculate variance using only the linear markers (according to Heinila) as well as mixed but I don't really see that it makes much difference, for R1b subclades anyway.  I don't want to misrepresent any of this, though. I don't think we have a lot of precision.

I refer again to the only significant data set on Y STR frequencies at:  Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.  The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  Thats why he insists on including the faster mutators whose path through time may more resemble a random walk until they saturate ( hit an upper or lower bound) at which time they bounce back and forth within the set of values permissible for these dys loci.  Unless a multistep occurs, it appears that most (59 out of 67) act the same way.

In the table I reference for dys loci 393 there are 23k samples with the range of values of 13 +/-1 for R1b  The data is similar for all seven Hg's at 393. (95% or more of the values are modal +/-1).  So are 390,19, 391,  and so on.

I believe that TMRCA estimates and diversity are underestimated.

 I showed how unique mutation event add mutations, i.e. variance/diversity.  So, what I have to show is that there is something in the analysis process that decreases variance across a wide set of dys loci, I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 12:21:52 PM
...
I believe interclade calculations, like most of the Variance calculations only consider coalescent time, the time to the build-up of the population, it doesn't penetrate the "disaster" and give real TMRCA.  So, the value of the interclade estimate will be dependent on the population history under examination.  I can certainly stand correction if this observation is incorrect?

We may be looking for two different things.  I'm generally not that interested in who the first person born with a particular SNP was or where he was.  That is a very difficult thing to ever find out --- it would just about take the father without the SNP and the son with the new SNP buried at the same grave site.

I'm generally intending to try to understand populations expansions and movements. In that case, I don't really care if there was a disaster bottleneck (although I think that is an interesting topic in and of itself) or if the time of coalesence were the actual first people with the SNP.

What's important is the coalescence or "coming together backward in time" of the STR diversity to a GD=0. That's the approximate time of the most recent common ancestor. It's just an approximation though.  We know the SNP can't be any younger than this time.

The cool thing about interclade calculations is they filter out the bias that intraclade calculations have towards the largest sub-populations samples (since intraclades are just averages.) If the two clades in an interclade calculation are of roughly the same age the precision can be relatively great. Of course we have to know they are two separate clades. The phylogenetic tree of SNPs provides the framework.

If we look at a number of interclade calculations in context of each other, we are effectively cornering in the upper and lower bounds of the subclades. http://tech.groups.yahoo.com/group/R-P312Project/files/Haplogroup_Timeline_R-L11_Subclades.gif   The chart is based on Chandler/Little mutation rates so you could argue the years should be rescaled but the relative nature of L11 subclades won't change. The methodology is Nordtvedt's.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 12:45:54 PM
...
I refer again to the only significant data set on Y STR frequencies at: Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.

I don't know how old the data is for the frequency chart you cite, but I have all the detail on a large number of confirmed P312 people in files posted at the L21 and P312 Yahoo groups.  I just checked those files.  Out of the first 67 FTDNA STRs, only DYS472 was restricted to its mode or +1 to -1 the mode.

That's just for P312 deep clade tested people I can find in FTDNA projects. If you look at all the haplogroups and all of the data FTDNA has I don't think what you are saying is true.   ... but why FTDNA bother to keep an STR had had no variance?  It would be a dead STR, useless.

I also think you are pointing out the importance of having more STRs (more individual experiments) in the calculations. Any one STR might be aberrant but if we look at populations and use statistics to take advantage of the law of large numbers, we can still find value.

Quote from: ironroad41
The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  
I agree that it is important not to measure hours with a calendar, which is what you'd be doing if you rely on STRs that don't hardly move.

I don't think you can say Ken Nordtvedt hasn't considered this though. I can't find the posts but I know he has run simulations with different sets of markers and concluded that the loss of precision from excluding many of the faster markers was greater than the risks run by so-called saturation.

Quote from: ironroad41
I believe that TMRCA estimates and diversity are underestimated.

It could be but the primary controversy there are the mutation rates - germ-line versus evolutionary.  This has no impact on the relative positioning of one one haplogroup to another according to their STR variance.

Quote from: ironroad41
I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.

Do you have any studies or analysis that this is true?  


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 12:55:48 PM
Mikewww says: “Do you have any studies or analysis that this is true?”

You asked the same to me. And the same is the answer.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 12:56:27 PM
Mikewww says: “Do you have any studies or analysis that this is true?”

You asked the same to me. And the same is the answer.
Do you speak for Ironroad?

BTW, I'm sorry but I lost track of your answer.  At least on this thread.  I think you generally believe STR mutations revolve around the modal, but I don't believe it just because you do.

I interpret your belief as advocating the trashing of Y STRs as far as usefulness for the molecular clock concept. There is a large scientific community and commercial testing community that are not in agreement with you.   Even legally, I think what you advocate could mean that paternity tests are useless because of convergence back to the identical.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 01:00:36 PM
I don't know Ironroad, but it seems to me that what he says is what I am saying from many years, and also to sacred monsters like Nordtvedt, Klyosov and many "professionists" if we are amateurs, and the numerous peer review papers I have destroyed in these last years are demonstrating this.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 01:03:54 PM
I don't know Ironroad, but it seems to me that what he says is what I am saying from many years, and also to sacred monsters like Nordtvedt, Klyosov and many "professionists" if we are amateurs, and the numerous peer review papers I have destroyed in these last years are demonstrating this.

Who can argue with a giant slayer like yourself?  Particularly one who can interpret others' thoughts and speak on their behalf.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 01:06:41 PM
I am not able to write to you if you add answer to answer. I am not fluent in english like you, but you should know that my principles aren't only the mutations around the modal, but also the convergence to the modal as time passes and that sometime a mutation goes for the tangent. There are then the outliers, like that R-Z253 which falsifies all your calculations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 01:10:58 PM
I am not able to write to you if you add answer to answer. I am not fluent in english like you, but you should know that my principles aren't only the mutations around the modal, but also the convergence to the modal as time passes and that sometime a mutation goes for the tangent. There are then the outliers, like that R-Z253 which falsifies all your calculations.
It looks like between convergence and tangents you've got all the bases covered.  That's a good plan your part.  I accept that you have different perspectives on STR mutation stuff. That's fine. You could be right.

However, then you lost me.  Can you be more specific on what Z253 falsifies?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 01:16:41 PM
From another thread:

Quote from: ironroad41 on Today at 08:21:46 AM
I looked at your age estimate for Z253 Mike, and I know it is consistent with your other work, but as I've noted before, I have a great deal of difficulty reconciling the haplotype I have z5hg3 (Ysearch) with your age estimates of this subclade.  I know I don't fit the mold, but I have been tested positive for this SNP.

Of course if you are R-Z253 and your values don’t fit with the clade Mikewww has individuated for this subclade, it does mean that that clade was one of the clades of R-Z253 and yours is the witness of the fact that Z253 is more ancient than Mikewww thinks. The subclades of R-L21 have a casual order in the haplotree and we don’t know which is more ancient. Your values are the classical “outlier”, and the outliers are the witness of the mutations that a haplogroup has had beyond the lines extinct and the clades mutated around the modal. Every subclade like every haplogroup has a modal which is a fiction, till the outliers like yours demolish it.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 01:19:26 PM
...
I refer again to the only significant data set on Y STR frequencies at: Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.

I don't know how old the data is for the frequency chart you cite, but I have all the detail on a large number of confirmed P312 people in files posted at the L21 and P312 Yahoo groups.  I just checked those files.  Out of the first 67 FTDNA STRs, only DYS472 was restricted to its mode or +1 to -1 the mode.

That's just for P312 deep clade tested people I can find in FTDNA projects. If you look at all the haplogroups and all of the data FTDNA has I don't think what you are saying is true.   ... but why FTDNA bother to keep an STR had had no variance?  It would be a dead STR, useless.

I also think you are pointing out the importance of having more STRs (more individual experiments) in the calculations. Any one STR might be aberrant but if we look at populations and use statistics to take advantage of the law of large numbers, we can still find value.

Quote from: ironroad41
The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  
I agree that it is important not to measure hours with a calendar, which is what you'd be doing if you rely on STRs that don't hardly move.

I don't think you can say Ken Nordtvedt hasn't considered this though. I can't find the posts but I know he has run simulations with different sets of markers and concluded that the loss of precision from excluding many of the faster markers was greater than the risks run by so-called saturation.

Quote from: ironroad41
I believe that TMRCA estimates and diversity are underestimated.

It could be but the primary controversy there are the mutation rates - germ-line versus evolutionary.  This has no impact on the relative positioning of one one haplogroup to another according to their STR variance.

Quote from: ironroad41
I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.

Do you have any studies or analysis that this is true?  

I briefly looked at the the P312 data for 393.  What is needed is a distribution, by number for each dys loci.  My observation is that 95% of the entries will be modal, +/- 1.  That leaves room for the 5% multisteps observed.  This would entail counting the number of each entries values and plotting them as the reference I cited did.  Note many dys loci appear tighter than 5%.
re: mutation rates.  I have used chandler and a more recent set of 110 published on-line (burgarella).  Since chandler crossed boundaries of Hgs, I am suspect of his value of 388 say.  I think the Burgarella rates are certainly valid for many of the applications we look at, such as Clan Gregor founder.  I'm inclined to think at present that Zhivotovsky's fudge factor may be due to hidden mutations?

one other observation is that for the 7 Hgs studied in the data set I referenced, there is no obvious change in modal value with time for a dys loci.  A change appears from time to time, but it appears to be due to a multistep mutation and then the drift over time for the Hg is back to a common modal.  This may be due to the chemical kinetics Klyosov talks about?  The point is we don't see over time as we evolve from Hg E to the R's any significant change in modals or dynamic range around the modals.  I think someone observed once on rootsweb that his 439 had the same value as a chimpanzee.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 23, 2012, 01:34:08 PM
The only reason why most mutations observed in FTDNA datasets are within +-1 mutation of the presumed modal, is because most people in the FTDNA projects share a TMRCA that is fairly recent, at least in the clan projects. I think a more important thing to look at is not whether 95% of mutations in DYS393 are within +-1 mutation of the modal, but how many are +1 and how many are -1, this would be indicative of whether or not a large number of back mutations could have occurred. I don't think mutations converge to a modal, in fact the one thing I would need to confirm is that the presumed modal of a set is in fact the ancestral haplotype of the set. Although for all intended purposes a set where the modal of a given microsatellite is 12 and most mutations are 14 would have the same variance as a set where the modal is 13 and most mutations are either 12 or 14 when calculating the modal using the assumption of minimization of mutations. Also, there is an observed direct relationship between the number of repeats and the mutation rate. For example a locus with 16 repetitions is more likely to mutate to 17, than the same locus with 13 repetitions is to mutate to 14.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 01:42:50 PM
I have spoken of this about the aDNA found in France of the hg. G: without those data we wouldn't have undertstood which was the modal 7000 years ago for some loci.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 23, 2012, 01:49:04 PM
Thinking about this mutating around the modal idea, I think it sounds quite exciting.

Clearly there is more than one modal, lots in fact !!

Presumably these modals consist of values that peoples DNA gravitates to ?

Now if we can work out what special properties certain values have at specific loci to create this effect then we could maybe predict other values that have this strange property and use them to discover other as yet unidentified modals !!!!

BTW does this mean that people with no real conection could end up with extremely similar values as there DNA converges on a random modal ?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 01:51:16 PM
But does anybody answer this post of mine?

This posting of mine, posted here and published also by Dienekes, is waiting some response, above all from Anatole Klyosov:


An interesting haplotype of R1a1a (M17) has been found in the paper of Gunjan Sharma et al., Genetic Affinities of the Central Indian Tribal Population, PLoS one, February 2012:
DYS19=18
DYS385=14-17
DYS389=15-30
DYS390=28
DYS391=12
DYS392=14
DYS393=13
DYS437=17
DYS439=13
DYS448=22
DYS456=17
DYS458=17

At first sight it could seem we have found the R-M420 not found so far in India with its DYS492=14, which presupposes a 13, whereas all the other R1a1a haplotypes have 11 or 10 and 12 from 11, but this haplotype has been tested for M17, then it isn’t an R-M420. Also the extremely large variance of the other markers makes us think that this value 14 derives from a modal 11 (or what was the modal at the origin of this subclade). Then again all the discourses about “modal” and “variance”, as I have supported many times, are worth nothing.
But I think it would be something to say about the TMRCA of 10.97+/-1.86 kya (25 y for generation) even though calculated by the Zhivotovsky rate. It is clear that these R1a1a-s belong to different clades and the massive presence of the clade most usually found falsifies the calculation. It is clear that this haplotype is an outlier, but for this more interesting, because testifies all the mutation gone mostly for the tangent and not around the modal. If we calculate the intraclade between two of these haplotypes, for instance with this closer to the modal: 15, 11-14, 14-32, 24,10, 11, 12,14,10, 20, 15,16 we have 32 mutations. Also using the usual mutation rate of 0,0022, we have:
(454x32)/28=518
518x25=12,950
and I have used a generation of 25 years and not 32 as I use usually, and I haven’t considered other mutations around the modal.

Conclusions? The ancientness of the haplogroups is much much more than it is usually thought.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 01:54:52 PM
I have spoken of this about the aDNA found in France of the hg. G: without those data we wouldn't have undertstood which was the modal 7000 years ago for some loci.
I won't argue with you on what mutation rates are right. That is black hole of a discussion on its own and the mutation rates are the application of time(years) so they are critical for on TMRCA estimations.  However, this thread is about STR diversity not necessarily mutation rates.

Does this have something to do with what you said about R-Z253 (a subclade of R-L21) falsifying something?

Marko Heinila estimated the TMRCA for G-M201 as 27k ybp based on just over 2200 haplotypes. He uses STR diversity in a different way than Nordtvedt in what he calls a "maximum likelihood" method.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 23, 2012, 02:05:02 PM
We have discussed about this about hg. J on eng.molgen with the great geneticist Roy King. I have had the impression that Roy was frustrated in his desire to find that his J was Jewish and not European. In the short time of his haplogroup the method of Heinila wasn’t able to decide, because, by calculating the variance without taking in consideration mutations around the modal etc., this method cannot decide. It is possible that for more ancient times (27kya are many) it fits.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 02:10:57 PM
But does anybody answer this post of mine?....
Conclusions? The ancientness of the haplogroups is much much more than it is usually thought.
Do you have these two haplotypes in Ysearch with 67 STRs?  What are the terminal SNPs for each haplotype? Comparing haplotypes on a limited number STRs is not really something you can expect much precision with.  

Comparing any two individual haplotypes can produce unusual results,  I pretty much ignore FTDNA's tip calculator when looking at my matches.  I think this is part of the reason they felt that 111 STRs are useful, but if you are only using 10 or 15 for just two people I don't know if it is worth your time chasing down.

I agree with you that some people have values at some STRs that are far off the modals, if that is what you mean by a tangent.   I think this underscores the importance of the law of large numbers and using statistical tools for populations, not necessarily individuals.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 23, 2012, 02:26:49 PM
We have discussed about this about hg. J on eng.molgen with the great geneticist Roy King. I have had the impression that Roy was frustrated in his desire to find that his J was Jewish and not European. In the short time of his haplogroup the method of Heinila wasn’t able to decide, because, by calculating the variance without taking in consideration mutations around the modal etc., this method cannot decide. It is possible that for more ancient times (27kya are many) it fits.

Heinila has J-M304's TMRCA estimated at 20k ybp and he has the interclade for J-M304's and I-M170's common ancestor as 25k ybp.   What's out of wack or what's falsified? These are quite old haplogroups so I don't think you can expect much precision and at these ages the linear duration of the STRs does become an issue (per Vince Vizachero.)

Nevertheless, the IJ interclade TMRCA is 25k ybp while using the same method Heinila gets the R1a-SRY10831.2 and R1b-M343 interclade TMRCA as 15k ybp.  

Again these estimates are not precise to 1000 years, but that R1a1 / R1b interclade age of 15k is not that far different than Karafet's estimate for the R1 TMRCA of 18.5 k ybp. Karafet used a completely different method not using STRs, but by counting SNP branch lengths.  The SNP branch length "molecular clock" seems to align with the STR variance "molecular clock."   ...  an amazing coincidence.

R1 could clearly be older than 15k ybp or 18.5k ybp present and there could always be an abberrant STR value or two, but we have these estimates (based on large numbers of haplotypes) for the most recent common ancestors of R1a and R1b in support of each other.

BTW, using the same method and scale, Heinila has the interclade TMRCA for R1b-L21 and R1b-U152 as 4.2k ybp. The TMRCA for R1b-Z253 (a son of L21) shouldn't be older than that if at all.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 23, 2012, 02:27:05 PM
The only reason why most mutations observed in FTDNA datasets are within +-1 mutation of the presumed modal, is because most people in the FTDNA projects share a TMRCA that is fairly recent, at least in the clan projects. I think a more important thing to look at is not whether 95% of mutations in DYS393 are within +-1 mutation of the modal, but how many are +1 and how many are -1, this would be indicative of whether or not a large number of back mutations could have occurred. I don't think mutations converge to a modal, in fact the one thing I would need to confirm is that the presumed modal of a set is in fact the ancestral haplotype of the set. Although for all intended purposes a set where the modal of a given microsatellite is 12 and most mutations are 14 would have the same variance as a set where the modal is 13 and most mutations are either 12 or 14 when calculating the modal using the assumption of minimization of mutations. Also, there is an observed direct relationship between the number of repeats and the mutation rate. For example a locus with 16 repetitions is more likely to mutate to 17, than the same locus with 13 repetitions is to mutate to 14.
 You make some good points.  The dataset I referred to was not limited to clans however.  We don't know what really affects what the modal is other then the type of STR, whether, di, tri etc., this property appears to affect the mutation rate (see the mutation table I referred to Burgarella et.al.).  The other issue might be chemical kinetics and what that entails (I am no expert in that field).  I do believe there have been small changes in the Modal over time, but not much.  My argument has been that most mutations are around the modal(regardless of the modal value).  Unlike the Drunkards walk model which shows an expanding range of states with time, the STR mutational process seems to be confined to a narrow band, except for when a multistep occurs.  I don't think this is due to the age of haplotypes but is more inherent to the process.  I don't think your last statement re: which dys loci is most likely to mutate to a higher value agrees with the data of Burgarella.  Mutation rates, as I mentioned, seem more defined by the type of STR, i.e., two, three, four or more G,C,A,Ts in the increment.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 23, 2012, 05:35:38 PM
I checked and in my haplotype cluster there are 20 (xCDY) off-modals with four mutations that are less then the modal value which is about 22% or 3 to 1 mutating upwards. In a slow to fast order, the 8-10 and the 16th STRs are downward dogs.

Order   1130A1   L21 Modal
1   531=>12   11
2   497=15   14
3   511=11   10
4   19=>15   14
5   385a=12   11
6   441=14   13
7   552=25   24
8   447=24   25
9   513=11   12
10   557=<15   16
11   446=14   13
12   464d=18   17
13   456=18   16
14   534=16   15
15   449=31   30
16   576=17   18
17   710=36   35
18   712=>21   20


Title: Re: STR Wars: Is diversity meaningful? TMRCAs? Mutation rates?
Post by: Mike Walsh on April 24, 2012, 01:09:51 PM
I'm just reposting this here to help keep other threads on topic.  I'm opening up the subject of this thread to include TMRCAs and mutation rates.  I don't necessarily take firm positions on TMRCAs and mutation rates, but rather trust, what I understand others on these things...  however, we can discuss them if you want.

I'll respond to Ironroad later.

As mike has said, even the experts can't decide on the appropriate model.  I spent a lot of time on the Busby paper this winter and also pondered again the Zhivotovsky conundrum.
There are several mathematical methodologies that produce similar results so I do want to be clear that there are alternative methods that seem to support each other in TMRCA calculations.

What is not agreed up on are the mutation rates, although the leading hobbyist-scientists seem to come down pretty much on the side of the germ-line rates...  this would include Chandler, Nordtvedt, Heinila and Klyosov. I don't know if Vizachero and Dienekes are scientists but they also are against using the evolutionary rates rather than the germ-line rates.
I am responding to your previous post and this one.  We are all homosapiens, but what does that mean?  We have many differences due to environment and evolution.  The same is true in this area of study.  If you have accessed the table I referred to you can look at the distribution of 388 for hgs I1, J2, and R1b.  I am emphasizing this dys loci because its behavior can significantly affect the TMRCA estimate.  For this loci Chandler gives a value of .00022 per gen and burgurella .00046.  From the table I note that R1b had approximately 221 mutations out of 22129 entries.  We have no idea how many are unique and how many are inherited.  J2 had 265 out of 915 and I had 2508 out of 5700!  Additionally the data spread is across 5 to 6 values for I and J2 and across essentially 3 values for R1b. In no way can one rate support these data.   Additionally, the variance calculation will show a large contribution to TMRCA due to the very low mutation rate and concomitant long time period expected between mutations at this locus.  No wonder I and J appear older in Kens work.

I really believe you have to get into this level of detail to understand the Y STR mutation process and its current problems.  Most Dys loci who mutate within the modal +/- l generate no appreciable variance and certainly no increase occurs with time as the drunkards walk model suggests.  Most of the variance is generated by multisteps, especially steps greater than 2 and the faster mutators such as CDYa,b.

My conclusion is that the Variance/ASD model does not represent the data properties.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 24, 2012, 02:13:11 PM
I have tried to reply to your questions as honestly as I can.  I presented my comments in this format, because I wanted to explore the assumptions built into the calculations presented for Variance/diversity.

It is critical that the assumptions represent what we observe in the data generated to date, whether it has been published as a study or is just a dataset.

The ASD/Variance model was developed by Goldstein, et.al., and modified by Nordtvedt and others.  At the time of development, not much data existed to verify the assumptions.

We will never have enought data, but sufficient data has been collected, both germ-line and family data sets to make observations about the "real" properties of the Y STR mutational process.

I do not believe your subject question can ever be rationally answered until these properties are understood and agreed upon and a model created using these properties.

I would like to emphasize one other aspect of the Goldstein derivation in which he states that each dys loci can be used to infer the TMRCA but in practice several are used and averaged.  Note:  I do not believe this calculation can be made using Kens approach since he uses averages of mutation rates?  What this approach permits is an estimate of the SD of the computation.  First you compute the TMRCA using the dys loci of interest. Let xsuba be the average of the TMRCA's, then the SD = square root of ( sum(x - xsuba)^2/(N -1)).  This also explains why using STR's of similar rates yields higher confidence.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 24, 2012, 06:23:16 PM
...
I would like to emphasize one other aspect of the Goldstein derivation in which he states that each dys loci can be used to infer the TMRCA but in practice several are used and averaged.  Note:  I do not believe this calculation can be made using Kens approach since he uses averages of mutation rates? ...

If you are critiquing Ken Nordtvedt's TMRCA methodology you should probably read his web site documentation and understand his spreadsheet. http://knordtvedt.home.bresnan.net/  You can also get direct answers from him on the Rootsweb Hg I forum.  He'll answer, particularly if you have a critique.

I've seen where Anatole Klyosov uses an average rate across a set of markers.   Nordtvedt aggregates STRs into a summary TMRCA but he does call them individual experiments and he does use the individual STR mutation rates in his spreadsheet formulas. He has a column for each STR.  Anyway, I don't think this is averaging the rates together in the sense that you mean, but I'm not sure what you mean.  I think when you get down to the specifics you have to talk about the details of the formulas.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 25, 2012, 07:04:18 AM
...
I would like to emphasize one other aspect of the Goldstein derivation in which he states that each dys loci can be used to infer the TMRCA but in practice several are used and averaged.  Note:  I do not believe this calculation can be made using Kens approach since he uses averages of mutation rates? ...

If you are critiquing Ken Nordtvedt's TMRCA methodology you should probably read his web site documentation and understand his spreadsheet. http://knordtvedt.home.bresnan.net/  You can also get direct answers from him on the Rootsweb Hg I forum.  He'll answer, particularly if you have a critique.

I've seen where Anatole Klyosov uses an average rate across a set of markers.   Nordtvedt aggregates STRs into a summary TMRCA but he does call them individual experiments and he does use the individual STR mutation rates in his spreadsheet formulas. He has a column for each STR.  Anyway, I don't think this is averaging the rates together in the sense that you mean, but I'm not sure what you mean.  I think when you get down to the specifics you have to talk about the details of the formulas.
  I left my statement re: Kens approach as a question mark, since I haven't looked over his work in quite a while.  If he uses individual dys loci rates then his approach should be amenable to to the same SD calculation.  My major point was that if the rates of the loci are similar, then the estimates are closer and the SD is smaller.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 25, 2012, 08:51:14 AM
...
I would like to emphasize one other aspect of the Goldstein derivation in which he states that each dys loci can be used to infer the TMRCA but in practice several are used and averaged.  Note:  I do not believe this calculation can be made using Kens approach since he uses averages of mutation rates? ...

If you are critiquing Ken Nordtvedt's TMRCA methodology you should probably read his web site documentation and understand his spreadsheet. http://knordtvedt.home.bresnan.net/  You can also get direct answers from him on the Rootsweb Hg I forum.  He'll answer, particularly if you have a critique.

I've seen where Anatole Klyosov uses an average rate across a set of markers.   Nordtvedt aggregates STRs into a summary TMRCA but he does call them individual experiments and he does use the individual STR mutation rates in his spreadsheet formulas. He has a column for each STR.  Anyway, I don't think this is averaging the rates together in the sense that you mean, but I'm not sure what you mean.  I think when you get down to the specifics you have to talk about the details of the formulas.
  I left my statement re: Kens approach as a question mark, since I haven't looked over his work in quite a while.  If he uses individual dys loci rates then his approach should be amenable to to the same SD calculation.  My major point was that if the rates of the loci are similar, then the estimates are closer and the SD is smaller.
Okay, so you are not critiquing Ken's methodology then, because you haven't read his work for quite a while. Since you are mentioning him by name and using hypotheticals like "if he uses" then to be fair to him why don't you challenge him directly?  If you feel uncomfortable, if you will craft a set of very specific questions, I'll ask them on the Hg I Rootsweb forum so he will answer. That way the questions are somewhat anonymous from your perspective..  The guy is good with math so I doubt if he hasn't spent a lot of time on the issues related to this.




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 25, 2012, 12:54:09 PM
I'm just cataloging this from the Busby thread since Busby did an analysis of the linear duration of STRs, which somewhat questions the concept, but then seems to rely on them (STRs) to make their case about various forms of R1b in Europe.

I've also agree that STR evaluation is useful.  I just think that using limited numbers like 10 or 15 is not enough.  That's what I see when I do my own comparisons on hundreds of long haplotypes anyway.  I also think Busby's application of STRs does not match their own linear duration standards. That is an attack, but perhaps I just don't understand. Can you explain?

You are right; they showed that there is a significant effect of microsatellite choice in age estimates that they should have used that finding when calculating TMRCA of R-S127 haplogroup which is on figure-4a. However, in figure-2 they did not calculate TMRCA in generations, but explored the bootstrapped variance, and in fact they do not seem to think that variance is affected by choice of STR, which is why they used 10 STRs on figure-2.  In a nutshell they showed that microsatellite choice can have an effect on age estimates, but still used a combined set of 10 STRs to explore variance.  Perhaps they think one should choose the STRs when calculating TMRCA based on similarity on mutations rates and the presumed time span for common ancestry, i.e. use the average mut/marker for the slowest or fastest STRs depending on the presumed TMRCA, but not the average mut/marker for the whole set, but if you want to calculate variance use the combined set of STRs.

This where I get confused about Busby's theme. I don't know really understand which methods they think are best, but at least I see they value STR diversity in their analyses, just using different techniques I guess.




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 25, 2012, 04:04:26 PM
This is always a contentious issue. I think STR diversity is useful. There are challenges and they must be considered in context.

In my opinion, people are fine with it until it disagrees with their theory, then they must shoot it down rather than adjust their theory. To me it is just another data point, and unfortunately we are in dire need of those.

Anyway, let's discuss this topic here so we don't have to argue the points over and over again in other topics, drowning them out.

As always seems to happen we have strayed from your original question and can't see the forest for the trees (or something like that).  My observations have been little discussed.  My major point in answering you is that I do not believe most Y STR dys loci follows a drunkards walk model which is mathematically equivalent to using ASD/Variance to describe the process.  I know that Nordtvedt is using Variance but my reference for that derivation has been Goldstein,et.al. ( who by-the-way heads up the human genome lab at Duke Univ.).  I believe, based on analyzing the data set I referenced that his model does match the data.  I'm not throwing rocks at anyone, he had no data!  1.  No distribution of allele values around the  modal for the set of dys loci. 2.  No knowledge of multisteps.  When you include these factors I have to conclude that the model doesn't work.

Additionally, the data also suggests that if many of the dys loci mutate away from their modal, then the most probable next mutation is back to the modal, because except for the 5% of multisteps, their aren't any entries with values greater than +/- of themodal.

so to bluntly anwer your original question I would say that diversity isn't meaningful since its masked by hidden mutations which makes time shorter, we count less mutations than really occurred and I don't think ASD/Variance can handle that. (note, the original statement re ASD/Var compensating for hidden mutations was based on the drunkards walk model, where the distance from the modal increase with time and the squaring of the difference between the modal and the present value does compensate for back mutations)

I would be very interested in seeing some data from existing R1b data sets re: STR locus distributions around the modal.  I simply don't have the math tools to extract that from the datasets myself.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 25, 2012, 05:25:02 PM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 25, 2012, 06:10:24 PM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  

You should probably look at his formulas and his powerpoint charts where he charts out and tries to explain his methodology.

I think the answer is something along the lines of what John Chandler is saying.  This is from reply #5 of this thread.

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, ...
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

My understanding of the explanation is that their mathematical model does not care about hidden mutations or even multi-step mutations. The mutation rates were derived based on visible mutations so, as long as they have adequate data to build the mutation rates, the way the TMRCA method uses them is consistent.  We should not think of the published mutation rate as literally the physical rate of change per the STR, but rather the observable rate of change.

What is required is that the STRs act somewhat consistently, in other words the expected (predicted) rates up and down should be the same and the rates shouldn't change given the allele value, etc.   This would be where the concern about STRs reaching saturation and high alleles values comes into play.  If an STR doesn't show linear duration (of its rate) during the timeframe we care about then it is not helpful.   The goal of the math model is to include STRs that are linear or "on average" (in aggregate) linear.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 25, 2012, 06:24:16 PM
...  My major point in answering you is that I do not believe most Y STR dys loci follows a drunkards walk model which is mathematically equivalent to using ASD/Variance to describe the process.  I know that Nordtvedt is using Variance but my reference for that derivation has been Goldstein,et.al. ( who by-the-way heads up the human genome lab at Duke Univ.).  I believe, based on analyzing the data set I referenced that his model does match the data.  I'm not throwing rocks at anyone, he had no data!  1.  No distribution of allele values around the  modal for the set of dys loci. 2.  No knowledge of multisteps.  When you include these factors I have to conclude that the model doesn't work...


I haven't read Goldstein's report. Would you mind posting it again?

All I can say is that it is apparent that when looking at R1b haplogroup haplotypes... real ones, lots of them and long ones ...   that STR diversity generally increases with haplogroups that are bigger (older) branches on the Y DNA tree.  In other words, it actually happens STR variance is higher for haplogroups that the SNP based Y DNA tree says are older.  -  This is observable. Not hypothetical. Please check reply #72 in this thread and around it. I've done this for pretty much all of R-L11. It works nicely.

Is STR variance precise?  No, but folks like Nordtvedt take great pains to produce confidence ranges that you can use and used advanced techniques like interclade comparisons to improve precision.

Academics and testing companies also use STR diversity and have been for a long time.

I know you are aware of Marko Hienila's TMRCA method. He said it is NOT ASD/variance based so that might alleviate your fears.  He calls it a "maximum likelihood" method which I believe is especially well suited for back or multi-step mutations.....
but it matters little. Marko comes up with TMRCAs for the R1b haplogroups that are similar to what Nordtvedt's method does.

Are all STRs good in terms of their linearity with time? No, surely not. The multi-copy ones aren't very linear at all. Some of the faster ones, or at least the high allele value ones may not be reliable either.

Is it possible that some samples of haplotypes are biased by a particular group?  Sure, that is what the "resampling" thing is all about in the Busby and Myres work.  However, this is primarily an intraclade problem. Nordtvedt's interclade approach can reduce or eliminate those biases significantly.  

Maybe the mutation rates are all wrong, but I don't think anyone can effectively argue that most of FTDNA's STRs don't accumulate variance with time.  It's also intuitive, if you consider that most of these STRs are single steppers per event and you overlay that on to the family structure (tree).


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 25, 2012, 08:17:03 PM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  

You should probably look at his formulas and his powerpoint charts where he charts out and tries to explain his methodology.

I think the answer is something along the lines of what John Chandler is saying.  This is from reply #5 of this thread.

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, ...
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

My understanding of the explanation is that their mathematical model does not care about hidden mutations or even multi-step mutations. The mutation rates were derived based on visible mutations so, as long as they have adequate data to build the mutation rates, the way the TMRCA method uses them is consistent.  We should not think of the published mutation rate as literally the physical rate of change per the STR, but rather the observable rate of change.

What is required is that the STRs act somewhat consistently, in other words the expected (predicted) rates up and down should be the same and the rates shouldn't change given the allele value, etc.   This would be where the concern about STRs reaching saturation and high alleles values comes into play.  If an STR doesn't show linear duration (of its rate) during the timeframe we care about then it is not helpful.   The goal of the math model is to include STRs that are linear or "on average" (in aggregate) linear.

Thanks Mike.

What about using a Poisson distribution process to help gauge how many hidden mutations are accumulated over time?  For example,  Let's say the average observable genetic distance between any two L11+'s is 20.  Poisson should show us how many should be the average at x point in time.  Maybe 30 at 6000 years, 40 at 8000, or only a small increase.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: MHammers on April 25, 2012, 09:26:34 PM
I ran a simple Poisson distribution with Excel using an average mutation rate of .0023 and average generation time of 30 years/G over 49 markers.  This is to see how many mutation events can be expected in x time between two haplotypes.

For 67 generations or 2000 years, I get 7 mutations with the probability mass function.  At 10,000 years, 37 mutations with the same.  

This hypothetically includes hidden mutations.  Many L11 members are 20+ away from others in observable mutations, so approximately 37 on average when including back or multi-step mutations might not be far off.  However, this is still a simple model for what we are trying to answer and the snp L11 is probably closer to 2,000 than 10,000 years old.





Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 25, 2012, 10:58:46 PM
MikeH

I manually calculated the rate used and here is what I show.
Using the same average 0.23 mutation rate equals 1 mutation per 435 birth events.

435/49 markers equals 8.9 per birth events

@49 markers: 8.9 x 30 years per generation equals 267 years
(using 25yrs per gen equals 222.5)

2000 years divided by 267 equals 7.5 mutations will occur. so at 10K yrs 37.5.
(2000/222.5 = 9.0 Mutations, 10K = 4.5)

Pretty close.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 07:57:35 AM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  

You should probably look at his formulas and his powerpoint charts where he charts out and tries to explain his methodology.

I think the answer is something along the lines of what John Chandler is saying.  This is from reply #5 of this thread.

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, ...
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

My understanding of the explanation is that their mathematical model does not care about hidden mutations or even multi-step mutations. The mutation rates were derived based on visible mutations so, as long as they have adequate data to build the mutation rates, the way the TMRCA method uses them is consistent.  We should not think of the published mutation rate as literally the physical rate of change per the STR, but rather the observable rate of change.

What is required is that the STRs act somewhat consistently, in other words the expected (predicted) rates up and down should be the same and the rates shouldn't change given the allele value, etc.   This would be where the concern about STRs reaching saturation and high alleles values comes into play.  If an STR doesn't show linear duration (of its rate) during the timeframe we care about then it is not helpful.   The goal of the math model is to include STRs that are linear or "on average" (in aggregate) linear.

I didn't see the term multisteps discussed by John?  I do note that when he refers to compensation for hidden mutations he is making reference to Dys loci that behave like a drunkards walk model and are unbounded.  HIs comment about the linearity of a dys loci is appropriate and I believe the number of mutations is undercounted because of  the boundedness of many of the dys loci.

I llike his presentation of the Zhiv problem and how they found a constant fudge factor to compensate for some unknown factor in the mutational process.  I happen to believe the unknown factor is real and is related to the hidden mutation issue.

You don't need the fudge factor if you can intelligently count mutations, when you can't then maybe it is the best option when you're trying to infer Large TMRCA's.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 09:36:05 AM
... What about using a Poisson distribution process to help gauge how many hidden mutations are accumulated over time?  For example,  Let's say the average observable genetic distance between any two L11+'s is 20.  Poisson should show us how many should be the average at x point in time.  Maybe 30 at 6000 years, 40 at 8000, or only a small increase.
I don't know the statistics well enough comment on the advantages or disadvantages. I know the "Maximum Likelihood" method that Marko Heinila uses can be applied to a Poisson distribution but I don't know have any details on Marko's formulas.  He might have them posted somewhere.

John Chandler would probably comment if you post this on Rootsweb GENEALOGY-DNA.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 09:40:48 AM
...  My major point in answering you is that I do not believe most Y STR dys loci follows a drunkards walk model which is mathematically equivalent to using ASD/Variance to describe the process.  I know that Nordtvedt is using Variance but my reference for that derivation has been Goldstein,et.al. ( who by-the-way heads up the human genome lab at Duke Univ.).  I believe, based on analyzing the data set I referenced that his model does match the data.  I'm not throwing rocks at anyone, he had no data!  1.  No distribution of allele values around the  modal for the set of dys loci. 2.  No knowledge of multisteps.  When you include these factors I have to conclude that the model doesn't work...


I haven't read Goldstein's report. Would you mind posting it again?

All I can say is that it is apparent that when looking at R1b haplogroup haplotypes... real ones, lots of them and long ones ...   that STR diversity generally increases with haplogroups that are bigger (older) branches on the Y DNA tree.  In other words, it actually happens STR variance is higher for haplogroups that the SNP based Y DNA tree says are older.  -  This is observable. Not hypothetical. Please check reply #72 in this thread and around it. I've done this for pretty much all of R-L11. It works nicely.

Is STR variance precise?  No, but folks like Nordtvedt take great pains to produce confidence ranges that you can use and used advanced techniques like interclade comparisons to improve precision.

Academics and testing companies also use STR diversity and have been for a long time.

I know you are aware of Marko Hienila's TMRCA method. He said it is NOT ASD/variance based so that might alleviate your fears.  He calls it a "maximum likelihood" method which I believe is especially well suited for back or multi-step mutations.....
but it matters little. Marko comes up with TMRCAs for the R1b haplogroups that are similar to what Nordtvedt's method does.

Are all STRs good in terms of their linearity with time? No, surely not. The multi-copy ones aren't very linear at all. Some of the faster ones, or at least the high allele value ones may not be reliable either.

Is it possible that some samples of haplotypes are biased by a particular group?  Sure, that is what the "resampling" thing is all about in the Busby and Myres work.  However, this is primarily an intraclade problem. Nordtvedt's interclade approach can reduce or eliminate those biases significantly.  

Maybe the mutation rates are all wrong, but I don't think anyone can effectively argue that most of FTDNA's STRs don't accumulate variance with time.  It's also intuitive, if you consider that most of these STRs are single steppers per event and you overlay that on to the family structure (tree).

The Goldstein/Stumpf paper is from Science, Vol. 191, 2 march 2001

I would expect diversity of a set of haplotypes to increase with time.  As time elapses more to the slower mutations occur which have a very small probability of reoccurring.  I think the medium rate haplotypes (mostly tetra motif) go in and out randomly as they mutate around the modal?

Markko traces/uses apparent mutations as does Ken.

I am arguing that most tetra motif dys loci don't accumulate variance with time.  Variance increases requires an unbounded model.  I don't see that in the small amount of data I have looked at?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 09:55:19 AM
Markko traces/uses apparent mutations as does Ken.
Yes, of course, because that's all that is observable.

...
I am arguing that most tetra motif dys loci don't accumulate variance with time.
What tetra STR markers out of FTDNA's first 67 should be eliminated.  Please provide the list.  It should be easy to run a couple of comparisons.   Maybe this will line up with Marko Heinila's linear duration analysis in which case the "36 linear" markers that I use will be appropriate.

Variance increases requires an unbounded model.

An infinitely unbounded model is not required, just a general linear relationship for the time duration that is applicable.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 10:15:10 AM
The set you probably should use depends on the time frame of interest.  This was Busbys observation, but not practice if I read his paper correctly.  Its a probability issue.  For independent events, as mutations are, the probability of two mutations at a loci is equal to the P(1) mutation squared.  I don't have a good rule for picking, I observe, whatever their rates are, that CDYa,b can have more than one mutations per entry in a relative short time, hundreds of years.  Maybe you can scale from their rate to estimate which dys loci have a low probability of two mutations in 1K years and so on?

When I say bounded I mean that (excepting multisteps), the mutational process at a dys loci is bounded/confined to modal +/-1.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 10:16:05 AM
I haven't read Goldstein's report. Would you mind posting it again?

The Goldstein/Stumpf paper is from Science, Vol. 191, 2 march 2001

That paper is only available for fee.  Please post the excerpts that apply.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 26, 2012, 10:24:49 AM
I posted this on DNA-forums last year.

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007276

Our findings suggest that Y chromosome STRs of increased repeat unit size have a lower rate of evolution, which has significant relevance in population genetic and evolutionary studies.


Principal Findings
In order to study the evolutionary dynamics of STRs according to repeat unit size, we analysed variation at 24 Y chromosome repeat loci: 1 tri-, 14 tetra-, 7 penta-, and 2 hexanucleotide loci. According to our results, penta- and hexanucleotide repeats have approximately two times lower repeat variance and diversity than tri- and tetranucleotide repeats, indicating that their mutation rate is about half of that of tri- and tetranucleotide repeats.'


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 26, 2012, 10:28:54 AM
Here is the Google Cache of the thread.

http://webcache.googleusercontent.com/search?q=cache:skOm4nTP5SQJ:dna-forums.org/index.php%3F/topic/16142-star-wars-i-mean-str-wars-for-r1b/page__st__20&hl=en&gl=us&prmd=imvns&strip=1


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 10:35:49 AM
I haven't read Goldstein's report. Would you mind posting it again?

The Goldstein/Stumpf paper is from Science, Vol. 191, 2 march 2001

That paper is only available for fee.  Please post the excerpts that apply.
Did you try www.sciencemag.org?  For older issues I believe you can access without cost, but it may require you to register?  If I have it stored on-line I'll email it to you.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 10:49:32 AM
You have an argument. This is fine.

...
I am arguing that most tetra motif dys loci don't accumulate variance with time.

I ask you for some detail so I can modify my variance calculations and look at STRs you think are appropriate. I'm volunteering to do this for you.  I don't really think this effort is going to lead to anything, but I'm willing to test your argument with real data like I have on Marko's "linear markers" or Ken's idea of "more markers is better except multi-copy, etc."

What tetra STR markers out of FTDNA's first 67 should be eliminated?  Please provide the list.  It should be easy to run a couple of comparisons.   Maybe this will line up with Marko Heinila's linear duration analysis in which case the "36 linear" markers that I use will be appropriate.

Below is your answer.  My request to help you is simple but you are not helping me help you.

The set you probably should use depends on the time frame of interest.  This was Busbys observation, but not practice if I read his paper correctly.  Its a probability issue.  For independent events, as mutations are, the probability of two mutations at a loci is equal to the P(1) mutation squared.  I don't have a good rule for picking, I observe, whatever their rates are, that CDYa,b can have more than one mutations per entry in a relative short time, hundreds of years.  Maybe you can scale from their rate to estimate which dys loci have a low probability of two mutations in 1K years and so on?

When I say bounded I mean that (excepting multisteps), the mutational process at a dys loci is bounded/confined to modal +/-1.

You don't have to agree with the results, but please provide specifics on your argument so it can be tested in some manner.

I think we've gone over this, but CDYa,b are multi-copy markers and no one that I know of uses them in TMRCA calculations. They are already excluded from the argument.  I exclude DYS385, YCAII, DYS464, DYS459, DYS413, DYS395s1, DYS425 (possible null), DYS439 (possible null) in any of my STR variance calculations. I do include those on straight GD calculations using modified infinite allele techniques.

I have played with adding and subtracting STRs and comparing relative variance across haplogroup. I've done this more systematically with the linearity estimates Marko Heinila has provided.  I can tell you, it doesn't make much difference as long as you get enough STRs (individual experiments) going. The benefits of the law of large numbers seems to apply.

I am not going to extra research and gyrations unless you can be specific on what you want to test and do your own homework.   Do you want to improve the processes? or you just don't like the answers?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 11:23:21 AM
I posted this on DNA-forums last year.

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007276

Our findings suggest that Y chromosome STRs of increased repeat unit size have a lower rate of evolution, which has significant relevance in population genetic and evolutionary studies. ...

umm... this is making a litte more sense to me in terms of the academic back and forth.

"Decreased Rate of Evolution in Y Chromosome STR Loci of Increased Size of the Repeat Unit" by Jarve also includes Zhivotovsky as an author.  Zhivotovsky is the guy who gets his name hung on as the label for the famous (or infamous) evolutionary mutation rates.  I should go try to find Nordtvedt's Rootsweb posts. He really just plain calls the Zhivotovsky evolutionary rates bad science.  That's another side discussion, but it would make sense that given criticism, Zhivotovsky would need to go out and find some bad STRs to help support what some people call his times 3 fudge factor.

Nevertheless, some STRs probably do behave non-linearly outside of certain time ranges. Marko Heinila addressed this with a statistical analysis across tens of thousands of haplotypes. Don't ask me about his method. He's way beyond me. It seemed  logical when he presented it on the "TMRCA report" thread (Aug 2011) on DNA forums. I don't remember any arguments against his methods.

Here were all the markers where "timeframe for each locus where saturation effects are relatively insignificant" were greater than 5000 years.  I don't use the multi-copy markers, even if he included them.
 
Quote
DYS426     > 100000
 DYS447            > 100000
 DYS590            > 100000
 DYS641            > 100000
 DYS472            > 100000
 DYS425            > 100000
 DYS436            > 100000
 DYS490            > 100000
 DYS450            > 100000
 DYS617            > 100000
 DYS492            > 100000
 DYF395S1b           93052
 DYS455              92365
 DYS388              91912
 DYS392              63939
 DYS438              44590
 DYS578              42906
 DYS448              35579
 DYS454              32780
 YCAIIa                       32468
 DYS385a             31095
 DYS520              26205
 DYS531              24566
 DYS446              24038
 DYS594              24008
 YCAIIb                      23585
 DYS385b             23191
 DYS640              22915
 DYS568              16304
 DYS607              15957
 DYS557              15291
 DYS481              14970
 DYS413b             14512
 DYS537              13943
 DYS437              13858
 DYF395S1a           13021
 DYS487              11721
 DYF406S1            11405
 DYS570              10071
 DYS565               9546
 DYS393               9512
 DYS459a              8550
 DYS413a              8471
 DYS449               8044
 DYS19a               7964
 DYS390               7178
 DYS511               5475
 DYS572               5285
 DYS442               5260
 DYS444               5163

In my "36 linear" marker set I'm not using the ones at the bottom, like DYS572. I'm only using STRs with timeframes greater than 7000 years (to cover the Neolithic time.)  As I've said, I don't use any multi-copy markers.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 12:29:56 PM
You have an argument. This is fine.

...
I am arguing that most tetra motif dys loci don't accumulate variance with time.

I ask you for some detail so I can modify my variance calculations and look at STRs you think are appropriate. I'm volunteering to do this for you.  I don't really think this effort is going to lead to anything, but I'm willing to test your argument with real data like I have on Marko's "linear markers" or Ken's idea of "more markers is better except multi-copy, etc."

What tetra STR markers out of FTDNA's first 67 should be eliminated?  Please provide the list.  It should be easy to run a couple of comparisons.   Maybe this will line up with Marko Heinila's linear duration analysis in which case the "36 linear" markers that I use will be appropriate.

Below is your answer.  My request to help you is simple but you are not helping me help you.

The set you probably should use depends on the time frame of interest.  This was Busbys observation, but not practice if I read his paper correctly.  Its a probability issue.  For independent events, as mutations are, the probability of two mutations at a loci is equal to the P(1) mutation squared.  I don't have a good rule for picking, I observe, whatever their rates are, that CDYa,b can have more than one mutations per entry in a relative short time, hundreds of years.  Maybe you can scale from their rate to estimate which dys loci have a low probability of two mutations in 1K years and so on?

When I say bounded I mean that (excepting multisteps), the mutational process at a dys loci is bounded/confined to modal +/-1.

You don't have to agree with the results, but please provide specifics on your argument so it can be tested in some manner.

I think we've gone over this, but CDYa,b are multi-copy markers and no one that I know of uses them in TMRCA calculations. They are already excluded from the argument.  I exclude DYS385, YCAII, DYS464, DYS459, DYS413, DYS395s1, DYS425 (possible null), DYS439 (possible null) in any of my STR variance calculations. I do include those on straight GD calculations using modified infinite allele techniques.

I have played with adding and subtracting STRs and comparing relative variance across haplogroup. I've done this more systematically with the linearity estimates Marko Heinila has provided.  I can tell you, it doesn't make much difference as long as you get enough STRs (individual experiments) going. The benefits of the law of large numbers seems to apply.

I am not going to extra research and gyrations unless you can be specific on what you want to test and do your own homework.   Do you want to improve the processes? or you just don't like the answers?

I can't answer many of your queries.  I think it is important first to agree, or disagree, on my premise that many of the dys loci (medium rate) are limited/bounded.  I've provided a dataset that suggests they are, but I think we need more data.

A prior paper by goldstein, referenced in busby, gives a linearity equation.  Thats what busby used.  I don't know what range of values for each STR Markko used.  If he didn't recognize the problem with multisteps, I would question his definition of linearity.

I'm not asking you to run any test cases yet since I don't know how to specify what you are asking.  If someone who is much cleverer with S/W than I am could create some distribution tables, then we can evaluate that data and determine the next step.

I know Kens opinion of Zhiv.  That said, a lot of folks, as you know, who are knowledgeable are supportive of his approach.  What I'm trying to do is to come up with an understanding of why he had to fudge the data sets referenced by Chandler.  I don't think we are chasing ghosts here.

I appreciate all the attention you've paid to my comments.  I am limited in what guidance I can provide.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 26, 2012, 12:46:05 PM

Here were all the markers where "timeframe for each locus where saturation effects are relatively insignificant" were greater than 5000 years.  I don't use the multi-copy markers, even if he included them.
 
Quote
DYS426     > 100000
 DYS447            > 100000
 DYS590            > 100000
 DYS641            > 100000
 DYS472            > 100000
 DYS425            > 100000
 DYS436            > 100000
 DYS490            > 100000
 DYS450            > 100000
 DYS617            > 100000
 DYS492            > 100000
 DYF395S1b           93052
 DYS455              92365
 DYS388              91912
 DYS392              63939
 DYS438              44590
 DYS578              42906
 DYS448              35579
 DYS454              32780
 YCAIIa                       32468
 DYS385a             31095
 DYS520              26205
 DYS531              24566
 DYS446              24038
 DYS594              24008
 YCAIIb                      23585
 DYS385b             23191
 DYS640              22915
 DYS568              16304
 DYS607              15957
 DYS557              15291
 DYS481              14970
 DYS413b             14512
 DYS537              13943
 DYS437              13858
 DYF395S1a           13021
 DYS487              11721
 DYF406S1            11405
 DYS570              10071
 DYS565               9546
 DYS393               9512
 DYS459a              8550
 DYS413a              8471
 DYS449               8044
 DYS19a               7964
 DYS390               7178
 DYS511               5475
 DYS572               5285
 DYS442               5260
 DYS444               5163

In my "36 linear" marker set I'm not using the ones at the bottom, like DYS572. I'm only using STRs with timeframes greater than 7000 years (to cover the Neolithic time.)  As I've said, I don't use any multi-copy markers.

Perhaps it would be good to know what methodology he used, because he gets a linearity that is three and four folds greater than the previously observed linearity based on the Busby et al(2011) study.

For example Busby et al. gets 19244 ybp of linearity for DYS392, whereas above it shows 63939 ybp for DYS392, that is 3.3  times greater. DYS438 12465 ybp(Busby et al.) vs.44590 ybp(Above) again 3.6 times greater. DYS437 4357 ybp(Busby et al)vs.13858ypb(Above), DYS19 1888 ybp(Busby et al)vs. 7964 ybp (Above).

There are some STRs such as DYS439, DYS635, DYS456, DYS389I, DYS389II, DYS458, Y-GATA-H4 that I couldn’t find above. Others such as DYS448 do not differ by much(i.e. Busby et al. 25381 ybp vs.35579), and DYS393 which gets 5648 ybp in Busby et al. vs. 9512 ybp(Above). The exception would be DYS390 which gets 9211 ybp Busby et al. vs. 7178 ybp(Above). The main point here is that out of 7 STRs that overlap in both cases, 6 have their linearity inflated, what’s worse is that STRs such as DYS437, DYS19, DYS393 which are being used as “most linear” because they have a linearity of more than 7000 ybp, actually show a linearity that is well below 7000 ybp.

Again I don’t know how that person came about those numbers, I know how Busby et al. came about their numbers, which was based on the observed range of alleles in each loci, and the mutation rates measured in father-son’s pairs.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 12:46:12 PM
....    I don't know what range of values for each STR Markko used.  If he didn't recognize the problem with multisteps, I would question his definition of linearity. ...

You were the one who referred me to Marko Heinila. I was not familiar with him until you put me in contact with him.  He definitely recognizes and tries to account for  back-mutations and multi-step mutations. It is my understanding that is why he chose to use the "maximun likelihood" method.

His definition of linearity is very clear.
Quote from: Marko Heinila
timeframe for each locus where saturation effects are relatively insignificant

I am beginning to think you won't accept anything that does not fit your preset theories on Doggerland or on various clans.  If that is the actual basis for your disagreement, that is fine - just say so.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 12:56:19 PM
Perhaps it would be good to know what methodology he used, because he gets a linearity that is three and four folds greater than the previously observed linearity based on the Busby et al(2011) study. ....  

I agree.  I do not understand how Marko T. Heinila derived the linear duration estimates.  I read his original postings at DNA forums (I think it was under U106 TMRCA report) and I was impressed with the large amount of data he used and his application of statistics, but he lost me. He is a scientist at the Helsinki University of Technology (something to do with their Low Temperature Lab.)

This is from memory, but my impression of Busby's research on linearity was that the data was very limited.  I've read on Rootsweb where you'd need to study some unbelievably high number of father-son transmissions to determine linearity that way.
 
Quote from: Busby
Fifteen Y-STRs with mutation rates, range of alleles and estimate of duration of linearity. All STRs investigated in this study are shown with their mutation rates (μ), estimated from Ballantyne et al. [33], and range of observed alleles, R, with 95% CI is taken from the YHRD [34]. θ(R)/2μ is an estimate of the duration of linearity of an STR."

In this an excerpt of that table with the duration of linearity column. I added the "S127"s and "xxx"s to show what markers he used in his S127(L11) STR variance calculations and which ones appeared to be too short in duration according to his own method to go back over 5000 years.
Quote from: Busby

Y-STR    θ(R)/2μ   
DYS448   25381   
DYS392   19244   S127
DYS438   12465   S127
DYS390   9211   S127
DYS393   5648   S127
DYS439   4861   S127 xxx
DYS437   4357   S127 xxx
DYS635    4221   
DYS456    3289   
DYS389II  3111   S127 xxx
DYS391    2554   S127 xxx
DYS458    1944   
DYS19      1888   S127 xxx
Y-GATA-H4 1630   
DYS389I    953   S127 xxx

Maybe you can figure this out.  This is where Busby lost me. Six of the ten STRs he used weren't fit close to for the purpose unless you assume R-L11 is even younger yet.  Why did he use these STRs in his analysis to show R-L11 was of about the same age all across Europe.? It's probably true, but it seems like he (Busby et al) argue against themselves in their efforts to knock Barlaresque.

Anyway, given 1) my inability to see consistency in Busby's stuff, 2) criticism that much large data sets on father-son events are needed, 3) Heinila's strong background and apparent grasp of statistics, 4) the large number of haplotypes Heinila analyzed, and 5) the fact that he analyzed all of FTDNA's 1st 67 STRs...     I started using Heinila's results as the basis for analyzing haplotype data with "linear" STRs only.

... but it doesn't matter much anyway, at least in the large R1b subclades I have data for.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 26, 2012, 01:11:49 PM
This is from memory, but my impression of Busby's research on linearity was that the data was very limited.  I've read on Rootsweb where you'd need to study some unbelievably high number of father-son transmissions to determine linearity that way.

So you think that calibration using some hypothetical clan project where everyone is presumed to be equally removed from the same ancestor, and that the mutation rates of STRs are linear throughout is a better fit, than father-son pairs where one can get a grasp of the actual number of mutations that occur across a generation?

The person who claims that father-son rates are useless is Klyosov, but we know that father-son rates are by far way more reliable than the mutation rates found using calibrations of the Donald Clan.

For once when one count mutations of a loci on a clan Project one is assuming the following:

1-Everyone is equally removed from the presumed common ancestor.
2- The haplotype that minimizes mutations is the modal haplotype, however we have no way of knowing if that was the haplotype of that ancestor, unless we actually test his DNA.
3-While loss of linearity might have no effect on determining the mutation of loci X in a time range of 1300 ybp, one can’t  extrapolate that longer time frames.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 26, 2012, 01:19:41 PM

I can't answer many of your queries.  I think it is important first to agree, or disagree, on my premise that many of the dys loci (medium rate) are limited/bounded.  I've provided a dataset that suggests they are, but I think we need more data.

A prior paper by goldstein, referenced in busby, gives a linearity equation.  Thats what busby used.  I don't know what range of values for each STR Markko used.  If he didn't recognize the problem with multisteps, I would question his definition of linearity.

I'm not asking you to run any test cases yet since I don't know how to specify what you are asking.  If someone who is much cleverer with S/W than I am could create some distribution tables, then we can evaluate that data and determine the next step.

I know Kens opinion of Zhiv.  That said, a lot of folks, as you know, who are knowledgeable are supportive of his approach.  What I'm trying to do is to come up with an understanding of why he had to fudge the data sets referenced by Chandler.  I don't think we are chasing ghosts here.

I appreciate all the attention you've paid to my comments.  I am limited in what guidance I can provide.

How big is this problem of multi step mutations ?

I can't say I come across it very often myself and just had a look at a large family group in a project run by a well known and respected researcher.

Out of the 67 people in this group there were a total of 90 off modal values in the first 37. Only three people had values that were greater than 1 from the modal values one was in YCAII and the other two in CDY and these loci aren't used in variance calculations.

70% of the loci in this group had at least one person with a mutation from the family modal.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 01:30:52 PM
This is from memory, but my impression of Busby's research on linearity was that the data was very limited.  I've read on Rootsweb where you'd need to study some unbelievably high number of father-son transmissions to determine linearity that way.
So you think that calibration using some hypothetical clan project where everyone is presumed to be equally removed from the same ancestor, and that the mutation rates of STRs are linear throughout is a better fit, than father-son pairs where one can get a grasp of the actual number of mutations that occur across a generation?
No, no, I do not agree with Klyosov and in no way intend to support the hypothetical clan model. To me it would be flipping a coin ten times on the north pole and if came out 7 heads and 3 tails we could assume that to be the case everywhere millions of times over.

The person who claims that father-son rates are useless is Klyosov, but we know that father-son rates are by far way more reliable than the mutation rates found using calibrations of the Donald Clan.

For once when one count mutations of a loci on a clan Project one is assuming the following:

1-Everyone is equally removed from the presumed common ancestor.
2- The haplotype that minimizes mutations is the modal haplotype, however we have no way of knowing if that was the haplotype of that ancestor, unless we actually test his DNA.
3-While loss of linearity might have no effect on determining the mutation of loci X in a time range of 1300 ybp, one can’t  extrapolate that longer time frames.

I was talking about how Busby determined linearity timeframes (not mutation rates), but I honestly don't understand the "supposed" limitation in the data he use to calculate linearity.  I was going from memory on what I thought he did.  I'll go back and recheck. Maybe what he did is very thorough, it just seemed limited at the time I read it.

BTW, I did go back to the Myres and Barlaresque papers and reread where they got their data from and how they put it together.  I can repeat the variance calculations, etc., but I can't at all defend how they got their data from different sources.  It makes me nervous.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 01:42:58 PM
How big is this problem of multi step mutations ?

I can't say I come across it very often myself and just had a look at a large family group in a project run by a well known and respected researcher.

Out of the 67 people in this group there were a total of 90 of modal values in the first 37. Only three people had values that were greater than 1 from the modal values one was in YCAII and the other two in CDY and these loci aren't used in variance calculations.

70% of the loci in this group had at least one person with a mutation from the family modal.

As far as I can tell it isn't a big problem, but there some people who are very concerned about it.

I do have one case in a project that I have where it appears that one subgroup jumped from 568=11 to 568=8, at least we think it did.  This does throw off TMRCA calculations when you have small groups.    When you examine the haplotypes, the faster STRs barely moved so we concluded this was a multi-step jump.

However, at the subclade level, R-L193 in this case, it gets washed out (TMRCA wise) anyway. It may be akin to saying one of the 67 STR experiments went bad, but with that many individual STR experiments and 100 haplotypes it doesn't impact anything.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 26, 2012, 01:56:20 PM
How big is this problem of multi step mutations ?

I can't say I come across it very often myself and just had a look at a large family group in a project run by a well known and respected researcher.

Out of the 67 people in this group there were a total of 90 of modal values in the first 37. Only three people had values that were greater than 1 from the modal values one was in YCAII and the other two in CDY and these loci aren't used in variance calculations.

70% of the loci in this group had at least one person with a mutation from the family modal.

As far as I can tell it isn't a big problem, but there some people who are very concerned about it.

I do have one case in a project that I have where it appears that one subgroup jumped from 568=11 to 568=8, at least we think it did.  This does throw off TMRCA calculations when you have small groups.    When you examine the haplotypes, the faster STRs barely moved so we concluded this was a multi-step jump.

However, at the subclade level, R-L193 in this case, it gets washed out (TMRCA wise) anyway. It may be akin to saying one of the 67 STR experiments went bad, but with that many individual STR experiments and 100 haplotypes it doesn't impact anything.

My feelings exactly.

I recently came across somebody who had 12-30 at 389 in a relatively young cluster where as nearly everybody else had 12-28 which presumably was a jump of two, but it's not common.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 26, 2012, 03:23:22 PM
Perhaps it would be good to know what methodology he used, because he gets a linearity that is three and four folds greater than the previously observed linearity based on the Busby et al(2011) study. ....  

I agree.  I do not understand how Marko T. Heinila derived the linear duration estimates.  I read his original postings at DNA forums (I think it was under U106 TMRCA report) and I was impressed with the large amount of data he used and his application of statistics, but he lost me. He is a scientist at the Helsinki University of Technology (something to do with their Low Temperature Lab.)

He definitely analyzed the differences between up and down STR mutations. Unfortunately his links are no longer active.

Quote from: Rootsweb Heinila and Vernade
From: Marko Heinila
Subject: Re: [DNA] Understanding STR mutation rates
Date: Fri, 10 Jun 2011 09:37:32 -0600

Thanks for posting those links below. This is more obviously related
to the more detailed Markov chain model discussions that are
(finally!) becoming more popular on this list... the links provide
some information related the many parameters involved. There seems to
be no perfect estimation methods for these features, what is suggested
empirically in the links is hopefully better than nothing for a
starting point. Marko Heinila

From: vernade didier
Subject: Re: [DNA] Understanding STR mutation rates
Date: Tue, 7 Jun 2011 12:36:09 +0100 (BST)

This is off list because I don't know what Marko Heinila wants. I was
contacted by Marko Heinila and he provided links to his site :
http://beforepresent.dyndns.info/
http://beforepresent.dyndns.info/updownratio.php
http://beforepresent.dyndns.info/lengtheffect.php
http://beforepresent.dyndns.info/rates.php
http://beforepresent.dyndns.info/weights.php

The point is about mutation rates. This work seems interesting.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 26, 2012, 03:34:45 PM
I have no theory on doggerland, its pure speculation from my point of view.

I very much respect Markos work.  I simply was trying to respond to your query. If Marko assumed that each dys locis spread was all due to single steps, then I would question how he defined linear based on work I did this winter.

As I understand it Marko is currently a quant working for a hedge fund in NYC and is now an american citizen.

Additionally, I have no agenda.  I simply think I may have an idea why the Zhiv folks had to apply a fudge factor to father/son rates.

As I pointed out previously, we need to better understand all aspects of the Y STR mutational process based on the data that has been accumulated.

re: multisteps.  I have seen estimates as high as 10% of all mutations, but I think 5% is a more solid number.  A multistep of 4 generates 16 times as much variance/ASD as a single step at a dys loci, thats just the way the model works.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: razyn on April 26, 2012, 04:19:43 PM
Perhaps it would be good to know what methodology he used, because he gets a linearity that is three and four folds greater than the previously observed linearity based on the Busby et al(2011) study. ....  

I agree.  I do not understand how Marko T. Heinila derived the linear duration estimates.  I read his original postings at DNA forums (I think it was under U106 TMRCA report) and I was impressed with the large amount of data he used and his application of statistics, but he lost me. He is a scientist at the Helsinki University of Technology (something to do with their Low Temperature Lab.)

He definitely analyzed the differences between up and down STR mutations. Unfortunately his links are no longer active.

His links aren't, but he is.  A few days ago, he was recovering from long distance flights.  He's aware of the current activity on this forum, and with luck we may hear from him.  He will be analyzing data from 4/17/12, which is recent enough to have results for several of the new SNPs under Z196 (among other new things).  I don't know when his new analysis will be complete, what length haplotypes it will cover, or where it will be visible.  But I'm sure it will be interesting, if we can find it.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 27, 2012, 09:54:57 AM
I have no theory on doggerland, its pure speculation from my point of view.

I very much respect Markos work.  I simply was trying to respond to your query. If Marko assumed that each dys locis spread was all due to single steps, then I would question how he defined linear based on work I did this winter.

As I understand it Marko is currently a quant working for a hedge fund in NYC and is now an american citizen.

Additionally, I have no agenda.  I simply think I may have an idea why the Zhiv folks had to apply a fudge factor to father/son rates.

As I pointed out previously, we need to better understand all aspects of the Y STR mutational process based on the data that has been accumulated.

re: multisteps.  I have seen estimates as high as 10% of all mutations, but I think 5% is a more solid number.  A multistep of 4 generates 16 times as much variance/ASD as a single step at a dys loci, thats just the way the model works.

Its okay to have questions and challenges, but it also important to do some homework. We all had access to his web links last year. I thought I'd better get the information while I could so he sent me a file folder of the documentation he had there when he closed it down.

I don't know if Marko's model is correct, but he certainly made efforts to account for concerns like the ones you have.  As you can see, he tries to calculate both up and down rates as well as multi-steps.

Quote from: Marko Heinila web link 2011

Y-Tree Mutation Rates

locus    mean RN    population rate    modal RN    modal rate    modal downrate    modal uprate    multisteps    Delta ln(uprate)    Delta ln(downrate)
DYS393    13.1    0.00104    13    0.00095    0.000336    0.000611    0.03    0.11    0.94
DYS390    23.6    0.00228    24    0.00243    0.001202    0.001227    0.02    0.15    0.58
DYS19a    14.3    0.00161    14    0.00129    0.000241    0.001046    0.03    0.23    0.96
DYS391    10.5    0.00289    10    0.00149    0.000355    0.001137    0.01    0.00    1.88
DYS385a    12.1    0.00172    11    0.00134    0.000421    0.000921    0.08    0.11    0.29
DYS385b    14.6    0.00334    14    0.00294    0.001112    0.001829    0.10    0.11    0.22
DYS426    11.6    0.00013    12    0.00017    0.000081    0.000090    0.07    0.95    0.82
DYS388    12.6    0.00052    12    0.00034    0.000074    0.000263    0.15    0.33    0.57
DYS439    11.6    0.00499    12    0.00523    0.002273    0.002953    0.00    0.25    1.02
DYS389i    13.0    0.00213    13    0.00188    0.000615    0.001267    0.02    0.05    1.26
DYS392    12.3    0.00058    13    0.00069    0.000221    0.000469    0.10    0.34    0.51
DYS389B    16.4    0.00292    16    0.00231    0.000624    0.001689    0.03    0.18    0.86
DYS458    16.7    0.00697    17    0.00711    0.003145    0.003968    0.04    0.17    0.40
DYS459a    8.7    0.00045    9    0.00043    0.000286    0.000141    0.07    -0.85    1.19
DYS459b    9.6    0.00118    10    0.00152    0.001339    0.000177    0.01    -0.81    1.87
DYS455    10.6    0.00026    11    0.00028    0.000151    0.000130    0.06    0.17    0.46
DYS454    11.1    0.00029    11    0.00026    0.000061    0.000198    0.09    0.52    1.40
DYS447    24.7    0.00304    25    0.00313    0.001629    0.001501    0.06    0.11    0.12
DYS437    14.9    0.00085    15    0.00082    0.000509    0.000314    0.03    0.08    0.74
DYS448    19.4    0.00138    19    0.00124    0.000593    0.000647    0.03    0.14    0.31
DYS449    29.5    0.00843    29    0.00777    0.003120    0.004647    0.05    0.06    0.18
DYS460    10.7    0.00342    11    0.00346    0.001811    0.001652    0.01    -0.09    1.42
Y-GATA-H4    10.7    0.00230    11    0.00248    0.001398    0.001082    0.01    0.12    0.91
YCAIIa    19.2    0.00046    19    0.00042    0.000188    0.000228    0.30    -0.04    0.51
YCAIIb    22.1    0.00094    23    0.00106    0.000737    0.000326    0.18    -0.03    0.27
DYS456    15.4    0.00508    15    0.00380    0.001457    0.002340    0.02    0.17    0.70
DYS607    14.6    0.00217    15    0.00235    0.001288    0.001062    0.04    0.18    0.38
DYS576    17.5    0.01085    18    0.01152    0.005357    0.006166    0.03    0.14    0.44
DYS570    17.9    0.00874    17    0.00677    0.002876    0.003894    0.04    0.17    0.28
CDYa    35.4    0.01265    36    0.01300    0.006602    0.006399    0.07    0.02    0.14
CDYb    37.5    0.01632    38    0.01681    0.008045    0.008763    0.08    0.08    0.18
DYS442    12.0    0.00317    12    0.00291    0.001211    0.001699    0.02    0.24    0.73
DYS438    11.1    0.00049    12    0.00058    0.000270    0.000305    0.07    0.08    0.37
DYS531    11.0    0.00044    11    0.00043    0.000088    0.000339    0.08    0.34    1.05
DYS578    8.5    0.00019    9    0.00023    0.000064    0.000163    0.08    0.32    1.08
DYF395S1a    15.2    0.00042    15    0.00040    0.000113    0.000286    0.08    -0.66    0.78
DYF395S1b    15.9    0.00032    16    0.00031    0.000143    0.000171    0.07    0.38    0.64
DYS590    8.0    0.00013    8    0.00013    0.000045    0.000084    0.07    
   
DYS537    10.5    0.00104    10    0.00076    0.000088    0.000671    0.00    0.03    1.49
DYS641    10.0    0.00030    10    0.00030    0.000066    0.000238    0.05    
   
DYS472    8.0    0.00002    8    0.00002    0.000000    0.000016    0.00    
   
DYF406S1    10.3    0.00164    10    0.00137    0.000422    0.000950    0.02    0.19    0.67
DYS511    9.9    0.00138    10    0.00133    0.000456    0.000877    0.02    0.19    1.22
DYS425    12.1    0.00013    12    0.00012    0.000059    0.000064    0.19    0.53    0.49
DYS413a    22.0    0.00197    23    0.00233    0.001778    0.000552    0.20    -0.06    0.32
DYS413b    22.8    0.00139    23    0.00136    0.000579    0.000781    0.17    0.08    0.46
DYS557    15.9    0.00331    16    0.00327    0.001308    0.001958    0.05    0.12    0.27
DYS594    10.2    0.00046    10    0.00044    0.000109    0.000331    0.08    -0.04    0.67
DYS436    12.0    0.00010    12    0.00010    0.000029    0.000070    0.18    
   
DYS490    12.0    0.00021    12    0.00021    0.000041    0.000167    0.19    0.36    0.64
DYS534    15.4    0.00755    15    0.00643    0.002552    0.003879    0.04    0.14    0.41
DYS450    8.0    0.00018    8    0.00018    0.000055    0.000125    0.10    0.33    0.54
DYS444    12.4    0.00334    12    0.00269    0.000993    0.001700    0.01    0.14    0.68
DYS481    23.1    0.00421    22    0.00334    0.001164    0.002178    0.11    0.10    0.25
DYS520    20.2    0.00171    20    0.00159    0.000565    0.001026    0.03    0.14    0.32
DYS446    13.2    0.00335    13    0.00312    0.001210    0.001906    0.04    0.13    0.23
DYS617    12.3    0.00061    12    0.00054    0.000150    0.000389    0.08    0.31    0.44
DYS568    11.1    0.00048    11    0.00040    0.000181    0.000224    0.08    0.26    1.41
DYS487    13.0    0.00087    13    0.00081    0.000172    0.000638    0.10    0.23    1.02
DYS572    10.9    0.00113    11    0.00110    0.000733    0.000372    0.02    -0.23    1.06
DYS640    11.2    0.00037    11    0.00029    0.000039    0.000256    0.04    0.13    2.12
DYS492    12.1    0.00025    12    0.00023    0.000068    0.000163    0.11    0.34    0.64
DYS565    11.7    0.00073    12    0.00081    0.000502    0.000309    0.05    0.29    1.26

Once again, let me reiterate that the fine tuning and disagreement among hobbyist-scientists doesn't matter much, at least in terms of R1b.

Heinila's intraclade TMRCA for R-L21 is 4.2k ybp. His interclade for the L21-U152 TMRCA is also 4.2k ybp.  Anatole Klosov gets similar estimates for R-L21. I don't know if he does interclade estimates.  Using Ken Nordtvedt's Generation 7, I get the intraclade Coalescence Age for R-L21 as 3.7k ybp. With Gen 7, I get the L21-U152 interclade TMRCA as 4.4k ybp. The R1b subclade TMRCA estimates line up nicely across the board for these guys.

Quote from: Marko Heinila web link 2011

Time Values

haplogroup ____ shorthand _ proven _ predicted  intraclade interclade _ interclade_hg...
_________________________ members _ members (kyBP) (kyBP)
R1b1a2a1a1a _____ R-U106 ___ 1203 __ 3720 __ 4.5 ___ 4.5 __ R1b1a2a1a1b R-P312
R1b1a2a1a1b _____ R-P312 ___ 3403 __ 10724 _ 4.3 ___ 4.5 __ R1b1a2a1a1a R-U106
R1b1a2a1a1b3 ____ R-U152 ___ 439 ___ 1502 __ 4.2 ___ 4.2 __ R1b1a2a1a1b4 R-L21
R1b1a2a1a1b3c ___ R-L2 _____ 254 ___ 996 ___ 4.2 ___ 4.2 __ R1b1a2a1a1b4 R-L21
R1b1a2a1a1b4 ____ R-L21 ____ 2038 __ 6813 __ 4.2 ___ 4.2 __ R1b1a2a1a1b3  R-U152
R1b1a2a1a1b5a ___ R-SRY2627_ 126 ___ 341 ___ 3.5 ___ 3.8 __ R1b1a2a1a1b5b    R-L165

Moving up a step on the Y tree, you can see that Marko has the interclade TMRCA for P312-U106 as 4.5k ybp. Using Ken's method I get 4.5k ybp as well.  I gather Maliclavelli thinks it is ridiculous but I think you can see why I think these major R-L11 subclades were essentially of one clan or tribe at one time.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 27, 2012, 11:22:52 AM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  

You should probably look at his formulas and his powerpoint charts where he charts out and tries to explain his methodology.

I think the answer is something along the lines of what John Chandler is saying.  This is from reply #5 of this thread.

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, ...
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

My understanding of the explanation is that their mathematical model does not care about hidden mutations or even multi-step mutations. The mutation rates were derived based on visible mutations so, as long as they have adequate data to build the mutation rates, the way the TMRCA method uses them is consistent.  We should not think of the published mutation rate as literally the physical rate of change per the STR, but rather the observable rate of change.

What is required is that the STRs act somewhat consistently, in other words the expected (predicted) rates up and down should be the same and the rates shouldn't change given the allele value, etc.   This would be where the concern about STRs reaching saturation and high alleles values comes into play.  If an STR doesn't show linear duration (of its rate) during the timeframe we care about then it is not helpful.   The goal of the math model is to include STRs that are linear or "on average" (in aggregate) linear.

Today, Ken Nordtvedt posted a response related to some of these questions.
Quote from: Ken Nordtvedt
After all these years folks on (other) forums are still wondering or doubting variance measure between haplotypes properly counts back mutations. It does

Suppose an STR mutates twice between two haplotypes. The two could be up,up or up,down or down,up, or down,down. These four possibilities have equal probabilities of 1/4 if up and down mutations are equally likely.

So expected value of variance (which involves squaring STR repeat differences) is 4 x 2/4 + 0 x 2/4 = 2

Suppose an STR mutates three times between two haplotypes. The three could be up,up,up or up,up,down or up,down,up or down,up,up, or up,down,down, or down,up,down, or down,down,up, or down,down,down
Each of these eight alternatives of three mutations has probability 1/8.

So expected value of variance (which involves squaring STR repeat differences) is 9 x 2/8 + 1 x 6/8 = 3

Etc. Yes, variance totally takes into account back mutations and multiple mutations in same direction.......
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-04/1335537421


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 27, 2012, 11:32:52 AM

Today, Ken Nordtvedt posted a response related to some of these questions.
Quote from: Ken Nordtvedt
After all these years folks on (other) forums are still wondering or doubting variance measure between haplotypes properly counts back mutations. It does

Suppose an STR mutates twice between two haplotypes. The two could be up,up or up,down or down,up, or down,down. These four possibilities have equal probabilities of 1/4 if up and down mutations are equally likely.

So expected value of variance (which involves squaring STR repeat differences) is 4 x 2/4 + 0 x 2/4 = 2

Suppose an STR mutates three times between two haplotypes. The three could be up,up,up or up,up,down or up,down,up or down,up,up, or up,down,down, or down,up,down, or down,down,up, or down,down,down
Each of these eight alternatives of three mutations has probability 1/8.

So expected value of variance (which involves squaring STR repeat differences) is 9 x 2/8 + 1 x 6/8 = 3

Etc. Yes, variance totally takes into account back mutations and multiple mutations in same direction.......
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-04/1335537421


What I bolded is a key assumption, that up and down mutations are equally likely(i.e. That they have the same mutation rates) to occur, in reality, they aren't, they have different mutation rates.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 27, 2012, 12:43:27 PM

Today, Ken Nordtvedt posted a response related to some of these questions.
Quote from: Ken Nordtvedt
After all these years folks on (other) forums are still wondering or doubting variance measure between haplotypes properly counts back mutations. It does

Suppose an STR mutates twice between two haplotypes. The two could be up,up or up,down or down,up, or down,down. These four possibilities have equal probabilities of 1/4 if up and down mutations are equally likely.

So expected value of variance (which involves squaring STR repeat differences) is 4 x 2/4 + 0 x 2/4 = 2

Suppose an STR mutates three times between two haplotypes. The three could be up,up,up or up,up,down or up,down,up or down,up,up, or up,down,down, or down,up,down, or down,down,up, or down,down,down
Each of these eight alternatives of three mutations has probability 1/8.

So expected value of variance (which involves squaring STR repeat differences) is 9 x 2/8 + 1 x 6/8 = 3

Etc. Yes, variance totally takes into account back mutations and multiple mutations in same direction.......
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-04/1335537421


What I bolded is a key assumption, that up and down mutations are equally likely(i.e. That they have the same mutation rates) to occur, in reality, they aren't, they have different mutation rates.

But are they different enough to cause the calculations to be out by much, and if so how difficult would it be to correct ?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 27, 2012, 01:03:12 PM

Today, Ken Nordtvedt posted a response related to some of these questions.
Quote from: Ken Nordtvedt
After all these years folks on (other) forums are still wondering or doubting variance measure between haplotypes properly counts back mutations. It does

Suppose an STR mutates twice between two haplotypes. The two could be up,up or up,down or down,up, or down,down. These four possibilities have equal probabilities of 1/4 if up and down mutations are equally likely.

So expected value of variance (which involves squaring STR repeat differences) is 4 x 2/4 + 0 x 2/4 = 2

Suppose an STR mutates three times between two haplotypes. The three could be up,up,up or up,up,down or up,down,up or down,up,up, or up,down,down, or down,up,down, or down,down,up, or down,down,down
Each of these eight alternatives of three mutations has probability 1/8.

So expected value of variance (which involves squaring STR repeat differences) is 9 x 2/8 + 1 x 6/8 = 3

Etc. Yes, variance totally takes into account back mutations and multiple mutations in same direction.......
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-04/1335537421


What I bolded is a key assumption, that up and down mutations are equally likely(i.e. That they have the same mutation rates) to occur, in reality, they aren't, they have different mutation rates.

But are they different enough to cause the calculations to be out by much, and if so how difficult would it be to correct ?

Ken will respond quickly today, I think, if you ask your questions.  
To subscribe to that forum, please send an email to Y-DNA-HAPLOGROUP-I-request@rootsweb.com with the word 'subscribe' without the quotes in the subject and the body of the message.

I'm sure he has thought about your issues and run simulations to determine how he should handle via his methodologies, but if you are uncomfortable that he either hasn't worked this out thoroughly or that he doesn't understand then you should ask him directly. My only advise is to read his web site documentation thoroughly first so we don't waste his time.

My opinion is that Nordtvedt really understands and I know he runs simulations against different mathematically modeled concepts.  I'm not saying you think differently of him, but you should delve into the areas you are concerned about directly.

I've got the conversation going on this topic at the forum Ken frequents, so please jump in.

I think it is easy to misunderstand simple illustrations. One of your concerns is his assumptions that
Quote from: JeanL
that up and down mutations are equally likely(i.e. That they have the same mutation rates) to occur, in reality, they aren't, they have different mutation rates.

I think he is just trying to simply illustrate the concept he is using so that a lay person could understand it.  If you get deeper into these assumptions I think you will find that he has thoroughly investigated errors or aberrant data to determine how to control and report confidence intervals and improve precision.  In other words, you'll quickly dive deep into some math formulas and the understanding of the terms and conditions in the equations will be the terminology of the conversation.

However, please, please, please, do not take my word on any of this. Go to Rootsweb Hg I and ask Ken what you wish.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 27, 2012, 01:17:01 PM
@Mikewww or anyone familiar with Generations7 spreadsheet

Do you know if there is an explanation somewhere as to what math operation Ken uses to account for hidden mutations?  

You should probably look at his formulas and his powerpoint charts where he charts out and tries to explain his methodology.

I think the answer is something along the lines of what John Chandler is saying.  This is from reply #5 of this thread.

A recent conversation from Rootsweb:
Quote from: general question
My own layman's viewpoint has always been to wonder how such unknowable factors like bottle-necks, back mutations, etc. can ever be adequately compensated for
Here is a response from a Scientist at MIT. John Chandler is the guy who calculated the mutation rates most of us use.
Quote from: John Chandler
That "etc." is exactly the difficulty. I'll point out in passing that back mutations are automatically accounted for in the variance method, ...
http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2012-03/1333051203

My understanding of the explanation is that their mathematical model does not care about hidden mutations or even multi-step mutations. The mutation rates were derived based on visible mutations so, as long as they have adequate data to build the mutation rates, the way the TMRCA method uses them is consistent.  We should not think of the published mutation rate as literally the physical rate of change per the STR, but rather the observable rate of change.

What is required is that the STRs act somewhat consistently, in other words the expected (predicted) rates up and down should be the same and the rates shouldn't change given the allele value, etc.   This would be where the concern about STRs reaching saturation and high alleles values comes into play.  If an STR doesn't show linear duration (of its rate) during the timeframe we care about then it is not helpful.   The goal of the math model is to include STRs that are linear or "on average" (in aggregate) linear.

Today, Ken Nordtvedt posted a response related to some of these questions.
Quote from: Ken Nordtvedt
After all these years folks on (other) forums are still wondering or doubting variance measure between haplotypes properly counts back mutations. It does

Suppose an STR mutates twice between two haplotypes. The two could be up,up or up,down or down,up, or down,down. These four possibilities have equal probabilities of 1/4 if up and down mutations are equally likely.

So expected value of variance (which involves squaring STR repeat differences) is 4 x 2/4 + 0 x 2/4 = 2

Suppose an STR mutates three times between two haplotypes. The three could be up,up,up or up,up,down or up,down,up or down,up,up, or up,down,down, or down,up,down, or down,down,up, or down,down,down
Each of these eight alternatives of three mutations has probability 1/8.

So expected value of variance (which involves squaring STR repeat differences) is 9 x 2/8 + 1 x 6/8 = 3

Etc. Yes, variance totally takes into account back mutations and multiple mutations in same direction.......
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-04/1335537421

I continued the conversation and asked him - As far as multi-increment single event mutations, what is the best way to handle them?  In one of my projects I edit what appears to be an
obvious jump before calculating TRMCAs.  In large haplogroup related calculations, these seem to "wash out" so I don't worry about them. Are there recommendations or guidelines for them?

Quote from: Ken Nordtvedt
Problem with multi-step mutations --- and we do know they happen now and then --- is that we do not know their rates very well. Actually, we don't know individual STR single step mutation rates too well.

But if n step mutation rates are known, equal up and down (another fiction), then they are easily included in mutation rate and contribute to variance in a straightforward manner by changing:

m --> m* = m(1) + 4 m(2) + 9 m(3) + .... with m(n) being mutation rate for n step mutations. In other words, don't worry about multiple steps and just evaluate variance according to normal rules, but adjust the STR mutation rates upward according to formula given.

Of course the higher step mutations wreck havoc with statistical sigma for variance after G generations --- sigmas go up significantly.

Note: m ---> m* is necessarily an INCREASE, which DECREASES tmrca estimates.

I also asked - Is the "maximum likelihood" method useful for TMRCA calculations? Does it have advantages or disadvantages? I've looked at Marko Heinila's work on this and his TMRCA estimates for R1b subclades come out very similar to what I get when I use your Generations 7 methodology.

Quote from: Ken Nordtvedt
For branch segments which are not too long, GD and GD-based maximum liklihood should not differ too much from variance as long as many STRs are used.
But there are devils in the details of programs which look for maximum liklihoods of trees; so one should know what's in the black boxes.

I'm not saying Nordtvedt's methodologies are perfect, nor even perfectly precise. They aren't, but at least he takes great care in understanding and testing issues and managing (and reporting) the confidence intervals. His interclade methodology is probably the best thing out there... just my opinion.

Did you notice how he note that the differences in the methodologies shouldn't make much difference if a lot of STRs are used?   That's what I've been trying to say.   If we use a lot STRs and a lot of haplotypes, the results don't vary much through various iterations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 27, 2012, 01:19:02 PM
For the Archive access,

http://archiver.rootsweb.ancestry.com/th/index/Y-DNA-HAPLOGROUP-I



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 27, 2012, 05:48:51 PM

My opinion is that Nordtvedt really understands and I know he runs simulations against different mathematically modeled concepts.  I'm not saying you think differently of him, but you should delve into the areas you are concerned about directly.


The problem is that in a simulation everything will turn out as expected, which is why we collect empirical data to compare against the simulations. Therefore the simulations are used as the benchmark, the ideal that should be, but often times the reality is quite different from the ideal.

There was a recently(2011) published paper called:

Germline mutations of STR-alleles include multi-step mutations as defined by sequencing of repeat and flanking regions (http://www.sciencedirect.com/science/article/pii/S1872497311001505)

Quote from: Dauber et al(2011)
Well defined estimates of mutation rates are a prerequisite for the use of short tandem repeat (STR-) loci in relationship testing. We investigated 65 isolated genetic inconsistencies, which were observed within 50,796 allelic transfers at 23 STR-loci (ACTBP2 (SE33), CD4, CSF1PO, F13A1, F13B, FES, FGA, vWA, TH01, TPOX, D2S1338, D3S1358, D5S818, D7S820, D8S1132, D8S1179, D12S391, D13S317, D16S539, D17S976, D18S51, D19S433, D21S11) in Caucasoid families residing in Austria and Switzerland. Sequencing data of repeat and flanking regions and the median of all theoretically possible mutational steps showed valuable information to characterise the mutational events with regard to parental origin, change of repeat number (mutational step size) and direction of mutation (losses and gains of repeats). Apart from predominant single-step mutations including one case with a double genetic inconsistency, two double-step and two apparent four-stepmutations could be identified. More losses than gains of repeats and more mutations originating from the paternal than the maternal lineage were observed (31 losses, 22 gains, 12 losses or gains and 47 paternal, 11 maternal mutations and 7 unclear of parental origin). The mutation in the paternal germline was 3.3 times higher than in the maternal germline. The results of our study show, that apart from the vast majority of single-step mutations rare multi-step mutations can be observed. Therefore, the interpretation of mutational events should not rigidly be restricted to the shortest possible mutational step, because rare but true multi-stepmutations can easily be overlooked, if haplotype analysis is not possible.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 28, 2012, 08:42:53 AM
I have analyzed the data set I referenced earlier and it is clear to me that up/down rates are not equal.  As Razyn points out though, How important is that issue?

I find that several of Kens assumptions are not met:  equal up down rates and equally likely forward and backward mutations.  For a tightly knit dys loci, as many are, If you are at +1 from the modal, there is little or no probability of mutating to step two.  The only possible step is then back to the modal.

By definition of modal, it is the value most frequently observed, we have no way I can currently think of to count the number of mutations that have occurred from the modal and then back.  We really can't depend on the count on the number of mutations which have occurred in the past at any allele value since we don't know the time history of the mutational process.

I am starting the slow process (for me) of creating a distribution table for as many of the R- L21 dys loci as I can.  It is already pretty obvious that the up/down rates are not similar, as previous data suggests.  The bigger question is the issue of bounded mutations at a dys loci and the impact of multisteps.  For the faster mutators such as CDYa,b it will probably be impossible to sort out single steps from multisteps.  From distributions I have observed in the past the faster mutators approach a distribution which is almost uniform, two to four alleles with similar values and then rapidly falling off tails.  It seems obvious to me that all dys loci have a set of bounded values which is exacerbated by the occasional multistep.

I reiterate my objective as being to determine if we can explain the "fudge factor" of Zhivotovsky from properties of the STR mutational process.  Something is changing the father/son meioses data as the real process plays out.  JMHO.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 28, 2012, 10:49:23 AM
I find that several of Kens assumptions are not met:  equal up down rates and equally likely forward and backward mutations.  For a tightly knit dys loci, as many are, If you are at +1 from the modal, there is little or no probability of mutating to step two.  The only possible step is then back to the modal.

By definition of modal, it is the value most frequently observed, we have no way I can currently think of to count the number of mutations that have occurred from the modal and then back.  We really can't depend on the count on the number of mutations which have occurred in the past at any allele value since we don't know the time history of the mutational process.
I thank you for what you are saying. I have expressed these concepts in the past with my golden principles:
1) mutations happen around the modal
2) there is a convergence to the modal as time passes
3) sometime a mutation goes for the tangent (and we have the outlier)

This said, my analysis of calculating the time passed from the BT haplogroup by counting how the modal of BT has generated other values forwards and backwards, done all the due calculations, perhaps merits to be taken in consideration.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 10:50:21 AM
I have analyzed the data set I referenced earlier and it is clear to me that up/down rates are not equal.  As Razyn points out though, How important is that issue?

I find that several of Kens assumptions are not met:  equal up down rates and equally likely forward and backward mutations.  

Keep in mind that Marko Heinila accounts for up and down mutation rates as well as multi-increment STR mutations and comes out with about the same TMRCA estimations as Nordtvedt's - as I've shown you earlier in this thread.

.... I reiterate my objective as being to determine if we can explain the "fudge factor" of Zhivotovsky from properties of the STR mutational process.  Something is changing the father/son meioses data as the real process plays out.  JMHO.
I'm sorry, I didn't realize that was your objective.

If it is Zhivotosky's "fudge factor" then why should we attempt to explain it? You should consider reviewing his studies closely and determining if you think his conclusions are valid.  If he can't explain to your satisfaction then that it is Zhivotosky who you should be questioning.  

Do you think his rates are the correct ones? If so, why?

In terms of the relative STR diversity between known groups of related people (i.e. haplogroups), the mutation rate doesn't really matter. We aren't applying time to the result when just looking at variance. Their relative positioning amongst themselves (the haplogroups) can be useful information but you can slide the time scale forward or backward as you wish as different assumptions.   Just keep in mind that if you slide the timeline back (slower mutation rates) then you have to account for the archaeology of Homo Sapiens Sapiens' (modern man) appearance in Europe.
Quote from: Wikipedia
H. sapiens reached Europe around 40,000 years ago


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 28, 2012, 10:58:46 AM
The analyses of aDNA in many places of Europe have demonstrated that the TMRCA of that haplotypes, calculated by your method, was at least less for a 2.5 factor.
Voilà Zhivotovsky!


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:02:16 AM
My opinion is that Nordtvedt really understands and I know he runs simulations against different mathematically modeled concepts.  I'm not saying you think differently of him, but you should delve into the areas you are concerned about directly.

The problem is that in a simulation everything will turn out as expected, which is why we collect empirical data to compare against the simulations. Therefore the simulations are used as the benchmark, the ideal that should be, but often times the reality is quite different from the ideal.

Nothing is every ideal, which is what science tries to deal with. Everything I've read from Nordtvedt is that he is evaluating the data and trying to account for what he calls "fictions." That doesn't mean the mathematical models don't work.. it's a matter of precision.

I have no issue with your concern. I don't think your concern has a significant impact given the number of markers I'm evaluating. As I've shown on this thread, the results seem to show a correlation with the known SNP defined phylogenetic Y DNA tree, at least for the R1b subclade.  We know Ken thinks the correlation of STR diversity to time applies to the Hg I subclades appropriately as you can see from his web site.

... but I am NOT an expert so I'll request again:

Please, please, please make your challenges directly to Nordtvedt on the Rootsweb haplogroup I forum. I'm trying to be a proxy for some of your concerns, but you are much better at arguing your point of view than I am so please go ask Ken.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:10:54 AM
The analyses of aDNA in many places of Europe have demonstrated that the TMRCA of that haplotypes, calculated by your method, was at least less for a 2.5 factor.
Voilà Zhivotovsky!
Please be specific on how that is demonstrated.  What level of SNP testing for what sample and in comparison to what hobbyist-scientist's TMRCA estimate? Please specifically direct us to the TMRCA estimate that you are challenging. I'm not well versed in non-R1b SNPs so please provide as much relevant background as you can on the other haplogroups if that is your area of challenge.

BTW, I think you realize that the TMRCA for a group people living today is not necessarily the same thing as the original bearer or an early bearer of an SNP.  The surviving folks' MRCA may have come from a totally different place.

I agree with you that there is a chance that the MRCA may only represent one lopsided branch of an old haplogroup. There is no doubt that this is a real possibility.  What is the question we seek to answer? ...
Is it when was an SNP born or is when was the TMRCA for a group of people living today and where did they come from?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 28, 2012, 11:19:38 AM
As I remember markos work he also had an unexplained 5K to 6K years where he couldn't find enough data between L11 and subsequent mutations?

I have read Zhiv's papers and understand them as well as I can.  He doesn't try to explain the difference in the first paper and has been on a hunt in subsequent papers.  At this point in time, I don't think he knows why either.  An additional point is that he may have not recognized the issues of unique event mutations and multisteps and this omission may be part of his fudge factor for all I know.

I happen to believe it is in the intrinsic mutation process and the models being used to represent the process.  Up/down rates are generally not the same, Many loci are bounded around the modal, these are both observable data and do not match the criterion Ken put forth.  Marko did try to include up/down rates, I'm not sure about multisteps and I don't think he considered/evalueated the impact on his model of bounded dys loci.

I think that Zhivs estimates are better applied to larger time estimates than 1K or so.  Shorter in time than that I can show that father/son rates are directly applicable.  I've applied this technique to both Kerchners family and the Ian Cam and other clans and got sensible results. In both cases the probability of a hidden mutation at the medium to slow dys loci is very  low.

No, I hold no great expectations for Zhiv either.  But, I do believe that the current estimates of time for many of our R-1b subclades is too short?  I am asking Why?  Note: it isn''t only my opinion here, its about half or more of the genetic "guru" community.  So, I am not Don Quixote out chasing windmills.  Reasonable minds have made the same observation.  I think the best chance to resolve the issue is "data mining", which I have some hands-on experience in.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:32:42 AM
As I remember markos work he also had an unexplained 5K to 6K years where he couldn't find enough data between L11 and subsequent mutations?
No, we went through that before. There is no inexplicable 5-6K years. It is just that at the L11 level of the tree there is no peer under L23 to do an interclade age with so he had to go up and over to an "uncle" outside of R-M269 for a valid haplogroup to compare with for interclade calculations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:39:11 AM
... But, I do believe that the current estimates of time for many of our R-1b subclades is too short?  I am asking Why?

Why do you think the estimates from Nordtvedt and Heinila are low? You've listed concerns about up/down mutation rates, multi-steps, etc.  That is fine, but the concerns do not necessarily make the estimates top young. They could be too old. I have no reason to think the estimates and their confidence ranges are automatically always too young.  When do you believe R1b entered or was born in Europe?

Note: it isn''t only my opinion here, its about half or more of the genetic "guru" community.  So, I am not Don Quixote out chasing windmills.  Reasonable minds have made the same observation.  I think the best chance to resolve the issue is "data mining", which I have some hands-on experience in.

I want to be clear that I agree mutation rates are an open issue. This is why I generally only calculate STR variance myself and only use other peoples' (i.e. Nordtvedt's or Heinila's) TMRCA estimates.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:43:51 AM
... So, I am not Don Quixote out chasing windmills....

I've never brought up Don Quixote, but since you've brought it up it is interesting that we have folks they have "slayed sacred monsters" - or something along that line.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 28, 2012, 12:12:46 PM

Nothing is every ideal, which is what science tries to deal with. Everything I've read from Nordtvedt is that he is evaluating the data and trying to account for what he calls "fictions." That doesn't mean the mathematical models don't work.. it's a matter of precision.

Well tell me something I don’t already know. Ok, I don’t mean to be harsh, but yeah, the main problem here is the great number of assumptions that come into play on the “data” that Dr.Nordtvedt is evaluating. Like I said before, when we look at father-son’s pairs, we know what the allele at a given loci is, we know in that sense what the modal(ancestral) is, but even in that scenario there  could be cases of somatic mutations. As you can see in the study I provided, which used 23 autosomal STRs, which actually are a lot more stable(Which is self-evident in the case that even at the autosomal level, the paternal germ-line is three folds the maternal germline) than Y-chromosome STRs, there were 31 losses vs.22 gains. If one was to run a mathematical simulation across a generation you will never get a disparity of 9 mutations, at best 1-2. So again, this is family pairs, where all factors are known, and with a good precision we can determine mutation rates.

When it comes to the X Clan which has 100 participants, and a presumed common ancestor who lived 1300 ybp, the ancestral haplotype of that ancestor isn’t known, whether everyone in that generation is equally far removed, or at least on average far removed x generations from that ancestor isn’t know, the mutation rate is assumed to be constant, or to change very little, in order to estimate mutation rates in a time span of 1300 ybp. Often times the mutation rate will remain constant, because even with the fastest STRs the most mutations that will happen is 2 or 3 in 1300 ybp. However when one gets into 3000, 4000, 5000 ybp those fast mutating STRs probably incremented their mutation rate quote significantly along the process and one is presuming that 2*10^-3 measured for loci DYS-B from Clan X would do it, but in reality the average mutation rate could be something like 0.5*10^-3 for loci DYS-B when the time span is 3000-5000 ybp. 

I have no issue with your concern. I don't think your concern has a significant impact given the number of markers I'm evaluating. As I've shown on this thread, the results seem to show a correlation with the known SNP defined phylogenetic Y DNA tree, at least for the R1b subclade.  We know Ken thinks the correlation of STR diversity to time applies to the Hg I subclades appropriately as you can see from his web site.


... but I am NOT an expert so I'll request again:

Please, please, please make your challenges directly to Nordtvedt on the Rootsweb haplogroup I forum. I'm trying to be a proxy for some of your concerns, but you are much better at arguing your point of view than I am so please go ask Ken.

Of course you don’t think my concern will impact your work, of course none of the folks who are averaging mutation rates, and ignoring changes(loss of linearity) in mutation rate think it will have any impact in their work. If they did think so, then they wouldn’t do it, I don’t think anyone is going to waste a lot of precious time in something they know is wrong.  As for directing my challenges, I don’t want to challenge Dr.Nordtvedt, on the contrary, I prefer to run my own experiments on a case-control scenario, and see what results they yield.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 28, 2012, 12:28:07 PM
I'm not well versed in non-R1b SNPs so please provide as much relevant background as you can on the other haplogroups if that is your area of challenge.
Amongst the thousands of letters I wrote, I have found this about the hg. G2a found in France on a site of 7000 years ago. What is my thinking is said here by Oh-Willeke and Argiedude, so that you doesn’t think I am alone:

Looking closely at the median joining network, the case for the Northern Caucasus as the source of G2a almost anywhere else seems remote. Instead, the Northern Caucasus looks like a recipient of G2a from Mediterranean or the Middle East, and honestly, the Western Mediterranean looks like the best fit for a Northern Caucasus G2a source based on that network with the original G2a bearing men in the Northern Caucasus in this scenario probably arriving by sea, rather than overland from the Middle East, and the migration probably taking place at some time before R1b became common in the Mediterranean, but probably post-Neolithic revolution.

The network is also suggestive of the idea that the Treilles group may have had immediate antecedent in Italy and only more remote antecedents in Iberia.

This direction of migration is quite unexpected (Oh-Willeke).

All TMRCA estimates always produce ages of just 2000 to 5000 years. So apparently the world can trace its y-dna to just 2 or 3 dozen men that lived barely 3000 years ago. It's a little more likely that the entire theory of TMRCA is a piece of ssss (must I mean that all these spreadsheets are spreadshits?) And this study has found a sample that wasn't supposed to exist 5000 years ago, hell, not even 2500 years ago (Argiedude).

Very good you both! Some years ago, after having read a paper of Yunusbaev, I said that the centre of hg. G was the Adriatic, and about the TMRCA I have written a lot on the mutations around the modal laughing of some know-all like Nordtvedt and Klyosov.
Very good you both and a third: me.

P.S. Watching the paper of Yunusbaev, the center of diffusion of hg. G (probably G2a) was Sardinia.

    Re: Neolithic Farmers and the Spread of Indo-European, The Case for Euphratic, etc. « Reply #69 on: June 04, 2011, 04:34:44 AM »



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 28, 2012, 12:31:52 PM
I've never brought up Don Quixote, but since you've brought it up it is interesting that we have folks they have "slayed sacred monsters" - or something along that line.
Some sacred monster of mtDNA has been slayed also recently in this forum, and not by me only.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 28, 2012, 12:51:39 PM

Nothing is every ideal, which is what science tries to deal with. Everything I've read from Nordtvedt is that he is evaluating the data and trying to account for what he calls "fictions." That doesn't mean the mathematical models don't work.. it's a matter of precision.

Well tell me something I don’t already know. Ok, I don’t mean to be harsh, but yeah, the main problem here is the great number of assumptions that come into play on the “data” that Dr.Nordtvedt is evaluating. Like I said before, when we look at father-son’s pairs, we know what the allele at a given loci is, we know in that sense what the modal(ancestral) is, but even in that scenario there  could be cases of somatic mutations. As you can see in the study I provided, which used 23 autosomal STRs, which actually are a lot more stable(Which is self-evident in the case that even at the autosomal level, the paternal germ-line is three folds the maternal germline) than Y-chromosome STRs, there were 31 losses vs.22 gains. If one was to run a mathematical simulation across a generation you will never get a disparity of 9 mutations, at best 1-2. So again, this is family pairs, where all factors are known, and with a good precision we can determine mutation rates.

When it comes to the X Clan which has 100 participants, and a presumed common ancestor who lived 1300 ybp, the ancestral haplotype of that ancestor isn’t known, whether everyone in that generation is equally far removed, or at least on average far removed x generations from that ancestor isn’t know, the mutation rate is assumed to be constant, or to change very little, in order to estimate mutation rates in a time span of 1300 ybp. Often times the mutation rate will remain constant, because even with the fastest STRs the most mutations that will happen is 2 or 3 in 1300 ybp. However when one gets into 3000, 4000, 5000 ybp those fast mutating STRs probably incremented their mutation rate quote significantly along the process and one is presuming that 2*10^-3 measured for loci DYS-B from Clan X would do it, but in reality the average mutation rate could be something like 0.5*10^-3 for loci DYS-B when the time span is 3000-5000 ybp. 

I have no issue with your concern. I don't think your concern has a significant impact given the number of markers I'm evaluating. As I've shown on this thread, the results seem to show a correlation with the known SNP defined phylogenetic Y DNA tree, at least for the R1b subclade.  We know Ken thinks the correlation of STR diversity to time applies to the Hg I subclades appropriately as you can see from his web site.


... but I am NOT an expert so I'll request again:

Please, please, please make your challenges directly to Nordtvedt on the Rootsweb haplogroup I forum. I'm trying to be a proxy for some of your concerns, but you are much better at arguing your point of view than I am so please go ask Ken.

Of course you don’t think my concern will impact your work, of course none of the folks who are averaging mutation rates, and ignoring changes(loss of linearity) in mutation rate think it will have any impact in their work. If they did think so, then they wouldn’t do it, I don’t think anyone is going to waste a lot of precious time in something they know is wrong.  As for directing my challenges, I don’t want to challenge Dr.Nordtvedt, on the contrary, I prefer to run my own experiments on a case-control scenario, and see what results they yield.


I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 28, 2012, 03:19:10 PM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 28, 2012, 03:57:17 PM
I added up all the Ups and downs in MikeW's L21 ALL 67 marker spreadsheet by STR and totaled.

More or less from Modal
Allele values higher than the modal: 41789 57%
Allele value lower than Modal: 31974 43%
Out of 67 markers there were 49 STRs had a majority up and 18 down. (73 vs 27%)




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 28, 2012, 04:00:41 PM
MikeW,

Have you ever tried to incorporate Chandlers mutation rate calculation formula into your spreadsheets to see there was differences in various Haplogroups or Subclades?

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 28, 2012, 04:06:18 PM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.

 The distribution about the modal is close in both sets of data.  Whether 5% of the mutations are multisteps is hard to discern.  Look at the behavior of 388 in R1b, I and J.  I think you'll be surprised?

An additional comment would be that this is a very slow mutator and equal values for 13 and 14 in the 312 data set is interesting.  In general, there aren't a lot mutations at this loci.  Two options might be: a. a mutation from 12 to 13 and then from 13 to 14.  Once at 14, there may be descendants who carry that unusual value, or b. a multistep with the same scenario as above.  i.e.  we don't know if the proliferation of 14 is a family event or random event?  I've seen data where, an apparent multistep occurred of about 4 steps and then a population built up around that value.  In the case of 388, we observe a modal 12 the E Hgs, and then successively higher for G(50%  12 and 50% 13),I (14) and J(15) and then a return to 12 for R1a and R1b, as if they had evolved directly E3a and E3b. This data is obtained from the dataset I referenced earlier. Note the width of the distribution for E3a and b is essentially 3, only two states for G, 4 for I and 6 for J.  R1a is unusual in that the modal is 12 (.83) and the allele state 10 has .15.  Clearly a multistep occurred and then a population growth occurred?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 28, 2012, 04:26:29 PM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.

  The distribution about the modal is close in both sets of data.  Whether 5% of the mutations are multisteps is hard to discern.  Look at the behavior of 388 in R1b, I and J.  I think you'll be surprised?

Exactly but the modal is different in both cases, you seem to be suggesting (I think) that there is something special about the modal value itself that causes these distribution patterns but it's just normal distribution.

I will have a closer look at DYS388 but I don't think I will be terribly surprised, I just had a look at a small surname project that I knew had tight groups in J2b, U106, L21, I1 and E1b1b1. They had different values for that loci which isn't a great surprise considering the time to the common ancestor.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 28, 2012, 04:30:41 PM
DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.
Looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.
There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.
Actually these data are very interesting. They demonstrate that probably R-P312 is a little bit older that R-U106, that, casually, the mutation U106 happened in a man with DYS492=13, and, as the DYS492=13 were very few amongst the ancestor R-L11, this mutation happened amongst those few rather than the great majority with DYS492=12. It is also clear that from then the mutations forwards happen with a slower mutations rate because they start from 13 and not 12, or, better, the mutation from 13 to 14 happens with a faster mutation rate, but from 14 to 15 the mutation is much slower. The mutations forwards are faster than those downwards, but the values of 9 for R-P312 and 10 and 15 for R-U106 are next to the saturation.
Unfortunately we haven’t the data of SMGF about this marker. The mutation rate of DYS492 (Ballantyne) is very slow (3.92x10-4), then the lost of about 5% of the modal may also mean many thousands of years.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 28, 2012, 04:35:11 PM
DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.
Looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.
There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.
Actually these data are very interesting. They demonstrate that probably R-P312 is a little bit older that R-U106, that, casually, the mutation U106 happened in a man with DYS492=13, and, as the DYS492=13 were very few amongst the ancestor R-L11, this mutation happened amongst those few rather than the great majority with DYS492=12. It is also clear that from then the mutations forwards happen with a slower mutations rate because they start from 13 and not 12, or, better, the mutation from 13 to 14 happens with a faster mutation rate, but from 14 to 15 the mutation is much slower. The mutations forwards are faster than those downwards, but the values of 9 for R-P312 and 10 and 15 for R-U106 are next to the saturation.
Unfortunately we haven’t the data of SMGF about this marker. The mutation rate of DYS492 (Ballantyne) is very slow (3.92x10-4), then the lost of about 5% of the modal may also mean many thousands of years.


Funnily enough I agree with most of what you've written here, which doesn’t happen very often.

You can't say that U106 is younger than P312 from this data alone though, however I thing it probably is but not by much.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 28, 2012, 06:22:33 PM
In general, I don't believe the modal changes over time for a Hg.  So, I'm not emphasizing the modal value.  Each distribution about the modal appears to be unique.  In part, because of multistep occurrences.  It would be great if the board had someone with a chemical kinetics background who could help enlighten us.

My whole thrust is to get us interested in looking at the data which has been generated and try to understand it.  The model has to fit the data and not the other way around.  One problem with physicists in general is that they love to postulate a theory, the less the data the better, and then prove that it is correct through modelling.   My approach is to use the data to suggest the best model.  FWIW I have an MS in physics.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:04:36 PM
MikeW,

Have you ever tried to incorporate Chandlers mutation rate calculation formula into your spreadsheets to see there was differences in various Haplogroups or Subclades?

MJost

I don't apply mutation rates to any formulas in the Haplotype_Data spreadsheets i post. For any time calculation, I use whatever Ken Nordtvedt has in the Y DNA Tools Generations 7 methodology.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:07:41 PM
DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.
Looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.
There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.
Actually these data are very interesting. They demonstrate that probably R-P312 is a little bit older that R-U106, that, casually, the mutation U106 happened in a man with DYS492=13, and, as the DYS492=13 were very few amongst the ancestor R-L11, this mutation happened amongst those few rather than the great majority with DYS492=12. It is also clear that from then the mutations forwards happen with a slower mutations rate because they start from 13 and not 12, or, better, the mutation from 13 to 14 happens with a faster mutation rate, but from 14 to 15 the mutation is much slower. The mutations forwards are faster than those downwards, but the values of 9 for R-P312 and 10 and 15 for R-U106 are next to the saturation.
Unfortunately we haven’t the data of SMGF about this marker. The mutation rate of DYS492 (Ballantyne) is very slow (3.92x10-4), then the lost of about 5% of the modal may also mean many thousands of years.

Funnily enough I agree with most of what you've written here, which doesn’t happen very often.

You can't say that U106 is younger than P312 from this data alone though, however I thing it probably is but not by much.

According to the relative STR variance calculations I've been doing the last couple of years, I always get U106 as younger than P312 as well.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 28, 2012, 11:21:43 PM

Nothing is every ideal, which is what science tries to deal with. Everything I've read from Nordtvedt is that he is evaluating the data and trying to account for what he calls "fictions." That doesn't mean the mathematical models don't work.. it's a matter of precision.

Well tell me something I don’t already know. Ok, I don’t mean to be harsh, but yeah, the main problem here is the great number of assumptions that come into play on the “data” that Dr.Nordtvedt is evaluating. Like I said before, when we look at father-son’s pairs, we know what the allele at a given loci is, we know in that sense what the modal(ancestral) is, but even in that scenario there  could be cases of somatic mutations. As you can see in the study I provided, which used 23 autosomal STRs, which actually are a lot more stable(Which is self-evident in the case that even at the autosomal level, the paternal germ-line is three folds the maternal germline) than Y-chromosome STRs, there were 31 losses vs.22 gains. If one was to run a mathematical simulation across a generation you will never get a disparity of 9 mutations, at best 1-2. So again, this is family pairs, where all factors are known, and with a good precision we can determine mutation rates.

When it comes to the X Clan which has 100 participants, and a presumed common ancestor who lived 1300 ybp, the ancestral haplotype of that ancestor isn’t known, whether everyone in that generation is equally far removed, or at least on average far removed x generations from that ancestor isn’t know, the mutation rate is assumed to be constant, or to change very little, in order to estimate mutation rates in a time span of 1300 ybp. Often times the mutation rate will remain constant, because even with the fastest STRs the most mutations that will happen is 2 or 3 in 1300 ybp. However when one gets into 3000, 4000, 5000 ybp those fast mutating STRs probably incremented their mutation rate quote significantly along the process and one is presuming that 2*10^-3 measured for loci DYS-B from Clan X would do it, but in reality the average mutation rate could be something like 0.5*10^-3 for loci DYS-B when the time span is 3000-5000 ybp.  

I have no issue with your concern. I don't think your concern has a significant impact given the number of markers I'm evaluating. As I've shown on this thread, the results seem to show a correlation with the known SNP defined phylogenetic Y DNA tree, at least for the R1b subclade.  We know Ken thinks the correlation of STR diversity to time applies to the Hg I subclades appropriately as you can see from his web site.


... but I am NOT an expert so I'll request again:

Please, please, please make your challenges directly to Nordtvedt on the Rootsweb haplogroup I forum. I'm trying to be a proxy for some of your concerns, but you are much better at arguing your point of view than I am so please go ask Ken.

Of course you don’t think my concern will impact your work, of course none of the folks who are averaging mutation rates, and ignoring changes(loss of linearity) in mutation rate think it will have any impact in their work. If they did think so, then they wouldn’t do it, I don’t think anyone is going to waste a lot of precious time in something they know is wrong.  As for directing my challenges, I don’t want to challenge Dr.Nordtvedt, on the contrary, I prefer to run my own experiments on a case-control scenario, and see what results they yield.

This is not work, just a hobby so I'm not worrying at night too much about it.  I try to apply whatever models apply as best I can to the data I have - primarily P312 subclades, particularly L21.  That's it.  I've just found Dr. Nordtvedt's work impressive.

Contrary to your opinion, if you can come up with something specific I'll try to at least run some data through it and see what comes up. Based on Ironroads concerns and Busby's concerns about STR variance saturation, I took direction from Ironroads to understand what Heinila did to evaluate STR linearity. I implemented and use Heinila's work in the STR variance results I display. I've tried to normalize variance between STRs so no one STR outweighed another, but unfortunately I couldn't come up with a normalization technique that didn't throw out inconsistent results. Vince Vizachero has since explained that it was a difficult challenge and probably not worth the effort. I really don't care who's method does what, just whatever works best that I can implement (and of course, understand.)

If you don't want challenge Nordtvedt directly, that's fine, but why critique his work in absentia when you can do it directly with him?  If your arguments are strong, it will be apparent on the Rootsweb forum and we'll all know.

If you really want to progress the situation, please consider that if you convince him of the significance of your work then he will probably try to figure out how to improve existing models with your concept.  Everything I've seen from him is that he is sincere in making improvements to make a model that works. Wikipedia says he has shown a "a fiendish piece of mathematics" related to space work. I've seen the innovation he implemented with interclade calculations.  He's a guy you want on your side as he is adept at creating mathematical models.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 29, 2012, 12:54:25 AM
What have I to say to Ken Nordtvedt? I agree with you that he is a kind person. I have corresponded many times with him, firstly on Rootsweb and after my banishment on Dna-forums beyond many private letters. On Dna-forum I unmasked his nickname and he worried with some moderators. Finally I was banned also from there.

If we take the mutations of DYS492 we spoken above, we have:

3.92.10-4= 1 mutation every 2551 generations

The lost of the modal is about 5% in R-P312 and R-U106, calculating that there have been some back mutations.
(2551:100)x5=127.55

127.55x32=4081.6, i.e. about your dates.

What there is wrong in this?
What I have said about the outlier in R1a1a.
By doing this the outliers, which testify the mutations happened not overwhelmingly around the modal, are cut off, i.e. the haplotypes more “modal” or simply the haplotypes survived get the upper hand. If this isn’t important in a short period, it becomes important for a long one. See my letter above, where the aDNA found in Europe is underestimated for a 2.5 factor.

Here there is the mistake, that no mathematics may adjust.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on April 29, 2012, 02:12:42 AM
If you don't want challenge Nordtvedt directly, that's fine, but why critique his work in absentia when you can do it directly with him?  If your arguments are strong, it will be apparent on the Rootsweb forum and we'll all know.

When have I criticized Dr.Nordtvedt’s work directly? You have brought up certain assumptions made by folks on the hobbyist community, and I have explained my skepticism to some of them. I haven’t interacted much with Dr.Nordtvedt, but from what I have seen online, he seems like a reasonable, honest, humble person. I actually have argued before with Klyosov in other forums, and would gladly tell him straight to his face my concerns about his methodology, as I have done before. However, all I get from Klyosov is a cesspool of fallacious arguments, and the cowardice of posting fragments of our discussion in his Journal of Genetic Genealogy without giving the opportunity to defend my arguments, so I’m done with him, as to me, he is nothing but an arrogant …….


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 29, 2012, 07:23:15 AM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.

  The distribution about the modal is close in both sets of data.  Whether 5% of the mutations are multisteps is hard to discern.  Look at the behavior of 388 in R1b, I and J.  I think you'll be surprised?

Exactly but the modal is different in both cases, you seem to be suggesting (I think) that there is something special about the modal value itself that causes these distribution patterns but it's just normal distribution.

I will have a closer look at DYS388 but I don't think I will be terribly surprised, I just had a look at a small surname project that I knew had tight groups in J2b, U106, L21, I1 and E1b1b1. They had different values for that loci which isn't a great surprise considering the time to the common ancestor.


I've had a look at DYS388 in the I-L22 and P312 projects, it would have been pointless using the I1 project as that would have been comparing apples and oranges, L22 is thought to be about the same sort of age as P312 so should have similar variance.

P312

8      0.10%
10      0.10%
11      0.21%
12      97.41%
13      1.86%
14      0.31%

I-L22

12      0.60%
13      1.79%
14      93.45%
15      3.57%
16      0.60%


There are points of interest here but again both sets of data are remarkably similar, apart from the fact that the modal value for P312 is 12 and L22 14.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on April 29, 2012, 10:32:00 AM
MikeW,

Have you ever tried to incorporate Chandlers mutation rate calculation formula into your spreadsheets to see there was differences in various Haplogroups or Subclades?

MJost

I don't apply mutation rates to any formulas in the Haplotype_Data spreadsheets i post. For any time calculation, I use whatever Ken Nordtvedt has in the Y DNA Tools Generations 7 methodology.

I asked if you ever tried as you have the sheer numbers.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 29, 2012, 11:23:54 AM
What have I to say to Ken Nordtvedt? I agree with you that he is a kind person. I have corresponded many times with him, firstly on Rootsweb and after my banishment on Dna-forums beyond many private letters. On Dna-forum I unmasked his nickname and he worried with some moderators. Finally I was banned also from there.

If we take the mutations of DYS492 we spoken above, we have:

3.92.10-4= 1 mutation every 2551 generations

The lost of the modal is about 5% in R-P312 and R-U106, calculating that there have been some back mutations.
(2551:100)x5=127.55

127.55x32=4081.6, i.e. about your dates.

What there is wrong in this?
What I have said about the outlier in R1a1a.
By doing this the outliers, which testify the mutations happened not overwhelmingly around the modal, are cut off, i.e. the haplotypes more “modal” or simply the haplotypes survived get the upper hand. If this isn’t important in a short period, it becomes important for a long one. See my letter above, where the aDNA found in Europe is underestimated for a 2.5 factor.

Here there is the mistake, that no mathematics may adjust.

Why do you keep calculating TMRCA estimates based on one STR only?  Everything I read from the hobbyist scientists like Nordtvedt, Klyosov and Chandler, is that using more STRs (more experiments) is better. I think at the level of only one or two STRs, I'm not even sure it is worth doing.  The population genetics scientists even use at least ten or so, and they don't have much money so I'm sure they'd use more if they could.

Nordtvedt and Chandler are clear to say back-mutations are handled in their calcuations.

For R1a, you should probably take up this discussion with Anatole Klyosov. He is very interested in R1a, no doubt.  What TMRCA estimate (and by who) are you arguing against with your R1a1a example?



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 29, 2012, 11:30:15 AM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.

 The distribution about the modal is close in both sets of data.  Whether 5% of the mutations are multisteps is hard to discern.  Look at the behavior of 388 in R1b, I and J.  I think you'll be surprised?

Exactly but the modal is different in both cases, you seem to be suggesting (I think) that there is something special about the modal value itself that causes these distribution patterns but it's just normal distribution.

I will have a closer look at DYS388 but I don't think I will be terribly surprised, I just had a look at a small surname project that I knew had tight groups in J2b, U106, L21, I1 and E1b1b1. They had different values for that loci which isn't a great surprise considering the time to the common ancestor.


I've had a look at DYS388 in the I-L22 and P312 projects, it would have been pointless using the I1 project as that would have been comparing apples and oranges, L22 is thought to be about the same sort of age as P312 so should have similar variance.

P312

8      0.10%
10      0.10%
11      0.21%
12      97.41%
13      1.86%
14      0.31%

I-L22

12      0.60%
13      1.79%
14      93.45%
15      3.57%
16      0.60%


There are points of interest here but again both sets of data are remarkably similar, apart from the fact that the modal value for P312 is 12 and L22 14.


I understand what you are showing and I applaud the effort, but I still think an analysis of only one or two STRs is not enough to rely on.  There could be aberrations in any one STR in any one haplogroup. An example is U106's L1 subclade which has a lot of DYS439=null.

... but I think your point above for P312 and L22 is still valid.  Even though the modal for P312 and L22 are different, the dispersion is similar.    

This is just my perspective in looking at a glass of water and saying it is half full versus half empty, but what I think that people on this forum call "convergence around the modal" is really just divergence from the ancestral for scattered extant fairly young branches on the Y DNA tree.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 29, 2012, 11:34:46 AM
MikeW,

Have you ever tried to incorporate Chandlers mutation rate calculation formula into your spreadsheets to see there was differences in various Haplogroups or Subclades?

MJost

I don't apply mutation rates to any formulas in the Haplotype_Data spreadsheets i post. For any time calculation, I use whatever Ken Nordtvedt has in the Y DNA Tools Generations 7 methodology.

I asked if you ever tried as you have the sheer numbers.
I must not understand your question. Are you asking if I've run TMRCA calculations with multiple haplogroups? 

I've done that using Gen 7 for the major R-L11 haplogroups. That is displayed at the R-P312 Yahoo group Files under Haplogroup_Timeline_R-L11_Subclades.gif


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 29, 2012, 11:52:56 AM
Mikewww says: “Why do you keep calculating TMRCA estimates based on one STR only?  Everything I read from the hobbyist scientists like Nordtvedt, Klyosov and Chandler, is that using more STRs (more experiments) is better. I think at the level of only one or two STRs, I'm not even sure it is worth doing. The population genetics scientists even use at least ten or so, and they don't have much money so I'm sure they'd use more if they could.
Nordtvedt and Chandler are clear to say back-mutations are handled in their calcuations.

For R1a, you should probably take up this discussion with Anatole Klyosov. He is very interested in R1a, no doubt.  What TMRCA estimate (and by who) are you arguing against with your R1a1a example?”

Mine was of course only an experiment and I made many caveats. I did the same with the data posted about DYS388 and got different results, but I think that something interesting there is in these data.

My calculation about R1a1a has tried to demonstrate only that my results using the germ-line mutation rate weren’t different from those of the paper by using the Zhiv rate, this if we find an outlier like this. Of course the Zhiv rate was applied wrongly by those scholars, because they didn’t separate the haplotypes and the most diffused one falsified the calculation, otherwise with the Zhiv rate the age would have been at least 3 times older.
I have said here or in another thread that I am not against Nordtvedt, who said clearly that an hg I old of 6000 years had an interclade of 20000 years with another I. This demonstrates what we all know: that the most part of the lines are extinct.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 29, 2012, 12:20:25 PM

I understand what you are showing and I applaud the effort, but I still think an analysis of only one or two STRs is not enough to rely on.  There could be aberrations in any one STR in any one haplogroup. An example is U106's L1 subclade which has a lot of DYS439=null.

... but I think your point above for P312 and L22 is still valid.  Even though the modal for P312 and L22 are different, the dispersion is similar.    

This is just my perspective at looking at a glass of water and saying it is half full versus half empty, but I what I think that people on this forum call "convergence around the modal" is really just divergence from the ancestral for scattered extant fairly young branches on the Y DNA tree.

Of course null values are a different story and shouldn't be included in variance calculations since they have nothing to do with STRs, however I understand and agree with your point.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 29, 2012, 01:08:24 PM
Mikewww says: “This is just my perspective at looking at a glass of water and saying it is half full versus half empty, but I what I think that people on this forum call "convergence around the modal" is really just divergence from the ancestral for scattered extant fairly young branches on the Y DNA tre”.

Actually I haven’t spoken of “convergence around the modal” but of “mutations around the modal” and “convergence to the modal”.
For instance: hg. R1b1* etc: DYS426= 12, 11, 12, 13 ,12…. And these mutations are the modals of R1b1, R-M269, R-L23, R-L51, R-P312 etc.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 29, 2012, 07:04:42 PM
...
Actually I haven’t spoken of “convergence around the modal” but of “mutations around the modal” and “convergence to the modal”.
For instance: hg. R1b1* etc: DYS426= 12, 11, 12, 13 ,12…. And these mutations are the modals of R1b1, R-M269, R-L23, R-L51, R-P312 etc.
I confess I don't understand the finer points of "mutations around" and "convergence to" the modal. 

For the whole hypothesis of using Y STRs as molecular clocks, I think what is important is that there is some kind of relatively consistent divergence from the ancestral allele. I mean, that is an important premise for the hypothesis. Do you agree?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 30, 2012, 12:11:57 AM
Oh yes, I agree, and it could be useful if we do some caveats:
1)   mutations aren’t linear, but there are back mutations and also multi step ones, if very rare I think
2)   every line is new and begins a new cycle, i.e. R-L51+ happened in a man with DYS426=13 (or all the R-L51-s known so far descend from a similar man), then 13 is the modal value and from there we should calculate the mutations rate etc. To consider all R-L11 like an unicum is wrong
3)   I’d say that every familiar line is a new beginning
4)   How are counted the mutations of DYS426 in hg. R1b without taking in consideration what I have said?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 30, 2012, 04:38:57 AM
Indeed it is. The MRCA of the Z220 six is 4590!!. At the same time the MRCA of Z196xL176xM153xZ209-xZ220 (n=29) is 2733!!
Z196xL175 starts as a group with a MRCA of 3606; alter exclusion of M153 (MRCA 1068) it is 3888; without z209 (MRCA 4511) is changes into 3256; leaving Z220 out, it becomes 2733.
All these MRCA do not include RecLOH events etc.
Hans van Vliet

Here a Median Joining analysis after Star Contraction and Media Parsimony the Z220 six. We really need more Z209++ to test for Z220.
I'm very unsure how the old age of Z220 will connect easily with M153 though the Z278 and Z214 nodes.
Looking at Marko H's analysis it may come with a surprise.
Hans van Vliet
http://dl.dropbox.com/u/74936451/z220.pdf

The pair wise mismatch analysis of 67 STR markers of the Z196xL176 group shows a beautiful bell shape; indicative for a fast growing population.
Excluding the M153 subpopulation makes the profile more rugged showing that a more steady state in the growth had occurred.
The Z220 clade itself has a very rugged pair wise mismatch profile: a stabilising society?
Age and growth patterns hint a more slightly horizontal development in the downstream clades of Z196; with M153 way down the stream.
Hans

Mikewww, look at all these interesting analyses of Hans van Vliet in another thread and you’ll get a picture of all my theories (and doubts).


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on April 30, 2012, 04:54:19 AM
This reasoning is probably the same done by Klyosov about haplogroup A0, that it has nothing to do with A1b etc., but probably is of a hominid mixed with the incomers from Eurasia, the same of Neanderthals or Denisovians, even though probably less old.
The same happens also amongst closer lines, but which have some discontinuities, i.e. many lines lack because they are extinct. The same of course happens also in the mtDNA, from this many mistakes also in the last classification of Behar et al.
And what is an outlier if not the surviving of a line distant from the others survived? Perhaps the analysis of Zhivotovsky is right because he takes in consideration people of the Austro-Asian migrants, who didn’t meet again after the separation. These are pure lines, if I may say so.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 30, 2012, 09:47:09 AM
I think that your comments are correct and timely.  As I said on my thread I will try to present data that shows that a large number of STR's are bounded in their range, which implies that hidden mutations are prevalent.  I think that hidden mutations will produce the same observable effect you cite, an apparent decrease in mutation rate over longer time spans.  I know the Maori and Gypsy data Zhiv used was not more than a thousand or so years, but if he was using the Faster mutators such as CDYa,b; then I would expect a similar observation would be made and apparently thats what he did?

I think it's at least reasonably clear that loci don't mutate equally up or down also it's likely that loci with different values will have slightly (probably very slightly) diffrent mutation rates.

However how big is this issue and do we need to worry about it that much, especially when talking about L11 and its offspring ?

DYS492 offers a convenient way to see if loci are likely to behave in a dramatically different way after a mutation.

looking at all the values for this loci in the P312 project I found

9         0.10%
10         0.21%
11         0.94%
12         95.60%
13         1.68%
14         1.47%

where as in the U106 project we see

10         0.12%
12         1.06%
13         95.89%
14         2.82%
15         0.12%


Both sets of data look remarkably similar apart from the obvious fact that 12 is modal in the first set and 13 in the second.

There definitely doesn’t seem to be any particular tendency for U106 to try and get back to a value of 12 and I think it's reasonable to conclude that if somebody who was P312 had a value of 13 at this loci that it would behave in exactly the same way as somebody who had 13 and was U106.

 The distribution about the modal is close in both sets of data.  Whether 5% of the mutations are multisteps is hard to discern.  Look at the behavior of 388 in R1b, I and J.  I think you'll be surprised?

An additional comment would be that this is a very slow mutator and equal values for 13 and 14 in the 312 data set is interesting.  In general, there aren't a lot mutations at this loci.  Two options might be: a. a mutation from 12 to 13 and then from 13 to 14.  Once at 14, there may be descendants who carry that unusual value, or b. a multistep with the same scenario as above.  i.e.  we don't know if the proliferation of 14 is a family event or random event?  I've seen data where, an apparent multistep occurred of about 4 steps and then a population built up around that value.  In the case of 388, we observe a modal 12 the E Hgs, and then successively higher for G(50%  12 and 50% 13),I (14) and J(15) and then a return to 12 for R1a and R1b, as if they had evolved directly E3a and E3b. This data is obtained from the dataset I referenced earlier. Note the width of the distribution for E3a and b is essentially 3, only two states for G, 4 for I and 6 for J.  R1a is unusual in that the modal is 12 (.83) and the allele state 10 has .15.  Clearly a multistep occurred and then a population growth occurred?
 I have begun to develop observations re: the properties of the data set R - Z253+.  My initial work is with 74 entries and 23 of 37 dys loci.  Initally, I am calculating the TMRCA dys loci by dys loci, using the Burgurella mutation rates.  I assumed all mutations were single step, regardless of step size.  In my experience, it takes a while to become familiar with a data set and their may be some incorrect observations/results?

Note:  I included all except for Z253-.  So 226,554 and 895 which are younger than 253 are included. From what I can gather from Machiavelli/VanVliet this will decrease the overall TMRCA's.

The results are highly variable.  Six dys loci give TMRCA's of approximately8K to 10K BP.  These are:391,392,456,576 and 442.  392 is the cleanest, it has 6 mutations with one more than +/-1 from the modal.  The others are almost all bimodal and I used the allele value with the highest number of apparent mutations as the modal.  In the case of 391 there are 45 11, 25 10's and 1 12.  456 has 1 at 12, 1 at 14, 33 at 15, 29 at 16 and 8 at 17.  the other two are similar to these two.  391 has always been interesting to me, it is as if a man had a mutation from 11 to 10 and then he had two sons one of which had a mutation 10 to 11 (or vice versa).  These two brothers then began two dominant lines which we still have today??

Contrary to the above we have TMRCA's as follows:  390 = 1655; 19 = 1250; 388  = 3464; 426 = 0; 455 and 454 = 600 BP etc.

Comments:  I did not include, 385a,b; 389i,ii; 459a,b; 464a,b,c,d; YCAIIa,b and CDYa,b.  Also, I have no means, at present, to count "hidden mutations" at the modal.  The results may reflect their occurrence but the variability of mutation rates and other assumptions may affect the data results also.  My next step is to only include Z253+ and M226- entries to observe this effect.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 30, 2012, 03:58:48 PM
I have started another thread which discusses this issue from a different point of view but to specifically answer the original question is that diversity appears to be more important.  But we have understand that diversity restarts, in some sense, with each SNP.  So when we are looking at a set of data, we have to use entries that all have the oldest SNP, and only the oldest SNP, we are trying to estimate time to.  Using entries with downstream SNP's will mix the estimate toward a smaller value.  Specifically, if you want to estimate the time of R-L21, then we have to use entries who have that as their last SNP.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 06:14:53 PM
I have started another thread which discusses this issue from a different point of view but to specifically answer the original question is that diversity appears to be more important.

I agree on that point, but I'll just add the caveat that diversity (and frequency) should be considered stand-alone. As many of the pieces of the puzzle as you can gather should be considered in context.

Quote from: ironroad41
But we have understand that diversity restarts, in some sense, with each SNP.

What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't think bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments. (I meant I've asked Chandler, et al, including Klyosov who actually is a bio-chemist.)

I don't see anything in the academic studies that point to this. Is there such study?

This is where I made the point on another thread that we are all "homo sapiens sapiens", a subspecies of hominids.  We are more alike than different and everything I read is that most SNPs used are the ones that are searched for and found in the "junk" DNA between the genes.  In other words, they impact nothing.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 30, 2012, 06:49:48 PM

What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.


Mike

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.

This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.

If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.

When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on April 30, 2012, 07:01:43 PM
I have started another thread which discusses this issue from a different point of view but to specifically answer the original question is that diversity appears to be more important.

I agree on that point, but I'll just add the caveat that diversity (and frequency) should be considered stand-alone. As many of the pieces of the puzzle as you can gather should be considered in context.

Quote from: ironroad41
But we have understand that diversity restarts, in some sense, with each SNP.

What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't think bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments. (I meant I've asked Chandler, et al, including Klyosov who actually is a bio-chemist.)

I don't see anything in the academic studies that point to this. Is there such study?

This is where I made the point on another thread that we are all "homo sapiens sapiens", a subspecies of hominids.  We are more alike than different and everything I read is that most SNPs used are the ones that are searched for and found in the "junk" DNA between the genes.  In other words, they impact nothing.

What I mean is that, simply, the founder of a line, the person which had an SNP starts the mutational process over.  All descendants of that founder show diversity from him.  The diversity after him is from his haplotype.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 07:02:29 PM
What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.
This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.
If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.
When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.

I understand. This is the deal that Leo Little uncovered. I'm just pointing to this as the only case I'm aware of where standard STR  "reading" has a direct physical link to an SNP.  I guess, technically, the null is a "no read" so that is what you are being diligent in pointing out.  I agree with you - a null is a testing "no read."

The point I'm trying to make is there is no direct cause-effect tie between an STR value and a SNP based Y DNA tree haplogroup.  Any associations of STR values to SNPs are just coincidental. The reason different SNP marked haplogroups probably have different STR ancestral values is that they are must the remnants or scattered surviving branches of the human Y DNA family tree.   There were many, many more Y branches but most have died out leaving us with what we see today. R-P312 has the WAMH modal... U106 has something slightly different, Hg I has something different, etc.

 I just felt that I had to mention the L1 correlation with the DYS439=null "reading" to be clear because this a case that doesn't really fit the point, but as you articulate, it really is not an exception, just a not applicable to the point as far as the point I was making.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 30, 2012, 07:09:00 PM
What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.
This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.
If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.
When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.

I understand. This is the deal that Leo Little uncovered. I'm just pointing to this as the only case I'm aware of where standard STR  "reading" has a direct physical link to an SNP.  I guess, technically, the null is a "no read" so that is what you are being diligent in pointing out.  I agree with you - a null is a testing "no read."

The point I'm trying to make is there is no direct cause-effect tie between an STR value and a SNP based Y DNA tree haplogroup.  Any associations of STR values to SNPs are just coincidental.

 I just felt that I had to mention the L1 correlation with the DYS439=null "reading" to be clear because this a case that doesn't really fit the point, but as you articulate, it really is not an exception, just a not applicable to the point as far as the point I was making.

Sorry I wasn't intending to drive it home with a sledge hammer, I just wasn't sure you knew what a null result was :)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 07:11:09 PM
I have started another thread which discusses this issue from a different point of view but to specifically answer the original question is that diversity appears to be more important.

I agree on that point, but I'll just add the caveat that diversity (and frequency) should be considered stand-alone. As many of the pieces of the puzzle as you can gather should be considered in context.

Quote from: ironroad41
But we have understand that diversity restarts, in some sense, with each SNP.

What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't think bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments. (I meant I've asked Chandler, et al, including Klyosov who actually is a bio-chemist.)

I don't see anything in the academic studies that point to this. Is there such study?

This is where I made the point on another thread that we are all "homo sapiens sapiens", a subspecies of hominids.  We are more alike than different and everything I read is that most SNPs used are the ones that are searched for and found in the "junk" DNA between the genes.  In other words, they impact nothing.

What I mean is that, simply, the founder of a line, the person which had an SNP starts the mutational process over.  All descendants of that founder show diversity from him.  The diversity after him is from his haplotype.

I agree with you although I don't really thinking of it as "starting over" but rather as just a snapshot in time....  like this is the mile marker on the highway as we drive by.  Early in this thread, I tried to call this "divergence from the ancestral."   Other people use terms like "convergence to the modal" or "mutations around the modal" or the like and I'm not exactly sure what that means so I will describe my perspective is this is just "divergence from the ancestral."


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 07:13:04 PM
What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.
This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.
If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.
When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.

I understand. This is the deal that Leo Little uncovered. I'm just pointing to this as the only case I'm aware of where standard STR  "reading" has a direct physical link to an SNP.  I guess, technically, the null is a "no read" so that is what you are being diligent in pointing out.  I agree with you - a null is a testing "no read."

The point I'm trying to make is there is no direct cause-effect tie between an STR value and a SNP based Y DNA tree haplogroup.  Any associations of STR values to SNPs are just coincidental.

 I just felt that I had to mention the L1 correlation with the DYS439=null "reading" to be clear because this a case that doesn't really fit the point, but as you articulate, it really is not an exception, just a not applicable to the point as far as the point I was making.

Sorry I wasn't intending to drive it home with a sledge hammer, I just wasn't sure you knew what a null result was :)

No, I was being sloppy and that could confuse folks so you I appreciate that.  In fact, I learned a little more about the process from your description.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on April 30, 2012, 07:16:46 PM
What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.
This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.
If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.
When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.

I understand. This is the deal that Leo Little uncovered. I'm just pointing to this as the only case I'm aware of where standard STR  "reading" has a direct physical link to an SNP.  I guess, technically, the null is a "no read" so that is what you are being diligent in pointing out.  I agree with you - a null is a testing "no read."

The point I'm trying to make is there is no direct cause-effect tie between an STR value and a SNP based Y DNA tree haplogroup.  Any associations of STR values to SNPs are just coincidental.

 I just felt that I had to mention the L1 correlation with the DYS439=null "reading" to be clear because this a case that doesn't really fit the point, but as you articulate, it really is not an exception, just a not applicable to the point as far as the point I was making.

Sorry I wasn't intending to drive it home with a sledge hammer, I just wasn't sure you knew what a null result was :)

No, I was being sloppy and that could confuse folks so you I appreciate that.  In fact, I learned a little more about the process from your description.

Well it's not the full story of course, apparently they employ lasers and other fancy gizmos as well.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 07:37:37 PM
I'm just cataloging this quote here as it is pertinent to the overall thread.

Sandy Paterson is a M222 researcher who is an actuarist by profession. Here are his comments from Rootsweb on Ken Nordvedt's model.

Quote from: Sandy Paterson
Ken Nordtvedt, erstwhile Emeritus Professor of Physics at Montana State has proved that

E(v) = mG

where E(v)= expected marker variance
m=mutation rate
G=number of generations

So the number of generations taken to reach a given level of dispersion of
marker scores can be estimated as (observed variance)/(mutation rate).
Obviously, the more markers the better. This means it's quite natural to
divide the observed sum of variance of one haplogroup by that of another in
order to get a feel for the age of one haplogroup relative to another.
That's what Mike did, and he did so in order to avoid arguments about poorly
researched mutation rates. I think that's perfectly valid.
http://archiver.rootsweb.ancestry.com/th/read/dna-r1b1c7/2012-03/1333010242

There is a secondary point in that he is agreeing with me that by comparing (relative) variance rather than TMRCA's I'm avoiding the mutation rate evolutionary versus germ line controversy.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on April 30, 2012, 08:00:35 PM
...  I have begun to develop observations re: the properties of the data set R - Z253+.  My initial work is with 74 entries and 23 of 37 dys loci.  Initally, I am calculating the TMRCA dys loci by dys loci, using the Burgurella mutation rates.  I assumed all mutations were single step, regardless of step size.  In my experience, it takes a while to become familiar with a data set and their may be some incorrect observations/results?

Note:  I included all except for Z253-.  So 226,554 and 895 which are younger than 253 are included. From what I can gather from Machiavelli/VanVliet this will decrease the overall TMRCA's.

The results are highly variable.  Six dys loci give TMRCA's of approximately8K to 10K BP.  These are:391,392,456,576 and 442.  392 is the cleanest, it has 6 mutations with one more than +/-1 from the modal.  The others are almost all bimodal and I used the allele value with the highest number of apparent mutations as the modal.  In the case of 391 there are 45 11, 25 10's and 1 12.  456 has 1 at 12, 1 at 14, 33 at 15, 29 at 16 and 8 at 17.  the other two are similar to these two.  391 has always been interesting to me, it is as if a man had a mutation from 11 to 10 and then he had two sons one of which had a mutation 10 to 11 (or vice versa).  These two brothers then began two dominant lines which we still have today??

Contrary to the above we have TMRCA's as follows:  390 = 1655; 19 = 1250; 388  = 3464; 426 = 0; 455 and 454 = 600 BP etc....

I think what you are seeing is why Ken Nordtvedt has been suggesting all along that more STRs is better.

Here is the actuarist's view on Ken's recommendation.
Quote from: Sandy Paterson
However, KN does indded suggest that more is better....
What I've found is that you start getting reasonable results as n approaches 50. Anything less is dicey.
http://archiver.rootsweb.ancestry.com/th/read/dna-r1b1c7/2012-03/1332498888

That's why I quit doing STR variance on 16 and 25 markers. I can see in my own haplogroup comparisons that the results are not consistent when picking out markers amongst a low number. Sandy is saying that his simulations showed we should be using at least 50 STRs. This all makes me cringe knowing the academic studies typically use a number like 6, 10 or 15 STRs at the most in their diversity calculations.

Ken recently made the following comment on using just one STR for TMRCA estimations. The question was posed and Ken answered.
Quote
why not using DYS724, in a C14 dating sort of way, as as a very simple and rough indicator of time to a MCRA within one (sub)clade of a haplogroup?

Quote from: Ken Nordtvedt
Very Very Rough Indicator. Consider the sigma on tmrca using just this oneSTR.

But theory is fine: the underlying assumption is that each individual STR is a crude clock. We want a better clock, so we compose such by combining behaviors of many, many individual STR clocks.

Remember: C14 clock is the composite result of millions of radioactive atoms doing their thing.
http://archiver.rootsweb.ancestry.com/th/read/y-dna-haplogroup-i/2012-04/1335729772

So, even though idea of Carbon-14 dating is based on aggregating many, many crude clocks together in a mathematically sound way.  That's all Ken is doing with STRs.

If you like math and want the math theory discussion, you should probably read these posts from a couple of years ago on the Central Limit Theory in Action with Ken Nordtvedt, John Chandler and James Heald, who is also pretty sharp.
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2008-03/1204573397

The aggregation of STR clocks is very clearly not perfected, but Heald agrees.
Quote from: James Heald
I suspect Ken is quite right, that with enough markers, P(T | t) rapidly becomes approximately Gaussian, because of the Central Limit Theorem; with the mean of T = mean no of steps = mu t
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2008-03/1204573397

Please don't misinterpret things out of context. There are many disagreements among the mathematicians. However the thoughtful aggregation of STR clocks to estimate the relative age of clades is useful, no doubt.  I expect to see new breakthroughs over the next couple of years... maybe JeanL has one for us.  I think Heinila developed some new forms of analysis.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 07:26:08 AM
What do you mean by "diversity restarts?"   If you are implying that that there is a biological link between Y SNPs and Y STRs, I don't bio-chemists think that is true. I think L1 and DYS439=null might be an exception, but I've asked this question multiple times and I always get a "no" answer without any counter-arguments.

Since this is the second time you've mentioned this I thought to post this simplistic explanation (simplistic because that's the extent of my knowledge) of what a null reading is.

One of the steps in reading an STR is finding it.
This is done by employing a 'dyed' chemical that bonds to the DNA near the STR.
If there is an alteration in this section of DNA then the chemical dye doesn’t bond and the relevant section can't be found.
When that happens the result is reported as a null result, but it really means the area wasn't read and could be anything within the bounds of probability for that STR

In the case of the L1 DYS439=null the disruption is in fact the L1 SNP, but a null reading at this loci could (I assume) easily be produced by a private SNP.

As I said I'm no expert in this sort of thing but I'm sure if you asked somebody like Vince Tilroe or Thomas Krahn they would be delighted to explain in-depth all the ins and outs.

I understand. This is the deal that Leo Little uncovered. I'm just pointing to this as the only case I'm aware of where standard STR  "reading" has a direct physical link to an SNP.  I guess, technically, the null is a "no read" so that is what you are being diligent in pointing out.  I agree with you - a null is a testing "no read."

The point I'm trying to make is there is no direct cause-effect tie between an STR value and a SNP based Y DNA tree haplogroup.  Any associations of STR values to SNPs are just coincidental. The reason different SNP marked haplogroups probably have different STR ancestral values is that they are must the remnants or scattered surviving branches of the human Y DNA family tree.   There were many, many more Y branches but most have died out leaving us with what we see today. R-P312 has the WAMH modal... U106 has something slightly different, Hg I has something different, etc.

 I just felt that I had to mention the L1 correlation with the DYS439=null "reading" to be clear because this a case that doesn't really fit the point, but as you articulate, it really is not an exception, just a not applicable to the point as far as the point I was making.
I can't prove that there is no direct relationship between an STR pattern and a SNP.  Are you arguing that they are independent?  Then how can we show groups of subclades with similar STR signatures and all having a common SNP?

The whole premise has been that a modal reflects the STR pattern of one man who both has an SNP value and an STR pattern. I don't think it can be proven that the SNP and the defining STR mutation occurred simultaeously, but they were close in time.
  All this said, the coincidence of an SNP and an STR modal pattern may be a "red herring".  What is important is that using entries from younger SNPs is wrong. As an example take M226 which is not a real old SNP, c. 400 AD is the estimate.  All men with that SNP are descended from one man and we can infer his modal haplotype from his descendants.  All members of M226 reflect diversity beginning at 400 AD to the present time depending when their line had a mutation from ancestors set of values.

This brings me to my point that when you are trying to estimate Z253, you should not include M226 entries.  They only have diversity to the time of the M226 defining mutation and will reduce the estimate from those entries of Z253 whose diversity started much earlier in time, the time when the defining mutations for Z253 occurred, be that an SNP or STR?

I'll repeat what I said before; SNP's for a hierarchical set of data.  If I take the set of entries with a younger SNP, I get a younger TMRCA.  When trying to determine the time of a modal value of an older SNP, only entries with just that SNP should be used, no entries having a subsequent SNP should be used because it will reduce the TMRCA.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 07:53:52 AM
I thought I might expand on my "red herring" comment. 

In the clan Gregor, I have made numerous estimates of the TMRCA of the "founder" of the clan.  But there always is an uncertainty.  We don't know precisely when the mutation occurred?  It could have occurred with ggf,gf,f or possibly son.  there is no way of knowing.  Our SD's are always such that we cannot be sure which person had the mutation.  I think the same level of uncertainty exists with the timing between an SNP and the STR modal.  The modal reflects all the entries most frequent values dys loci by dys loci. ( note: if we include subsequent SNP entries we may bias the modal).

Our TMRCA  estimate is our best guess when the modal occurred and the time to the SNP defining that modal can't be too far off.  It might not be O'neill of the nine hostages, but it sure was probably a close relative (re M226).


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 09:12:23 AM
Our TMRCA  estimate is our best guess when the modal occurred and the time to the SNP defining that modal can't be too far off.  It might not be O'neill of the nine hostages, but it sure was probably a close relative (re M226).

I think that you mean M222 and in fact there is no evidence that Niall of the Nine Hostages carried M222. If he is anything more than fiction, he was from a part of Ireland which is actually low on M222. The whole idea that he was the ancestor of the men in Donegal carrying M222 rested on genealogies which were tampered with c. 700 AD to make this famous person the ancestor of various families of Donegal, who then claimed to be the Northern Uí Néill. See  Irish Surnames and y-DNA: Uí Néill (http://www.buildinghistory.org/distantpast/irishsurnames.shtml#Niall)

You may be chasing a similar will-o'-the-wisp with the founder of Clan Gregor. I can't really tell from what you have written. In general men of the same surname who are actually related should turn out to have a common ancestor at around the period that surnames developed, and that has been found to be the case for some  that have been investigated. However if you have been looking at everyone with a McGregor surname, including those not known to be related by paper trail to the clan chiefs, that could be throwing you out. As you know, not all McGregors will be descended from the same Gregor. Some may not be descended from a Gregor at all. Though we can expect some not descended from the clan founder to be at least in the same haplogroup, since R1b-L21 is so common in Scotland. (I take it that the chief's line is L21.)  


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 11:47:13 AM
I thought that was Trinity Colleges opinion re: Niall?

The Ian Cam are destringuished by a mutation to 10 at 391.  This is common to all the entries to date.  The Clan Gregor has over 600+ entries; Grier, Grieg, Gregory etc.  The Ian Cam are one set of the entries but assert they are descendants of the founder of the Clan Gregor name.  So, if you go to the FtDNA Clan Gregor website and observe the entries, you will that the Ian Cam are a pretty homgeneous group.  2124 is a direct descendant of the Clan founder and has had no observable mutations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: chris1 on May 01, 2012, 11:58:38 AM
Our TMRCA  estimate is our best guess when the modal occurred and the time to the SNP defining that modal can't be too far off.  It might not be O'neill of the nine hostages, but it sure was probably a close relative (re M226).

I think that you mean M222 and in fact there is no evidence that Niall of the Nine Hostages carried M222. If he is anything more than fiction, he was from a part of Ireland which is actually low on M222. The whole idea that he was the ancestor of the men in Donegal carrying M222 rested on genealogies which were tampered with c. 700 AD to make this famous person the ancestor of various families of Donegal, who then claimed to be the Northern Uí Néill. See  Irish Surnames and y-DNA: Uí Néill (http://www.buildinghistory.org/distantpast/irishsurnames.shtml#Niall)

You may be chasing a similar will-o'-the-wisp with the founder of Clan Gregor. I can't really tell from what you have written. In general men of the same surname who are actually related should turn out to have a common ancestor at around the period that surnames developed, and that has been found to be the case for some  that have been investigated. However if you have been looking at everyone with a McGregor surname, including those not known to be related by paper trail to the clan chiefs, that could be throwing you out. As you know, not all McGregors will be descended from the same Gregor. Some may not be descended from a Gregor at all. Though we can expect some not descended from the clan founder to be at least in the same haplogroup, since R1b-L21 is so common in Scotland. (I take it that the chief's line is L21.)  
If I've got it right, the 'Ian Cam' McGregor group is a younger subclade branching off the very large 'Scots Modal' cluster around 600 years ago. 'Scots Modal' is L21+ but I think it has yet to discover its defining SNP downstream of L21.

Regarding the O'Neill/UiNeill, this is what I picked up: One group is the L21+, M222+ (Ui Neill), a very large cluster known as the 'Nial cluster'/'NW Irish' and contains many surnames.

Another cluster, one that I don't think you mention in the link, (with a number of O'Neill and MacShane surnames) is possibly P312* (named 'O'Neill Variety' or O'Neill Variant') and has not yet discovered its defining SNP downstream of P312 (L21- , U152- and Z196- so far).


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 01, 2012, 12:23:28 PM
... Please don't misinterpret things out of context. There are many disagreements among the mathematicians. However the thoughtful aggregation of STR clocks to estimate the relative age of clades is useful, no doubt.  I expect to see new breakthroughs over the next couple of years... maybe JeanL has one for us.  I think Heinila developed some new forms of analysis.

I can see Ken continues to work on enhancements.  I currently use his Generations7 methodology in conjunction with the Haplotype_Data spreadsheets I maintain for R1b deep clade tested people. I'll let initial testing settle a little and then I'll incorporate 111T version.

Quote from: Ken Nordtvedt
I have upgraded my excel program for estimating intra and inter clade variance based age estimates for y haplotypes. Generations111T now takes haplotypes which include all the 111 standard FTDNA STRs (although 11 of the multi-copy ones are not used). But haplotype collections of mixed STR numbers can be used. I like to think the upgrade program is also more user friendly than the Generations7 it replaces.

Generations111T can be downloaded from link below. Please report any glitches, etc. The “T” stands for “test model”. Read instructions.
Single haplotypes can be entered into both clade A and clade B row spaces to obtain TMRCA for haplotype pair, or up to 400 haplotypes of each clade can be used. Or just one clade can be entered to obtain both coalescence age and TMRCA age estimates.
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-05/1335879324

BTW - good news!  I already have about 900 111 STR length haplotypes for just R-L21.  FTDNA developed the 68-111 panel to enhance TMRCA estimations. I don't think there are any multi-copy STRs in this panel.

Now is the time to consider upgrading to 111 STRs if you haven't already.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 12:59:58 PM
I thought that was Trinity Colleges opinion re: Niall?

Yes indeed. They put out the study claiming Niall as the daddy of M222 just months before an historian undermined the whole idea that Niall was the founder of the Northern Ui Neill. Then later testing blew more holes in the idea. But let us not digress. McGregor is your interest.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 01:18:53 PM
The Ian Cam are distinguished by a mutation to 10 at 391.  This is common to all the entries to date.  The Clan Gregor has over 600+ entries; Grier, Grieg, Gregory etc.  The Ian Cam are one set of the entries but assert they are descendants of the founder of the Clan Gregor name.  So, if you go to the FtDNA Clan Gregor website and observe the entries, you will that the Ian Cam are a pretty homgeneous group.  2124 is a direct descendant of the Clan founder and has had no observable mutations.

OK. Things are now crystal clear on certain points.  http://www.familytreedna.com/public/macgregor/ says that 2124 is descended from the MacGregors of Glencarnoch (Chief’s line), and that kit fits into the MacGregor (Ian Cam) section of the results. So we can assume that the Ian Cam lot are descendants of the founder of the clan, if by "clan", we mean persons descended from the first chief that historians can identify (according to Wikipedia) Gregor "of the Golden Bridles." Gregor's son, Iain Camm ("of the One-Eye") succeeded as the second Chief sometime prior to 1390.

Did you attempt to find a date for the founder of the Ian Cam section alone? Or did you include a lot of people named McGregor who may have no connection whatsoever with the said line of chiefs no matter how much they wish they did?



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 01:21:21 PM
Our TMRCA  estimate is our best guess when the modal occurred and the time to the SNP defining that modal can't be too far off.  It might not be O'neill of the nine hostages, but it sure was probably a close relative (re M226).

I think that you mean M222 and in fact there is no evidence that Niall of the Nine Hostages carried M222. If he is anything more than fiction, he was from a part of Ireland which is actually low on M222. The whole idea that he was the ancestor of the men in Donegal carrying M222 rested on genealogies which were tampered with c. 700 AD to make this famous person the ancestor of various families of Donegal, who then claimed to be the Northern Uí Néill. See  Irish Surnames and y-DNA: Uí Néill (http://www.buildinghistory.org/distantpast/irishsurnames.shtml#Niall)

You may be chasing a similar will-o'-the-wisp with the founder of Clan Gregor. I can't really tell from what you have written. In general men of the same surname who are actually related should turn out to have a common ancestor at around the period that surnames developed, and that has been found to be the case for some  that have been investigated. However if you have been looking at everyone with a McGregor surname, including those not known to be related by paper trail to the clan chiefs, that could be throwing you out. As you know, not all McGregors will be descended from the same Gregor. Some may not be descended from a Gregor at all. Though we can expect some not descended from the clan founder to be at least in the same haplogroup, since R1b-L21 is so common in Scotland. (I take it that the chief's line is L21.)  
If I've got it right, the 'Ian Cam' McGregor group is a younger subclade branching off the very large 'Scots Modal' cluster around 600 years ago. 'Scots Modal' is L21+ but I think it has yet to discover its defining SNP downstream of L21.

Regarding the O'Neill/UiNeill, this is what I picked up: One group is the L21+, M222+ (Ui Neill), a very large cluster known as the 'Nial cluster'/'NW Irish' and contains many surnames.

Another cluster, one that I don't think you mention in the link, (with a number of O'Neill and MacShane surnames) is possibly P312* (named 'O'Neill Variety' or O'Neill Variant') and has not yet discovered its defining SNP downstream of P312 (L21- , U152- and Z196- so far).

You are right on re: the clan gregor.  I believe I have shown that the R1b of Clan Donald, MacMillans and Buchanans all converge about 0 to 200 BC to what is known as the Scots Modal.

Of current interest to us is that 2124 the clan chieftain is having his entire genome evaluated, along with 6 other selected scots, by Jim Wilson.  This might help identify some new SNP's in his line leading to the scots modal?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 01, 2012, 01:22:02 PM
.. The Ian Cam are destringuished by a mutation to 10 at 391.  ..

One STR, even a slow one, is not enough to discern with any certainty.

..So, if you go to the FtDNA Clan Gregor website and observe the entries, you will that the Ian Cam are a pretty homgeneous group.  2124 is a direct descendant of the Clan founder and has had no observable mutations.

How many generations back does 2124's pedigree go?

You are right on re: the clan gregor.  I believe I have shown that the R1b of Clan Donald, MacMillans and Buchanans all converge about 0 to 200 BC to what is known as the Scots Modal.

I think I lost something in this discussion. You've mentioned Z253+ before. Are you saying the Ian Cam descendants have both Z253+ and Scots Modal people AND there is a single Ian Cam founding lineage?

EDIT: It looks like you are saying they do have separate founding lineages. If so, I don't think it is useful to determine the TMRCA for this mixed up group. It would be more appropriate to find all of the subclades within the group, be they Z253+, DF21+, M222+ or whatever and do the TMRCA for that level of the Y DNA tree involved. If Z253, DF21 and M222 covered it then either L21 or DF13 would be the lowest level of the Y DNA tree where the whole group of Ian Cam comes together. If that is the case, there is no use calculating the TMRCA for Ian Cam. Just calculate the TMRCA for all of L21 or all of DF13 and that would as close as we could probably get to a common ancestor for all of Ian Cam.   There people outside of Ian Cam, then that are more closely related to Ian Cam members than all Ian Cam members are to themselves.

Can you ask the project administrator to turn on the Y DNA SNP report?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 01:25:15 PM
The surname McGregor actually occurs before Ian Cam. Or at least someone called the son of Gregor. Duncan M'Greghere is recorded in 1292.

Where people may be getting confused is thinking that earlier McGregors are something to do with the later chiefs. Anyone could name a son Gregor.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 01:30:47 PM
The Ian Cam are distinguished by a mutation to 10 at 391.  This is common to all the entries to date.  The Clan Gregor has over 600+ entries; Grier, Grieg, Gregory etc.  The Ian Cam are one set of the entries but assert they are descendants of the founder of the Clan Gregor name.  So, if you go to the FtDNA Clan Gregor website and observe the entries, you will that the Ian Cam are a pretty homgeneous group.  2124 is a direct descendant of the Clan founder and has had no observable mutations.

OK. Things are now crystal clear on certain points.  http://www.familytreedna.com/public/macgregor/ says that 2124 is descended from the MacGregors of Glencarnoch (Chief’s line), and that kit fits into the MacGregor (Ian Cam) section of the results. So we can assume that the Ian Cam lot are descendants of the founder of the clan, if by "clan", we mean persons descended from the first chief that historians can identify (according to Wikipedia) Gregor "of the Golden Bridles." Gregor's son, Iain Camm ("of the One-Eye") succeeded as the second Chief sometime prior to 1390.

Did you attempt to find a date for the founder of the Ian Cam section alone? Or did you include a lot of people named McGregor who may have no connection whatsoever with the said line of chiefs no matter how much they wish they did?


 I just used a subset of the Ian Cam only.  I have run extensive TMRCA's on sets of other entries and have gotten convergences back into the BC range.    My estimate required that I identify shared mutations such as with the Stirling entries (who are all Ian Cam); further there is an occasional multistep and I had to exclude the entry with a mutation at 426, since he, drove the estimate back more than 200 years.  With around 39 entries and the caveats above, I get 1350 +/-100 as the best estimate for the occurrence of the 391 mutation?  There is a lot that can be learned in studying one set of data in detail.  

I should also mention that I did the same analysis with the smaller Kerchner family and got 1650 as the TMRCA.

I don't want to bring up old "discussions", but I asked Ken and VV repeatedly to use the variance approach on these two sets of data and they  never responded.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 02:04:47 PM
 I just used a subset of the Ian Cam only. ..    My estimate required that I identify shared mutations such as with the Stirling entries (who are all Ian Cam); further there is an occasional multistep and I had to exclude the entry with a mutation at 426, since he, drove the estimate back more than 200 years.  With around 39 entries and the caveats above, I get 1350 +/-100 as the best estimate for the occurrence of the 391 mutation..

So that would be about 660 AD. So your feeling is that the mutation occurred a lot earlier than the 14th century Gregor "of the Golden Bridles". Makes sense. So where do we go from there? You feel that people in the Ian Cam section are not necessarily descended from Gregor "of the Golden Bridles", but from some ancestor of his. I see your point. I was wrong to assume that they were all descended from the founder. I freely confess it! :)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on May 01, 2012, 02:20:50 PM
I can see Ken continues to work on enhancements.  I currently use his Generations7 methodology in conjunction with the Haplotype_Data spreadsheets I maintain for R1b deep clade tested people. I'll let initial testing settle a little and then I'll incorporate 111T version.

Now is the time to consider upgrading to 111 STRs if you haven't already.

As far as I can tell the major change (apart from the obvious additional loci) is the mutation rates, I wonder where they came from ?

Rather confusingly there are two sets, the ones Ken is using are in the row named 'backup'.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on May 01, 2012, 02:24:49 PM

Of current interest to us is that 2124 the clan chieftain is having his entire genome evaluated, along with 6 other selected scots, by Jim Wilson.  This might help identify some new SNP's in his line leading to the scots modal?

That's really good news, finding a SNP that defined the Scots Modal would be something to look forward to. Any idea if the data will be made public ?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 03:51:56 PM

Of current interest to us is that 2124 the clan chieftain is having his entire genome evaluated, along with 6 other selected scots, by Jim Wilson.  This might help identify some new SNP's in his line leading to the scots modal?

That's really good news, finding a SNP that defined the Scots Modal would be something to look forward to. Any idea if the data will be made public ?

As I said there are 6 other Scots being included by Jim.  I have no idea when the data will be generated nor who will announce what they find,  It could be quite political since Jim thinks the MacGregors are of Pictish descent as do Woolf and others at Edinborough.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 04:01:39 PM
.. The Ian Cam are destringuished by a mutation to 10 at 391.  ..

One STR, even a slow one, is not enough to discern with any certainty.

..So, if you go to the FtDNA Clan Gregor website and observe the entries, you will that the Ian Cam are a pretty homgeneous group.  2124 is a direct descendant of the Clan founder and has had no observable mutations.

How many generations back does 2124's pedigree go?

You are right on re: the clan gregor.  I believe I have shown that the R1b of Clan Donald, MacMillans and Buchanans all converge about 0 to 200 BC to what is known as the Scots Modal.

I think I lost something in this discussion. You've mentioned Z253+ before. Are you saying the Ian Cam descendants have both Z253+ and Scots Modal people AND there is a single Ian Cam founding lineage?

EDIT: It looks like you are saying they do have separate founding lineages. If so, I don't think it is useful to determine the TMRCA for this mixed up group. It would be more appropriate to find all of the subclades within the group, be they Z253+, DF21+, M222+ or whatever and do the TMRCA for that level of the Y DNA tree involved. If Z253, DF21 and M222 covered it then either L21 or DF13 would be the lowest level of the Y DNA tree where the whole group of Ian Cam comes together. If that is the case, there is no use calculating the TMRCA for Ian Cam. Just calculate the TMRCA for all of L21 or all of DF13 and that would as close as we could probably get to a common ancestor for all of Ian Cam.   There people outside of Ian Cam, then that are more closely related to Ian Cam members than all Ian Cam members are to themselves.

Can you ask the project administrator to turn on the Y DNA SNP report?

 The Ian Cam have that distinguishing feature, but obviously there is no info in that mutation.  I use some 39 dys loci and a set of entries of about the same size for analysis.  I would dearly love someone besides myself to run a TMRCA estimate on the Ian Cam?

The Ian Cam are Z253-, I and the clan moderator are +.  Until we get the Jim Wilson results on 2124, we have no info on any SNP except for R-L21+ for the clan.  As I said, I took the Ian Cams founders haplotype, a converged MacMillan, a converged R!b Clan Donald set and got a TMRCA c. 0 to 200 BC,  for all we know, they are all just R-L21.  Note Campbells seem to fit in here also.

I'm very confused with your one para.  I can't make sense out of it.  All the data we have is that the Chief is R-L21+ and therefore all the Ian Cam are.  As I said we will learn more (hopefully) after Jim Wilson runs his studies.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 01, 2012, 04:21:19 PM
.... It looks like you are saying they do have separate founding lineages. If so, I don't think it is useful to determine the TMRCA for this mixed up group. It would be more appropriate to find all of the subclades within the group, be they Z253+, DF21+, M222+ or whatever and do the TMRCA for that level of the Y DNA tree involved. If Z253, DF21 and M222 covered it then either L21 or DF13 would be the lowest level of the Y DNA tree where the whole group of Ian Cam comes together. If that is the case, there is no use calculating the TMRCA for Ian Cam. Just calculate the TMRCA for all of L21 or all of DF13 and that would as close as we could probably get to a common ancestor for all of Ian Cam.   There people outside of Ian Cam, then that are more closely related to Ian Cam members than all Ian Cam members are to themselves...

.... I'm very confused with your one para.  I can't make sense out of it.  All the data we have is that the Chief is R-L21+ and therefore all the Ian Cam are.  

I was bit wordy.

All I'm saying is that if you have a mixed bag of subclades in Ian Cam there is no use calculating a TMRCA for it.  

It would be better to calculate the TMRCA for all of the lowest common level of the Y DNA that encompasses all of the Ian Cam members.  In this case, it may be that everyone is some version of L21+.  Some are Z253+ some are Z253- and some maybe L21*.  If that is the case then you might as well just look at the TMRCA for all of L21 as the TMRCA ancestor for Ian Cam.  DYS391 or any other STR commonality are just coincidental.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jean M on May 01, 2012, 04:40:59 PM
It would be better to calculate the TMRCA for all of the lowest common level of the Y DNA that encompasses all of the Ian Cam members.  In this case, it may be that everyone is some version of L21+.  Some are Z253+ some are Z253-

Ironroad is saying that the Ian Cam are all L21+, but Z253-.

Ironroad himself is Z253+, but not in the Ian Cam.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 01, 2012, 05:05:43 PM
.... It looks like you are saying they do have separate founding lineages. If so, I don't think it is useful to determine the TMRCA for this mixed up group. It would be more appropriate to find all of the subclades within the group, be they Z253+, DF21+, M222+ or whatever and do the TMRCA for that level of the Y DNA tree involved. If Z253, DF21 and M222 covered it then either L21 or DF13 would be the lowest level of the Y DNA tree where the whole group of Ian Cam comes together. If that is the case, there is no use calculating the TMRCA for Ian Cam. Just calculate the TMRCA for all of L21 or all of DF13 and that would as close as we could probably get to a common ancestor for all of Ian Cam.   There people outside of Ian Cam, then that are more closely related to Ian Cam members than all Ian Cam members are to themselves...

.... I'm very confused with your one para.  I can't make sense out of it.  All the data we have is that the Chief is R-L21+ and therefore all the Ian Cam are.  

I was bit wordy.

All I'm saying is that if you have a mixed bag of subclades in Ian Cam there is no use calculating a TMRCA for it.  

It would be better to calculate the TMRCA for all of the lowest common level of the Y DNA that encompasses all of the Ian Cam members.  In this case, it may be that everyone is some version of L21+.  Some are Z253+ some are Z253- and some maybe L21*.  If that is the case then you might as well just look at the TMRCA for all of L21 as the TMRCA ancestor for Ian Cam.  DYS391 or any other STR commonality are just coincidental.
Based on a lot of things including family histories many of the Ian Cam can prove from whom their family derives.  By looking at the haplotypes it also becomes evident that this is not a mixed bag.  Theree are (over 70) descended from one man c.1350AD.  As I have been stressing, it has taken a lot of tries and learning to be able to take a set of the Ian Cam, mostly independent in some sense and compute a TMRCA.  You cannot just grab a sample and say voila, you will get the wrong answer more times than not.  But, if you are careful and identify inherited mutations, multisteps etc. it can be done. And I wouldn't call this type of analysis "fudging" the data.  

This is why I have always been skeptical of grabbing a set of data, applying a math rule to it, and then proclaiming veracity.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on May 02, 2012, 03:37:28 AM
They are speaking a lot about this hg. A0 thanks to Ted Kandell and also Argiedude has reappeared on Rootsweb. Ted Kandell, who is what’s more a lovely person, seems a little bit obsessed by this “ancestor”, as if he were searching a new Abraham in the lands of Africa, as though I looked for my to be an European in this 2.8% of Neanderthal. I studied this haplogroup and put some of them on ySearch. About this I tend to agree with Anatole Klyosov, i.e. that this line (I think unique), survived from the prehistory, belongs probably to a hominid whom back migrants from Eurasia mixed with, taking from him their to be Africans.

But to demonstrate (once more) to Mikewww my theories, we could take my haplotype (KV7Y2) R-L150+ and that of Anutechia (MF7MA), A0, separated from our line at least 160,000 years ago, perhaps much more says Anatole Klyosov.

DYS393        12        13
DYS390        24        19
DYS19          15         16
DYS391        10          10
DYS385a      11         15
DYS385b      14         17
DYS426        12         13
DYS388        12         11
DYS439        12          12
DYS389I        13          13
DYS392         12         12
DYS389II       29         30
DYS458          16         16
DYS459a          9           8
DYS459b         10          9
DYS455           11          10
DYS454            11           11
DYS447            24          24
DYS437            15          12
DYS448            19          21
DYS449            29           32.2
DYS464a          14          13
DYS464b          14           15
DYS464c           16          15
DYS464d           17          18
DYS460             10          12
GATAH4.1          11            9
YCAIIa               19            16
YCAIIb               23            18
DYS456              16           15
DYS442              12           12
DYS438              12           16
DYS444              12            12
DYS446              13          19
DYS461              11           12
DYS462              11            13
YGATAA10          14             12
DYS635              23             15
GGAAT1B07        10             10
DYS441              14            13
DYS445              12             12
DYS452               30              31
DYS463               24              20

If I have remembered well my values (but a little changes), we have:

MR: 0,0022
(74:86): 0,0022=391 generations
Your theories (Nordtvedt, Vizachero, Klyosov etc. etc.) are completely absurd. At this level also Zhivotovsky isn’t worth.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 04, 2012, 09:42:17 AM
... But to demonstrate (once more) to Mikewww my theories, we could take my haplotype (KV7Y2) R-L150+ and that of Anutechia (MF7MA), A0, separated from our line at least 160,000 years ago, perhaps much more says Anatole Klyosov......
Your theories (Nordtvedt, Vizachero, Klyosov etc. etc.) are completely absurd. At this level also Zhivotovsky isn’t worth.

As you said, these are the theories of others, not mine. I'm just using what folks like Ken Nordtvedt, Marko Heinila and John Chandler have provided. These folks are true scientists, albeit not in population genetics.... however, they know their math.  I'm just convinced their mathematical models are useful.

... and there you go again, arguing by exception.  The value of statistics is to look at large groups. It is not useful to apply statistics to (I'll use your word,) "absurd" cases.  There is not much use in applying a methodology to two modern people who's SNP phylogeny indicates they are tens  (or hundreds) of years separated. Anutechia (MF7MA) is a modern person. He is not ancient. His lineage has separated from us long, long ago. The branches of the Y DNA family tree grow in many directions. The branches do not just grow away from each other (diverge.)  They also cross (converge.)  What you should be interested in is not your GD from Anutechia but time to your ancestal alleles, that of your common ancestor, a truly ancient person who died probably over 100K or double (?) years ago.  Even so, if you want to estimate such a number, why look at just two people? Why not look at the thousands of long haplotypes available - using the value of statistics?

Oh, speaking of "absurd", is any one saying the linear duration of these STRs lasts for 100's of thousands years?  Your exception argument case should not be used to consider a mathematical model not designed to handle it. Even earlier on this thread, on reply #18, you'll find:
...
Quote from: Vincent Vizachero
... For old haplogroups (e.g. more than 25 ky old) the problem of non-linear accumulation of GD due to marker saturation becomes the dominant problem. Creating trees from STRs in this timeframe is typically not necessary, thankfully, now that our SNP-based trees are so much more complete than they were several years ago.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-04/1334313410

Not one is saying STR linearity durations aren't an issue over the periods of time in your case. Marko Heinila's analysis, which I've posted on this thread (see reply #146), lists STR durations for most of the common STRs.
....
Nevertheless, some STRs probably do behave non-linearly outside of certain time ranges. Marko Heinila addressed this with a statistical analysis across tens of thousands of haplotypes.
....

Very few STRs might possibly apply to your exception case, but I doubt if Marko would recommend using any of them for 200k year periods. We know Vizachero would not recommend it.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 04, 2012, 09:57:33 AM
... Please don't misinterpret things out of context. There are many disagreements among the mathematicians. However the thoughtful aggregation of STR clocks to estimate the relative age of clades is useful, no doubt.  I expect to see new breakthroughs over the next couple of years... maybe JeanL has one for us.  I think Heinila developed some new forms of analysis.


I can see Ken continues to work on enhancements.  I currently use his Generations7 methodology in conjunction with the Haplotype_Data spreadsheets I maintain for R1b deep clade tested people. I'll let initial testing settle a little and then I'll incorporate 111T version.

Quote from: Ken Nordtvedt
I have upgraded my excel program for estimating intra and inter clade variance based age estimates for y haplotypes. Generations111T now takes haplotypes which include all the 111 standard FTDNA STRs (although 11 of the multi-copy ones are not used). But haplotype collections of mixed STR numbers can be used. I like to think the upgrade program is also more user friendly than the Generations7 it replaces.

Generations111T can be downloaded from link below...
http://archiver.rootsweb.ancestry.com/th/read/Y-DNA-HAPLOGROUP-I/2012-05/1335879324


Why not have a play with it ?

It's easy enough to use and you can delete entire loci from the calculation to see how it effects the outcome.

BTW does anybody know where the mutation rates Ken used in Generations111T are published ?

The mutation rates that Ken used in Generations111T are from Marko Heinila; I have so far not been able to find his mutation rates online.
....
I got a note from Marko.  He is still recovering, but he said that his new rates are about 15% faster than Chandlers.  That explains most of  the difference between the 800 years BP for the Ian Cam he had calculated previously and MJosts data? I think he will be on line soon(hope).  

This is interesting. Marko Heinila, as noted earlier in this thread, has evaluated mutation rates by STRs, up and down rates as well as multi-steps.  He actually comes out with faster mutation rates than Chandler's germ line rates.   This puts the situation even more at odds with the very slow evolutionary rates that Zhivotosky et al uses.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on May 04, 2012, 10:10:05 AM
Mike, of course I am not convinced from your answer, also because you take something from VV, and you know how much I esteem him. He tested Romitti for L150 without his permission, after denied the importance of that SNP, and disagreed with FTDNA when they accepted that SNP and abandoned his work of administrator. After came back to his plants, then returned and tried to make me banned again from Dna-forums (that it rests in peace!).
He, like administrator of "ht35 project", had the sample of DeMao whom recognized like R1b1* anly after many letters of mine, he had the sample of Mangino (the Tuscan Mancini) whom it were enough to test for a WTY to undestand where R1b1a2* was born, and he didn't anything for this, he struggled for an Eastern origin of R1b, giving bad suggestions to you all.
Your analyses were all wrong. It becomes to be evident from these R1b found in Germany of 4600 YBP, and which surprise if they tested some SNPs downstream M269! But I am sure, as I was sure of this also some days ago and I wrote this, that they will find in Italy and in Tuscany above all some R1b not of 5000 years old, but at least the double.
Where your analyses will go then?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on May 04, 2012, 10:41:10 AM
This is interesting. Marko Heinila, as noted earlier in this thread, has evaluated mutation rates by STRs, up and down rates as well as multi-steps.  He actually comes out with faster mutation rates than Chandler's germ line rates.   This puts the situation even more at odds with the very slow evolutionary rates that Zhivotosky et al uses.

Mike, the problem here is that the robusticity of these mutation rates are being tested in sets that share common ancestry in relatively short time frame. Then, those values are being extrapolated to time frames that are presumed to be three and four times longer. There is variability in the mutation rate as a function of repeats. In a time frame of 1000-1750 ybp, even the fastest markers would give you at most 2-3 mutations, so the effects of the variability can be reduced by:

a)Analyzing  a large number of markers, because even within the time frame of 1000-1750 ybp there is a possibility that 1 mutation is in fact 3, and that 0 is equal to 2 due to back mutations. However when a large number of markers are used that probability is greatly reduced. That is in fact where the law of large number comes into play.

b)When one calibrates the mutation rate of a haplotype, it will likely yield very good results if there are only 2 or 3 mutation rates per marker that are being averaged, however it doesn’t mean that one could use that average and extrapolate it to an older time frame, because now there are new mutation rates on each microsatellite that are not being accounted for.

Let me explain part b) slightly so that people might better understand. Say: one clan has a common ancestor that live 1000 ybp. You look at  DYS XXX and find that the maximum amount of mutations that have occurred are 2, this means that there were two mutation rates involved in the process(Assuming no back mutations occurred) so when DYS XXX mutated from modal to mut-1 it had mutation rate “a”, and when it mutated from mut-1 to mut-2 it had mutation rate “b”. When one does the calibration method, the mutation rate “c” is actually a very close estimate of the average mutation rate for that time span. Now when you take mutation rate “c” and try to apply it to a longer time span, what happens is that “c” was perfect for the time span in consideration, but in a longer time span one might encounter that there is “different modal” to mut-01 with mutation rate “a0”, and then mut-01 to mut02 with mutation rate “a1”, and so on until you actually reach mutation rates “a” and “b”. For simplicity purposes, I only assumed forward mutations, but in reality in could go anywhere, which further adds more uncertainty to the analyses. The problem is that when mutation rate “c” is applied to time frames that are much longer than where “c” was estimated, one is automatically assuming that all the mutation rates in the new time period being analyzed would average out to “c”, or a number close to it. In reality, due to the variability in the mutation rates, it doesn’t.  This is the reason why I’m thinking of creating a case-control simulation where I can implemented a mutation rate that is a function of the repeat number, and see how good of a fit do calibrated mutation rates found using say 50 generations are for sets that have common ancestry at 100 generations.  This should take me a while, because I want to design a project that could be used for multiple things, and I just got out of school this week, and I’m graduating Sunday. So I want to take a little break, before I get to it.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on May 04, 2012, 11:15:47 AM
This is interesting. Marko Heinila, as noted earlier in this thread, has evaluated mutation rates by STRs, up and down rates as well as multi-steps.  He actually comes out with faster mutation rates than Chandler's germ line rates.   This puts the situation even more at odds with the very slow evolutionary rates that Zhivotosky et al uses.

Mike, the problem here is that the robusticity of these mutation rates are being tested in sets that share common ancestry in relatively short time frame. Then, those values are being extrapolated to time frames that are presumed to be three and four times longer. There is variability in the mutation rate as a function of repeats. In a time frame of 1000-1750 ybp, even the fastest markers would give you at most 2-3 mutations, so the effects of the variability can be reduced by:

a)Analyzing  a large number of markers, because even within the time frame of 1000-1750 ybp there is a possibility that 1 mutation is in fact 3, and that 0 is equal to 2 due to back mutations. However when a large number of markers are used that probability is greatly reduced. That is in fact where the law of large number comes into play.

b)When one calibrates the mutation rate of a haplotype, it will likely yield very good results if there are only 2 or 3 mutation rates per marker that are being averaged, however it doesn’t mean that one could use that average and extrapolate it to an older time frame, because now there are new mutation rates on each microsatellite that are not being accounted for.

Let me explain part b) slightly so that people might better understand. Say: one clan has a common ancestor that live 1000 ybp. You look at  DYS XXX and find that the maximum amount of mutations that have occurred are 2, this means that there were two mutation rates involved in the process(Assuming no back mutations occurred) so when DYS XXX mutated from modal to mut-1 it had mutation rate “a”, and when it mutated from mut-1 to mut-2 it had mutation rate “b”. When one does the calibration method, the mutation rate “c” is actually a very close estimate of the average mutation rate for that time span. Now when you take mutation rate “c” and try to apply it to a longer time span, what happens is that “c” was perfect for the time span in consideration, but in a longer time span one might encounter that there is “different modal” to mut-01 with mutation rate “a0”, and then mut-01 to mut02 with mutation rate “a1”, and so on until you actually reach mutation rates “a” and “b”. For simplicity purposes, I only assumed forward mutations, but in reality in could go anywhere, which further adds more uncertainty to the analyses. The problem is that when mutation rate “c” is applied to time frames that are much longer than where “c” was estimated, one is automatically assuming that all the mutation rates in the new time period being analyzed would average out to “c”, or a number close to it. In reality, due to the variability in the mutation rates, it doesn’t.  This is the reason why I’m thinking of creating a case-control simulation where I can implemented a mutation rate that is a function of the repeat number, and see how good of a fit do calibrated mutation rates found using say 50 generations are for sets that have common ancestry at 100 generations.  This should take me a while, because I want to design a project that could be used for multiple things, and I just got out of school this week, and I’m graduating Sunday. So I want to take a little break, before I get to it.



There is also the issue of how long a generation is, presumably this was estimated and would end up being built into the mutation rates.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 04, 2012, 11:33:46 AM
This is interesting. Marko Heinila, as noted earlier in this thread, has evaluated mutation rates by STRs, up and down rates as well as multi-steps.  He actually comes out with faster mutation rates than Chandler's germ line rates.   This puts the situation even more at odds with the very slow evolutionary rates that Zhivotosky et al uses.

Mike, the problem here is that the robusticity of these mutation rates are being tested in sets that share common ancestry in relatively short time frame. Then, those values are being extrapolated to time frames that are presumed to be three and four times longer. There is variability in the mutation rate as a function of repeats.....

I agree that you have valid concerns.  I don't know what the correct mutation rates are.  The only thing that has me leaning towards the germ-line rates being applicable for the STR durations (as depicted by Heinila) is that if I look at the ages of all of the haplogroups (I, G, E, etc.) in context, the resulting TMRCAs make sense, again for reasonable timeframes.

This should take me a while, because I want to design a project that could be used for multiple things,

Very, good. You could do a lot of good to progress the current models or come up with a new one.

and I just got out of school this week, and I’m graduating Sunday. So I want to take a little break, before I get to it.

Congratulations! That's great.  Make sure to enjoy your accomplishment. This is something that can never be taken away from you.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on May 04, 2012, 12:01:23 PM
I agree that you have valid concerns.  I don't know what the correct mutation rates are.  The only thing that has me leaning towards the germ-line rates being applicable for the STR durations (as depicted by Heinila) is that if I look at the ages of all of the haplogroups (I, G, E, etc.) in context, the resulting TMRCAs make sense, again for reasonable timeframes.

I agree with you, germ-line mutation rates are the way to go forward,  however, there is still a lot more case-control checks that ought to be done to assure that things such as variability in mutation rates, loss of linearity do not affect the outcome.

 

Congratulations! That's great.  Make sure to enjoy your accomplishment. This is something that can never be taken away from you.

Yes I’m pretty excited, although the job market here in the US is kind of bad. I’m graduating with a Bachelors of Science in Mechanical Engineering, and I have specialized(i.e. I have a minor) in Bio-Mechanics, specifically computational bio-dynamics.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on May 04, 2012, 03:17:30 PM
Mutation rates affect long term estimates the most I believe.  The faster mutators are characterized by fairly good sets of father/son meioses but the slower are very infrequent.

I think using tables of entries is questionable.  There are too many sets of families with redundant mutations. It is blatantly evident in the Ian Cam.  So, you overcount the occurrence of mutations.

As I said on another thread, 388 is a classic example of a dys loci which changes rate with allele value.  The difference in up/down rates would suggest a migration of the modal with time, but I haven't seen that in the data. 

Burgurellas 110 dys loci estimates are subdivided by motif (3, 4, 5 etc.).  Although only 3 and 4 are dominant in their data set.  So the motif appears to affect rate.

There are additional subtleties in which changes in latitude ( migration)  may have a short-term affect.

We are all homo sapiens, but we are also sentient beings and we respond to our environment.  I believe that is what darwin first postulated?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on May 16, 2012, 11:19:28 PM
I found this study while browsing online today, while it is not directly related to humans or Y-STRs for that matter, I think it provides good insights as to what I was talking about regarding the calibration of mutation rates and the time frames.

http://mbe.oxfordjournals.org/content/29/2/707.abstract (http://mbe.oxfordjournals.org/content/29/2/707.abstract)

Quote from: Crandall et al.2012

The rate of change in DNA is an important parameter for understanding molecular evolution and hence for inferences drawn from studies of phylogeography and phylogenetics. Most rate calibrations for mitochondrial coding regions in marine species have been made from divergence dating for fossils and vicariant events older than 1–2 My and are typically 0.5–2% per lineage per million years. Recently, calibrations made with ancient DNA (aDNA) from younger dates have yielded faster rates, suggesting that estimates of the molecular rate of change depend on the time of calibration, decaying from the instantaneous mutation rate to the phylogenetic substitution rate. aDNA methods for recent calibrations are not available for most marine taxa so instead we use radiometric dates for sea-level rise onto the Sunda Shelf following the Last Glacial Maximum (starting ∼18,000 years ago), which led to massive population expansions for marine species. Instead of divergence dating, we use a two-epoch coalescent model of logistic population growth preceded by a constant population size to infer a time in mutational units for the beginning of these expansion events. This model compares favorably to simpler coalescent models of constant population size, and exponential or logistic growth, and is far more precise than estimates from the mismatch distribution. Mean rates estimated with this method for mitochondrial coding genes in three invertebrate species are elevated in comparison to older calibration points (2.3–6.6% per lineage per million years), lending additional support to the hypothesis of calibration time dependency for molecular rates.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on May 17, 2012, 12:12:00 AM
I found this study while browsing online today, while it is not directly related to humans or Y-STRs for that matter, I think it provides good insights as to what I was talking about regarding the calibration of mutation rates and the time frames.

http://mbe.oxfordjournals.org/content/29/2/707.abstract (http://mbe.oxfordjournals.org/content/29/2/707.abstract)

Quote from: Crandall et al.2012

The rate of change in DNA is an important parameter for understanding molecular evolution and hence for inferences drawn from studies of phylogeography and phylogenetics. Most rate calibrations for mitochondrial coding regions in marine species have been made from divergence dating for fossils and vicariant events older than 1–2 My and are typically 0.5–2% per lineage per million years. Recently, calibrations made with ancient DNA (aDNA) from younger dates have yielded faster rates, suggesting that estimates of the molecular rate of change depend on the time of calibration, decaying from the instantaneous mutation rate to the phylogenetic substitution rate. aDNA methods for recent calibrations are not available for most marine taxa so instead we use radiometric dates for sea-level rise onto the Sunda Shelf following the Last Glacial Maximum (starting ∼18,000 years ago), which led to massive population expansions for marine species. Instead of divergence dating, we use a two-epoch coalescent model of logistic population growth preceded by a constant population size to infer a time in mutational units for the beginning of these expansion events. This model compares favorably to simpler coalescent models of constant population size, and exponential or logistic growth, and is far more precise than estimates from the mismatch distribution. Mean rates estimated with this method for mitochondrial coding genes in three invertebrate species are elevated in comparison to older calibration points (2.3–6.6% per lineage per million years), lending additional support to the hypothesis of calibration time dependency for molecular rates.
I haven't read their paper but I'm not really following the whole line of reasoning.
Is this still applicable given the long period of time?  I may misunderstand but they mention a "two epoch" model.  An Epoch is a long, long time.. way behind our period of interest.
Quote
In a geologic time frame, Epochs divide Periods into smaller chunks, and the lengths of Epochs range in the tens of millions of years. In the most recent Era, the Cenozoic, they have ranged from 2 million to 22 million years long. There have been seven Epochs in the last 70 million years or so, and the current Epoch is called the Holocene. You can see the Geologic Time Scale
http://answers.ask.com/Science/Nature/how_long_is_an_epoch


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on June 15, 2012, 02:05:29 PM
This is on the STR Wars thread as well, I compare variance both with a set of mixed speed markers and then with a subset that meet a 7000 year linear duration according to Marko Heinila. In all cases multi-copy (i.e. CDY, 464, 459, etc.) and null potential STRs (i.e.439, 425) are thrown out. In other words, in variance comparisons I have a rational, quality driven approach to using STRs. As far as Ken's TMRCA tool, he is essentially using everything but multi-copy STRs and requires the user to adjust nulls to an incremental GD.

Indeed you did:

http://www.worldfamilies.net/forum/index.php?topic=10513.msg129272#msg129272 (http://www.worldfamilies.net/forum/index.php?topic=10513.msg129272#msg129272)

However:

1) I believe that even though you used the 36 most linear(Having linearity longer than 7000 ybp) STRs you used per Marko.H calculations still show a wide variability in mutation rates. In fact, only 24 STRs in the 111 marker set provided by Marko.H have mutation rates lower than 10-3, thus making them slow STRs.

2)The calculations were perfomed on the R1b-L21+ dataset from the FTDNA Projects which are heavily populated by folks of British descent, so if the TMRCA of L21 in Britain is indeed 4000 ybp, then both most linear or mixed sets of STR ought to give you the same result.


3)In a nutshell you can argue that based on the calculations on R1b-L21+ the difference between using 36 STRs that have a linearity of 7000+ybp vs. a mixed bag of 49 appears to have little effect on variance for a set that is mostly populated by British guys.

4)Now can you extrapolate those conclusions to say P312+(i.e. DF27, U152,etc) folks from elsewhere in Europe? I for once wouldn’t do it.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on June 15, 2012, 05:27:03 PM
1) I believe that even though you used the 36 most linear(Having linearity longer than 7000 ybp) STRs you used per Marko.H calculations still show a wide variability in mutation rates. In fact, only 24 STRs in the 111 marker set provided by Marko.H have mutation rates lower than 10-3, thus making them slow STRs.

Marko looked like he had a good methodology and he used a lot of data and covered all the markers we are using so I chose to follow his methodology.  About a year a go I was doing the same STR variance calculations with different sets of markers based slowly on mutation rates.  Tim Janzen has done similar stuff and posted it on RootsWeb.  He is on the R1b-U106 S21 Yahoo Group if you'd like to discuss his findings with him.

My anecdotal observations in using only slow markers or very slow markers or what have you was that the relative variance still jumped around between haplogroups when they shouldn't.   Since, I have found that the 49 mixed speed set of STRs from the first 67 generally provide a consistent relationship of STR variance between haplogroups as do the 36 Marklo linear (for 7K) markers.  I was pleased to seem some cross-checked relationship to hold consistent like this so I've gone this route.

I have no statistical analysis of this, but I've read Ken Nordtvedt explain if you use only slow markers you lose precision. Essentially, this would be like measuring minutes with a calendar.  The size of sample you need to get this to average out with slow markers only consistently is enormous, at least that's my interpretation of what I've read.

2)The calculations were perfomed on the R1b-L21+ dataset from the FTDNA Projects which are heavily populated by folks of British descent, so if the TMRCA of L21 in Britain is indeed 4000 ybp, then both most linear or mixed sets of STR ought to give you the same result.

I'm not sure of the exact set of calculations you referenced, but I also did similar comparisons between and within the U152, U106 and Z196 haplogroups. Yes, I agree with you there is a heavy bias towards Americans and folks of Isles descent, but I just counted and over 30 European countries are included in these P312 and U106 data sets.  

3)In a nutshell you can argue that based on the calculations on R1b-L21+ the difference between using 36 STRs that have a linearity of 7000+ybp vs. a mixed bag of 49 appears to have little effect on variance for a set that is mostly populated by British guys.

I actually think it is more important to understand the relationships between clades and subclades first, regardless of geography. We know people within subclades are related whereas we can't say all L21 in Britain is closely related.  Some may actually be historical migrations from France or Scandinavia...  and we have migrations going the other directions as well.

4)Now can you extrapolate those conclusions to say P312+(i.e. DF27, U152,etc) folks from elsewhere in Europe? I for once wouldn’t do it.  

Are you saying we shouldn't attempt to calculate the age for P312 or some subclade unless we have a scientifically sampled set of data from everywhere P312 lives?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on June 15, 2012, 06:14:02 PM
My anecdotal observations in using only slow markers or very slow markers or what have you was that the relative variance still jumped around between haplogroups when they shouldn't.   Since, I have found that the 49 mixed speed set of STRs from the first 67 generally provide a consistent relationship of STR variance between haplogroups as do the 36 Marklo linear (for 7K) markers.  I was pleased to seem some cross-checked relationship to hold consistent like this so I've gone this route.

But your anecdotal observations come mostly from a set of R1b-L21 that is heavily dominated by one ethnic group. Also, you should indeed get consistent results if you used the 49 mixed speed set of STRs vs. the 36 most linear STRs, mainly because the fast mutating STRs would dominate in the variance calculation for either case, which explains why you get differences when you use the relative variance of slow markers.

I have no statistical analysis of this, but I've read Ken Nordtvedt explain if you use only slow markers you lose precision. Essentially, this would be like measuring minutes with a calendar.  The size of sample you need to get this to average out with slow markers only consistently is enormous, at least that's my interpretation of what I've read.

How do you know that you aren’t measuring miles with a 12 inch ruler instead? Ken Nordvedt probably said that referring to datasets that have a recent(i.e. less than 1500 ybp) known TMRCA, but unless tested, how can you assume that it is safe to go ahead and use fast markers on all R1b-L21 folks, do we know that their time frame falls within what would be appropriate to measure using fast markers?

I'm not sure of the exact set of calculations you referenced, but I also did similar comparisons between and within the U152, U106 and Z196 haplogroups. Yes, I agree with you there is a heavy bias towards Americans and folks of Isles descent, but I just counted and over 30 European countries are included in these P312 and U106 data sets.

The only relative variance comparison of 49 mixed vs.36 linear you have shown was for R1b-L21+ subclades, but if you have done it for U152, and Z196, go ahead and share the results, it would be interesting to see them. Yes there are over 30 European countries, but that doesn’t change the fact that the majority of haplotypes come from folks of British Isles descent, so unless the each one of the other European ethnic groups has a considerable sample size, their presence would merely act as outliers, which would probably not even affect the outcome by much given the total sample size. In fact let’s talk some numbers into the question, so I went ahead and downloaded your Haplotype_Data_L21_all excel file, this is what I got:

Total Haplotypes: 6119

England: 588

Ireland:1987

Scotland: 1142

Wales: 187

Total British Isles: 3904 (63.80% of the total sample)

Unknown origin: 1909

Now let’s look at other Europeans:

Denmark: 6

France: 84

Germany: 56

Italy: 11

Netherlands: 8

Norway: 28

Poland: 6

Portugal: 7

Russia: 7

Spain: 45

Sweden: 12

Switzerland: 6

All nonBritish Islands Europeans combined: 306

What effects do you think the 306 haplotypes from other European countries have against the 3904 haplotypes from the British Islands?



I actually think it is more important to understand the relationships between clades and subclades first, regardless of geography. We know people within subclades are related whereas we can't say all L21 in Britain is closely related.  Some may actually be historical migrations from France or Scandinavia...  and we have migrations going the other directions as well.

Yes, you are right some L21 could be of historical migrations from France or Scandinavia, but a good percentage of it isn’t.

Are you saying we shouldn't attempt to calculate the age for P312 or some subclade unless we have a scientifically sampled set of data from everywhere P312 lives?

I’m saying that if you haven’t tried to calculate the age of P312 using a different STRs sets(i.e. 20 slow Markers.vs.36 Most Linear Markers.vs.49 Mixed Markers) on a truly representative sample of P312, there is no way of telling if the age estimates you are getting for P312 using the current dataset which is heavily dominated by L21 folks are correct.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on June 29, 2012, 11:29:34 PM
It is important to understand the difference between a true Most Recent Common Ancestor and a Coalescence age calculation. I'm just cataloging this conversation here. John Chandler is a scientist from IT and Anatole Klyosov is a scientist in the biochemistry field.

Quote from: Rootsweb post by John Chandler
From: john.chandler@alum.mit.edu (John Chandler)
Subject: Re: [DNA] Calculation of TMRCA
Date: Fri, 29 Jun 2012 13:32:41 -0400

Anatole wrote:

> Welcome to DNA genealogy. The question which you have addressed is not simple, and even a scientist of
> such a great caliber as John Chandler came to a wrong estimate.

That's a remarkable conclusion. The fact is that I carefully refrained from expressing an estimate in this case because any solitary estimate would be more misleading than helpful. The passage from me that you quoted is illustrative:

> >...if the other two differ by just one step at each of the three
> >discrepant markers, the 95% confidence interval for
> >their TMRCA would extend to about 45 generations. ...

Observe that I emphasized the breadth of the probability distribution of TMRCA estimate to the exclusion of the estimate itself. It would improve communication if you actually *read* the posts you respond to. The bottom line for the calculation is that any TMRCA up to 45 generations is statistically plausible. If the surname in question were less distinctive, it would be foolish to assume that the MRCA even had a surname at all.

> In that situation the most applicable is the "permutation method",
> which is practically not known among folks in the field, though I
> have published it first in 2009,

Another way to improve communication is to use the same terminology as everyone else. The thing you are calling the "permutation method" has been widely used since time immemorial and is known as the variance method. The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Quote from: Rootsweb post by Anatole Klyosov
My response:
Dear John,

I have noticed that you were thinking for exactly a week before to respond. Frankly, I thought that you have realized that to give such an answer as you quoted yourself above is wrong indeed. It is of no use. You could have said that the 99% confidence interval would extend to about 150 generations. Or that 99.9% confidence interval would extend to about 1500 generations (or whatever, I did not waste time for those calculations). Who cares how it would extend at a certain confidence interval? Either you provide an estimate when you are asked, or honestly say that you do not know. Do not give elusive answers, nobody forces you to answer in the first place.

As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment.    
 
>The thing you are calling the "permutation method" has been widely used since time immemorial...

Well, maybe. Nothing is new under the moon. However, you did not use it and you did not come up with a specific answer to the question which was addressed. As (almost) always, you came up with a negative comment, and with nothing else. As I have informed you earlier, I do not buy negative comments if they do not give a direct answer to the question addressed .  

Regards,
Anatole Klyosov
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-06/1341014863


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on June 30, 2012, 12:16:00 AM
Quote from: Rootsweb post by John Chandler response to A.Klyosov

…The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Very true, the usage of mean haplotypes does tend to underestimate the true variance of a dataset, but not only that, there are other biological concepts that could do it too.  In fact by using the concept of a mean haplotype being the ancestral one assumes that the TMRCA is within the timeframe that is less than the fixation time for most STR markers used, however that is a complete wild guess in this case.

Now let’s analyze English Professor Anatole Klyosov and his wide usage of logical fallacies.(Actually I take it back, he barely used any fallacies :( )

Quote from: Rootsweb post by Anatole Klyosov
[…]
As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment. 

How is this relevant to the discussion, one might not know, but a good ole’ Ad Hominem is never bad when arguing, so one shouldn't be surprise if Klyosov brings the I have a “PhD/I am a teacher/I’m an expert and you are not” argument . 
 



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Dubhthach on June 30, 2012, 04:00:40 AM


What effects do you think the 306 haplotypes from other European countries have against the 3904 haplotypes from the British Islands?


I only see 1917 samples from "British Islands" in that list tbh.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on June 30, 2012, 11:49:46 AM

I only see 1917 samples from "British Islands" in that list tbh.


Well 3904 have their most distant known ancestor coming from the "British Islands", if one looks at column D where it says “Old World Country”, and one would see that Wales, Scotland, Ireland, and England add up to 3904.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Dubhthach on June 30, 2012, 01:54:00 PM

I only see 1917 samples from "British Islands" in that list tbh.


Well 3904 have their most distant known ancestor coming from the "British Islands", if one looks at column D where it says “Old World Country”, and one would see that Wales, Scotland, Ireland, and England add up to 3904.


The problem with your logic is Ireland isn't a "British island"


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on June 30, 2012, 02:10:07 PM

The problem with your logic is Ireland isn't a "British island"

Well, I'm sorry then, how would you refer to the Ireland+UK combo then? I actually would gladly change my terminology if you tell me a more appropriate one.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: alan trowel hands. on June 30, 2012, 02:30:05 PM
I dont see the fuss about the term British Isles.  British Isles as a collective term of great age that is far older than Britain as a political entity.  Ireland was one of the 'islands of the Pretani' (Cruithne in Gaelic) or at least its proven that they were one of the elements in the prehistoric Irish Population so the name has an historical basis.  Each to their own but I feel the problem with alternative collective terms is that they sound contrived or have no historical basis.  I think the problem is always going to be there because some people do not want a collective term anyway so simply Britain and Ireland /Ireland and Britain is probably the safest way to go to avoid treading on sensitive toes. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on June 30, 2012, 04:36:55 PM
Quote from: Rootsweb post by John Chandler response to A.Klyosov

…The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Very true, the usage of mean haplotypes does tend to underestimate the true variance of a dataset, but not only that, there are other biological concepts that could do it too.  In fact by using the concept of a mean haplotype being the ancestral one assumes that the TMRCA is within the timeframe that is less than the fixation time for most STR markers used, however that is a complete wild guess in this case.

Now let’s analyze English Professor Anatole Klyosov and his wide usage of logical fallacies.(Actually I take it back, he barely used any fallacies :( )

Quote from: Rootsweb post by Anatole Klyosov
[…]
As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment.  

How is this relevant to the discussion, one might not know, but a good ole’ Ad Hominem is never bad when arguing, so one shouldn't be surprise if Klyosov brings the I have a “PhD/I am a teacher/I’m an expert and you are not” argument .  
 


Some comment on yours and mikes conversations: 1.  I forwarded Mike a copy of a Science article by David Goldstein published in march 2001.  In that article he derived the variance equation and stated that the calculation of TMRCA was performed for each dys loci and averaged.  If the variance equation is used this way, and not by averaging mutation rates of dys loci, then the Sum of Squares for each dys loci  can be calculated and therefore the SD also.  Its a little cumbersome but it is defined in the literature.  re: this discussion, estimates have higher SD's when the range of mutation rates used is greater.  2. re: use of slower mutators, such as was done by  D. Janszen(sp) on rootsweb, I still don't believe that a formula approach replaces a lot of work to understand the set of entries you are working with.  Only counting unique mutational events, not inherited, is very  important.  On another thread, I have gone through in detail how I approached R- Z253 to estimate the TMRCA of this group.  Even though there are 3 entries with 388 =13, this represents only one mutation, similarly for 426; 7 entries but only one mutation.  Unless this kind of care is used, the accuracy of the current Coalescence ages of different haplogroups is highly questionable. Finally, for the faster mutators it is impossible to know the time history of what has occurred and I don't believe it is modelled by the random walk model?  All I generally observe is a mutation pattern centered around the modal value.  note: however, this pattern is broken when multistep mutations occur.

Finally, re Klyosov, I think it is still an open issue on whether he really has something different and correct to offer.  Intrinsically, for general Coalescence/TMRCA estimates, I believe that the use of slower mutators is necessary to make a rational estimate, that just makes good sense to me. Whether his permutation approach based on "chemical kinetics" is a valid approach is a TBD in my opinion.  final note:  that doesn't make him a nice person to work with however.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 06, 2012, 03:46:46 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 07, 2012, 01:49:07 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905

I haven't studied Hans work, but I believe also that STR mutations and SNP mutations are not related cause and effect wise. What I have observed is that the STR mutational process doesn't follow a drunkards walk model.  In that model the drunk departs from the lamppost by a square root law, i.e. after 25 steps he is 5 steps from the lamppost.  I replotted the table of Y STR frequencies from rootsweb and I do not see any consistent drift.  A few of the faster mutators have a wider range, but in general the modal is constant across SNP's.  This doesn't necessarily imply independence but it suggests the modal value(s) is a preferred state for the dys loci. In other words the drunk migrates around the modal, not away from it.

Theres a lot we don't understand about STR mutations.  What is their real purpose?  They don't appear to influence genes?  Is their sole purpose to act as a clock?  Then why do they have such a dynamic range?  Food for thought.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 08, 2012, 03:24:55 AM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905

I haven't studied Hans work, but I believe also that STR mutations and SNP mutations are not related cause and effect wise. What I have observed is that the STR mutational process doesn't follow a drunkards walk model.  In that model the drunk departs from the lamppost by a square root law, i.e. after 25 steps he is 5 steps from the lamppost.  I replotted the table of Y STR frequencies from rootsweb and I do not see any consistent drift.  A few of the faster mutators have a wider range, but in general the modal is constant across SNP's.  This doesn't necessarily imply independence but it suggests the modal value(s) is a preferred state for the dys loci. In other words the drunk migrates around the modal, not away from it.

Theres a lot we don't understand about STR mutations.  What is their real purpose?  They don't appear to influence genes?  Is their sole purpose to act as a clock?  Then why do they have such a dynamic range?  Food for thought.

Perhaps what you are seeing is not a drunkard walking 25 steps and ending up only 5 steps away, but a family growing over time, sometimes one step, sometimes stationary, sometimes a step the other direction. Given the period of time (# generations and transportation available) the family reached only about 5 steps from its ancestor.  Of course major parts of the family may have disappeared (gone extinct) so the branching out is the same in all directions.

Why does everything have to have a sole purpose that we understand? The sun always arose in the east, even if we did not understand why.  Still it was a good event to measure time with - the start of a day. The ancients knew this before we knew what the sun really was.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 08, 2012, 07:35:38 AM
I'll repeat my assertion.  Over the haplogroups E1a to R1b (7 different hgs); there is no apparent drift in the modal value. As someone noted on rootsweb once, He had the same value at 439 as a chimpanzee and we parted ways millions of years ago.

I agree we don't have to understand everything in order to surivive.  Thats not the point.  Dienekes just published the results of 3+ studies debunking the out of Africa 60K years ago. I believe we also have quite a few myths that describe the properties and evolution of STR's that were postulated before sufficient data existed.

The current philosophy is that the process is random (whatever that means); that the 200:1 dynamic range of mutation rates doesn't matter (use the average of all dys loci under consideration when making variance estimates).  That the Gaussian model ( called the great intellectual fraud by Taleb) describes the distributions we see, etc.  We can surivive with these myths but how will it help us to increase our understanding of the mutational process?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 09, 2012, 12:50:25 AM
I'll repeat my assertion.  Over the haplogroups E1a to R1b (7 different hgs); there is no apparent drift in the modal value. As someone noted on rootsweb once, He had the same value at 439 as a chimpanzee and we parted ways millions of years ago....
Are you saying E1a's modal is about the same as the Western Atlantic modal?
Are you talking about just DYS439 out of the 111 we can see from FTDNA haplotypes?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 09, 2012, 07:01:57 AM
No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 10, 2012, 03:33:33 AM
No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?

I am not saying that mutations are completely and ultimately random. I think that most probably are practically random. By that I mean we can only observe somewhat random patterns. We don't have any other piece of data that can be used in a cause-effect predictive manner.

We do know that most, and I mean almost all Y DNA lineages have gone extinct.  Out of the millions, we only of have a relatively few that survive.

Given the few surviving branches of the human Y DNA (paternal lineage) tree and the practically random nature of most STR mutations, I don't see anything extremely unusual or profound about a large old tree with some big branches, then smaller and smaller branches that sometimes cross.  The tree may be lopsided with some branching (facing the sun perhaps) that have become bushy, full of twigs, compared to other parts of the tree.

If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on July 10, 2012, 06:09:37 AM
If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.
But this is very meaningful instead, i.e. it demonstrates that some markers have had many mutations around the modal, whereas others have had mutations for the tangent, what I have always said and what falsifies all your theories.
I am confident that aDNA will demonstrate all my theories.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 10, 2012, 07:05:45 AM
No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?

I am not saying that mutations are completely and ultimately random. I think that most probably are practically random. By that I mean we can only observe somewhat random patterns. We don't have any other piece of data that can be used in a cause-effect predictive manner.

We do know that most, and I mean almost all Y DNA lineages have gone extinct.  Out of the millions, we only of have a relatively few that survive.

Given the few surviving branches of the human Y DNA (paternal lineage) tree and the practically random nature of most STR mutations, I don't see anything extremely unusual or profound about a large old tree with some big branches, then smaller and smaller branches that sometimes cross.  The tree may be lopsided with some branching (facing the sun perhaps) that have become bushy, full of twigs, compared to other parts of the tree.

If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.

I think the reality is that many of our trees have run out of branches/branchpoints and are dead ends.  I gave an example of the problem I had with my problem in the states.  I have finally found two similar haplotypes in R - Z253 but one has only 67 dys loci measured. How can I use my 632 = 10 in a comparison of two entries?  It contributes thousands of years of mutational time.  And further, after I converge us two, I still have two other very dissimilar modal haplotypes to converge.  It is very clear to me that R - Z253 is much older than 2k years.  I haven't figured out a good way to estimate its age though.  In addition to the 632 we have 426 and 388 and 393 and others to compare.  These are some of the slower mutators.  The only way I can see is to use the approach Nordtvedt advocates and essentially average out the mutational rate differences?  I'm not convinced that is correct.

I don't really know for sure that Y STR dys loci were designed to be a clock of male heredity.  However, if they were than there is/was a purpose to the design and having a range of 200 or so in mutational rate has some meaning?

In summary, I think we have at least 3 kinds of time estimates: a. TMRCA's b. Coalescence and c. Dead-ends.  The problem appears to be that we can't sort them out?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 11, 2012, 02:21:25 AM
I think the reality is that many of our trees have run out of branches/branchpoints and are dead ends.

Yes, absolutely, there are are many extinct paternal lineages. This has been going on for a long time.

I gave an example of the problem I had with my problem in the states.  I have finally found two similar haplotypes in R - Z253 but one has only 67 dys loci measured. How can I use my 632 = 10 in a comparison of two entries?  It contributes thousands of years of mutational time.  

On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

 It is very clear to me that R - Z253 is much older than 2k years.

I agree. Who is saying that Z253 is 2K ybp present? The initial TMRCA interclade estimates (with Nordtvedt's method) I did had Z253 back close to the time of L21, or I guess DF12 we should say.

I haven't figured out a good way to estimate its age though.  In addition to the 632 we have 426 and 388 and 393 and others to compare.  These are some of the slower mutators.  The only way I can see is to use the approach Nordtvedt advocates and essentially average out the mutational rate differences?  I'm not convinced that is correct.

Agreed, but I think that is the best we can do. We can throw out STRs that are clearly aberrations, like in the case where there are null DYS425s but that is somewhat aribitrary. How do we know what's an aberration versus a signal of old age? This is where the light in my head comes on related to Nordtvedt's statements that more STRs means more experiments and more experiments means better chances at finding the true patterns.

I don't really know for sure that Y STR dys loci were designed to be a clock of male heredity.  However, if they were than there is/was a purpose to the design and having a range of 200 or so in mutational rate has some meaning?

I'm pretty sure our designer didn't really have STR mutations so that we could use them as a pseudo clock. That's our contrivance - Emile Zuckerkandl and Linus Pauling. http://en.wikipedia.org/wiki/Molecular_clock

In summary, I think we have at least 3 kinds of time estimates: a. TMRCA's b. Coalescence and c. Dead-ends.  The problem appears to be that we can't sort them out?

Why do you say that? There are different calculations and I think we include intraclade and interclade as variations of TMRCA estimates. As far as "dead-ends", they are accounted for. The variance models have formulas for both cases - the entire population is known, or for a partial population. The partial population assumes there are missing lineages, which would include the dead-ends.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on July 11, 2012, 08:17:04 AM
On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals. Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 11, 2012, 08:18:52 AM
Well written MIke.  Good food for thought.  A couple of points.  re: age of Z 253, I was referring to your R-L11 "Big Picture" Timeline which shows R-Z253 as most probably 400 AD or so?

I'm still not sure about "washing out aberrations" produced by slow mutators, although in fairness, the Ian Cam have a mutation at 426 which, if included, adds several hundred years of mutational time to the estimate even with some 40+ entries.  Since we can't be precisely sure when the 11 to 10 mutation occurred, it might be right or wrong.

re: molecular clock.  most of the early work was with autosomal not STR's.  That said, its not clear why they should be different except for the fact that they appear "extra" in some sense and not related to our genetic picture as commonly thought of.  You have provided some fine references I haven't read before, so I should probably withhold any further comments at this time.  edit:  weren't the STR's originally called "junk" DNA?

I'm still not sure that interclade handles the apparent major gap in R1b between P 312 and prior SNP's? Look at Markos estimates of TMRCA for R1B.

In sum; you've made some good points here and referenced some good sources (indirectly through wikipedia).  Professor Allan from NZ looks especially interesting. I'm still not sure that simply averaging out STR mutations is the right way to go ahead, but I can't disprove it either.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 11, 2012, 09:10:43 AM
On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals. Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.

You're right on Jean.  In fact the earliest formulation of the TMRCA calculation included the comment that you compute the TMRCA dys loci by dys loci and then average.  The SD of the estimate is the square root of the sum of squares divided by N - 1.  This clearly shows that including a slow mutator with fast mutators will increase the SD.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 11, 2012, 12:30:29 PM
 quote: On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations. end of quote.

This is one para of your comments that does bother me.  Yet it may be fully correct to say that we have to average out the range of mutational rates? However,  As you know, several times I have referred to Talebs book on the Black Swan.  It deals with trying to understand the occurrence of rare events and their impact.  Its a little pompous and reading through some of the negative reviews at Amazon.com is instructive for trying to put his ideas in perspective.

For those who are investors, he likens a black swan to what happened to Long Term Capital Management Corp., a hedge fund that was playing in the derivatives market.  Two of the founders were economists who got the nobel prize for their work applying economic theory to the market.  They basically applied the Gaussian model to their investment model and got hit by a black swan.  They went belly up in the late 90's.

I believe the range of mutational rates can cause events similar to a black swan in the mutational process and distort or significantly affect our estimates of time.  I don't have any answer to this issue yet. It may be as you say it is an aberration and has to be averaged out.  I'm just not sure?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on July 11, 2012, 01:18:16 PM

On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals.

Yes, slow markers are more stable. That is essentially by definition.

Yes, calculating with a single STR has got to have very wide confidence intervals. I would guess that would make such an estimate not very useful.


Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.

By "washing out" I just mean that the more data you have, generally, abberations do not significantly impact the final estimates.

Yes, I agree that we should deselect STRs that do not have a linear relationship with time for the duration in question.  The problem is figuring out which to deselect. It isn't necessarily the fast ones. There is some work that problems occur at higher allele value STRs. The only deselections that I have rationale for are multi-copy ones, null cases and then the research that Marko Heinilla did.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: JeanL on July 11, 2012, 01:56:56 PM
Yes, slow markers are more stable. That is essentially by definition.

Well, yeah, in fact they are slow markers, because they are regions in the Y-Chromosome less prone to slippage, thus they are more stable. So them being more stable is what makes them slow markers.

By "washing out" I just mean that the more data you have, generally, abberations do not significantly impact the final estimates.

Well bigger sample size in terms of population can only do good, so yeah, more data is generally better.

Yes, I agree that we should deselect STRs that do not have a linear relationship with time for the duration in question.  The problem is figuring out which to deselect. It isn't necessarily the fast ones. There is some work that problems occur at higher allele value STRs. The only deselections that I have rationale for are multi-copy ones, null cases and then the research that Marko Heinilla did.

There is yet another problem, which I recently started pondering about, and under a Wright-Fisher expansion model would definitely posit a problem, at least theoretically, and that is that if the TMRCA of a given population is older than the “time to fixation/reaching majority frequency” of a given STR, then it would yield the erroneous modal value, and thus cause the estimates to appear much younger than they should. I have started running simulations about it, but I would have to run a significant number of simulations before any meaningful result is produced, in the mean time, what I have observed is that when the TMRCA is 5 folds the time to majority of an STR, then the effects are immediate, and they are observable even within a few simulations. Marko H research that has thus far been presented here has nothing to do with modal values, I believe he was observing the relationship between repeat number and mutation rates, as well as the range of observed alleles and mutation rates. The effect of what I am talking about would actually manifest itself as a difference of probably 3 to 5 folds in the TMRCA when calculating it using a set of slow markers vs. a set of fast markers.  Essentially, if proven true, it means that the there is no way to calculate the putative ancestral allele value of fast markers, if the MRCA occurred at a time earlier than 1/μ generations, where μ is the mutation rate of the STR.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 15, 2012, 02:07:00 PM
Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I don't know if there are any  papers out there, but I have been studying the  yfreq database at ancestry.com: www.freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm and have extracted some information which may apply here.  First a question about the word diversity in the Y STR context.  Is it simply the reduction in the modal value or is it the spread of the distribution.  I make that distinction since many mutations are about the modal and back mutations are prevalent, possibly preserving  the modal value.  In any case, I try to look at both criteria where possible.
 
I've reached several conclusions: 1. Diversity is not a measure of age since mutation rate varies with modal value, independent of Hg, for a dys loci.  2. This leads me to conclude that using the same mutation rates for all Hgs is in error since they don't all have the same modal values.  3. High off modal values may be due to the fact that a mutation occurred near the time of the founder and fixation occurred.

data:  look at 390 in the data set, especially hg E3a with a modal of 21 and 91% value. It has the smallest diversity of all 7 hgs presented.  The next lowest is G with a modal at 22 and 75%.  All the remaining hgs have higher modals.  Given that E3a is older or at least as old as the other hgs, I conclude that the modal value is the issue.  391 is a similar example with R1b having the highest modal value 11 at a (67%) value.  Consider 426 in which the first 5 hgs have a modal value of 11 and all have about 99%+ values.  R1a and R1b have a modal of 12 and a very slightly lower value (99,98). Finally look at 388. Ea,b, R1a,b all have modals of 12 (note that R1a must have had a multistep to 10).  The oddballs in this set are G with 50% at 12 and 50% at 13 and then we have I and J2.  I has a modal at 14 (56%) and J has a modal at 15 (71%).  Applying R1b derived rates to these hgs would significantly affect the variance estimate for this locus.

To me the message is clear; there is a lot more complexity in this process than we are currently accounting for and diversity has limited application in describing this process.  JMHO

edit:  I have found another leo little web page, which expresses the above comments quite vividly. It is at the same site above and presents the genetic diversity as a function of dys loci over the same 7 Hgs.  It clearly presents the random variability in gene diversity from locus to locus.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on July 19, 2012, 01:54:31 PM
I was hoping to generate some discussion on Leo Littles work on gene diversity.  His data presents clearly that diversity and age are not synonymous.  My impression is that there is a sensitivity of mutation rate to allele value,  lower values usually having lower rates.

I believe it is critical to our understanding of relative ages of Hg's, where there are small but important changes in the modal value.

Is there some other interpretation we can give to Leo's data?  I'd like to hear about it.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on September 14, 2012, 08:32:27 AM
I copied this from another thread so as not to distract from it.

....
And let me just say....  Y-DNA STRs are so old school.
...

What do you mean by that? The use of Y DNA STRs in understanding paternal lineages in both genetic genealogy and population genetics has been very helpful. They are no panacea for the world's ills but another source of data related to genetics that is sorely needed.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: acekon on September 21, 2012, 10:58:29 AM
Using str's as a supplement is one thing. Mapping variance in 9 str's and making broad assumptions, and then asking for samples of Z2105+/-  in the same breath is quite simply a paradox.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on September 21, 2012, 11:09:18 AM
Using str's as a supplement is one thing. Mapping variance in 9 str's and making broad assumptions, and then asking for samples of Z2105+/-  in the same breath is quite simply a paradox.

I agree, except for the part of Z2105 and a paradox. I'm not sure what you are talking about on that piece.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 01, 2012, 04:56:42 PM
Haplotype diversity does not equall origin. If you want proof of that then study any of the private SNPs and their modal haplotypes.Modal haplotype and SNP go hand in hand. If you make the assumption that we are all descended from a single SNP like U106 then you have to think that at one time U106 was a private one.So you will all find the answers that you are looking for by studying them. This is only my opinion.

I copied this over from the IE mapping thread because it could take that conversation off track.

I do not think haplotype diversity equals origin, but I think it can be indicator of direction of movement, and is more useful than frequency for indicating that.

I do not see proof how private SNPs and modal haplotypes prove (you use the word "proof") what you are saying. Please explain.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 02, 2012, 03:44:31 AM
A modal haplotype is formed from the majority of people having a certain set of ystr values. Majority always rules. The modal also shows which SNP the group belongs to and suggests it is ancestral for that group. I have studied thousands of haplotypes at ysearch over the last six years.I found the markers for the Clan Colla group and it took me months to convince some of them. Now I can identify the whole group with one marker. I can identify the specific markers for some other subclades. The L1 group can be identfied with one marker as well.

A portion of this message was deleted.  Terry


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 02, 2012, 11:28:55 AM
A modal haplotype is formed from the majority of people having a certain set of ystr values. Majority always rules. The modal also shows which SNP the group belongs to and suggests it is ancestral for that group. I have studied thousands of haplotypes at ysearch over the last six years.I found the markers for the Clan Colla group and it took me months to convince some of them. Now I can identify the whole group with one marker. I can identify the specific markers for some other subclades. The L1 group can be identfied with one marker as well.

I agree with you in that a mode is a statistical concept that most common value (not necessarily majority, though) of a particular characteristic (STR in this case) within a population.

I agree with you that the modal haplotype only suggests or indicates an ancestral value. We don't really know nor can't know the ancestral values without digging up bones for the ancestor.

Sometimes there are correlations of STR markers to SNPs but this is not solid, 100% accurate. For example, U106+ people typically have 492=13 and P312+ rarely do, but there are P312+ people that are 492=13 and one higher, even 492=14.

A portion of this massage was deleted.  Terry

I know you said that diversity does not equal origin. I agree, they definitely don't equal, but I still think diversity is useful for ascertaining age and can be indicative for direction back to the origin. I don't see anything in your discussion where you explain why that isn't true.  Please explain.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 02, 2012, 01:42:34 PM
Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.
I have looked at a few private or family SNPs and haplotypes and I have a better understanding of things.That is how all of these major subclades started. You are the person that states often that we are all descended from one R1b-L11 man.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 02, 2012, 04:50:39 PM
I've reversed the order of your post so I can respond to the clearer stuff first.

I have looked at a few private or family SNPs and haplotypes and I have a better understanding of things.That is how all of these major subclades started. You are the person that states often that we are all descended from one R1b-L11 man.

Yes, we think that the mutation L11 (aka S127) is a UEP (Unique Event) type SNP. In other words, it happened only once in the lineage of all R1b people. Therefore, it follows that any subclade, marked by an SNP that descends from the L11 family, has a single common ancestor. Essentially, its by definition of phylogenetic tree that there was one man that was the most recent man from which all existing P312, U106 and L11* descend from.

The same can be said for U106. The U106 (aka S21) mutated only once and is carried on to all of the Y descendants of that man.  We also know that this U106 man can't be older than the most recent common ancestor for all of U106 and P312.

Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.

I agree that the age of the first U106 man is what it is.

You can say U106 originated in Ireland and here's my response. I don't know where the first very first U106 man originated. I don't think we can ever know. So you see, my answer was not "diversity equals origin."

If you ask in what directions from where U106 expanded to its current positions, I can give you opinions on that based on diversity of STRs, SNPs (notice I didn't say just STRs), relatives of U106 (such as brothers like P312 and L11*, cousins like L51*, sons like Z381, etc.), and in the context of cultural and linguistic knowledge.  We probably need to look hard at very distant relatives that might have cohabitated as well, such as I1, R1a1, etc.  as well as correlations with mt DNA.

I'm not following your proof, though.
Haplotype diversity does not equall origin. If you want proof of that then study any of the private SNPs and their modal haplotypes

Are you saying gene diversity has no relationship to time, i.e., the number of generations back to a common ancestor? The difference between a modal haplotype and an ancestral haplotype doesn't change the concept of mutations accumulating over the generations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jarman on October 02, 2012, 05:09:46 PM
Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.

The age of U106 does not change but their TMRCA certainly are different.  Neither calculation would tell us anything about the populations' diversity in Cornwall or Ireland or Antarctica.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 02, 2012, 05:50:13 PM
I tried to add this in the "personal attacks" thread but no replies are allowed there.

....
A portion of this massage was deleted.  Terry

I see Terry deleted two sentences here. He certainly has the right to do so. I just want to be clear everyone I used no obscenities and did not personally attack anyone. I guess Terry just wants to make sure that element of that conversation has stopped, which I agree with and I proactively support Terry in this effort.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 02, 2012, 05:58:00 PM
I didnt say that U106 originated in Ireland but I can say that some subclades or SNPs did. Its the same with all the haplogroups that are found here.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 03, 2012, 12:43:21 PM
Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.

The age of U106 does not change but their TMRCA certainly are different.  Neither calculation would tell us anything about the populations' diversity in Cornwall or Ireland or Antarctica.

How can one man have six mutations in 6000 years and two others have sixteen and twenty five in the same time frame?Is this the wrong modal for U106(HXTNR).Shouldnt everyone have the same gd from a modal haplotype?How did you guys work out a TMRCA for U106?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 03, 2012, 01:58:27 PM
Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.

The age of U106 does not change but their TMRCA certainly are different.  Neither calculation would tell us anything about the populations' diversity in Cornwall or Ireland or Antarctica.

How can one man have six mutations in 6000 years and two others have sixteen and twenty five in the same time frame?Is this the wrong modal for U106(HXTNR).Shouldnt everyone have the same gd from a modal haplotype?How did you guys work out a TMRCA for U106?

Isogg has a great line concerning your First and Third question.
http://www.isogg.org/wiki/Most_recent_common_ancestor

"Even though each living person receives genes (in original or mutated forms) in dramatically different proportions from these ancestors from the identical ancestors point,[5] from this point back all living people share exactly the same set of ancestors,..."

I just ran U106 all 111 markers from a list I had in August of 320 HTs .

YrsPerGen=30

IntraClade Coalescence Age
Generations    StdDevInGen   YBP    +-YBP    VARP    SD   

105.8   21.2   3,173.4   636.3   24.873   4.987

Founder's Age
Generations   StdDevInGen   YBP  +-YBP   Max   VAR   SD
106.1   21.2   3,183.3   637.3   3,820.6   24.951   4.995

Using 111 markers has been showing 10-20 generations (300-600 YBP) less than what 67 markers will show.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 03, 2012, 03:49:26 PM
Lets assume that U106 is 6000 ybp. There is a U106 man from Cornwall who has a gd of 6 from the U106 Modal and I have 16 from the modal. Between us there are 22.The age of U106 does not change. Another example another man here in Ireland is a gd of 25 from the modal and I am 16 from the modal and we have a gd of 41 between us and the age of U106 stays the same. If I say to you that U106 originated in Ireland what will you say? Diversity equals origin.

The age of U106 does not change but their TMRCA certainly are different.  Neither calculation would tell us anything about the populations' diversity in Cornwall or Ireland or Antarctica.

How can one man have six mutations in 6000 years and two others have sixteen and twenty five in the same time frame?Is this the wrong modal for U106(HXTNR).Shouldnt everyone have the same gd from a modal haplotype?How did you guys work out a TMRCA for U106?

Everyone should not have the same GD from the ancestral haplotype. It is by chance that we end up where we are.

I like to envision a branching tree.
(http://www.finegardening.com/CMS/uploadedImages/Images/Gardening/Issues_71-80/041071080-01_med.jpg)

Some branches grow almost straight up but end up with smaller branches and twigs spreading to the side. Other branches shoot out to the side initially but some of their descendant branches grow back up. I don't think the actual Y DNA tree is nice and round though. Some major limbs probably died and fell away. Other branches grew faster.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 04, 2012, 01:43:42 PM
I'm just cataloging this post here so we don't lose it.
Quote from:  Vincent Vizachero
I think that David's question is whether there is evidence that haplogroups have been shown to have different mutation rates. The number of samples that would be required to produce such evidence is huge: way beyond the number collected for any study to date.
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-10/1349347745

My comments are that we how we are planning to apply STR mutation rates. Since we are using them to estimate ages, etc. what we care about is the expected mutation rate, not necessarily the observed rates.

I don't know of any reason to think that Y STR mutations are biologically linked to Y SNP mutations.  Does anyone?  If not, we should not expect mutation rates to be different by SNP based haplogroup within our species.  You could think of it as we are more alike than we are different.

While it is true that different markers have different rates, and that even a single marker has a different probability of mutating depending on its original value, the number that matters is the sum of mutation rates for all markers tested.

Quote from:  Vincent Vizachero
When you are testing 37 or 67 markers, it may be that one haplogroup has a higher modal value for marker A than another haplogroup. But if it has a lower modal value for marker B, the effect of allele length cancels out to some degree.

Over even just 37 markers, haplogroups have an average allele length that varies by only 1-2%: not enough to produce a significant difference in mutation rate.

I think you can see why Ken Nordtvedt says each STR is like a separate experiment and the more experiments you conduct the more reliable your aggregate results will be.

Translation: Long haplotypes are much better than short/bikini haplotypes for analysis. (EDIT: I meant for statistical analysis and the like, the more STRs, the better. Of course, a key off-modal STR might be critical for identifying potential matches.)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 04, 2012, 03:26:35 PM
If a man is predicted R1b and he has a null value at 439 then I could find all of them with just that one marker at ysearch if I wanted. . So ydna is a powerfull tool. One can tell a lot with bikini haplotypes.
Furthermore, there is a link between modal haplotype and SNP.
Ken N may be a genious and others too but  we dont have to give them a blank check.We have the right to question everything  and there are lots of them that I could ask. After all it is my dna and my identity that he refers to when he mentions anything about U106.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 05, 2012, 12:21:28 PM
This is a quote from Anatole Klyosov in the same Rootsweb thread as the below.
Quote from: Anatolke Klyosov
... The above question was repeated and answered here dozens of times. In fact, the same mutation rate constants are fully applicable to all and any haplogroups. Lately, this issue was analyzed in details using thousands of haplotypes from various haplogroups. See the paper "Mutation rate constants in DNA genealogy (Y chromosome)". Adv. Anthropol., 2011, v. 1, No. 2, pp.
26-34.
http://www.scirp.org/journal/aa/
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-10/1349429460

----------------------------------------------------
I'm just cataloging this post here so we don't lose it.
Quote from:  Vincent Vizachero
I think that David's question is whether there is evidence that haplogroups have been shown to have different mutation rates. The number of samples that would be required to produce such evidence is huge: way beyond the number collected for any study to date.
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-10/1349347745

My comments are that we how we are planning to apply STR mutation rates. Since we are using them to estimate ages, etc. what we care about is the expected mutation rate, not necessarily the observed rates.

I don't know of any reason to think that Y STR mutations are biologically linked to Y SNP mutations.  Does anyone?  If not, we should not expect mutation rates to be different by SNP based haplogroup within our species.  You could think of it as we are more alike than we are different.

While it is true that different markers have different rates, and that even a single marker has a different probability of mutating depending on its original value, the number that matters is the sum of mutation rates for all markers tested.

Quote from:  Vincent Vizachero
When you are testing 37 or 67 markers, it may be that one haplogroup has a higher modal value for marker A than another haplogroup. But if it has a lower modal value for marker B, the effect of allele length cancels out to some degree.

Over even just 37 markers, haplogroups have an average allele length that varies by only 1-2%: not enough to produce a significant difference in mutation rate.

I think you can see why Ken Nordtvedt says each STR is like a separate experiment and the more experiments you conduct the more reliable your aggregate results will be.

Translation: Long haplotypes are much better than short/bikini haplotypes for analysis. (EDIT: I meant for statistical analysis and the like, the more STRs, the better. Of course, a key off-modal STR might be critical for identifying potential matches.)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 06, 2012, 04:56:10 PM
Mike,

I assumed you used MarkoH's Up/downMutationRatesRatio paper and his range of accuracy of the linear and quadratic models sheet to remove those STRs 7K ybp and under, along with removing the Multi-copy markers? Did you make some substitutions for a specific reason?

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 08, 2012, 12:30:00 PM
I assumed you used MarkoH's Up/downMutationRatesRatio paper and his range of accuracy of the linear and quadratic models sheet to remove those STRs 7K ybp and under, along with removing the Multi-copy markers? Did you make some substitutions for a specific reason?

Reply #146 of this thread as the output of Marko Heinilla's linear duration assessment of FTDNA's first 67 STRs. http://www.worldfamilies.net/forum/index.php?topic=10513.msg130009#msg130009


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 08, 2012, 12:37:38 PM
I'm just cataloging this here. It's a comment/post from a person who is an actuary.

Quote from: Sandy Paterson
... A mode is not used to calculate a mutation rate. The number of mutations are observed (6) and divided by the number of meioses (14,553). That gives .0004123 which is what they report. It really doesn't bother me how many mutated off 14 compared to how many mutated off 13 and so on.

There is some evidence presented by Ballantyne (I think it was) that the number of repeats provided a partial explanation for variance in mutation rates. I checked their work and found that it added hardly anything to R-squared.
http://archiver.rootsweb.ancestry.com/th/read/DNA-R1B1C7/2012-10/1349710154

Sandy is also the guy who says we want at least 50 STRs in our calculations. My reaction is that we should not remove STRs from our calculations unless we have good reason to do so. Everyone seems to agree that multi-copy markers can cause problems. In addition, there is no doubt that some STRs (apparently the ones with high repeats) reach saturation levels but where is the line drawn?  I'm not sure, but I feel pretty good about using the 36 markers that have the longer linear durations according to Heinilla. They seem to work in terms of consistency in comparing one haplogroup to another.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 08, 2012, 01:46:50 PM
I assumed you used MarkoH's Up/downMutationRatesRatio paper and his range of accuracy of the linear and quadratic models sheet to remove those STRs 7K ybp and under, along with removing the Multi-copy markers? Did you make some substitutions for a specific reason?

Reply #146 of this thread as the output of Marko Heinilla's linear duration assessment of FTDNA's first 67 STRs. http://www.worldfamilies.net/forum/index.php?topic=10513.msg130009#msg130009
Ah,  I forgot this old post. Its a little different than the page Marko recently sent me which I compared using his Linear YBP column.

https://dl.dropbox.com/u/50201824/old/timevaluesEtc/updownratio.html

and a link to his linear page for the above page.

"and the range of accuracy of the linear and quadratic models, the results are given here. "

https://dl.dropbox.com/u/50201824/old/timevaluesEtc/variance2.html

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 09, 2012, 12:21:30 AM
https://dl.dropbox.com/u/50201824/old/timevaluesEtc/variance2.html

Thanks, Mark. Do you have this document in a spreadsheet or CSV format?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 09, 2012, 09:05:19 AM
Here is a spread I just created of MarkoH's linear webpage.

https://docs.google.com/open?id=0By9Y3jb2fORNcXRTQUFiSEJBc3c

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 10, 2012, 02:18:56 PM
Mark
what would you have to do with your formula to have a TMRCA of 6000 years for U106?



Here is a spread I just created of MarkoH's linear webpage.

https://docs.google.com/open?id=0By9Y3jb2fORNcXRTQUFiSEJBc3c

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful thrsan Hg frequency?
Post by: Mark Jost on October 10, 2012, 03:06:22 PM
Mark
what would you have to do with your formula to have a TMRCA of 6000 years for U106?

If this isn't what you meant, please let me know. There are several factors to consider.

TMRCA in Years is determined by the chosen Years per Generation times Generations. I have been using 30 as the standard, which I feel maybe an average of, lets say today back to 1AD of thirty years per gen and 20 year rate for BC time frame. So if the 20 year figure was higher then the TMRCA age increase, of course.

Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate.

Variance is affected by the quanity of haplotypes all assumed to be in the same or under a common Subclade farther down the tree trunk.

Assuming haplotypes used are either tested to the above criteria or match others in the same clade such as  varieties with one or more tested positive with a specified SNP.

With the extended panel (68-111) the confidence increases significantly and actually (it appears to be) a males age increases so does the number of mutations that can be transmitted. This increases the variance and the Mutation rate which causes the mrca in generations to decrease.

The experts may have different info.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 10, 2012, 04:31:35 PM
Mark
you are brilliant at maths but if the TMRCA for U106 is 6000 then the formula is wrong. I think someone posted recently that M269 is 9000 ybp how would  the TMRCA of U106 be between 3000 and 4000 ybp?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 10, 2012, 10:20:53 PM
Mark
you are brilliant at maths but if the TMRCA for U106 is 6000 then the formula is wrong. I think someone posted recently that M269 is 9000 ybp how would  the TMRCA of U106 be between 3000 and 4000 ybp?

By someone else's calculation you say that U106 is 6K years before present.

I am using 111 marker haplotypes at a quanity of 320. Using the latest mutation rates by Marko via KenN' Generation111T engine with 17 multicopy markers removed using 94 net getting rid of any saturation effects.

U106's True Sample (n-1) variance has a TMRCA in Generations of 101.4. Using 30 years per generation equals 3,042.0  +- 631.4 years before present with a 68.27 percent (1-Sigma) spread. Even at 2 sigma, it is 4.3K old.

111 marker panels are showing nearly the same TMRCA  as 67 Markers do using the same data set and is younger than using 37 marker set.

When I ran Busby's 15 (14) markers M269 with n=1035  using the same tool, it had an estimate of 143.0 +- 62.3 generations. Using 30 years per Generation, the True Founders TMRCA was 4,289.8 +-1,870.0 years before present. About 1,200 years between M269 and U106.

AND, L21 ran at Busby's 15 (14 used) markers and the 111 (94 used), I get the same number of generation at 113.2 and 113.6 respectively. P312 has 127.1 with Busby's markers used.

I would say L21 is an older clade than U106 by about a dozen generations.


MJost







Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 10, 2012, 10:28:14 PM
I would say L21 is an older clade than U106 by about a dozen generations.

I'm not sure how precise we can get, but I've consistently gotten that P312 is older than U106 (STR diversity wise) and that U152 is as old as P312 with L21 quickly behind.....  so this actually makes sense if you are getting them roughly the same age.

Anyway, it just feels good to find someone else seeing the same thing.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 10, 2012, 10:59:43 PM
...so this actually makes sense if you are getting them roughly the same age.

Anyway, it just feels good to find someone else seeing the same thing.



Not using 17 MC markers I got 67 markers HTs for P312 x152 at 121.1  +-33.0 gen's and
U152 at 123.7 +-33.4 generations old.

So U152 is just barely older than P312. P312 and U152 were spawned about the same time.

Then L21 at 113.6 generation ago with U106 at  101.4 generations. There's a difference of about 300 to 400 years.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: razyn on October 11, 2012, 12:59:02 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on October 11, 2012, 03:47:11 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 11, 2012, 03:54:48 AM
U106 is only 25%  of R1b and has as many SNPS downstream and more wide spread.That for me makes it older. Secondly a 50 year old man has a 37,67, and 111 test. Nomatter which formula you use his age will  still be 50.M269 was supposed to be born in the Neolithic 9000 ybp.



I would say L21 is an older clade than U106 by about a dozen generations.

I'm not sure how precise we can get, but I've consistently gotten that P312 is older than U106 (STR diversity wise) and that U152 is as old as P312 with L21 quickly behind.....  so this actually makes sense if you are getting them roughly the same age.

Anyway, it just feels good to find someone else seeing the same thing.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Richard Rocca on October 11, 2012, 07:53:40 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)

Additionally, P312* could still contain many younger subclades that we don't yet know about therefore making it appear slightly younger than U152.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 11, 2012, 08:10:02 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.

That was for Stoneman :))) as why I added the SDs.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 11, 2012, 08:34:41 AM

Additionally, P312* could still contain many younger subclades that we don't yet know about therefore making it appear slightly younger than U152.

You bring up a good point. Groups of large younger subclades do affect the variance results. using 111 markers, L21 All n=1020 had a variance of 25.99 but when I removed the Large M222 subclade leaving n=873 it gave a smaller variance of 25.65. The generation difference was only one.

MikeW was correct in calculating the interclades between subclades and reporting that age data.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 11, 2012, 08:46:27 AM
U106 is only 25%  of R1b and has as many SNPS downstream and more wide spread.That for me makes it older. Secondly a 50 year old man has a 37,67, and 111 test. Nomatter which formula you use his age will  still be 50.M269 was supposed to be born in the Neolithic 9000 ybp.


The problem with SNPs is that they have very low mutation rates. This makes them very useful for very deep SNP lineages (thousands of generations), but not very informative for very recent genealogies (tens to hundreds of generations). Hence, SNP markers are not very informative.

It is likely that mutation rates differ at microsatellites those with the highest mutation rates, these will provide the most information.

I dont think there is enough data points available to calculate ages from SNPs across short periods of time.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 11, 2012, 10:53:31 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)

Anatole K often confuses people on this. I think he has even said U152 is older than P312.
Technically, we can't really say that, but this is even more than a margin of error thing.

This is why I use the words "U152 appears to be as old as P312." U152 had to have occurred after P312. It's younger from a first occurence viewpoint. It has to be.

However, U152 could still have higher STR diversity than P312 and this just be a margin of error problem.

Another situation could be that U152's TMRCA (most recent common ancestor of all surviving U152 people) is older than the TMRCA for P312*. This is quite possible and not illogical.

Another situation which also relates to error margins is that the STR diversity for all of P312 might be lower than all of U152 for reason of bias in the data.  Let's say L21 is significantly younger than U152 and there are several more times L21 in the data than U152. The data is not representative and L21 could be dragging P312's diversity down a bit.

This shows us why interclade calculations are better than intraclade.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on October 11, 2012, 11:04:06 AM
So U152 is just barely older than P312.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring?  I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)

Anatole K often confuses people on this. I think he has even said U152 is older than P312.
Technically, we can't really say that, but this is even more than a margin of error thing.

This is why I use the words "U152 appears to be as old as P312." U152 had to have occurred after P312. It's younger from a first occurence viewpoint. It has to be.

However, U152 could still have higher STR diversity than P312 and this just be a margin of error problem.

Another situation could be that U152's TMRCA (most recent common ancestor of all surviving U152 people) is older than the TMRCA for P312*. This is quite possible and not illogical.

Another situation which also relates to error margins is that the STR diversity for all of P312 might be lower than all of U152 for reason of bias in the data.  Let's say L21 is significantly younger than U152 and there are several more times L21 in the data than U152. The data is not representative and L21 could be dragging P312's diversity down a bit.

This shows us why interclade calculations are better than intraclade.


Yep I agree with all of that.

Something a lot of people seem to have trouble with is just how tenuous Y lines can be, it's not at all unreasonable for even quit old and large families to disappear without a trace or indeed only leave one surviving line.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 11, 2012, 11:42:01 AM
This shows us why interclade calculations are better than intraclade.

Thanks for providing a deeper review.

The whole point of my posting ages of various clades using several different number of markers, Busby 15, 111 markers, etc. is to show the ages are reasonably set using subclade comparisons.

I didn't create the variance calculation engine nor the mutation rates used. I just expanded the output to report standard deviation information and removed several more multi-copy markers that should have not been used. I also used internal Excel functions to guarantee a better numeric performance and modified the interclade results using a Pooled SD formula.

Using 111 markers appears to improve confidence over 67 marker after removing multi-copy markers.

Of course a son can not be older than the parent but overlapping confidence level (1-sigma at 68.27%) represents the unknown difference in ages as a fact but reasonably assumed. Maybe if one of the two clades, using a 95.5% confidence of +2 SD and the other at -2 SD and there still is a difference between them then we have a very solid assurance that the ages are correct via statistical significance.

It is tempting to look at whether two overlap or not, and try to reach a conclusion about whether the difference between means is statistically significant or not.

This statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance.

Useful rule of thumb: If two 95% CI error bars do not overlap, and the sample sizes are nearly equal, the difference is statistically significant with a P value much less than 0.05 (Payton 2003).

I was working to include T Test in the Interclade section as the t test takes into account sample size.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 11, 2012, 02:01:08 PM
There was some question as to why Ken's Age Generator is producing younger ages. The mutation rates are from Marko Heinila that Ken utilized. 

Marko could explain in much more detail as to his specific methodology. But I did have a question or two so I asked Marko to explain how and why of his 67 marker study appears different then his newest 111 marker. He explained in a recent email that:


"The 111T mutation rates are certain kind of averages for the about
 4,000 sample dataset of 111 STR data.  This is based on method
 (somewhat similar to Chandler's) that does not  put critical
 importance on tree construction accuracy except to reduce statistical
 double counting in some degree in case of the "weighted-pair" estimate
 set.  With the 111 rate estimates, it is assumed that per locus rates
 are approximately constant.  The method is indifferent to multisteps.
 
The webpage that considers linearity of the 67 set, uses much more
 complex mutation models than the 111 estimates. The difference is that
 the mutation rate is assumed to depend on repeat number unlike in the
 case of the 111 estimates.    Accurate tree constructions are needed
 in this case to find the related parameters. It is also necessary to
 model the multisteps and to be able to date the trees accurately
 enough. The 67 estimates there reflect the behavior in the estimated
 67 STR trees.
 
The comparison of these two sets of results obtained by very different
 methods gives a rough idea of the uncertainties coming from the
 detailed mutation modeling and also says something about tree
 constructions.  (The tree estimation problem is much more complex than
 the mutation rate estimation problem.)..."


Until someone else completes a much larger 111 marker mutation rate study with similar methods, I feel that Marko's estimation methods and results should be considered as a standard to be utilized.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 12, 2012, 12:42:43 PM
[quote author=Mark Jost link=topic=10513.msg141028#msg141028 date=
Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate.
MJost

This is a fundamental flaw in the process. Assuming a reasonable estimate of the mutation rates for the individual markers is known, why would you lump them all together to produce a single constant?

The model uses statistical analysis to determine the variance of each marker then proceeds to use simple arithmetic for the final step. Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: razyn on October 12, 2012, 01:44:59 PM
Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

To the extent that this is true (and I'm not technically competent to argue it) there's also an effect of whether a marker is moving up or down; it can mutate in either direction.  The "modal" numbers are based on whatever direction the markers took that were most successful in reproducing males -- and therefore have the present day appearance of having been the prehistoric norm.  And I've seen no scientific reason really to believe that.  The WAMH, etc. may or may not have been modal a few thousand years ago, whenever pappy L11 (as one example) had his sons.  It's now modal for a majority of their West Atlantic survivors.

This post doesn't mean to deny that pappy L11 had a haplotype -- but questions how sure we can be what that was, until we have had the opportunity to dig up a few more really old guys, and test their Y-DNA to a phylogenetically meaningful level.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 12, 2012, 02:24:48 PM
The basic idea of how to combine input variances and arrive at the variance in the output, you add the variances. Then you take the square root of the resulting variance to compute the standard deviation. But this additive property relies on the linearity of the equation relating the distributions together. KenN removes Mutli-copy marker that do affect linearity and I and MikeW agree. Removing fast muatators at the 67 and 111 marker level could further smooth out lines but you would be removing data points that can provide additional history.

When considering linearity and the coefficients in the transfer function, involves trigonometric or higher order functions.  Now we need a Six Sigma or Stats person to explain more and devise a more complete testing .

Kenneth Nordtvedt told me that 'Since we don’t know the internal structure of the general interclade tree, we can’t actually produce the SD; we can only give a most pessimistic case.'


MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 12, 2012, 03:14:37 PM
[quote author=Mark Jost link=topic=10513.msg141028#msg141028 date=
Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate.
MJost

This is a fundamental flaw in the process. Assuming a reasonable estimate of the mutation rates for the individual markers is known, why would you lump them all together to produce a single constant?

The model uses statistical analysis to determine the variance of each marker then proceeds to use simple arithmetic for the final step. Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

After you read my last post on summing variances then:

Yes you could calculate the age of each marker taking each variance and divided by the markers mutation rate. But the you have to add each marker's result and divide by the number of markers to get to an average age. But you are left with the old method to calculate SD using the sum of variances.

Using 67 marker haplotypes, the number of generation for L21 (n=1020) and P312xL21 (n=1638)  is 143.5 and 174.6 respectively using your suggest method. The basic way of summing the variance produces 114.7 and 122.3 generations.

Why do you feel yours is the correct method?


MJost




Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 12, 2012, 03:30:07 PM
The Model does not calculate the estimated age of a Haplogroup i.e SNP or cluster. It calculates the estimated age of a group of Haplotypes - a sample of the Haplogroup in question. The result will only be representative of the Haplogroup if the sample is representative of the Haplogroup, and the assumed individual marker mutation rates are also representative. The question of whether or not the assumed mutatation rates are representative is easily cross-checked by adding a line into the Calculation as follows.

For each Marker calculate:- Variance of marker/Assumed Mutation rate for the marker.

The result is the estimated age in generations predicted each individual marker.

The accuracy of the assumed mutation rates can be determined by the consistency of the ages according to the individual predictions.

As I have previously posted, the process of dividing the sum of the Variances by the sum of the Mutation rates is fundamentally flawed.

   


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 12, 2012, 04:04:23 PM
... As I have previously posted, the process of dividing the sum of the Variances by the sum of the Mutation rates is fundamentally flawed.
 

I agree with you in that logically it seems that multiplying by an aggregate mutation rate, which most TMRCA programs do, is flawed.

However, when that is objection is brought up, like to Anatole K, you get a response that their simulations show this has a very insignificant impact.

I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 12, 2012, 04:52:13 PM
The Model does not calculate the estimated age of a Haplogroup i.e SNP or cluster. It calculates the estimated age of a group of Haplotypes - a sample of the Haplogroup in question. The result will only be representative of the Haplogroup if the sample is representative of the Haplogroup, and the assumed individual marker mutation rates are also representative. The question of whether or not the assumed mutatation rates are representative is easily cross-checked by adding a line into the Calculation as follows.

For each Marker calculate:- Variance of marker/Assumed Mutation rate for the marker.

The result is the estimated age in generations predicted each individual marker.

The accuracy of the assumed mutation rates can be determined by the consistency of the ages according to the individual predictions.

As I have previously posted, the process of dividing the sum of the Variances by the sum of the Mutation rates is fundamentally flawed.
  
Ok, I am not a math guy. Here is Ken's paper on the subject to review.

http://www.jogg.info/42/files/nordtvedt.pdf

Probability is the main factor to consider.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 12, 2012, 04:59:45 PM
I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 12, 2012, 06:20:57 PM
I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Jdean on October 12, 2012, 06:25:56 PM
Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

To the extent that this is true (and I'm not technically competent to argue it) there's also an effect of whether a marker is moving up or down; it can mutate in either direction.  The "modal" numbers are based on whatever direction the markers took that were most successful in reproducing males -- and therefore have the present day appearance of having been the prehistoric norm.  And I've seen no scientific reason really to believe that.  The WAMH, etc. may or may not have been modal a few thousand years ago, whenever pappy L11 (as one example) had his sons.  It's now modal for a majority of their West Atlantic survivors.

This post doesn't mean to deny that pappy L11 had a haplotype -- but questions how sure we can be what that was, until we have had the opportunity to dig up a few more really old guys, and test their Y-DNA to a phylogenetically meaningful level.

If you calculate the modal values for U106, U152 and L21 over 67 loci you will find the results are very similar, WAMH may not be the exact ancestral values but it can't be far off !!


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: alan trowel hands. on October 12, 2012, 06:26:12 PM
Saw this on Dienenes blog

http://dienekes.blogspot.co.uk/2012/10/ann-gibbons-on-slower-mutation-rate.html



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 13, 2012, 05:53:57 PM
I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 13, 2012, 08:20:28 PM
I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample

Thank you for the analysis. You have legitimate concerns. Please bring them up with the author of the tool, Ken Nordtvedt. He frequents this Hg I forum and he will respond.
http://archiver.rootsweb.ancestry.com/th/index/Y-DNA-HAPLOGROUP-I


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: stoneman on October 14, 2012, 05:28:23 AM
I'm not sure how Ken Nordtvedt's latest TMRCA does it.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.  

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

In my accounting 101 class,  I still remember the first item taught. It was the concept of materiality. If its not material, we don't need to report it.
http://www.dwmbeancounter.com/tutorial/theorybook.html

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

1a) MRCA estimate = 138 generations = 4140 YBP with a 30 year generation interval.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

1b)MRCA estimate = 180 generations = 5400 YBP with a 30 year generation interval.

I consider this to be a significant difference.

Repeating the analysis using Marko Heinila mutation rates produces the following results.

2a) MRCA estimate = 163 generations = 4890 YBP with a 30 year generation interval.

2b) MRCA estimate = 181 generations = 5430 YBP with a 30 year generation interval.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample


The 5400 ybp for L21 looks more realistic and that means they could have been involved with the building of Newgrange.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: alan trowel hands. on October 14, 2012, 07:40:42 AM
I am not remotely mathematical but that sort of change to the method seems to make sense to me.  Can anyone with a maths brain please comment on this. 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 14, 2012, 03:20:21 PM
Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 15, 2012, 04:00:17 PM
Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 15, 2012, 06:32:20 PM
Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 16, 2012, 12:00:24 AM
....
G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         
...
Gen111TMod using a set of 111 marker haplotypes....
Notice the tighter StdDev In Generations spread at 111 vs 67.
....
111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095

I take it you feel pretty good about the 3400 years before present?  ... so that gets us to 1500 BC and maybe 2000 BC.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 16, 2012, 08:47:23 AM
....
G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         
...
Gen111TMod using a set of 111 marker haplotypes....
Notice the tighter StdDev In Generations spread at 111 vs 67.
....
111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095

I take it you feel pretty good about the 3400 years before present?  ... so that gets us to 1500 BC and maybe 2000 BC.

Unless someone comes up with another method, then yes it is very likely with a max of 4,069 years using the data's standard deviation. What I didnt show is, that using a confidence level of 95.45% has a +-987 YBP, adding a just 300 years to the best calculated probablity of +-668 years.

Just to test a question I had, I removed the two fastest markers in the 68-111 panel, STR's 712 and 710, produced these numbers and effectively, did not change the number of generations, only the variance change causing the STD Dev in Generations to increase. I was expecting this GenSD to decrease not the opposite to occur.

Generations   StdDevInGen   YBP   +-YBP
113.4   24.2   3,402.6   724.9


MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 16, 2012, 09:02:18 AM
Also here is a comment worth repeating.

"Finally, note that the resolution for t offered by using
n=5 markers is very poor, but rather fine precision is
offered by using 100 markers." (t = time)


"Estimating the Time to the Most Recent Common Ancestor for
the Y chromosome or Mitochondrial DNA for a Pair of Individuals"

Genetics Society of America
Bruce Walsh
March 22, 2001


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on October 16, 2012, 03:32:16 PM
Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost



Mark,
If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)
2) Copy the function to all markers on row 550 from D550 to BR550
3) Insert in cell BS550 the function =AVERAGE(C550:BR550)
Cell BS550 displays the MRCA in generations.

In our example MRCA estimate Ga = 178 (5340YBP) a significant variation from the above result.

This procedure also allows us to examine how the individual markers in the Haplogroup
have behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

The results using Marko H mutation rates do not differ significantly.



 


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on October 16, 2012, 04:58:01 PM
If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

Autochthon, since Ken Nordtvedt is the author of this methodology please go on to the forum that he is on and explain the problem you are seeing.
http://archiver.rootsweb.ancestry.com/th/index/Y-DNA-HAPLOGROUP-I

I'm sure he will answer.

Although you could not couch your position in terms of Ken's tool, you could describe the problem generically on this forum below and Anatole Klyosov will respond. If he doesn't, I'll ask him to. I think he generally does the same thing.
http://archiver.rootsweb.ancestry.com/th/index/GENEALOGY-DNA


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mark Jost on October 16, 2012, 05:54:12 PM
I assume Ken's Gen111T engine which uses variance and mutation rates generation formula have been tested against paper lineages.

The estimator sheet I modified did use a per marker age formula and the sum was much higher than using the existing method and is in a row 30 and 31 called CladeAmarkerGen.


You will need to take your hypothosis and prove it to a know paper trail with plenty of documented haplotypes.

MJost


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: ironroad41 on October 18, 2012, 12:36:10 PM
Just to compare results between Ken's various versions of his Gen engines and the my mod.

G Coalescence Age = Variance of Whole Population (n)         
G Founder's Age = Variance from Modal (sample n-1)         

I used 25 HTs for all four.

Gen6         GCoal = 98.2   GFounders = 111.8  M = 0.12393  (11 unused markers) 385,389i,459,464,CDY removed

Gen7.1       GCoal = 97     GFounders = 110    M = 0.12635  (11 unused markers) 385,389i,459,464,CDY removed

Gen111T      Gcoal = 101.98 GFounders = 116.2  M = 0.119669 (11 unused markers) Marko MR  385,389i,459,464,CDY removed

Gen111TMod   Gcoal = 105.7  GFounders = 111.0  M = 0.111128 (17 unused Markers) Marko MR Chg rate for s/b 389b's rate. 385,389i,459,464,CDY,YCAII,395S1 & 413 Removed. Excel Functions used.



Gen111TMod using a set of 111 marker haplotypes then reduced to 67 markers. Notice the tighter StdDev In Generations spread at 111 vs 67.

67(50)Markers   Sheet  Mutation Rate: 0.11113   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers using 67)     114.6   32.1

YBP   +-YBP   YBPMax   VAR   SD
3,438.8   963.5   4,402.3   12.738   3.569


111(94) Markers   Sheet  Mutation Rate: 0.22894
   
YrsPerGen*   Count   Founder's Age   Generations   StdDevInGen
30   N=1048   L21 ALL (111Markers)     113.4   22.3

YBP   +-YBP   YBPMax   VAR   SD
3,401.3   667.6   4,068.9   25.956   5.095



Mark,
So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA  G=133
SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3
SigmaGA = 10.244

Yep we are.

MJost



Mark,
If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)
If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.
This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)
2) Copy the function to all markers on row 550 from D550 to BR550
3) Insert in cell BS550 the function =AVERAGE(C550:BR550)
Cell BS550 displays the MRCA in generations.

In our example MRCA estimate Ga = 178 (5340YBP) a significant variation from the above result.

This procedure also allows us to examine how the individual markers in the Haplogroup
have behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

The results using Marko H mutation rates do not differ significantly.



 

I have been following this entry and would like to support your position.  The present variance approach, in my judgment, underestimates TMRCA's.  I've said this for quite a period of time, but it is difficult to prove because examples are rare where known dates are available.

Goldstein and stumpf in their review paper published in Science, March 2001, used a different approach.  For each dys loci they computed the TMrCA by dividing the ASD by the mutation rate for that locus.  This weights each loci equally.

The problem has been identified, but not accepted, by this community.  The data shows that there is very little variance contributed by the slower mutators, they mutate around the modal.  Faster mutators have a limited range of values they can assume, greater than +/- 1.  Therefore they contribute some ASD/variance.  However they saturate after a while also.  In my opinion Variance/ASD does not model the mutational process.

What one needs to do is count mutations, but that is very difficult due to hidden mutations for fast and medium mutators.  For longer durations, slow mutators can be used, but this requires care.

I hope you continue this effort you have initiated.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Mike Walsh on November 07, 2012, 05:51:14 PM
I thought it was interesting so I'll just archive it here.

Quote from: Anatole Klyosov
Subject: Re: [DNA] DNA] Coincidental convergence (or lack of divergence)
Quote from: Pietrzakstan
> How it is possible to count mutations between the two older haplotypes,
> if in a period of about several thousand years in one rapidly mutating
> locus
> may be 5-10 mutations?
> How it is possible to count mutations, if they are parallel in the same
> locus in both haplotypes?
> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "in
a period of about several thousand years". Who on Earth would want to do it
and for what purpose? Mutations are governed by statistics, and two
haplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just exclude
them. As simple as that. For example, for thousands years back I employ 22
marker haplotypes, in which one mutations happens in several thousand years.
This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.
2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to a
common ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the same
locus in both haplotypes?" - please elaborate. Your question is hard to
understand. However, please remember that you cannot work reliably with two
haplotypes. You cannot toss a coin two times only and hope to calculate
something out of this "statistics".
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352323225

I agree with Anatole's general point here that doing things like TMRCA estimates between just two people is not reliable. You need more data to apply statistical averages.

I don't agree or disagree on his point about the 16 slowest markers, but apparently he thinks they have enough linear duration to be linear for a few million years.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Maliclavelli on November 07, 2012, 10:25:41 PM
I was writing about this from many years, but my principles were three (at least):

1)   mutations happen around the modal
2)   there is a convergence to the modal as time passes
3)   sometime a mutation goes for the tangent

DYS391 mutates above all around 10 and 11 values
DYS439 mutates above all around 11-12-13 etc
Of course I am speaking of hg. R, but the same principle explains all the other haplogroups, which diverged only because they started from a different values gone for the tangent, but frequently from the same value and these values are almost the same also on very distant haplogroups.

My theory of the ancientness of hg. R in Europe presupposes this and I think it will come out winning.



Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Autochthon on November 10, 2012, 06:26:15 AM
I thought it was interesting so I'll just archive it here.

Quote from: Anatole Klyosov
Subject: Re: [DNA] DNA] Coincidental convergence (or lack of divergence)
Quote from: Pietrzakstan
> How it is possible to count mutations between the two older haplotypes,
> if in a period of about several thousand years in one rapidly mutating
> locus
> may be 5-10 mutations?
> How it is possible to count mutations, if they are parallel in the same
> locus in both haplotypes?
> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "in
a period of about several thousand years". Who on Earth would want to do it
and for what purpose? Mutations are governed by statistics, and two
haplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just exclude
them. As simple as that. For example, for thousands years back I employ 22
marker haplotypes, in which one mutations happens in several thousand years.
This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.
2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to a
common ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the same
locus in both haplotypes?" - please elaborate. Your question is hard to
understand. However, please remember that you cannot work reliably with two
haplotypes. You cannot toss a coin two times only and hope to calculate
something out of this "statistics".
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352323225

I agree with Anatole's general point here that doing things like TMRCA estimates between just two people is not reliable. You need more data to apply statistical averages.

I don't agree or disagree on his point about the 16 slowest markers, but apparently he thinks they have enough linear duration to be linear for a few million years.
His method is relatively simple which seem to go as follows.
A) Choose a set of appropriate markers to suit the approximate age of the Haplogroup, fast markers for recent, slow markers for several thousands of years and very slow markers for longer periods.
B) Discard any of the markers where the mutations on an individual allele are suspected of having gone none-linear due to back (reverse) mutations.
C) Count the total number of mutations from the modal of the Haplogroup applicable to the chosen markers.
D) Apply a constant to the total number of mutations for back (reverse) mutations which is derived from probability calculations to give a new (increased) total.
E) Divide the New total of mutations by a single mutation rate derived from haplogroups of "known?" age/generations. The result is the TMRCA in generations.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: seferhabahir on November 11, 2012, 02:00:40 PM

There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: razyn on November 11, 2012, 02:20:25 PM

There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: alan trowel hands. on November 11, 2012, 07:11:40 PM

There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028


That sounds very promising.  What would the implication of this be for R1b?


Title: Re: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
Post by: Heber on November 11, 2012, 08:04:17 PM

There is another interesting conversation (and some agreements) between Ray Banks and Anatole Klyosov re new data for using SNP counts to estimate time intervals posted today on rootsweb.

http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-11/1352614028


That sounds very promising.  What would the implication of this be for R1b?

Indeed, it looks like progress. I retained a basic formulae of 50 years per SNP. As R1b has the highest density of SNP on the Phylogenetic tree, it is a matter of counting SNP between clades and multiplying by 50 to estimate the age. A nice simple rule of thumb. I like the fact that Anatole is broadly agreeing with the methodology which gives me greater confidence re the checks and balances.
As we are currently experiencing a rapid expansion in the Phylogenetic tree and number of new SNPs discovered, this will be of great benefit in calculating rough migration routes and timelines.
We should get an updated tree (I hope) in the next few weeks with the release of Geno 2.0.
That should be a good opportunity to test the theory.