World Families Forums - STR Wars: Is diversity meaningful? more meaningful than Hg frequency?

Welcome, Guest. Please login or register.
July 25, 2014, 10:29:42 AM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
« previous next »
Pages: 1 ... 3 4 [5] 6 7 ... 14 Go Down Print
Author Topic: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?  (Read 17287 times)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #100 on: April 23, 2012, 11:29:43 AM »

.... We will never have enought data, but sufficient data has been collected, both germ-line and family data sets to make observations about the "real" properties of the Y STR mutational process.

I agree we don't have enough data on the real properties of these Y STRs.

I do not believe your subject question can ever be rationally answered until these properties are understood and agreed upon and a model created using these properties.

I agree we can't prove anything beyond a reasonable doubt but I think there have been enough simulation runs, and the statistical modeling is improving so we are getting some useful results.  My thinking is that something is better than nothing, but we should have no illusions of final answers.
Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #101 on: April 23, 2012, 11:33:49 AM »

Quote from: Mikewww
Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.
.  That comment may be true for a DYS loci like CDYa,b where over a few thousand year, many mutations will have occurred.  Most DYS loci have very few mutations relative to CDYa,b and a mutation is a low probability event.  But, the point is that there are no range of values for most dys loci, unlike CDY a,b.  If a mutation from the modal has occurred, then the most probable next event is a mutation back to the modal. Again, this can't be modelled by a random walk process, which I believe assumes that the process is unbounded?
STR diversity is used in study after study but I don't know of any study that says a back mutation towards the modal is more probable than another mutation away from it.   The exception is an STR that is at the high end of the full allele range from an absolute STR count.... which I interpret as STR counts in the 30's or approaching the 30's, primarily.

I do calculate variance using only the linear markers (according to Heinila) as well as mixed but I don't really see that it makes much difference, for R1b subclades anyway.  I don't want to misrepresent any of this, though. I don't think we have a lot of precision.
« Last Edit: April 23, 2012, 11:37:34 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #102 on: April 23, 2012, 12:01:30 PM »

Quote from: Mikewww
Why? I agree there are back-mutations, but so what?  John Chandler has told us that mutation rate calculations and variance account for back mutations.
.  That comment may be true for a DYS loci like CDYa,b where over a few thousand year, many mutations will have occurred.  Most DYS loci have very few mutations relative to CDYa,b and a mutation is a low probability event.  But, the point is that there are no range of values for most dys loci, unlike CDY a,b.  If a mutation from the modal has occurred, then the most probable next event is a mutation back to the modal. Again, this can't be modelled by a random walk process, which I believe assumes that the process is unbounded?
STR diversity is used in study after study but I don't know of any study that says a back mutation towards the modal is more probable than another mutation away from it.   The exception is an STR that is at the high end of the full allele range from an absolute STR count.... which I interpret as STR counts in the 30's or approaching the 30's, primarily.

I do calculate variance using only the linear markers (according to Heinila) as well as mixed but I don't really see that it makes much difference, for R1b subclades anyway.  I don't want to misrepresent any of this, though. I don't think we have a lot of precision.

I refer again to the only significant data set on Y STR frequencies at:  Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.  The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  Thats why he insists on including the faster mutators whose path through time may more resemble a random walk until they saturate ( hit an upper or lower bound) at which time they bounce back and forth within the set of values permissible for these dys loci.  Unless a multistep occurs, it appears that most (59 out of 67) act the same way.

In the table I reference for dys loci 393 there are 23k samples with the range of values of 13 +/-1 for R1b  The data is similar for all seven Hg's at 393. (95% or more of the values are modal +/-1).  So are 390,19, 391,  and so on.

I believe that TMRCA estimates and diversity are underestimated.

 I showed how unique mutation event add mutations, i.e. variance/diversity.  So, what I have to show is that there is something in the analysis process that decreases variance across a wide set of dys loci, I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.
« Last Edit: April 23, 2012, 12:05:14 PM by ironroad41 » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #103 on: April 23, 2012, 12:21:52 PM »

...
I believe interclade calculations, like most of the Variance calculations only consider coalescent time, the time to the build-up of the population, it doesn't penetrate the "disaster" and give real TMRCA.  So, the value of the interclade estimate will be dependent on the population history under examination.  I can certainly stand correction if this observation is incorrect?

We may be looking for two different things.  I'm generally not that interested in who the first person born with a particular SNP was or where he was.  That is a very difficult thing to ever find out --- it would just about take the father without the SNP and the son with the new SNP buried at the same grave site.

I'm generally intending to try to understand populations expansions and movements. In that case, I don't really care if there was a disaster bottleneck (although I think that is an interesting topic in and of itself) or if the time of coalesence were the actual first people with the SNP.

What's important is the coalescence or "coming together backward in time" of the STR diversity to a GD=0. That's the approximate time of the most recent common ancestor. It's just an approximation though.  We know the SNP can't be any younger than this time.

The cool thing about interclade calculations is they filter out the bias that intraclade calculations have towards the largest sub-populations samples (since intraclades are just averages.) If the two clades in an interclade calculation are of roughly the same age the precision can be relatively great. Of course we have to know they are two separate clades. The phylogenetic tree of SNPs provides the framework.

If we look at a number of interclade calculations in context of each other, we are effectively cornering in the upper and lower bounds of the subclades. http://tech.groups.yahoo.com/group/R-P312Project/files/Haplogroup_Timeline_R-L11_Subclades.gif   The chart is based on Chandler/Little mutation rates so you could argue the years should be rescaled but the relative nature of L11 subclades won't change. The methodology is Nordtvedt's.
« Last Edit: April 23, 2012, 12:49:25 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #104 on: April 23, 2012, 12:45:54 PM »

...
I refer again to the only significant data set on Y STR frequencies at: http://Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.

I don't know how old the data is for the frequency chart you cite, but I have all the detail on a large number of confirmed P312 people in files posted at the L21 and P312 Yahoo groups.  I just checked those files.  Out of the first 67 FTDNA STRs, only DYS472 was restricted to its mode or +1 to -1 the mode.

That's just for P312 deep clade tested people I can find in FTDNA projects. If you look at all the haplogroups and all of the data FTDNA has I don't think what you are saying is true.   ... but why FTDNA bother to keep an STR had had no variance?  It would be a dead STR, useless.

I also think you are pointing out the importance of having more STRs (more individual experiments) in the calculations. Any one STR might be aberrant but if we look at populations and use statistics to take advantage of the law of large numbers, we can still find value.

Quote from: ironroad41
The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  
I agree that it is important not to measure hours with a calendar, which is what you'd be doing if you rely on STRs that don't hardly move.

I don't think you can say Ken Nordtvedt hasn't considered this though. I can't find the posts but I know he has run simulations with different sets of markers and concluded that the loss of precision from excluding many of the faster markers was greater than the risks run by so-called saturation.

Quote from: ironroad41
I believe that TMRCA estimates and diversity are underestimated.

It could be but the primary controversy there are the mutation rates - germ-line versus evolutionary.  This has no impact on the relative positioning of one one haplogroup to another according to their STR variance.

Quote from: ironroad41
I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.

Do you have any studies or analysis that this is true?  
« Last Edit: April 23, 2012, 12:48:15 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #105 on: April 23, 2012, 12:55:48 PM »

Mikewww says: “Do you have any studies or analysis that this is true?”

You asked the same to me. And the same is the answer.

Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #106 on: April 23, 2012, 12:56:27 PM »

Mikewww says: “Do you have any studies or analysis that this is true?”

You asked the same to me. And the same is the answer.
Do you speak for Ironroad?

BTW, I'm sorry but I lost track of your answer.  At least on this thread.  I think you generally believe STR mutations revolve around the modal, but I don't believe it just because you do.

I interpret your belief as advocating the trashing of Y STRs as far as usefulness for the molecular clock concept. There is a large scientific community and commercial testing community that are not in agreement with you.   Even legally, I think what you advocate could mean that paternity tests are useless because of convergence back to the identical.
« Last Edit: April 23, 2012, 01:01:33 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #107 on: April 23, 2012, 01:00:36 PM »

I don't know Ironroad, but it seems to me that what he says is what I am saying from many years, and also to sacred monsters like Nordtvedt, Klyosov and many "professionists" if we are amateurs, and the numerous peer review papers I have destroyed in these last years are demonstrating this.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #108 on: April 23, 2012, 01:03:54 PM »

I don't know Ironroad, but it seems to me that what he says is what I am saying from many years, and also to sacred monsters like Nordtvedt, Klyosov and many "professionists" if we are amateurs, and the numerous peer review papers I have destroyed in these last years are demonstrating this.

Who can argue with a giant slayer like yourself?  Particularly one who can interpret others' thoughts and speak on their behalf.
Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #109 on: April 23, 2012, 01:06:41 PM »

I am not able to write to you if you add answer to answer. I am not fluent in english like you, but you should know that my principles aren't only the mutations around the modal, but also the convergence to the modal as time passes and that sometime a mutation goes for the tangent. There are then the outliers, like that R-Z253 which falsifies all your calculations.
« Last Edit: April 23, 2012, 01:07:33 PM by Maliclavelli » Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #110 on: April 23, 2012, 01:10:58 PM »

I am not able to write to you if you add answer to answer. I am not fluent in english like you, but you should know that my principles aren't only the mutations around the modal, but also the convergence to the modal as time passes and that sometime a mutation goes for the tangent. There are then the outliers, like that R-Z253 which falsifies all your calculations.
It looks like between convergence and tangents you've got all the bases covered.  That's a good plan your part.  I accept that you have different perspectives on STR mutation stuff. That's fine. You could be right.

However, then you lost me.  Can you be more specific on what Z253 falsifies?
« Last Edit: April 23, 2012, 01:12:24 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #111 on: April 23, 2012, 01:16:41 PM »

From another thread:

Quote from: ironroad41 on Today at 08:21:46 AM
I looked at your age estimate for Z253 Mike, and I know it is consistent with your other work, but as I've noted before, I have a great deal of difficulty reconciling the haplotype I have z5hg3 (Ysearch) with your age estimates of this subclade.  I know I don't fit the mold, but I have been tested positive for this SNP.

Of course if you are R-Z253 and your values don’t fit with the clade Mikewww has individuated for this subclade, it does mean that that clade was one of the clades of R-Z253 and yours is the witness of the fact that Z253 is more ancient than Mikewww thinks. The subclades of R-L21 have a casual order in the haplotree and we don’t know which is more ancient. Your values are the classical “outlier”, and the outliers are the witness of the mutations that a haplogroup has had beyond the lines extinct and the clades mutated around the modal. Every subclade like every haplogroup has a modal which is a fiction, till the outliers like yours demolish it.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #112 on: April 23, 2012, 01:19:26 PM »

...
I refer again to the only significant data set on Y STR frequencies at: http://Http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/yfreq.htm.  Under the assumptions I stated previously about multi-steps, you will observe that most STR's are bounded by the Modal, +/_1.

I don't know how old the data is for the frequency chart you cite, but I have all the detail on a large number of confirmed P312 people in files posted at the L21 and P312 Yahoo groups.  I just checked those files.  Out of the first 67 FTDNA STRs, only DYS472 was restricted to its mode or +1 to -1 the mode.

That's just for P312 deep clade tested people I can find in FTDNA projects. If you look at all the haplogroups and all of the data FTDNA has I don't think what you are saying is true.   ... but why FTDNA bother to keep an STR had had no variance?  It would be a dead STR, useless.

I also think you are pointing out the importance of having more STRs (more individual experiments) in the calculations. Any one STR might be aberrant but if we look at populations and use statistics to take advantage of the law of large numbers, we can still find value.

Quote from: ironroad41
The drunkards walk model, which underlies the variance/ASD model doesn't treat the case where the set of values obtained in the process are bounded.  This is an important point and I don't believe Nordtvedt considered it.  He needs increasing variance to make his model make sense.  
I agree that it is important not to measure hours with a calendar, which is what you'd be doing if you rely on STRs that don't hardly move.

I don't think you can say Ken Nordtvedt hasn't considered this though. I can't find the posts but I know he has run simulations with different sets of markers and concluded that the loss of precision from excluding many of the faster markers was greater than the risks run by so-called saturation.

Quote from: ironroad41
I believe that TMRCA estimates and diversity are underestimated.

It could be but the primary controversy there are the mutation rates - germ-line versus evolutionary.  This has no impact on the relative positioning of one one haplogroup to another according to their STR variance.

Quote from: ironroad41
I believe that mutations to the modal, either forward or backwards occur much more frequently than mutations greater than +/-1.

Do you have any studies or analysis that this is true?  

I briefly looked at the the P312 data for 393.  What is needed is a distribution, by number for each dys loci.  My observation is that 95% of the entries will be modal, +/- 1.  That leaves room for the 5% multisteps observed.  This would entail counting the number of each entries values and plotting them as the reference I cited did.  Note many dys loci appear tighter than 5%.
re: mutation rates.  I have used chandler and a more recent set of 110 published on-line (burgarella).  Since chandler crossed boundaries of Hgs, I am suspect of his value of 388 say.  I think the Burgarella rates are certainly valid for many of the applications we look at, such as Clan Gregor founder.  I'm inclined to think at present that Zhivotovsky's fudge factor may be due to hidden mutations?

one other observation is that for the 7 Hgs studied in the data set I referenced, there is no obvious change in modal value with time for a dys loci.  A change appears from time to time, but it appears to be due to a multistep mutation and then the drift over time for the Hg is back to a common modal.  This may be due to the chemical kinetics Klyosov talks about?  The point is we don't see over time as we evolve from Hg E to the R's any significant change in modals or dynamic range around the modals.  I think someone observed once on rootsweb that his 439 had the same value as a chimpanzee.
Logged
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #113 on: April 23, 2012, 01:34:08 PM »

The only reason why most mutations observed in FTDNA datasets are within +-1 mutation of the presumed modal, is because most people in the FTDNA projects share a TMRCA that is fairly recent, at least in the clan projects. I think a more important thing to look at is not whether 95% of mutations in DYS393 are within +-1 mutation of the modal, but how many are +1 and how many are -1, this would be indicative of whether or not a large number of back mutations could have occurred. I don't think mutations converge to a modal, in fact the one thing I would need to confirm is that the presumed modal of a set is in fact the ancestral haplotype of the set. Although for all intended purposes a set where the modal of a given microsatellite is 12 and most mutations are 14 would have the same variance as a set where the modal is 13 and most mutations are either 12 or 14 when calculating the modal using the assumption of minimization of mutations. Also, there is an observed direct relationship between the number of repeats and the mutation rate. For example a locus with 16 repetitions is more likely to mutate to 17, than the same locus with 13 repetitions is to mutate to 14.
« Last Edit: April 23, 2012, 01:41:13 PM by JeanL » Logged
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #114 on: April 23, 2012, 01:42:50 PM »

I have spoken of this about the aDNA found in France of the hg. G: without those data we wouldn't have undertstood which was the modal 7000 years ago for some loci.
« Last Edit: April 23, 2012, 01:45:27 PM by Maliclavelli » Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #115 on: April 23, 2012, 01:49:04 PM »

Thinking about this mutating around the modal idea, I think it sounds quite exciting.

Clearly there is more than one modal, lots in fact !!

Presumably these modals consist of values that peoples DNA gravitates to ?

Now if we can work out what special properties certain values have at specific loci to create this effect then we could maybe predict other values that have this strange property and use them to discover other as yet unidentified modals !!!!

BTW does this mean that people with no real conection could end up with extremely similar values as there DNA converges on a random modal ?
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #116 on: April 23, 2012, 01:51:16 PM »

But does anybody answer this post of mine?

This posting of mine, posted here and published also by Dienekes, is waiting some response, above all from Anatole Klyosov:


An interesting haplotype of R1a1a (M17) has been found in the paper of Gunjan Sharma et al., Genetic Affinities of the Central Indian Tribal Population, PLoS one, February 2012:
DYS19=18
DYS385=14-17
DYS389=15-30
DYS390=28
DYS391=12
DYS392=14
DYS393=13
DYS437=17
DYS439=13
DYS448=22
DYS456=17
DYS458=17

At first sight it could seem we have found the R-M420 not found so far in India with its DYS492=14, which presupposes a 13, whereas all the other R1a1a haplotypes have 11 or 10 and 12 from 11, but this haplotype has been tested for M17, then it isn’t an R-M420. Also the extremely large variance of the other markers makes us think that this value 14 derives from a modal 11 (or what was the modal at the origin of this subclade). Then again all the discourses about “modal” and “variance”, as I have supported many times, are worth nothing.
But I think it would be something to say about the TMRCA of 10.97+/-1.86 kya (25 y for generation) even though calculated by the Zhivotovsky rate. It is clear that these R1a1a-s belong to different clades and the massive presence of the clade most usually found falsifies the calculation. It is clear that this haplotype is an outlier, but for this more interesting, because testifies all the mutation gone mostly for the tangent and not around the modal. If we calculate the intraclade between two of these haplotypes, for instance with this closer to the modal: 15, 11-14, 14-32, 24,10, 11, 12,14,10, 20, 15,16 we have 32 mutations. Also using the usual mutation rate of 0,0022, we have:
(454x32)/28=518
518x25=12,950
and I have used a generation of 25 years and not 32 as I use usually, and I haven’t considered other mutations around the modal.

Conclusions? The ancientness of the haplogroups is much much more than it is usually thought.
« Last Edit: April 23, 2012, 01:56:09 PM by Maliclavelli » Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #117 on: April 23, 2012, 01:54:52 PM »

I have spoken of this about the aDNA found in France of the hg. G: without those data we wouldn't have undertstood which was the modal 7000 years ago for some loci.
I won't argue with you on what mutation rates are right. That is black hole of a discussion on its own and the mutation rates are the application of time(years) so they are critical for on TMRCA estimations.  However, this thread is about STR diversity not necessarily mutation rates.

Does this have something to do with what you said about R-Z253 (a subclade of R-L21) falsifying something?

Marko Heinila estimated the TMRCA for G-M201 as 27k ybp based on just over 2200 haplotypes. He uses STR diversity in a different way than Nordtvedt in what he calls a "maximum likelihood" method.
« Last Edit: April 23, 2012, 02:04:36 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #118 on: April 23, 2012, 02:05:02 PM »

We have discussed about this about hg. J on eng.molgen with the great geneticist Roy King. I have had the impression that Roy was frustrated in his desire to find that his J was Jewish and not European. In the short time of his haplogroup the method of Heinila wasn’t able to decide, because, by calculating the variance without taking in consideration mutations around the modal etc., this method cannot decide. It is possible that for more ancient times (27kya are many) it fits.
« Last Edit: April 23, 2012, 02:05:47 PM by Maliclavelli » Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #119 on: April 23, 2012, 02:10:57 PM »

But does anybody answer this post of mine?....
Conclusions? The ancientness of the haplogroups is much much more than it is usually thought.
Do you have these two haplotypes in Ysearch with 67 STRs?  What are the terminal SNPs for each haplotype? Comparing haplotypes on a limited number STRs is not really something you can expect much precision with.  

Comparing any two individual haplotypes can produce unusual results,  I pretty much ignore FTDNA's tip calculator when looking at my matches.  I think this is part of the reason they felt that 111 STRs are useful, but if you are only using 10 or 15 for just two people I don't know if it is worth your time chasing down.

I agree with you that some people have values at some STRs that are far off the modals, if that is what you mean by a tangent.   I think this underscores the importance of the law of large numbers and using statistical tools for populations, not necessarily individuals.
« Last Edit: April 23, 2012, 02:13:37 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #120 on: April 23, 2012, 02:26:49 PM »

We have discussed about this about hg. J on eng.molgen with the great geneticist Roy King. I have had the impression that Roy was frustrated in his desire to find that his J was Jewish and not European. In the short time of his haplogroup the method of Heinila wasn’t able to decide, because, by calculating the variance without taking in consideration mutations around the modal etc., this method cannot decide. It is possible that for more ancient times (27kya are many) it fits.

Heinila has J-M304's TMRCA estimated at 20k ybp and he has the interclade for J-M304's and I-M170's common ancestor as 25k ybp.   What's out of wack or what's falsified? These are quite old haplogroups so I don't think you can expect much precision and at these ages the linear duration of the STRs does become an issue (per Vince Vizachero.)

Nevertheless, the IJ interclade TMRCA is 25k ybp while using the same method Heinila gets the R1a-SRY10831.2 and R1b-M343 interclade TMRCA as 15k ybp.  

Again these estimates are not precise to 1000 years, but that R1a1 / R1b interclade age of 15k is not that far different than Karafet's estimate for the R1 TMRCA of 18.5 k ybp. Karafet used a completely different method not using STRs, but by counting SNP branch lengths.  The SNP branch length "molecular clock" seems to align with the STR variance "molecular clock."   ...  an amazing coincidence.

R1 could clearly be older than 15k ybp or 18.5k ybp present and there could always be an abberrant STR value or two, but we have these estimates (based on large numbers of haplotypes) for the most recent common ancestors of R1a and R1b in support of each other.

BTW, using the same method and scale, Heinila has the interclade TMRCA for R1b-L21 and R1b-U152 as 4.2k ybp. The TMRCA for R1b-Z253 (a son of L21) shouldn't be older than that if at all.
« Last Edit: April 23, 2012, 07:23:48 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #121 on: April 23, 2012, 02:27:05 PM »

The only reason why most mutations observed in FTDNA datasets are within +-1 mutation of the presumed modal, is because most people in the FTDNA projects share a TMRCA that is fairly recent, at least in the clan projects. I think a more important thing to look at is not whether 95% of mutations in DYS393 are within +-1 mutation of the modal, but how many are +1 and how many are -1, this would be indicative of whether or not a large number of back mutations could have occurred. I don't think mutations converge to a modal, in fact the one thing I would need to confirm is that the presumed modal of a set is in fact the ancestral haplotype of the set. Although for all intended purposes a set where the modal of a given microsatellite is 12 and most mutations are 14 would have the same variance as a set where the modal is 13 and most mutations are either 12 or 14 when calculating the modal using the assumption of minimization of mutations. Also, there is an observed direct relationship between the number of repeats and the mutation rate. For example a locus with 16 repetitions is more likely to mutate to 17, than the same locus with 13 repetitions is to mutate to 14.
 You make some good points.  The dataset I referred to was not limited to clans however.  We don't know what really affects what the modal is other then the type of STR, whether, di, tri etc., this property appears to affect the mutation rate (see the mutation table I referred to Burgarella et.al.).  The other issue might be chemical kinetics and what that entails (I am no expert in that field).  I do believe there have been small changes in the Modal over time, but not much.  My argument has been that most mutations are around the modal(regardless of the modal value).  Unlike the Drunkards walk model which shows an expanding range of states with time, the STR mutational process seems to be confined to a narrow band, except for when a multistep occurs.  I don't think this is due to the age of haplotypes but is more inherent to the process.  I don't think your last statement re: which dys loci is most likely to mutate to a higher value agrees with the data of Burgarella.  Mutation rates, as I mentioned, seem more defined by the type of STR, i.e., two, three, four or more G,C,A,Ts in the increment.
« Last Edit: April 23, 2012, 02:32:02 PM by ironroad41 » Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #122 on: April 23, 2012, 05:35:38 PM »

I checked and in my haplotype cluster there are 20 (xCDY) off-modals with four mutations that are less then the modal value which is about 22% or 3 to 1 mutating upwards. In a slow to fast order, the 8-10 and the 16th STRs are downward dogs.

Order   1130A1   L21 Modal
1   531=>12   11
2   497=15   14
3   511=11   10
4   19=>15   14
5   385a=12   11
6   441=14   13
7   552=25   24
8   447=24   25
9   513=11   12
10   557=<15   16
11   446=14   13
12   464d=18   17
13   456=18   16
14   534=16   15
15   449=31   30
16   576=17   18
17   710=36   35
18   712=>21   20
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #123 on: April 24, 2012, 01:09:51 PM »

I'm just reposting this here to help keep other threads on topic.  I'm opening up the subject of this thread to include TMRCAs and mutation rates.  I don't necessarily take firm positions on TMRCAs and mutation rates, but rather trust, what I understand others on these things...  however, we can discuss them if you want.

I'll respond to Ironroad later.

As mike has said, even the experts can't decide on the appropriate model.  I spent a lot of time on the Busby paper this winter and also pondered again the Zhivotovsky conundrum.
There are several mathematical methodologies that produce similar results so I do want to be clear that there are alternative methods that seem to support each other in TMRCA calculations.

What is not agreed up on are the mutation rates, although the leading hobbyist-scientists seem to come down pretty much on the side of the germ-line rates...  this would include Chandler, Nordtvedt, Heinila and Klyosov. I don't know if Vizachero and Dienekes are scientists but they also are against using the evolutionary rates rather than the germ-line rates.
I am responding to your previous post and this one.  We are all homosapiens, but what does that mean?  We have many differences due to environment and evolution.  The same is true in this area of study.  If you have accessed the table I referred to you can look at the distribution of 388 for hgs I1, J2, and R1b.  I am emphasizing this dys loci because its behavior can significantly affect the TMRCA estimate.  For this loci Chandler gives a value of .00022 per gen and burgurella .00046.  From the table I note that R1b had approximately 221 mutations out of 22129 entries.  We have no idea how many are unique and how many are inherited.  J2 had 265 out of 915 and I had 2508 out of 5700!  Additionally the data spread is across 5 to 6 values for I and J2 and across essentially 3 values for R1b. In no way can one rate support these data.   Additionally, the variance calculation will show a large contribution to TMRCA due to the very low mutation rate and concomitant long time period expected between mutations at this locus.  No wonder I and J appear older in Kens work.

I really believe you have to get into this level of detail to understand the Y STR mutation process and its current problems.  Most Dys loci who mutate within the modal +/- l generate no appreciable variance and certainly no increase occurs with time as the drunkards walk model suggests.  Most of the variance is generated by multisteps, especially steps greater than 2 and the faster mutators such as CDYa,b.

My conclusion is that the Variance/ASD model does not represent the data properties.
« Last Edit: April 24, 2012, 01:23:49 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #124 on: April 24, 2012, 02:13:11 PM »

I have tried to reply to your questions as honestly as I can.  I presented my comments in this format, because I wanted to explore the assumptions built into the calculations presented for Variance/diversity.

It is critical that the assumptions represent what we observe in the data generated to date, whether it has been published as a study or is just a dataset.

The ASD/Variance model was developed by Goldstein, et.al., and modified by Nordtvedt and others.  At the time of development, not much data existed to verify the assumptions.

We will never have enought data, but sufficient data has been collected, both germ-line and family data sets to make observations about the "real" properties of the Y STR mutational process.

I do not believe your subject question can ever be rationally answered until these properties are understood and agreed upon and a model created using these properties.

I would like to emphasize one other aspect of the Goldstein derivation in which he states that each dys loci can be used to infer the TMRCA but in practice several are used and averaged.  Note:  I do not believe this calculation can be made using Kens approach since he uses averages of mutation rates?  What this approach permits is an estimate of the SD of the computation.  First you compute the TMRCA using the dys loci of interest. Let xsuba be the average of the TMRCA's, then the SD = square root of ( sum(x - xsuba)^2/(N -1)).  This also explains why using STR's of similar rates yields higher confidence.
Logged
Pages: 1 ... 3 4 [5] 6 7 ... 14 Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.173 seconds with 18 queries.