World Families Forums - STR Wars: Is diversity meaningful? more meaningful than Hg frequency?

Welcome, Guest. Please login or register.
July 10, 2014, 06:25:17 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  STR Wars: Is diversity meaningful? more meaningful than Hg frequency?
« previous next »
Pages: 1 ... 9 10 [11] 12 13 14 Go Down Print
Author Topic: STR Wars: Is diversity meaningful? more meaningful than Hg frequency?  (Read 16891 times)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #250 on: June 15, 2012, 05:27:03 PM »

1) I believe that even though you used the 36 most linear(Having linearity longer than 7000 ybp) STRs you used per Marko.H calculations still show a wide variability in mutation rates. In fact, only 24 STRs in the 111 marker set provided by Marko.H have mutation rates lower than 10-3, thus making them slow STRs.

Marko looked like he had a good methodology and he used a lot of data and covered all the markers we are using so I chose to follow his methodology.  About a year a go I was doing the same STR variance calculations with different sets of markers based slowly on mutation rates.  Tim Janzen has done similar stuff and posted it on RootsWeb.  He is on the R1b-U106 S21 Yahoo Group if you'd like to discuss his findings with him.

My anecdotal observations in using only slow markers or very slow markers or what have you was that the relative variance still jumped around between haplogroups when they shouldn't.   Since, I have found that the 49 mixed speed set of STRs from the first 67 generally provide a consistent relationship of STR variance between haplogroups as do the 36 Marklo linear (for 7K) markers.  I was pleased to seem some cross-checked relationship to hold consistent like this so I've gone this route.

I have no statistical analysis of this, but I've read Ken Nordtvedt explain if you use only slow markers you lose precision. Essentially, this would be like measuring minutes with a calendar.  The size of sample you need to get this to average out with slow markers only consistently is enormous, at least that's my interpretation of what I've read.

2)The calculations were perfomed on the R1b-L21+ dataset from the FTDNA Projects which are heavily populated by folks of British descent, so if the TMRCA of L21 in Britain is indeed 4000 ybp, then both most linear or mixed sets of STR ought to give you the same result.

I'm not sure of the exact set of calculations you referenced, but I also did similar comparisons between and within the U152, U106 and Z196 haplogroups. Yes, I agree with you there is a heavy bias towards Americans and folks of Isles descent, but I just counted and over 30 European countries are included in these P312 and U106 data sets.  

3)In a nutshell you can argue that based on the calculations on R1b-L21+ the difference between using 36 STRs that have a linearity of 7000+ybp vs. a mixed bag of 49 appears to have little effect on variance for a set that is mostly populated by British guys.

I actually think it is more important to understand the relationships between clades and subclades first, regardless of geography. We know people within subclades are related whereas we can't say all L21 in Britain is closely related.  Some may actually be historical migrations from France or Scandinavia...  and we have migrations going the other directions as well.

4)Now can you extrapolate those conclusions to say P312+(i.e. DF27, U152,etc) folks from elsewhere in Europe? I for once wouldn’t do it.  

Are you saying we shouldn't attempt to calculate the age for P312 or some subclade unless we have a scientifically sampled set of data from everywhere P312 lives?
« Last Edit: June 15, 2012, 05:27:58 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #251 on: June 15, 2012, 06:14:02 PM »

My anecdotal observations in using only slow markers or very slow markers or what have you was that the relative variance still jumped around between haplogroups when they shouldn't.   Since, I have found that the 49 mixed speed set of STRs from the first 67 generally provide a consistent relationship of STR variance between haplogroups as do the 36 Marklo linear (for 7K) markers.  I was pleased to seem some cross-checked relationship to hold consistent like this so I've gone this route.

But your anecdotal observations come mostly from a set of R1b-L21 that is heavily dominated by one ethnic group. Also, you should indeed get consistent results if you used the 49 mixed speed set of STRs vs. the 36 most linear STRs, mainly because the fast mutating STRs would dominate in the variance calculation for either case, which explains why you get differences when you use the relative variance of slow markers.

I have no statistical analysis of this, but I've read Ken Nordtvedt explain if you use only slow markers you lose precision. Essentially, this would be like measuring minutes with a calendar.  The size of sample you need to get this to average out with slow markers only consistently is enormous, at least that's my interpretation of what I've read.

How do you know that you aren’t measuring miles with a 12 inch ruler instead? Ken Nordvedt probably said that referring to datasets that have a recent(i.e. less than 1500 ybp) known TMRCA, but unless tested, how can you assume that it is safe to go ahead and use fast markers on all R1b-L21 folks, do we know that their time frame falls within what would be appropriate to measure using fast markers?

I'm not sure of the exact set of calculations you referenced, but I also did similar comparisons between and within the U152, U106 and Z196 haplogroups. Yes, I agree with you there is a heavy bias towards Americans and folks of Isles descent, but I just counted and over 30 European countries are included in these P312 and U106 data sets.

The only relative variance comparison of 49 mixed vs.36 linear you have shown was for R1b-L21+ subclades, but if you have done it for U152, and Z196, go ahead and share the results, it would be interesting to see them. Yes there are over 30 European countries, but that doesn’t change the fact that the majority of haplotypes come from folks of British Isles descent, so unless the each one of the other European ethnic groups has a considerable sample size, their presence would merely act as outliers, which would probably not even affect the outcome by much given the total sample size. In fact let’s talk some numbers into the question, so I went ahead and downloaded your Haplotype_Data_L21_all excel file, this is what I got:

Total Haplotypes: 6119

England: 588

Ireland:1987

Scotland: 1142

Wales: 187

Total British Isles: 3904 (63.80% of the total sample)

Unknown origin: 1909

Now let’s look at other Europeans:

Denmark: 6

France: 84

Germany: 56

Italy: 11

Netherlands: 8

Norway: 28

Poland: 6

Portugal: 7

Russia: 7

Spain: 45

Sweden: 12

Switzerland: 6

All nonBritish Islands Europeans combined: 306

What effects do you think the 306 haplotypes from other European countries have against the 3904 haplotypes from the British Islands?



I actually think it is more important to understand the relationships between clades and subclades first, regardless of geography. We know people within subclades are related whereas we can't say all L21 in Britain is closely related.  Some may actually be historical migrations from France or Scandinavia...  and we have migrations going the other directions as well.

Yes, you are right some L21 could be of historical migrations from France or Scandinavia, but a good percentage of it isn’t.

Are you saying we shouldn't attempt to calculate the age for P312 or some subclade unless we have a scientifically sampled set of data from everywhere P312 lives?

I’m saying that if you haven’t tried to calculate the age of P312 using a different STRs sets(i.e. 20 slow Markers.vs.36 Most Linear Markers.vs.49 Mixed Markers) on a truly representative sample of P312, there is no way of telling if the age estimates you are getting for P312 using the current dataset which is heavily dominated by L21 folks are correct.
« Last Edit: June 15, 2012, 06:15:44 PM by JeanL » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #252 on: June 29, 2012, 11:29:34 PM »

It is important to understand the difference between a true Most Recent Common Ancestor and a Coalescence age calculation. I'm just cataloging this conversation here. John Chandler is a scientist from IT and Anatole Klyosov is a scientist in the biochemistry field.

Quote from: Rootsweb post by John Chandler
From: john.chandler@alum.mit.edu (John Chandler)
Subject: Re: [DNA] Calculation of TMRCA
Date: Fri, 29 Jun 2012 13:32:41 -0400

Anatole wrote:

> Welcome to DNA genealogy. The question which you have addressed is not simple, and even a scientist of
> such a great caliber as John Chandler came to a wrong estimate.

That's a remarkable conclusion. The fact is that I carefully refrained from expressing an estimate in this case because any solitary estimate would be more misleading than helpful. The passage from me that you quoted is illustrative:

> >...if the other two differ by just one step at each of the three
> >discrepant markers, the 95% confidence interval for
> >their TMRCA would extend to about 45 generations. ...

Observe that I emphasized the breadth of the probability distribution of TMRCA estimate to the exclusion of the estimate itself. It would improve communication if you actually *read* the posts you respond to. The bottom line for the calculation is that any TMRCA up to 45 generations is statistically plausible. If the surname in question were less distinctive, it would be foolish to assume that the MRCA even had a surname at all.

> In that situation the most applicable is the "permutation method",
> which is practically not known among folks in the field, though I
> have published it first in 2009,

Another way to improve communication is to use the same terminology as everyone else. The thing you are calling the "permutation method" has been widely used since time immemorial and is known as the variance method. The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Quote from: Rootsweb post by Anatole Klyosov
My response:
Dear John,

I have noticed that you were thinking for exactly a week before to respond. Frankly, I thought that you have realized that to give such an answer as you quoted yourself above is wrong indeed. It is of no use. You could have said that the 99% confidence interval would extend to about 150 generations. Or that 99.9% confidence interval would extend to about 1500 generations (or whatever, I did not waste time for those calculations). Who cares how it would extend at a certain confidence interval? Either you provide an estimate when you are asked, or honestly say that you do not know. Do not give elusive answers, nobody forces you to answer in the first place.

As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment.    
 
>The thing you are calling the "permutation method" has been widely used since time immemorial...

Well, maybe. Nothing is new under the moon. However, you did not use it and you did not come up with a specific answer to the question which was addressed. As (almost) always, you came up with a negative comment, and with nothing else. As I have informed you earlier, I do not buy negative comments if they do not give a direct answer to the question addressed .  

Regards,
Anatole Klyosov
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2012-06/1341014863
« Last Edit: June 29, 2012, 11:35:07 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #253 on: June 30, 2012, 12:16:00 AM »

Quote from: Rootsweb post by John Chandler response to A.Klyosov

…The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Very true, the usage of mean haplotypes does tend to underestimate the true variance of a dataset, but not only that, there are other biological concepts that could do it too.  In fact by using the concept of a mean haplotype being the ancestral one assumes that the TMRCA is within the timeframe that is less than the fixation time for most STR markers used, however that is a complete wild guess in this case.

Now let’s analyze English Professor Anatole Klyosov and his wide usage of logical fallacies.(Actually I take it back, he barely used any fallacies :( )

Quote from: Rootsweb post by Anatole Klyosov
[…]
As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment. 

How is this relevant to the discussion, one might not know, but a good ole’ Ad Hominem is never bad when arguing, so one shouldn't be surprise if Klyosov brings the I have a “PhD/I am a teacher/I’m an expert and you are not” argument . 
 

« Last Edit: June 30, 2012, 11:41:40 AM by JeanL » Logged
Dubhthach
Old Hand
****
Offline Offline

Posts: 273


« Reply #254 on: June 30, 2012, 04:00:40 AM »



What effects do you think the 306 haplotypes from other European countries have against the 3904 haplotypes from the British Islands?


I only see 1917 samples from "British Islands" in that list tbh.
« Last Edit: June 30, 2012, 04:01:55 AM by Dubhthach » Logged
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #255 on: June 30, 2012, 11:49:46 AM »


I only see 1917 samples from "British Islands" in that list tbh.


Well 3904 have their most distant known ancestor coming from the "British Islands", if one looks at column D where it says “Old World Country”, and one would see that Wales, Scotland, Ireland, and England add up to 3904.
Logged
Dubhthach
Old Hand
****
Offline Offline

Posts: 273


« Reply #256 on: June 30, 2012, 01:54:00 PM »


I only see 1917 samples from "British Islands" in that list tbh.


Well 3904 have their most distant known ancestor coming from the "British Islands", if one looks at column D where it says “Old World Country”, and one would see that Wales, Scotland, Ireland, and England add up to 3904.


The problem with your logic is Ireland isn't a "British island"
Logged
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #257 on: June 30, 2012, 02:10:07 PM »


The problem with your logic is Ireland isn't a "British island"

Well, I'm sorry then, how would you refer to the Ireland+UK combo then? I actually would gladly change my terminology if you tell me a more appropriate one.
Logged
alan trowel hands.
Guru
*****
Offline Offline

Posts: 2012


« Reply #258 on: June 30, 2012, 02:30:05 PM »

I dont see the fuss about the term British Isles.  British Isles as a collective term of great age that is far older than Britain as a political entity.  Ireland was one of the 'islands of the Pretani' (Cruithne in Gaelic) or at least its proven that they were one of the elements in the prehistoric Irish Population so the name has an historical basis.  Each to their own but I feel the problem with alternative collective terms is that they sound contrived or have no historical basis.  I think the problem is always going to be there because some people do not want a collective term anyway so simply Britain and Ireland /Ireland and Britain is probably the safest way to go to avoid treading on sensitive toes. 
Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #259 on: June 30, 2012, 04:36:55 PM »

Quote from: Rootsweb post by John Chandler response to A.Klyosov

…The thing it estimates is called the coalescence age, which is not the same as a TMRCA, except in the case of a sample of just two haplotypes. With three or more, as in this case, the coalescence age is biased on the low side, unless you take the variance with respect to the (unknowable) ancestral haplotype instead of the mean haplotype. In your illustration, needless to say, the variance is taken with respect to the mean haplotype.

Very true, the usage of mean haplotypes does tend to underestimate the true variance of a dataset, but not only that, there are other biological concepts that could do it too.  In fact by using the concept of a mean haplotype being the ancestral one assumes that the TMRCA is within the timeframe that is less than the fixation time for most STR markers used, however that is a complete wild guess in this case.

Now let’s analyze English Professor Anatole Klyosov and his wide usage of logical fallacies.(Actually I take it back, he barely used any fallacies :( )

Quote from: Rootsweb post by Anatole Klyosov
[…]
As you probably have noticed, I did not give only the answer, I explained HOW to calculate. You did not bother to do so.  Maybe because I am a teacher, and my duty is to explain things, and you are not. Well, it is your business, of course,  to explain or not.  However, it seems that you missed the main point of my comment.  

How is this relevant to the discussion, one might not know, but a good ole’ Ad Hominem is never bad when arguing, so one shouldn't be surprise if Klyosov brings the I have a “PhD/I am a teacher/I’m an expert and you are not” argument .  
 


Some comment on yours and mikes conversations: 1.  I forwarded Mike a copy of a Science article by David Goldstein published in march 2001.  In that article he derived the variance equation and stated that the calculation of TMRCA was performed for each dys loci and averaged.  If the variance equation is used this way, and not by averaging mutation rates of dys loci, then the Sum of Squares for each dys loci  can be calculated and therefore the SD also.  Its a little cumbersome but it is defined in the literature.  re: this discussion, estimates have higher SD's when the range of mutation rates used is greater.  2. re: use of slower mutators, such as was done by  D. Janszen(sp) on rootsweb, I still don't believe that a formula approach replaces a lot of work to understand the set of entries you are working with.  Only counting unique mutational events, not inherited, is very  important.  On another thread, I have gone through in detail how I approached R- Z253 to estimate the TMRCA of this group.  Even though there are 3 entries with 388 =13, this represents only one mutation, similarly for 426; 7 entries but only one mutation.  Unless this kind of care is used, the accuracy of the current Coalescence ages of different haplogroups is highly questionable. Finally, for the faster mutators it is impossible to know the time history of what has occurred and I don't believe it is modelled by the random walk model?  All I generally observe is a mutation pattern centered around the modal value.  note: however, this pattern is broken when multistep mutations occur.

Finally, re Klyosov, I think it is still an open issue on whether he really has something different and correct to offer.  Intrinsically, for general Coalescence/TMRCA estimates, I believe that the use of slower mutators is necessary to make a rational estimate, that just makes good sense to me. Whether his permutation approach based on "chemical kinetics" is a valid approach is a TBD in my opinion.  final note:  that doesn't make him a nice person to work with however.
« Last Edit: June 30, 2012, 04:40:08 PM by ironroad41 » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #260 on: July 06, 2012, 03:46:46 PM »

Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905
Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #261 on: July 07, 2012, 01:49:07 PM »

Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905

I haven't studied Hans work, but I believe also that STR mutations and SNP mutations are not related cause and effect wise. What I have observed is that the STR mutational process doesn't follow a drunkards walk model.  In that model the drunk departs from the lamppost by a square root law, i.e. after 25 steps he is 5 steps from the lamppost.  I replotted the table of Y STR frequencies from rootsweb and I do not see any consistent drift.  A few of the faster mutators have a wider range, but in general the modal is constant across SNP's.  This doesn't necessarily imply independence but it suggests the modal value(s) is a preferred state for the dys loci. In other words the drunk migrates around the modal, not away from it.

Theres a lot we don't understand about STR mutations.  What is their real purpose?  They don't appear to influence genes?  Is their sole purpose to act as a clock?  Then why do they have such a dynamic range?  Food for thought.
« Last Edit: July 07, 2012, 01:50:38 PM by ironroad41 » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #262 on: July 08, 2012, 03:24:55 AM »

Are there any scientific papers out there that use or show different Y DNA STR mutation rates for different haplogroups?

I copied this over from another thread just to catalog it here. I interpret Hans' analysis as inferring that STR mutations and SNP mutations are not related cause and effect wise.  If so, I don't see evidence that STR mutations should gravitate around or back to an ancestral (possibly modal) value specific to an SNP defined haplogroup.

There has been a intensive discussion about the quality of TMRCA estimates. It has been looked into through theoretical considerations and reflected against historical informations more or less backed up by empirical research. Carbon and isotopic dating artefacts associated with past populations gave us time lines to hold against these TMRCA calculations.

In genetic genealogy we use the SNPs indicators for speciation events while the diversity of the STR based haplotypes is seen as anagenetic developments within monophylitic clusters.

By excluding the multi and fast markers in the STR based haplotypes an effort is made to replace the SNP based synapomorphic characters with sets of STR markers and in doing so we come to a TMRCA estimate.

The mechanism of the variation in the VNTR is (mostly) DNA replication slippage while SNP is the consequence of a fixed mutation caused by external influences such as metabolic stress and/or ionising radiation. STR events are more or less internal; SNP novelties have external causes.

Radiation pressure causing ionising effects is reflected in the Delta 14C data as published by the International Carbon Calibration curves (http://www.radiocarbon.org/IntCal09.htm).

I constructed two graphs: first an graph in which the the SNP count (irrespective of its haplogroup) is set against the estimated TMRCA (data supplied by Marko Heinila).
https://dl.dropbox.com/u/74936451/SNP%20count%20%204-5.3.pdf
In the second graph the Delta 14C is plotted against the calibrated age (ybp).
https://dl.dropbox.com/u/74936451/delta%2014c%20versus%20ybp%204000-5200.pdf

I used the period from 4k till 5.3K for these graphs.
If the appears to be an interest in his approach I can make available the graphs of the whole period between present and 20K years ago.

Hans
http://www.worldfamilies.net/forum/index.php?topic=10752.msg133905

I haven't studied Hans work, but I believe also that STR mutations and SNP mutations are not related cause and effect wise. What I have observed is that the STR mutational process doesn't follow a drunkards walk model.  In that model the drunk departs from the lamppost by a square root law, i.e. after 25 steps he is 5 steps from the lamppost.  I replotted the table of Y STR frequencies from rootsweb and I do not see any consistent drift.  A few of the faster mutators have a wider range, but in general the modal is constant across SNP's.  This doesn't necessarily imply independence but it suggests the modal value(s) is a preferred state for the dys loci. In other words the drunk migrates around the modal, not away from it.

Theres a lot we don't understand about STR mutations.  What is their real purpose?  They don't appear to influence genes?  Is their sole purpose to act as a clock?  Then why do they have such a dynamic range?  Food for thought.

Perhaps what you are seeing is not a drunkard walking 25 steps and ending up only 5 steps away, but a family growing over time, sometimes one step, sometimes stationary, sometimes a step the other direction. Given the period of time (# generations and transportation available) the family reached only about 5 steps from its ancestor.  Of course major parts of the family may have disappeared (gone extinct) so the branching out is the same in all directions.

Why does everything have to have a sole purpose that we understand? The sun always arose in the east, even if we did not understand why.  Still it was a good event to measure time with - the start of a day. The ancients knew this before we knew what the sun really was.
« Last Edit: July 08, 2012, 03:26:08 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #263 on: July 08, 2012, 07:35:38 AM »

I'll repeat my assertion.  Over the haplogroups E1a to R1b (7 different hgs); there is no apparent drift in the modal value. As someone noted on rootsweb once, He had the same value at 439 as a chimpanzee and we parted ways millions of years ago.

I agree we don't have to understand everything in order to surivive.  Thats not the point.  Dienekes just published the results of 3+ studies debunking the out of Africa 60K years ago. I believe we also have quite a few myths that describe the properties and evolution of STR's that were postulated before sufficient data existed.

The current philosophy is that the process is random (whatever that means); that the 200:1 dynamic range of mutation rates doesn't matter (use the average of all dys loci under consideration when making variance estimates).  That the Gaussian model ( called the great intellectual fraud by Taleb) describes the distributions we see, etc.  We can surivive with these myths but how will it help us to increase our understanding of the mutational process?
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #264 on: July 09, 2012, 12:50:25 AM »

I'll repeat my assertion.  Over the haplogroups E1a to R1b (7 different hgs); there is no apparent drift in the modal value. As someone noted on rootsweb once, He had the same value at 439 as a chimpanzee and we parted ways millions of years ago....
Are you saying E1a's modal is about the same as the Western Atlantic modal?
Are you talking about just DYS439 out of the 111 we can see from FTDNA haplotypes?
« Last Edit: July 09, 2012, 12:51:11 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #265 on: July 09, 2012, 07:01:57 AM »

No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?
« Last Edit: July 09, 2012, 07:02:53 AM by ironroad41 » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #266 on: July 10, 2012, 03:33:33 AM »

No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?

I am not saying that mutations are completely and ultimately random. I think that most probably are practically random. By that I mean we can only observe somewhat random patterns. We don't have any other piece of data that can be used in a cause-effect predictive manner.

We do know that most, and I mean almost all Y DNA lineages have gone extinct.  Out of the millions, we only of have a relatively few that survive.

Given the few surviving branches of the human Y DNA (paternal lineage) tree and the practically random nature of most STR mutations, I don't see anything extremely unusual or profound about a large old tree with some big branches, then smaller and smaller branches that sometimes cross.  The tree may be lopsided with some branching (facing the sun perhaps) that have become bushy, full of twigs, compared to other parts of the tree.

If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.
Logged

R1b-L21>L513(DF1)>L705.2
Maliclavelli
Guru
*****
Offline Offline

Posts: 2146


« Reply #267 on: July 10, 2012, 06:09:37 AM »

If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.
But this is very meaningful instead, i.e. it demonstrates that some markers have had many mutations around the modal, whereas others have had mutations for the tangent, what I have always said and what falsifies all your theories.
I am confident that aDNA will demonstrate all my theories.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #268 on: July 10, 2012, 07:05:45 AM »

No, I'm saying that the modal values don't change significantly across haplotypes. There are difference between the two (E1a and WAMH) due to randomness.  I've mailed you a cc of the table I created which shows the modals for the first 67 dys loci and the seven hgs I mentioned.  Maybe you can infer more out of it than I did?

I am not saying that mutations are completely and ultimately random. I think that most probably are practically random. By that I mean we can only observe somewhat random patterns. We don't have any other piece of data that can be used in a cause-effect predictive manner.

We do know that most, and I mean almost all Y DNA lineages have gone extinct.  Out of the millions, we only of have a relatively few that survive.

Given the few surviving branches of the human Y DNA (paternal lineage) tree and the practically random nature of most STR mutations, I don't see anything extremely unusual or profound about a large old tree with some big branches, then smaller and smaller branches that sometimes cross.  The tree may be lopsided with some branching (facing the sun perhaps) that have become bushy, full of twigs, compared to other parts of the tree.

If a branch of E1a and R1b cross, or in genetic terms, converge on some of the STRs, I don't see why that is that meaningful. Fortunately we have the SNPs as branch markers to help us sort out the crossing branches.

I think the reality is that many of our trees have run out of branches/branchpoints and are dead ends.  I gave an example of the problem I had with my problem in the states.  I have finally found two similar haplotypes in R - Z253 but one has only 67 dys loci measured. How can I use my 632 = 10 in a comparison of two entries?  It contributes thousands of years of mutational time.  And further, after I converge us two, I still have two other very dissimilar modal haplotypes to converge.  It is very clear to me that R - Z253 is much older than 2k years.  I haven't figured out a good way to estimate its age though.  In addition to the 632 we have 426 and 388 and 393 and others to compare.  These are some of the slower mutators.  The only way I can see is to use the approach Nordtvedt advocates and essentially average out the mutational rate differences?  I'm not convinced that is correct.

I don't really know for sure that Y STR dys loci were designed to be a clock of male heredity.  However, if they were than there is/was a purpose to the design and having a range of 200 or so in mutational rate has some meaning?

In summary, I think we have at least 3 kinds of time estimates: a. TMRCA's b. Coalescence and c. Dead-ends.  The problem appears to be that we can't sort them out?
« Last Edit: July 10, 2012, 02:01:32 PM by ironroad41 » Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #269 on: July 11, 2012, 02:21:25 AM »

I think the reality is that many of our trees have run out of branches/branchpoints and are dead ends.

Yes, absolutely, there are are many extinct paternal lineages. This has been going on for a long time.

I gave an example of the problem I had with my problem in the states.  I have finally found two similar haplotypes in R - Z253 but one has only 67 dys loci measured. How can I use my 632 = 10 in a comparison of two entries?  It contributes thousands of years of mutational time.  

On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

 It is very clear to me that R - Z253 is much older than 2k years.

I agree. Who is saying that Z253 is 2K ybp present? The initial TMRCA interclade estimates (with Nordtvedt's method) I did had Z253 back close to the time of L21, or I guess DF12 we should say.

I haven't figured out a good way to estimate its age though.  In addition to the 632 we have 426 and 388 and 393 and others to compare.  These are some of the slower mutators.  The only way I can see is to use the approach Nordtvedt advocates and essentially average out the mutational rate differences?  I'm not convinced that is correct.

Agreed, but I think that is the best we can do. We can throw out STRs that are clearly aberrations, like in the case where there are null DYS425s but that is somewhat aribitrary. How do we know what's an aberration versus a signal of old age? This is where the light in my head comes on related to Nordtvedt's statements that more STRs means more experiments and more experiments means better chances at finding the true patterns.

I don't really know for sure that Y STR dys loci were designed to be a clock of male heredity.  However, if they were than there is/was a purpose to the design and having a range of 200 or so in mutational rate has some meaning?

I'm pretty sure our designer didn't really have STR mutations so that we could use them as a pseudo clock. That's our contrivance - Emile Zuckerkandl and Linus Pauling. http://en.wikipedia.org/wiki/Molecular_clock

In summary, I think we have at least 3 kinds of time estimates: a. TMRCA's b. Coalescence and c. Dead-ends.  The problem appears to be that we can't sort them out?

Why do you say that? There are different calculations and I think we include intraclade and interclade as variations of TMRCA estimates. As far as "dead-ends", they are accounted for. The variance models have formulas for both cases - the entire population is known, or for a partial population. The partial population assumes there are missing lineages, which would include the dead-ends.
« Last Edit: July 11, 2012, 02:29:10 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
JeanL
Old Hand
****
Offline Offline

Posts: 425


« Reply #270 on: July 11, 2012, 08:17:04 AM »

On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals. Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.
Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #271 on: July 11, 2012, 08:18:52 AM »

Well written MIke.  Good food for thought.  A couple of points.  re: age of Z 253, I was referring to your R-L11 "Big Picture" Timeline which shows R-Z253 as most probably 400 AD or so?

I'm still not sure about "washing out aberrations" produced by slow mutators, although in fairness, the Ian Cam have a mutation at 426 which, if included, adds several hundred years of mutational time to the estimate even with some 40+ entries.  Since we can't be precisely sure when the 11 to 10 mutation occurred, it might be right or wrong.

re: molecular clock.  most of the early work was with autosomal not STR's.  That said, its not clear why they should be different except for the fact that they appear "extra" in some sense and not related to our genetic picture as commonly thought of.  You have provided some fine references I haven't read before, so I should probably withhold any further comments at this time.  edit:  weren't the STR's originally called "junk" DNA?

I'm still not sure that interclade handles the apparent major gap in R1b between P 312 and prior SNP's? Look at Markos estimates of TMRCA for R1B.

In sum; you've made some good points here and referenced some good sources (indirectly through wikipedia).  Professor Allan from NZ looks especially interesting. I'm still not sure that simply averaging out STR mutations is the right way to go ahead, but I can't disprove it either.
« Last Edit: July 11, 2012, 09:12:58 AM by ironroad41 » Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #272 on: July 11, 2012, 09:10:43 AM »

On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals. Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.

You're right on Jean.  In fact the earliest formulation of the TMRCA calculation included the comment that you compute the TMRCA dys loci by dys loci and then average.  The SD of the estimate is the square root of the sum of squares divided by N - 1.  This clearly shows that including a slow mutator with fast mutators will increase the SD.
Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #273 on: July 11, 2012, 12:30:29 PM »

 quote: On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations. end of quote.

This is one para of your comments that does bother me.  Yet it may be fully correct to say that we have to average out the range of mutational rates? However,  As you know, several times I have referred to Talebs book on the Black Swan.  It deals with trying to understand the occurrence of rare events and their impact.  Its a little pompous and reading through some of the negative reviews at Amazon.com is instructive for trying to put his ideas in perspective.

For those who are investors, he likens a black swan to what happened to Long Term Capital Management Corp., a hedge fund that was playing in the derivatives market.  Two of the founders were economists who got the nobel prize for their work applying economic theory to the market.  They basically applied the Gaussian model to their investment model and got hit by a black swan.  They went belly up in the late 90's.

I believe the range of mutational rates can cause events similar to a black swan in the mutational process and distort or significantly affect our estimates of time.  I don't have any answer to this issue yet. It may be as you say it is an aberration and has to be averaged out.  I'm just not sure?
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #274 on: July 11, 2012, 01:18:16 PM »


On average, it would contribute a great deal of time to a TMRCA estimate, but in any individual case (lineage) there might be an STR, no matter how slow, that mutated in successive generations, or had a multi-step jump. Estimates are only very useful for large groups of people... lots of data for the average rates to wash out the aberrations.

The slow markers are by far more stable than any of the fast markers, so the likelihood of a multi-step mutation occurring in a slow marker is far less than it occurring in a fast marker. So in a sense, using a single slow marker is a risk, because there is a still a probability that it could had a multi-step mutation, or that it had mutated recently, nonetheless any TMRCA calculated with a single STR would have very wide confidence intervals.

Yes, slow markers are more stable. That is essentially by definition.

Yes, calculating with a single STR has got to have very wide confidence intervals. I would guess that would make such an estimate not very useful.


Now if instead something like 8-10 Slow markers were used the likelihood of all of them having recent mutations, or multi-steps, etc, can be considered as zero compared to any set of STRs that includes a mixed set of slow and fast markers. Fast markers do no wash out aberrations, they are only good to measure certain time frames, what it is known for sure, is that the effective time frame for fast markers is far shorter than for slow markers, there is still a lot of arguments as to the actual value of the time frames, but adding any marker to a calculation that is well outside the effective time frame of the marker doesn’t help refine the precision of the calculation of TMRCA, if anything it would contribute greatly to the unaccounted error, and thus likely undermine the TMRCA.

By "washing out" I just mean that the more data you have, generally, abberations do not significantly impact the final estimates.

Yes, I agree that we should deselect STRs that do not have a linear relationship with time for the duration in question.  The problem is figuring out which to deselect. It isn't necessarily the fast ones. There is some work that problems occur at higher allele value STRs. The only deselections that I have rationale for are multi-copy ones, null cases and then the research that Marko Heinilla did.
Logged

R1b-L21>L513(DF1)>L705.2
Pages: 1 ... 9 10 [11] 12 13 14 Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.184 seconds with 18 queries.