World Families Forums - some basic statistics

Welcome, Guest. Please login or register.
November 27, 2014, 04:07:53 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  some basic statistics
« previous next »
Pages: 1 [2] Go Down Print
Author Topic: some basic statistics  (Read 2664 times)
Richard Rocca
Old Hand
****
Offline Offline

Posts: 523


« Reply #25 on: June 01, 2012, 08:33:14 AM »

 Well, it appears to be argumentative?  here is a post by Didier supporting Klyosov's work: From: didier.vernade@Safe-mail.net
Subject: Re: [DNA] "Out of Africa" and R1b new papers published
Date: Thu, 31 May 2012 19:02:54 -0400

I read the 2 papers and here is my output.
 
Reminder :
 1 - Re-Examining the "Out of Africa" Theory and the Origin of Europeoids (Caucasoids) in Light of DNA Genealogy
 http://www.scirp.org/journal/PaperInformation.aspx?paperID=19566
 
2 - Ancient History of the Arbins, Bearers of Haplogroup R1b, from Central Asia to Europe, 16,000 to 1500 Years before Present
 http://www.scirp.org/journal/PaperInformation.aspx?paperID=19567
 
Paper 1 is a very interesting paper. Figure 3, in my opinion, is a breakthrough and probably the major breakthrough brought by A. Klyosov. I do have criticisms but I want first to make sure that my later critics will not be taken as dismissing this paper.
My main "difficulty" with this paper is about the search of an alternative geographical origin for "Adam" (or whatever name you give to this MRCA). Best would be not to propose any alternative and only to point the shape of the tree on Figure 3 and the timing suggest that Alpha and Beta had different geographical localization. I also do NT see the point on the (too) long discussion on the SNPs. I understand that many people still have the ancient M91 origin in mind but it doesn't make any point to the paper. Last, there might be some possible discussion on how, from the 4 A haplogroup clusters , acknowledged to be very different, a base haplotype was "found" ; a difference in 1 or 2 values might affect the timing but I do admit that it wouldn't change very much.
 
Paper 2 is very difficult to read. Probably because A. Klyosov had to update a story presented many times and he wanted to include new data from several different sources, often a few haplotypes here and there, to the global picture.
 Let me go to the point. I never accepted as established the migration by R1b-M269 by a north African route and I really think that the data presented do NT support this view. First, I would like to point that I don't think the map on Figure 10 is fair. As everyone knows there are plenty R-L23 and R-L51 in eastern Europe and in the Balkans and this map is more or less minimizing this fact. I undersrand that it's unwanted but the result is disturbing as it favors the north african hypothesis. Why ? Well, I would like first to remind people of this list that R-L23 and R-51 were the clades early reported (with the RFLP p49a,f assay) as "ht35" . Several groups looked for "ht35" (as opposed to "ht15" for the western type) and they localized them in eastern Europe and the Balkans ; in the middle east to some extent. The picture has changed but, roughly, it's clear that L51 ( a SNP known to be upstream of L11) is rare in western Europe as compared to eastern Europe. So (I go strai!
 ght to the point) if the route throught north Africa was made by R1b-M269 (+ some R-L23 impossible to find anymore) the geographical localization of R-L51 is hard to explain.

I am not going to produce an alternative explanation out of my hat. Let say that I posted that R-M269 came up to Italy but were stopped , it seems and, possibly, changed there from a terrestrial move to a sailing one. From Italy it's possible to reach north Africa near the Iberian coast. The difference is that the group reaching Iberia was a probably a mix including R-M269, R-L51 and possibly R-L11.

Here is my two cents on this question.  I thought the basic premise of the Hg subdivisions was a series of SNP's. showing descent.  If a person doesn't have an SNP, what does that mean?   I would assume that it means you are not part of that lineage?  Mayka, says differently.  I'm not sure what is correct at this time?

ps.  the second paper is also reviewed and commented on in a manner probably not appreciated by this board.  However it follows if paper one is correct.  The out of africa doesn't make sense if we are not descendants of hg A and B.  So, then the question is where did M269 and originate and when?  Asia or Europe?

While I'm not in disagreement about Didier's critique of Klyosov's fiction, L51* is not more common in Eastern Europe than Western Europe. It has very low frequency in Western Europe and is non-existent in Eastern Europe.
« Last Edit: June 01, 2012, 08:33:54 AM by Richard Rocca » Logged

Paternal: R1b-U152+L2*
Maternal: H
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #26 on: June 01, 2012, 01:17:48 PM »

I am not a fan of Klyosovs either.  I think his approach to TMRCA calculations is questionable.  Based on Didiers comments and Mayka's, who I both think usually make sense, I'm a little confused as to what the Hg organization represents.  Further, what SNP's should each person have and what history does it represent.  Is there a description of what the ISOGG organization table means and how to interpret it?
Logged
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #27 on: June 03, 2012, 02:26:05 PM »

Sorry, I don't appear to be as good at games as razyn and yourself.  I don't think Gioellos rules are broken.  What is important in answering your "subtle" question is also to ask what is the mutation rate of the dys loci you are referring to.

Being off modal at a fast mutator has very little information.  Being off modal at a very slow mutator does.  You can blithely use the word random and say it occurred by normal chance or you can find a smaller subset with an odd set of values at the slower mutators.  what I think Gieoello is getting at that many of the slower mutators only have a modal +/- 1 set of values.  If they have had two mutations over the time of the haplotype, they null out and no change is observed (hidden mutation).  This reduces the apparent diversity which is your measure of time?

My point is what is your diversity trying to show?  Age.  No way.  Relative age, possibly, but until you understand the rules for mutations, you have to be careful.

I now better understand why Marko H. dropped off these forums.  I think I'll join him.  good luck in your "studies"!!!

I'm not referring to one loci, that would be pointless

case in point

in the Z18 project the closest person to the extended WAMH modal at 111 loci has almost half the GD from that modal as the person who's the furthest.

The closest is Z18+, Z14+, Z372- whilst the furthest is Z18+, Z14+, Z372+, L257+

Claiming one of these has a younger or older haplotye is meaningless, and in fact they are just as rare as each other. What they are however is opposite ends of a normal statistical distribution and most people are somewhere in the middle, as you should expect.

BTW the GDs were measured using the hybrid mutation model but almost all of the values were only one step from the modal value anyway.

I think if you did the math, that you would find that the distribution of mutations vs dys loci could approximate a normal distribution.  But you are not looking at that, you are looking at a set of entries who mostly are closely correlated and then, usually in most data sets, a group of outliers, with 2,3,4 or even more rare mutations.  That isn't a normal distribution in the sense I think of it. First, I have very little confidence in the mutation rate estimates of the slower dys loci.  The data sets from which counts are made are too highly correlated and there is nowhere near enough father/son data collected to generate meaningful info about slow mutators.

I think the mutational process is better described by Nicholas Taleb in his book, "The Black Swan" where a combination of rare mutations is more of a black swan than the tails of a normal distribution? (From Talebs frontispiece:   A Black Swan is a highly improbable event with three principal characteristics:  It is unpredictable, it carries a massive impact; and, after the fact, we concoct an explanation that makes it appear less random, and more predictable, than it was)

Another issue about rates of slow mutators is allele value dependence.  Take 388 and look at the I1a hg.  Its value is 14/15 and the apparent number of mutations is much higher for that Hg  than say R1b for the same number of entries.  That is one reason I think the estimates of Variance of the I Hg are too high! (Chandlers rates are dominated by R1b data).

I'm still a staunch believer in the fact that the data suggests all mutations don't have equal weight (the GD hypothesis) and that the mutational process is not described by a simple, random sequence, as obtained from flipping a coin.

re: Taleb:
Nassim Nicholas Taleb (Arabic: نسيم نيقولا نجيب طالب‎, alternatively Nessim or Nissim, born 1960) is a Lebanese American essayist whose work focuses on problems of randomness and probability.[3] His 2007 book The Black Swan was described in a review by Sunday Times as one of the twelve most influential books since World War II.[4]
 
He is a bestselling author,[5][6][7] and has been a professor at several universities, currently at Polytechnic Institute of New York University and Oxford University.[8][9] He has also been a practitioner of mathematical finance,[10]a hedge fund manager,[11][12][13] a Wall Street trader,[14][15][16] and is currently a scientific adviser at Universa Investments and the International Monetary Fund.[17][18]
 
He criticized the risk management methods used by the finance industry and warned about financial crises, subsequently making a fortune out of the late-2000s financial crisis.[19][20] He advocates what he calls a "black swan robust" society, meaning a society that can withstand difficult-to-predict events.[11] He proposes "antifragility" in systems, that is, an ability to benefit and grow from random events, errors, and volatility [21][22] as well as "stochastic tinkering" as a method of scientific discovery, by which he means experimentation and fact-collecting instead of top-down directed research.[23]
p.s. In chapter 12, the Bell Curve. Taleb describes two conditions for mild randomness in a coin flipping experiment: 1. The flips are independent.  2. No "Wild" jumps. Only single step size.  But we have multi-steps, up to five observed in one step.  This process will not produce the bell curve.  welcome to the world of mandelbrotian style scale-invariant randomness.
« Last Edit: June 03, 2012, 05:04:06 PM by ironroad41 » Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #28 on: June 05, 2012, 06:30:06 AM »

At least you no longer seem to be saying the process isn't random :)

Clearly we aren't looking at a simple system here and I'm sure it's possible to refine the modals in order to improve estimates, but I don't think it's going to make that much difference, and I definitely don't think we will get L11 back to the sort of events you frequently mention such as great floods and Doggerland.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #29 on: June 05, 2012, 07:16:59 AM »

I'm rereading Talebs book to see what else comes out of the Black Swan Theory.  I'm still exploring the concepts of probability here.  In Physics there is the problem of describing the actions of an electron.  The probability analysis splits as to whether an electron is "an indistinguishable ball" or not.  Depending on whether you pick indistinguishable  or not you get two models: Einstein-Bose statistics and Fermi-Dirac.  Two models for the same process.  I'm opining here that there is a difference between "equally likely" and inputs that are not equally likely.  It seems to me there may be room for another model also.

I also agree with Taleb that what we should be about now is collecting data not creating theories.  Experiments are being performed to collect data and that needs to be better understood.  Your argument is that variance is good enough to describe the process and that I don't agree with.  But, I am analyzing data and other models to find out which are the most important model issues.

Re: the red flag; The Great Flood and the flooding of Doggerland.  I believe the data suggests overwhelmingly that this is a real event.  Who suffered the most is a TBD, be it E, G, I, J or R?  In the final analysis its not what you or I or anyone else thinks, its what the data tells us that is important.  In this case, the scientific community doesn't have a consensus and therefore I can't see how we can have one at this point in time.  JMHO.
« Last Edit: June 05, 2012, 07:19:42 AM by ironroad41 » Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #30 on: June 05, 2012, 08:17:39 AM »

I'm rereading Talebs book to see what else comes out of the Black Swan Theory.  I'm still exploring the concepts of probability here.  In Physics there is the problem of describing the actions of an electron.  The probability analysis splits as to whether an electron is "an indistinguishable ball" or not.  Depending on whether you pick indistinguishable  or not you get two models: Einstein-Bose statistics and Fermi-Dirac.  Two models for the same process.  I'm opining here that there is a difference between "equally likely" and inputs that are not equally likely.  It seems to me there may be room for another model also.

I also agree with Taleb that what we should be about now is collecting data not creating theories.  Experiments are being performed to collect data and that needs to be better understood.  Your argument is that variance is good enough to describe the process and that I don't agree with.  But, I am analyzing data and other models to find out which are the most important model issues.

Re: the red flag; The Great Flood and the flooding of Doggerland.  I believe the data suggests overwhelmingly that this is a real event.  Who suffered the most is a TBD, be it E, G, I, J or R?  In the final analysis its not what you or I or anyone else thinks, its what the data tells us that is important.  In this case, the scientific community doesn't have a consensus and therefore I can't see how we can have one at this point in time.  JMHO.

WE have enough data to compare p312 with u106 and though the confidence intervals are going to quite high it's at least enough to give us some reasonable ideas.

hopefully as more people test the newer SNPs we will get a better idea of what happened in the downstream groups, at the moment we are probably comparing apples with oranges but the big blind spot is Eastern Europe.

I thought to put your results into Robert Brooks SNP predictor to illustrate the point about how difficult it is to distinguish between groups in L21 let alone P312 but blow me it came back with a 95% result for L226 which isn't bad really, hats off to Robert !!
« Last Edit: June 05, 2012, 09:22:22 AM by Jdean » Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #31 on: June 05, 2012, 09:46:00 AM »

As you know, I am Z253+, L226-. I guess he is in the right ballpark.   Re: P312 and U106, Mike and others have done some nice work re: relative variance.  But, since variance depends on the statistics and properties of the data set  , I'm not sure how much it is really telling us?

As long as Zhivovtovsky isn't disproved, we have no absolute scale for time based on mutations.  I am pretty sure that current approaches and mutation data can offer us good solutions to the TMRCA for the Ian Cam or Kerchners family.  But, once you start talking about numbers much greater than 2K BP, I have my doubts.  I have mentioned all my concerns, ad nauseum.

At this point I am simply arguing that we continue to collect data and worry less about the theory of the process.  Fairly everyone on this board has a pretty fair feel where others sit and the argumentation is useless at this time.  Again, JMHO.

p.s. Busby, et.al. will probably release a new paper within the next year and it'll all be back to the drawing board.
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #32 on: June 05, 2012, 10:08:31 AM »

As you know, I am Z253+, L226-. I guess he is in the right ballpark.   Re: P312 and U106, Mike and others have done some nice work re: relative variance.  But, since variance depends on the statistics and properties of the data set  , I'm not sure how much it is really telling us?

As long as Zhivovtovsky isn't disproved, we have no absolute scale for time based on mutations.  I am pretty sure that current approaches and mutation data can offer us good solutions to the TMRCA for the Ian Cam or Kerchners family.  But, once you start talking about numbers much greater than 2K BP, I have my doubts.  I have mentioned all my concerns, ad nauseum.

At this point I am simply arguing that we continue to collect data and worry less about the theory of the process.  Fairly everyone on this board has a pretty fair feel where others sit and the argumentation is useless at this time.  Again, JMHO.

p.s. Busby, et.al. will probably release a new paper within the next year and it'll all be back to the drawing board.

People in the DNA comunity who haven't chucked Zhivovtovsky into the bin are probably rarer than L743 :)
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

razyn
Old Hand
****
Offline Offline

Posts: 406


« Reply #33 on: June 05, 2012, 10:14:22 AM »

I don't have any expertise in statistics, basic or otherwise; nor am I interested in learning the field in my dotage.  But I suspect that parsimony analysis (which has been mentioned on other threads here, e.g. by Marko Heinila and Hans van Vliet) should be part of the deliberations of those of you who feel a need to deliberate this.  And in that vein, I think this professor at Memorial University of Newfoundland has given a good introduction to what that's all about:

http://www.mun.ca/biology/scarr/2900_Parsimony_Analysis.htm

The genetic question addressed by his practical example is even relevant to some of the post LGM occupants of the former Doggerland.  (Seals.)
« Last Edit: June 05, 2012, 10:20:54 AM by razyn » Logged

R1b Z196*
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #34 on: June 05, 2012, 10:17:58 AM »

I don't have any expertise in statistics, basic or otherwise; nor am I interested in learning the field in my dotage.  But I suspect that parsimony analysis (which has mentioned on other threads here, e.g. by Marko Heinila and Hans van Vliet) should be part of the deliberations of those of you who feel a need to deliberate this.  And in that vein, I think this professor at Memorial University of Newfoundland has given a good introduction to what that's all about:

http://www.mun.ca/biology/scarr/2900_Parsimony_Analysis.htm

The genetic question addressed by his practical example is even relevant to some of the post LGM occupants of the former Doggerland.  (Seals.)

Some folks, like myself, like to believe that learning is a lifelong process?  I'm well into my dotage also. re: Zhiv.  show me the proof?
Logged
Jdean
Old Hand
****
Offline Offline

Posts: 678


« Reply #35 on: June 05, 2012, 10:24:03 AM »

I don't have any expertise in statistics, basic or otherwise; nor am I interested in learning the field in my dotage.  But I suspect that parsimony analysis (which has mentioned on other threads here, e.g. by Marko Heinila and Hans van Vliet) should be part of the deliberations of those of you who feel a need to deliberate this.  And in that vein, I think this professor at Memorial University of Newfoundland has given a good introduction to what that's all about:

http://www.mun.ca/biology/scarr/2900_Parsimony_Analysis.htm

The genetic question addressed by his practical example is even relevant to some of the post LGM occupants of the former Doggerland.  (Seals.)

Some folks, like myself, like to believe that learning is a lifelong process?  I'm well into my dotage also. re: Zhiv.  show me the proof?

As you said before this could go on forever.

You seem to like asking for proofs, try applying them to I2 or for that matter A0.
Logged

Y-DNA R-DF49*
MtDNA J1c2e
Kit No. 117897
Ysearch 3BMC9

razyn
Old Hand
****
Offline Offline

Posts: 406


« Reply #36 on: June 05, 2012, 10:32:00 AM »

Some folks, like myself, like to believe that learning is a lifelong process?  I'm well into my dotage also. re: Zhiv.  show me the proof?

I agree about the lifelong process thing, but it's one of the few things about which I agree with you.  When one has a limited number of days left on earth, some forms of learning are a great waste of those days.  I also have no urge to advance my limited knowledge of household plumbing, dentistry and several other fields in which it's less painful to hire an expert.

With regard to Zhivotovsky -- or more specifically his fudge factor, that makes his dates for certain key mutations about 2-3 times older than the current consensus among those who did not study at his feet -- not being a statistician, I could not possibly prove his error (if there be one) to someone whose mind is already made up to prefer the old dates to the said consensus.

What's left of my mind prefers the consensus.  Carry on, I'll just watch.
Logged

R1b Z196*
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #37 on: June 10, 2012, 11:07:02 AM »

According to the pundits the probability of a mutation at meiosis is .002.  That is beyond the 3 sigma point of the Bell Curve which is .0024.  In Talebs language a mutation itself is a black swan.  A fairly rare event.  Given a mutation has occurred, then the Probability of one of the slower mutators is about .01 of that.  Everything in the mutational process is outside the 3sigma point of the Gaussian.  What goes on out there is what we're trying to describe.

It just seems odd to me that a random process has a 99.8% chance of no mutation?  I'll bet on that!
« Last Edit: June 10, 2012, 11:15:28 AM by ironroad41 » Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #38 on: June 10, 2012, 11:46:26 AM »

According to the pundits the probability of a mutation at meiosis is .002.  That is beyond the 3 sigma point of the Bell Curve which is .0024.  ...

It just seems odd to me that a random process has a 99.8% chance of no mutation?  I'll bet on that!

I posted this yesterday on the Yahoo L21 group showing Marko's mutatation summary table if your interested.

The second and third column from the left is the mutation rate as a percentage and the other column is the number of transmissions that happen before a mutation occurs. I just calculated what that mutation rate would be as the mutation rates accumulated in FtDNA order. Marko H provided me these summary of his mutation rates total to confirm mine. Column 5 and 6 are the number of years between mutations at 25 and 30 year generations. The most right two set of numbers show the cummlative rates for each of the panels and the number of transmissions that would occur between each mutation. (my interest is using specific panel results in a TMRCA calc instead of an overall rate at a specific number of STRs).

                     
MH's Sum of Mutation Rates for 111 markers: 0.290653%                       
Quote
   
Markers   CumRates%   # of Transmissions per mutation at Mutation Rate   Per STR Mutation (BE)   Per mutatation at 25yrGen   Per mutatation at 30yrGen   PanMutRate    By Panel: # of Transmissions per mutation at Mutation Rate
12   0.024239   4,126   343.8   8,596   10,315   0.024239   4,126
25   0.060527   1,653   66.1   1,653   1,984   0.036288   2,756
37   0.132304   756   20.4   511   613   0.071777   1,394
67   0.172844   579   8.6   216   259   0.040540   2,467
111   0.290653   345   3.1   78   93   0.117809   849
                     
Default   0.2   500               
                     
Marko Heinla's Rates                     

               
                     
Marko Heinla's Rates                     


MJost
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
ironroad41
Old Hand
****
Offline Offline

Posts: 219


« Reply #39 on: June 10, 2012, 12:09:04 PM »

If I read your 111 data correctly, you are arguing the rate is .3% not .2%?  It would be about .24% at 25 years per gen.

My point is, and I know it is well understood, is that this mutational process is operating at the tail of the Gaussian distribution.
Logged
Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #40 on: June 10, 2012, 01:43:30 PM »

If I read your 111 data correctly, you are arguing the rate is .3% not .2%?  It would be about .24% at 25 years per gen.

My point is, and I know it is well understood, is that this mutational process is operating at the tail of the Gaussian distribution.
Marko H calculated the .3% rate at 111 makrers.

He explained to me "The idea in the estimation is that each haplotype pair is considered an independent random draw from a model distribution. Model distribution suggests what is the ratio of mismatches and matches in a given marker if pairs with a given number of matching markers in general are considered. The pair data is then used to solve the mutation rates. This is the same idea as in Chandler's paper on mutation rate estimation."

Your normal distribution sigma bell curve question is simular to what I mentioned to Marko that I didnt quite understand which was:  "Chandler's expressions for the “mutation model curve” (MMC) of Hutchison et al. (2004) and outline a procedure for using the high-match end of the MMC for extracting mutation rates..." and Marko replied:

He said, "In the case of haplotype pairs with one mismatch, the relative frequencies of pairs with various mismatching loci are proprotional to the relative mutation rates. This can be generalized to larger number of mismatching loci.

For a given time distance characteristic to a given number of mismatches, I use expression for probability of mismatch in a given locus derived from a symmetric up/down model. (It is actually still the same result as long as up/down ratio and up+down sum stay constants.) From this one can compute the probability of mismatch in locus i if n loci do not match in total. Mutation rates are solved by fitting these quantities against observations. "

MJost
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Pages: 1 [2] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.129 seconds with 18 queries.