World Families Forums - R-P312+ et al tracking

Welcome, Guest. Please login or register.
July 12, 2014, 12:11:35 AM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  R-P312+ et al tracking
« previous next »
Pages: [1] 2 3 Go Down Print
Author Topic: R-P312+ et al tracking  (Read 4637 times)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« on: June 29, 2010, 12:32:49 PM »

I've finally figured out how to calculate gene diversity in a spreadsheet so I can now easily calculate diversity as well as variance by geography, etc.

I've taken all of the R-L21* confirmed haplotypes I could find as well as those from the R-L21 Plus project to create the spreadsheet I've got out on Yahoo Groups R-L21 Plus.  I do periodically "scrape" R-L21* off of surname project y-results screens.  I wish everyone (project admins) would actually display the MDKA data and not just the kit sponsor's name.  The most tedious part of what I do is to classify the MDKA geographical info into consistent (sort-able/select-able) format for countries and regions.  I've come to know Europe geographically much better so I guess that is a side benefit.

I've decided to download and reformat all of the P312 subclade haplotypes,  including P312* itself.  That way we can do diversity, etc. by region.  Vince V gets data like that produced for all of R1b1b2 but I'd like to see it at the P312 level as well as a little more granularly.   Maybe that will be helpful.

For P312* I'll only use that project and the correct cateogories since I want to make sure just to get only downstream negative people in that category.

Below are some of the screens I'll be going to.  One thing I haven't found is an M37 project.  Is there such a thing?

http://www.isogg.org/tree/ISOGG_HapgrpR.html

http://www.familytreedna.com/public/atlantic-r1b1c/default.aspx?section=yresults
http://www.familytreedna.com/public/R-M153_The_Basque_Marker/default.aspx?section=yresults
http://www.familytreedna.com/public/R1b-U152/default.aspx?section=yresults
http://www.familytreedna.com/public/R1b1c6/default.aspx?section=yresults
http://www.familytreedna.com/public/R1b1c7/default.aspx?section=yresults
http://www.familytreedna.com/public/R-L226_Project/default.aspx?section=yresults
http://www.familytreedna.com/public/R-L21/default.aspx?section=yresults
http://www.familytreedna.com/public/R1b-L159.2/default.aspx?section=yresults
Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #1 on: June 29, 2010, 07:04:31 PM »

I've finally figured out how to calculate gene diversity in a spreadsheet so I can now easily calculate diversity as well as variance by geography, etc.
.... I've decided to download and reformat all of the P312 subclade haplotypes,  including P312* itself.  That way we can do diversity, etc. by region.  ....
Any recommendation on how to tally a "total" diversity index for all 67 markers?  Can you just average them together?  Should you exclude multi-copy or infinite allele type STR's?
Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #2 on: June 29, 2010, 08:16:30 PM »


Any recommendation on how to tally a "total" diversity index for all 67 markers?  Can you just average them together?  Should you exclude multi-copy or infinite allele type STR's?
[/quote]

Mike,

I've used Simpson's index of diversity across x number of markers for each population.  D=sum of (n/N), square , then 1-D.  Higher number between 0 and 1=more diversity in population on given marker.

For example, 8 haplotypes with 391=10 out of 25 total haplotypes with 391.
8/25=.32x.32=.1024, then 1-.1024=.8976.  Use the modal allele value in the sample for n to get the best representation of diversity.  In this example 391=10.  Apply this to all markers, then average.

I would think that doing this for very fast markers, just like variance and TMRCA calculations, would skew what you're trying to observe.
« Last Edit: June 29, 2010, 08:31:53 PM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #3 on: June 30, 2010, 12:55:51 AM »

Quote from: Mikewww
Any recommendation on how to tally a "total" diversity index for all 67 markers?  Can you just average them together?  Should you exclude multi-copy or infinite allele type STR's?

....
I would think that doing this for very fast markers, just like variance and TMRCA calculations, would skew what you're trying to observe.
What markers would you remove out of FTDNA's first 67?

The ones Janzen removes when he does his 67 marker TMRCA version of Nordtvedt's method are the multi-copy ones... DYS385, DYS395, DYS413, DYS425, DYS459, DYS464, YCAII  and CDY:
Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #4 on: June 30, 2010, 02:32:32 AM »

Those are probably good options, definitely CDY,464, RecLoh markers.  I don't think its quite as important as it is for an age estimate.  Still I would keep it simple.    

I would identify which markers to observe and find a modal.  Let's say you have 100 haplotypes in your sample population.  Generally speaking, it would be best to find the most modal haplotypes with the most markers from that sample in order to get a better representation of true diversity.  Slower, would generally be more informative than faster and possible parallel mutations.   I think the population's haplotype diversity is going to loosely correlate, though not necessarily, to its variance.  For example, a population dominated by AMH's (let's say, over 75%) will have low variance and also a low diversity among the haplotypes.  However, this might be a slightly better way for looking at gene flow, particularly for cluster origins based on off-modal values.  

It might be easier to look at individual markers first by mutation rate group (very slow, slow, etc.) and then break it down by region.  For example, if Scandinavia is modal of 14 at 393(off-modal from AMH on a slow marker), yet the highest diversity for 393 is in, let's say France, perhaps this gives an indication of where these off-modals might be coming from, especially as a cluster and also with France's higher overall variance.  

Something else to think about, one of the clusters (A-Sc? I think) has 531=12 for a couple of Scandinavians and many Scots, but where is 531 and the rest of the markers that make up that cluster most diverse?  We already know the sum of variance.  I haven't looked at the diversity myself.  It might be Scotland or other Isles, but those key markers could also show the most diversity among continentals without the continental members actually being part of that cluster.  If they don't then the origin is more than likely in the Isles.  531 is slow so high diversity is going to carry more weight than something faster in the same location.  Still, it's something to compare for a possible migration trail.
« Last Edit: June 30, 2010, 02:40:24 AM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #5 on: June 30, 2010, 09:11:56 AM »

....  We already know the sum of variance.  I haven't looked at the diversity myself.  It might be Scotland or other Isles, but those key markers could also show the most diversity among continentals without the continental members actually being part of that cluster.  ....
Um... never thought of looking at specific markers in that way.

Thanks for your input.

I'm a little bit confused about the usage of "variance" and "diversity".

"Genetic diversity" refers to the total number of genetic characteristics in the genetic makeup of a species.

"Genetic variability" describes the tendency of genetic characteristics to vary.

From a standpoint of age in a location, which is important and why?
Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #6 on: June 30, 2010, 11:23:00 AM »

Variance is basically the movement or mutations from the modal on a marker and is useful for determining age.  Generally, higher variance indicates older age.  Someone more informed about statistics and mutations over time could explain this better.  There is a degree of randomness to consider.

Diversity is showing the species proportion, in this case the haplotype proportion, in a given population.  Basically, how many distinct haplotypes and what fraction of the population they have.  A sample size of 100 with 10 distinct would probably be more diverse, than 100 with only 3 distinct haplotypes, depending on the frequency of each.  Finding a modal or most common helps determine that because it is the predominant haplotype and sets up the index of diversity equation.  Diversity doesn't really tell anything about age, but together with variance may provide more granularity for observing a population and considering its history.  It's about the evenness of the distribution of the haplotypes.  Take 2 populations of 10 different haplotypes each, one being the modal.  Population 1 has frequencies of 10% for each ht.  Population 2 has percentages of 50(the modal), 20, 10, 10, 5, 3, and 2.  Population 1 is clearly more even, therefore more diverse.
« Last Edit: June 30, 2010, 11:33:07 AM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #7 on: July 07, 2010, 08:42:27 AM »

Quote from: Mikewww
Any recommendation on how to tally a "total" diversity index for all 67 markers?  Can you just average them together?  Should you exclude multi-copy or infinite allele type STR's?

Mike,

I've used Simpson's index of diversity across x number of markers for each population.  D=sum of (n/N), square , then 1-D.  Higher number between 0 and 1=more diversity in population on given marker.

For example, 8 haplotypes with 391=10 out of 25 total haplotypes with 391.
8/25=.32x.32=.1024, then 1-.1024=.8976.  Use the modal allele value in the sample for n to get the best representation of diversity.  In this example 391=10.  Apply this to all markers, then average.

I would think that doing this for very fast markers, just like variance and TMRCA calculations, would skew what you're trying to observe.
George Chandler posted this on Rootsweb on July 3rd:
Quote from: Chandler
... The reason one might wish to exclude multi-copy markers is that, except in close relationships, they may add more uncertainty than they take away, since the various copies are not individually identified by the testing. That doesn't mean they bias the estimates if they are included.
At 67 markers, if take out all of the multi-copy ones I end up with 50. I am leaving in 389ii-i because it can be considered on its own.

The next question is it more valuable to have more markers measured in a population study or more people?  I've notice that Anatole uses 25 in some of his work.   If I only use the first 25 and eliminate multi-copy markers (385,459,464) I end up with 17.

My count of P312* now has 235 with 67 markers, but I have 341 with at least 25 markers tested.  So which is better for variance and diversity -  341 people with 17 STR's counted or 235 with 50 STR's counted?   Is 17 sufficient?
« Last Edit: July 07, 2010, 08:50:25 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #8 on: July 07, 2010, 03:39:06 PM »

Mike,

In general, more markers observed would be better for variance, therefore age.  However, you also have to observe where the variance is being generated.  If it's from mainly fast mutators then you might reconsider your marker selection.  Why not use both methods on both samples?  This way you can see what correlates and identify any false positives.  For divesity, I would try to at least get several ht's with as many markers as possible, thus avoiding the "bikini" haplotypes of various studies.  For variance, you won't be able to use just slow markers because you're less likely to see any seperation, so it's best to find a balance while trying to avoid overloading with the faster ones.

Here's an example of a false positive, let's say Ireland P312*  shows a higher variance (because of a handful of medium or fast mutators) at 50 markers than France, but when you measure them at 17 slow markers (which are more informative) it is much lower.  Couple that, with a higher diversity of ht's in France and it looks more likely that P312 is older in France.  Of course, I don't think we can say anything absolute when dealing with an unknown amount of random events, we can only look for support for our interpretations.



 
« Last Edit: July 07, 2010, 03:50:22 PM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #9 on: July 08, 2010, 09:13:29 PM »

.... 
I've decided to download and reformat all of the P312 subclade haplotypes,  including P312* itself.  That way we can do diversity, etc. by region.  Vince V gets data like that produced for all of R1b1b2 but I'd like to see it at the P312 level as well as a little more granularly.   Maybe that will be helpful.
...
I'm trudging through this now.  This just re-iterates to me how much RMS2 has been doing to geographically categorize and map the L21* and P312+ project people.
 
I did this for R-M222 but, no offense, found it boring.  Their haplotypes are close together so it is hard to break into further clusters.  Also their geographies are pretty much exactly what they should be in the Isles.

However, P312* is a lot more fun.  It's fun looking up where locations along the Mediterrean, "Prussia", etc., etc.   I'm learning some more about Europe.  Much more variety, including more folks in east England than I'm used to dealing with as far as L21+ types.
Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #10 on: August 05, 2010, 04:32:47 PM »

I've just about got the spreadsheets and geographic classifications worked out. I am amazed at the number of U152 people in North Italy.  L21 is definitely the "big" branch of P312 and it very northwestern heavy.

.... I've taken all of the R-L21* confirmed haplotypes I could find as well as those from the R-L21 Plus project to create the spreadsheet I've got out on Yahoo Groups R-L21 Plus.  I do periodically "scrape" R-L21* off of surname project y-results screens.  I wish everyone (project admins) would actually display the MDKA data and not just the kit sponsor's name.  The most tedious part of what I do is to classify the MDKA geographical info into consistent (sort-able/select-able) format for countries and regions.  I've come to know Europe geographically much better so I guess that is a side benefit.

I've decided to download and reformat all of the P312 subclade haplotypes,  including P312* itself.  That way we can do diversity, etc. by region.  Vince V gets data like that produced for all of R1b1b2 but I'd like to see it at the P312 level as well as a little more granularly.   Maybe that will be helpful.

For P312* I'll only use that project and the correct cateogories since I want to make sure just to get only downstream negative people in that category.
....
Logged

R1b-L21>L513(DF1)>L705.2
alan trowel hands.
Guru
*****
Offline Offline

Posts: 2012


« Reply #11 on: August 07, 2010, 06:07:34 AM »

I've just about got the spreadsheets and geographic classifications worked out. I am amazed at the number of U152 people in North Italy.  L21 is definitely the "big" branch of P312 and it very northwestern heavy.

.... I've taken all of the R-L21* confirmed haplotypes I could find as well as those from the R-L21 Plus project to create the spreadsheet I've got out on Yahoo Groups R-L21 Plus.  I do periodically "scrape" R-L21* off of surname project y-results screens.  I wish everyone (project admins) would actually display the MDKA data and not just the kit sponsor's name.  The most tedious part of what I do is to classify the MDKA geographical info into consistent (sort-able/select-able) format for countries and regions.  I've come to know Europe geographically much better so I guess that is a side benefit.

I've decided to download and reformat all of the P312 subclade haplotypes,  including P312* itself.  That way we can do diversity, etc. by region.  Vince V gets data like that produced for all of R1b1b2 but I'd like to see it at the P312 level as well as a little more granularly.   Maybe that will be helpful.

For P312* I'll only use that project and the correct cateogories since I want to make sure just to get only downstream negative people in that category.
....

Well as we all know U152 is a Suevic Germanic marker and the Longobardi (Lomdards) are said to be a branch of the Suevi in historic texts :0) I am only half joking.  If U152 is only found in decent numbers in the eastern edges of the main Celtic block where Germanic tribes overlaid then there has to remain the suspicion that U152 could be Germanic.  Its certainly not an open-shut case that it is Gaulish Celtic in my opinion.  If we had a better idea of its strength in France and it was fairly well represented there then maybe the Gaulish theory would be substantiated. Until that happens I think the case remains open.  There seems to be lack of it in many parts of the Celtic world-the isles, Iberia, NW France etc. L21 in contrast has turned up throughout the Celtic world to some degree and is impossible to explain using a Germanic origin.  Personally I am surprised that the U152=Gauls theory has not at least been questioned more. 
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #12 on: August 22, 2010, 03:55:22 PM »

I have the initial (draft) file ready of all deep clade tested R-P312 people that I can find.  Most are from the large haplogroup projects, like R1b, R-P312, R-L21Plus, R-M222, R-SRY2627 & L176, L226 and R-M153.  However, I also scraped from the Ireland, Scottish, British Isles, Scandinavian, Norwegian, Iberian, Germany and French Heritage projects along with a few others. The P312* list is probably short-changed. For P312* I just included people that are clearly marked R-P312*, like they are in the P312 project.  There are a lot of people out there that are R1b1b2a1b that may or may not have been tested for L21.  I left all of those people out.  They need to join the R-P312 project.

Here is a count by subclade.  I tried to follow Krahn's draft tree, but I did go out on the limb and called L159 with 2c2g as subclade.  The spreadsheet is set up that I can easily include them with the R-L21* guys so it is just there as an option.
http://ytree.ftdna.com/index.php?name=Draft&parent=root

P312_______2929
 L21_______1820
  L144___5
  L159____47 (**)
  L193____23
  L226____61
  M222____394
  P314____7
  P66___1
  L21*____859
  L21**___96 (****)
  L21?____327
 L238_____2
 L176_____202 (*)
  L165____9
  SRY2627_190
  L176*___3
 M153_____11
 U152_____
  L2______218
   L20____45
   L2*____173
  L4______5
  U152*__94
  U152**_17
  U152?__159
 P312*___399 (***)
 P312**__2

If it'll upload, I'll put this out under the FILES section of the P312 Yahoo Group.

(*) - assumes all SRY2627 is all L176 but this is not proven.. just about 10 for 10 so far
(**) - assumes L159+ is actually a subclade.  Much of it clearly is in tandem 464x=2c2g.
(***) - known to be P312+ L21- SRY2627- M153- but many are untested for L176, L238, etc.
(****) - have tested for the L21 downstream package but not necessarily P314
« Last Edit: August 24, 2010, 12:34:10 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #13 on: August 22, 2010, 04:06:58 PM »

The R-P312+ modal haplotype for all 67 marker is the Western Atlantic modal.  R-U106's modal is slightly off WAMH. The P312 average haplotype is slightly off WAMH at 449=30, 464c=16, CDYa=37, 413a=22, 481=23.
Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #14 on: August 22, 2010, 05:24:02 PM »

I only did these diversity calculations for sub-clades of P312 with more than just a handful of people.  Below are the % differences diversity-wise from R-P312 overall (all 1929 folks.)  A smaller number means less diversity.  A larger number means greater diversity.

We need statistically minded people to comment.  I believe that this is showing that U152+ is as old as P312 itself.  L21 is slightly younger than U152.  L176 and SRY2627 are a little bit younger.  

The youngest subclades are L21's (L193, M222, etc.) along with U152's L4 and R-M153 (the so-called Basque marker.) U152's L2 is almost as old as u152.

R-P312(all)____+0.0%

R-L21_________-1.9%
  R-L193______-64.2%
  R-L226______-56.7%
  R-M222______-43.7%
  R-L159______-50.5% (*)

R-L176________-10.6%
  R-SRY2627___-11.2%

R-M153________-43.0%

R-U152________+1.6% (***)
  R-L2________+3.2%
    R-L20_____+0.1%
  R-L4________-35.5%

(*) L159 may not be a subclade
(***) It may seem a little strange but the actual calculation would imply L2 is older than U152 and U152 is slightly older than P312.  That can't be true and I'm not sure why this shows this way other than statistical error.

EDIT: See reply #16. I don't have a solid methodology for applying these calculations so consider these as "draft" numbers right now.  I don't the relative nature of the subclades will change other than slightly anyway, but I just want to throw this caveat in.
« Last Edit: August 23, 2010, 10:13:26 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #15 on: August 23, 2010, 01:31:45 AM »

Good work Mike!

The L2 is interesting. It's obviously younger than U152, so there are probably certain markers that are causing the diversity to spike.  Without knowing your methodology, I'm guessing fast mutators.  Does a simple variance calculation of that sample show something close to U152?  Maybe it's just a random behavior having to do with the SNP and some unknown relationship to the STR's in U152.

The gap between L21 and its downstream subclades makes me think there are many more snp's and not just private ones still undiscovered.  Perhaps some will turn up in some of the clusters you've noted.
« Last Edit: August 23, 2010, 01:33:09 AM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #16 on: August 23, 2010, 09:25:41 AM »

Good work Mike!

The L2 is interesting. It's obviously younger than U152, so there are probably certain markers that are causing the diversity to spike.  Without knowing your methodology, I'm guessing fast mutators.  Does a simple variance calculation of that sample show something close to U152?  Maybe it's just a random behavior having to do with the SNP and some unknown relationship to the STR's in U152.

The gap between L21 and its downstream subclades makes me think there are many more snp's and not just private ones still undiscovered.  Perhaps some will turn up in some of the clusters you've noted.
I am setup to do both variance and diversity calculations, but I don't have a rationale on when to use which and how many markers to compare and which to throw out.
Help!

If I do the 67 marker comparison, I've been throwing out all of the multi-copy STRs so I really am only doing 50 markers.   Unfortunately it is typical that the shorter haplotypes are the ones in the more exotic places of central, eastern or southern Europe so to cover the geographies better it might be smarter to use only the first 25 markers and have more data.  Recommendations?

I noticed that Anatole will do some work with 67 markers to evaluate the validity of the tree structure and then do some of the TMRCA calculations on 25 markers. (I'm not sure... that's just my interpretation.)

At a basic level, how do we interpret diversity and variance? From what I've read, it sounds like diversity may be an indicator of age.  I'm not sure what variance is telling us.
http://en.wikipedia.org/wiki/Genetic_diversity
http://en.wikipedia.org/wiki/Genetic_variability

Understanding how best to apply the statistics is important as we can do a lot of analysis from here. I have all of the subclades classified to the country level, and further.  For example, Ireland, Scotland and England are broken into province/regions.   France is broken into five regions, Germany three.  I made Aquitane and the Pyrenees a region of its own... etc.

I also wonder about broad diversity (or variance) calculations.  For example, I'm not sure that the diversity across all subclades of P312 in a country (say England) is valid to look at for age.  The diversity may be high because most of the subclades may be present, but on the other hand each subclade's diversity, on its own, is not as high as in other countries.  I think we have to look at each sub-clade and then piece them together.

Also, related to the validity of a statistic...  I'm not sure that a diversity # for R-P312* means much since it is not really a subclade but just a collection of unknown subclades.  The same thing would apply to R-L21*.  It may only be relevant to look at R-L21 (and include M222).




« Last Edit: August 23, 2010, 10:17:08 AM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #17 on: August 23, 2010, 12:25:32 PM »

Variance can be a good indicator of age, though with R1b it is like untangling a big knot sometimes when trying to make sense of it.  I think variance is better because you are able to include more markers and get a more precise estimate.  Looking at the haplotype diversity, is better to see things like proportion, haplotype "hotspots", maybe founder effects, etc.
 
Basically variance is for age, though haplotype diversity can be used along with it to see if there is any correlation.  If L2 is showing a higher variance than P312 and U152, then it seems it is just a statistical anomaly.  It could be that L2's, and r1b in general, will require more markers past 67 to see any real seperation into different lines.

« Last Edit: August 23, 2010, 12:27:51 PM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #18 on: August 23, 2010, 02:33:39 PM »

Variance can be a good indicator of age, though with R1b it is like untangling a big knot sometimes when trying to make sense of it.  I think variance is better because you are able to include more markers and get a more precise estimate.  Looking at the haplotype diversity, is better to see things like proportion, haplotype "hotspots", maybe founder effects, etc.
 
Basically variance is for age, though haplotype diversity can be used along with it to see if there is any correlation.  If L2 is showing a higher variance than P312 and U152, then it seems it is just a statistical anomaly.  It could be that L2's, and r1b in general, will require more markers past 67 to see any real seperation into different lines.
I'm going to present everything as a percentage of R-P312's total measurement. Hopefully, it'll be less confusing than showing negative numbers.

Here are the relative Sum of the Variances for each of the subclades.  I used all 67 length haplotypes (only) but threw out the multi-copy markers and 425.

R-U152/L2 still looks quite old.  R-L21/P314 is fairly old, but there is only a few people with 67 on that one so I showed the variance compared to P312 at 25 markers for it as well.

R-P312(all)___100.0%

R-L21_________97.1%
  R-L193______25.8%
  R-L226______30.2%
  R-M222______45.3%
  R-P314______82.5% (50.2% if using 1st 25 STR's)
  R-L159______37.1%

R-L176________85.6%
  R-SRY2627___83.1%
  R-L165______46.8%

R-M153________19.6%

R-U152________102.3%
  R-L2________99.8%
    R-L20______94.6%
  R-L4________26.1%

Are you sure sum of the variance is a better indication of age?  When I compare R-L21 for England and France, France has about 10% more diversity but when it comes to Variance they are about the same.  I have a feeling that M222 impacts England's L21 variance but not so much the diversity.

Quote
Genetic diversity, the level of biodiversity, refers to the total number of genetic characteristics in the genetic makeup of a species. It is distinguished from genetic variability, which describes the tendency of genetic characteristics to vary.

To me it seems like it would take time for a maximum number of unique genetic characteristics (alleles) to occur (which is what diversity is) so that sounds like an age indicator.

On the other hand, variance seeks to measure the tendency to vary so that might be more impacted by the "weight" of the size of the sample size, or at least an "underweighting" of too small a sample size.

There are a lot more English samples, than French, of course.
« Last Edit: August 23, 2010, 02:49:40 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #19 on: August 23, 2010, 03:00:47 PM »

I think variance is usually a better indicator, but as you mentioned France has more diversity while the variance stays about the same.  I admit it is not certain, but it's all we have to work with outside of aDNA.

If we start with L21 and most other upstream Isles R1b (P312,U106, U152,etc.)originating on the continent, then I think the reason for similiar variance in England could be due to multiple migrations carrying the same SNP's (Beakers, Celts, Belgae, Saxons, Vikings), yet to be discovered downstream SNPs in the sample (which would be excluded if they were known), and maybe the overall sample size.  England seems to be an R1b "melting pot".  All of the R1b in those cultures ultimately comes from the same source areas when you go back in time.  I think that may be part of the reason for a variance close to France.  England has likely experienced migrations of R1b since the Bronze Age all the way until the Normans to varying degrees which gives a misleading higher variance.    With France being 10% more diverse in haplotypes with a much smaller sample strengthens the argument for a continental source.

As for M222, I wouldn't include it for either diversity or variance with other L21*.  This will definitely raise the variance especially when France and the continent have so little M222.  It's better to compare L21* France vs. L21* England and so on, then see where the subclades fit in by themselves underneath that to get an idea of population movement.
« Last Edit: August 23, 2010, 03:49:43 PM by MHammers » Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #20 on: August 23, 2010, 03:42:30 PM »

Here is diversity by geography across the first 25 markers for all P312 subclades.

Germany has more diversity than I thought, or than I'm using to looking at for just R-L21.  I think that is the impact of R-U152.

All R-P312 __________ 100.0%

France _____________ 107.6%
East (of Germany) Europe & East Mediterranean _ 105.0%
Alpine & Cisalpine _ 103.2%
England ____________ 99.2%
Ireland ____________ 98.3%
Germany ____________ 97.2%
Iberia _____________ 96.0%
Scotland ___________ 93.7%
Scandinavia ________ 92.6%
Low Countries ______ 88.0%
Aquitaine & Pyrenees _ 86.6%
Wales ______________ 82.4%

I included Central and Southern Italy in "East Mediterranean".  Aquitaine & Pyrenees is literally just the Pyrenees (including Basque Country) on both sides of the border along with essentially Southwestern France, but not the Central Massif or Alpine France.  Alpine France was included with France and was not included in the Alpine countries.

I thought of something that may be part of the reason some of these numbers are higher than total P312.  Total P312 includes all of the New World haplotypes.
Logged

R1b-L21>L513(DF1)>L705.2
MHammers
Old Hand
****
Offline Offline

Posts: 347


« Reply #21 on: August 23, 2010, 03:55:06 PM »

Here is diversity by geography across the first 25 markers for all P312 subclades.

Germany has more diversity than I thought, or than I'm using to looking at for just R-L21.  I think that is the impact of R-U152.

All R-P312 __________ 100.0%

France _____________ 107.6%
East (of Germany) Europe & East Mediterranean _ 105.0%
Alpine & Cisalpine _ 103.2%
England ____________ 99.2%
Ireland ____________ 98.3%
Germany ____________ 97.2%
Iberia _____________ 96.0%
Scotland ___________ 93.7%
Scandinavia ________ 92.6%
Low Countries ______ 88.0%
Aquitaine & Pyrenees _ 86.6%
Wales ______________ 82.4%


I think this validates R1b coming from the east with an early expansion to the Isles maybe out of the Rhine delta.  Then it spreads to areas on the maritime fringe.
Logged

Ydna: R1b-Z253**


Mtdna: T

Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #22 on: August 23, 2010, 04:01:57 PM »

Here is diversity by geography across the first 25 markers for all P312 subclades.

Germany has more diversity than I thought, or than I'm using to looking at for just R-L21.  I think that is the impact of R-U152.

All R-P312 __________ 100.0%

France _____________ 107.6%
East (of Germany) Europe & East Mediterranean _ 105.0%
Alpine & Cisalpine _ 103.2%
England ____________ 99.2%
Ireland ____________ 98.3%
Germany ____________ 97.2%
Iberia _____________ 96.0%
Scotland ___________ 93.7%
Scandinavia ________ 92.6%
Low Countries ______ 88.0%
Aquitaine & Pyrenees _ 86.6%
Wales ______________ 82.4%


I think this validates R1b coming from the east with an early expansion to the Isles maybe out of the Rhine delta.  Then it spreads to areas on the maritime fringe.
Remember R-U106 is another major clade of R-M269 not included above. Their project admins seem to indicate U106 has higher diversity as you go east across the Northern European plain... like into Poland.
Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #23 on: August 23, 2010, 04:19:51 PM »

I added the relative diversity % for all of Italy so you see it as a country as well as split into two regions.  I also did a calculation for East Europe & Mediterranean but with pulled Italy out.

All R-P312 _________ 100.0%

East (of Germany) Europe & East Mediterranean (but no Italy) _ 107.4%
East (of Germany) Europe & East Mediterranean (incl S. Italy) _ 105.0%
Alpine & Cisalpine _ 103.2%
France _____________ 101.4% (EDIT)
Italy ______________ 100.9%
England ____________ 99.2%
Ireland ____________ 98.3%
Germany ____________ 97.2%
Iberia _____________ 96.0%
Scotland ___________ 93.7%
Scandinavia ________ 92.6%
Low Countries ______ 88.0%
Aquitaine & Pyrenees 86.6%
Wales ______________ 82.4%

I included Central and Southern Italy in "East Mediterranean".  Aquitaine & Pyrenees is literally just the Pyrenees (including Basque Country) on both sides of the border along with essentially Southwestern France, but not the Central Massif or Alpine France.  Alpine France was included with France and was not included in the Alpine countries.

EDIT: I had France in error. Used the wrong #, but it should be just ahead of Italy but less diverse than the Alpine countries.
« Last Edit: August 24, 2010, 10:37:56 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Mike Walsh
Guru
*****
Offline Offline

Posts: 2963


WWW
« Reply #24 on: August 23, 2010, 04:23:05 PM »

I think this validates R1b coming from the east with an early expansion to the Isles maybe out of the Rhine delta.  Then it spreads to areas on the maritime fringe.
Here are the locations of the people I put in the region "East (of Germany) Europe & East Mediterranean":

Algeria ______ 1
Belarus ______ 2
Bulgaria _____ 1
Croatia ______ 1
Czech Rep. ___ 10
Estonia ______ 1
Greece _______ 4
Hungary ______ 10
Italy (Southern) 27
Kazahstan ____ 1
Latvia _______ 2
Lithuania ____ 7
Malta ________ 1
Poland _______ 17
Romania ______ 5
Russia _______ 4
Slovakia _____ 1
Turkey _______ 1
Ukraine ______ 10
« Last Edit: August 24, 2010, 10:35:42 PM by Mikewww » Logged

R1b-L21>L513(DF1)>L705.2
Pages: [1] 2 3 Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.164 seconds with 18 queries.