World Families Forums - 1000 Genome Project: Y Chromosome SNPs

Welcome, Guest. Please login or register.
September 23, 2014, 03:18:56 AM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  R1b General (Moderator: rms2)
| | |-+  1000 Genome Project: Y Chromosome SNPs
« previous next »
Pages: [1] Go Down Print
Author Topic: 1000 Genome Project: Y Chromosome SNPs  (Read 1107 times)
Maliclavelli
Guru
*****
Offline Offline

Posts: 2151


« on: September 13, 2010, 04:12:33 AM »

Vincent Vizachero crows over a victory:  that Pyrrhic one.
“New insight into recent human evolution can also gained from branch lengths; for example, the short internal branch lengths within the haplogroup R1b relative to the other haplogroups suggest a recent expansion of this European haplogroup” (from the paper of Jostins et al. http://bit.ly/djsOaP and quoted by Vizachero on Rootsweb).

But…if we compare R1 with I1, which is for Ken Nordtvedt about 6000 year old (and he is looking for the previous haplotypes that lack for the 15000 years it separated from I2), R1b is at least more ancient for the double, and we arrive to the Younger Dryas and for me it is enough.

But…whereas the other haplogroups are monophyletic, how is it possible that R1b1b2g and R1b1b2h mix themselves in the tree?
« Last Edit: September 13, 2010, 06:21:15 AM by Maliclavelli » Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Maliclavelli
Guru
*****
Offline Offline

Posts: 2151


« Reply #1 on: September 14, 2010, 12:41:31 AM »

Vincent Vizachero writes on Rootweb:
“I was able to reconstruct their tree, using the 75 men and 2788 SNPs
in the file that Vince pointed me to.
http://tinyurl.com/ychrgenotypes
The more data I see, the less optimistic I am about full Y sequencing.
Of the 2788 Y-SNPs they found in this group, 848 (or 30%) mutate twice
or more in the tree. 5% mutate 5 or more times. That means one (or
both) of two things.
1. Y-SNPs are very much more mutative than many people may still think.
2. This sequencing project, even after all the "filters", still has an
awful lot of errors”.

Probably we shall select SNPs which haven’t this variability. Those used so far were.
Probably we shall think to the Y tree more like the mtDNA, where back mutations are frequent.
Probably these mutating SNPs shall be used statistically like we use autosomal ones.

What SNPs won’t be able to give are the numerous knowledge that can give us theories like the Italian Refugium when everybody thought to the Cantabrian one.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #2 on: September 14, 2010, 10:04:23 AM »

Vincent Vizachero writes on Rootweb:
“I was able to reconstruct their tree, using the 75 men and 2788 SNPs
in the file that Vince pointed me to.
http://tinyurl.com/ychrgenotypes
The more data I see, the less optimistic I am about full Y sequencing.
Of the 2788 Y-SNPs they found in this group, 848 (or 30%) mutate twice
or more in the tree. 5% mutate 5 or more times. That means one (or
both) of two things.
1. Y-SNPs are very much more mutative than many people may still think.
2. This sequencing project, even after all the "filters", still has an
awful lot of errors”.

Probably we shall select SNPs which haven’t this variability. Those used so far were.
Probably we shall think to the Y tree more like the mtDNA, where back mutations are frequent.
Probably these mutating SNPs shall be used statistically like we use autosomal ones.

What SNPs won’t be able to give are the numerous knowledge that can give us theories like the Italian Refugium when everybody thought to the Cantabrian one.
I think it is important to keep this information in context.  There was never any guarantee that any SNP was a true UEP (Unique Event Polymorphism).
http://en.wikipedia.org/wiki/Unique-event_polymorphism

This is still all a statistical probability exercise.  Given the hundreds of millions of homo sapiens sapiens that have been born; it would be surprising, I think, if any one SNP occurred only once in all of time.  Fortunately, for statistical sake, most of the Y DNA lineages have died out.

If scientists evaluate SNP's properly and we apply them in the context of other SNP's within the known phylogenetic tree then I think we can maintain a high probability of being correct in placing a set of SNP results in the correct subclade.

Please note there is clear a benefit to deep clade package testing to keep SNP's in context and we certainly hope that testing companies are judicious in their use of assumptions of upstream SNP derived "calls" for placement in a subclade.

Given this is all still a probability exercise, long STR haplotype testing is still important.  It is another diagnostic tool to place people (or validate them) in subclades in context of available SNP results.  We can see that Ken Nordtvedt is a master at this.




« Last Edit: September 14, 2010, 10:05:21 AM by Mikewww » Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
Maliclavelli
Guru
*****
Offline Offline

Posts: 2151


« Reply #3 on: September 14, 2010, 10:11:05 AM »

I agree completely with you. The fact is that when some certainties crush down one thinks that has crushed truth.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

vineviz
Old Hand
****
Offline Offline

Posts: 191


« Reply #4 on: September 14, 2010, 10:22:09 AM »

This is still all a statistical probability exercise.  Given the hundreds of millions of homo sapiens sapiens that have been born; it would be surprising, I think, if any one SNP occurred only once in all of time.  Fortunately, for statistical sake, most of the Y DNA lineages have died out.

If scientists evaluate SNP's properly and we apply them in the context of other SNP's within the known phylogenetic tree then I think we can maintain a high probability of being correct in placing a set of SNP results in the correct subclade.

Mike,

I am sympathetic to the point I think you are trying to make, but the point I was trying to make goes beyond this.  We've known for as long as we've thought about it that any given mutation will have happened more than once in human history.

But what we have here is a dataset that has purportedly been heavily screened for quality (e.g. errors weeded out).  For such a small number of men and such a small number of SNPs, the number of parallel mutations is much, much larger than I would have expected.  Perhaps ten times as many as I'd expected to see based on working with other datasets (e.g. Adriano's collection of 23andMe haplotypes).

There was a paper last year that confirmed that the overall Y-SNP mutation rate was about what we expected.  One thing we don't know is whether the resultant mutations are spread more or less evenly over the Y-chromosome, or whether we have some hotspots:  some positions mutate a lot and others rarely mutate at all.
 
Theoretically, in a phylogenetic context it may not matter much.  But it matters a lot in two other important areas.

One is in error detection.  If some bp are mutating 20x or more in such a small sample, it will be hard to easily and inexpensively decide whether an apparent SNP is a true mutation or a sequencing error.

The second is in the usefulness of a Y-SNP molecular clock.  If we have a bunch of recurrent Y-SNPs it will be much harder to construct an accurate clock.  If we look at two men with the same allele, they could never have had a mutation or both have had a mutation.  Or one lineage had two mutations.  Etc.  In other words, we've been hoping that Y-sequencing would produce a clock with many thousands of very, very slow markers.  If that's not true, a lot of people will be greatly disappointed.

VV
Logged
Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #5 on: September 14, 2010, 10:29:22 AM »

....
Theoretically, in a phylogenetic context it may not matter much.  But it matters a lot in two other important areas.

One is in error detection.  If some bp are mutating 20x or more in such a small sample, it will be hard to easily and inexpensively decide whether an apparent SNP is a true mutation or a sequencing error.

The second is in the usefulness of a Y-SNP molecular clock.  If we have a bunch of recurrent Y-SNPs it will be much harder to construct an accurate clock.  If we look at two men with the same allele, they could never have had a mutation or both have had a mutation.  Or one lineage had two mutations.  Etc.  In other words, we've been hoping that Y-sequencing would produce a clock with many thousands of very, very slow markers.  If that's not true, a lot of people will be greatly disappointed.
I agree those issues are problematic.  

I have been made nervous about SNP testing errors in the past.  I know of a guy that was an 11-13 (cluster) guy that was U106+ while another 11-13 guy with close GD and same surname was L21+.  We went back and asked the lab to relook at it and sure enough they overturned their U106+ to U106- L21+, but again that shows the value of using multiple kinds of tests, at least in terms of subclade identification.
« Last Edit: September 14, 2010, 10:32:36 AM by Mikewww » Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
Mike Walsh
Guru
*****
Offline Offline

Posts: 2964


WWW
« Reply #6 on: September 14, 2010, 10:45:13 AM »

[..... The second is in the usefulness of a Y-SNP molecular clock.  If we have a bunch of recurrent Y-SNPs it will be much harder to construct an accurate clock. ...
Does this call into question Karafet et al's findings on haplogroup aging and R1 as 18.5K ybp?

Hopefully, there is enough stability with Y DNA SNP's that statistical methods can reduce the error or apply the results so they are correct, at least in their relative nature.  In other words, if there is some underlying consistency in error rates across haplogroups then we still should be able to find relative conclusions - at least at the population group level.
Logged

R1b-L21>L513(DF1)>S6365>L705.2(&CTS11744,CTS6621)
vineviz
Old Hand
****
Offline Offline

Posts: 191


« Reply #7 on: September 14, 2010, 11:13:51 AM »

[..... The second is in the usefulness of a Y-SNP molecular clock.  If we have a bunch of recurrent Y-SNPs it will be much harder to construct an accurate clock. ...
Does this call into question Karafet et al's findings on haplogroup aging and R1 as 18.5K ybp?
I don't think so, but it's hard to say for sure.  Karafet et al. used the same samples but different sequences and procedures.

The biggest problem will be on the last branch, since any errors are likely to be biggest there.  That's why Karafet got a good estimate for TMRCA-R1 but not for TMRCA-R1b1b2 (for example).  This new data would yield the same result:  the TMRCA for R1, or IJK are likely to be pretty reasonable (and similar to Karafet's estimates) but the TMRCA for R1b1b2 here would be wildly variable and highly suspect.

In this sample, the average R1b1b2 man has 48 SNPs between him and the MRCA of R1b1b2.  That's nice to know, if it is accurate.

But the range is HUGE (one guy has just 10 SNPs, another guy has 117 SNPs), which tells us that the sequencing errors and weak coverage depth are playing havoc with the findings.
Logged
Maliclavelli
Guru
*****
Offline Offline

Posts: 2151


« Reply #8 on: September 14, 2010, 11:24:47 AM »

Unfortunately I cannot say anything because I am not able to visualize (and to print) the Vincent's tree.
Logged

Maliclavelli


YDNA: R-S12460


MtDNA: K1a1b1e

Mark Jost
Old Hand
****
Offline Offline

Posts: 707


« Reply #9 on: September 14, 2010, 12:22:47 PM »


I agree those issues are problematic.  

I have been made nervous about SNP testing errors in the past.  I know of a guy that was an 11-13 (cluster) guy that was U106+ while another 11-13 guy with close GD and same surname was L21+.  We went back and asked the lab to relook at it and sure enough they overturned their U106+ to U106- L21+, but again that shows the value of using multiple kinds of tests, at least in terms of subclade identification.
Yea that is pretty unlikely occurance but situations like that do happen. I remember VinceT had a simular incident with his Haplogroup.
Logged

148326
Pos: Z245 L459 L21 DF13**
Neg: DF23 L513 L96 L144 Z255 Z253 DF21 DF41 (Z254 P66 P314.2 M37 M222  L563 L526 L226 L195 L193 L192.1 L159.2 L130 DF63 DF5 DF49)
WTYNeg: L555 L371 (L9/L10 L370 L302/L319.1 L554 L564 L577 P69 L626 L627 L643 L679)
Pages: [1] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.097 seconds with 18 queries.