World Families Forums - Submitting X DNA data for research and comparisons

Welcome, Guest. Please login or register.
April 19, 2014, 12:16:36 PM
Home Help Search Login Register

+  World Families Forums
|-+  General Forums - Note: You must Be Logged In to post. Anyone can browse.
| |-+  X-chromosome (X DNA) (Moderator: Seán MacGorman Powell)
| | |-+  Submitting X DNA data for research and comparisons
« previous next »
Pages: [1] Go Down Print
Author Topic: Submitting X DNA data for research and comparisons  (Read 6519 times)
Seán MacGorman Powell
X-chromosome Project Administrator
Board Moderator
Old Hand
*****
Offline Offline

Posts: 154



WWW
« on: January 20, 2009, 04:24:43 PM »

One of the goals of the X-chromosome Project website is to provide a centralized place for people to submit and compare X DNA test results.  There are three ways that you can participate in this research:

1)  For genome-scan tests (23andMe and DeCODEme), complete the following steps:

a)   Send me a private message providing me with your e-mail address.  I will then e-mail you an Excel spreadsheet for you to use to submit your results, which you can return to me.  If you do not have the capability to use an Excel spreadsheet, send me a PM anyway and I will explain to you how to submit your results via an alternate method.  I will not share your e-mail address with anyone without your permission.

b)   Log onto your personal page at www.23andme.com/you/

c)   Click the “Browse Raw Data” link near the top of the left-hand navigation bar

d)   Click on the icon of the X chromosome

e)   Scroll to the bottom of the page and click “Next>>”.  This will change the URL in your browser’s address bar to a form that you can edit with the start position of the block of interest.

f)   Backspace over the number after the = sign in the URL, and replace it with the start position (not the SNP number that starts with “rs,” but the position number).   On some browsers, you may be able to just double-click on this number to select just that portion of the URL, and then type the new position number in its place.

g)   Press your <enter> key, and this will bring up a page showing the sequence of SNPs that begin with the start number that you specified.

h)   Type the string of nucleotide base letters in the “<your name>’s Genotype” column into the spreadsheet that I have sent you.  Those of you who are good with spreadsheets may wish to first cut-and-paste the data into a text-only file, then import it into a new spreadsheet, and then simply paste the data from your spreadsheet into mine.

For privacy reasons, the haploblocks posted in this spreadsheet should not contain any genes (I will leave it up to each contributor to ensure that it doesn’t—if you are in doubt, make sure you look for the word “intergenic” before each SNP number that you are submitting from your raw data list), so there should be minimal privacy risks.  As a caveat, there are long  sequences of DNA in between the SNPs sampled by 23andMe, which could potentially turn out to contain genes after further research, so submit your data at your own discretion.  The spreadsheet that I will send you has the positions of genes marked with xx's, so leave those cells blank.

After you return the spreadsheet to me, I’ll add that data to the project results page.

If you have a new haploblock for which you want to solicit contributions, please post it in as a new topic on this forum board (again, first making sure that it doesn’t include any genes) and I’ll add it to the spreadsheet along with people’s data contributions, as they come in.

You may instead want to download you entire raw data file (using the “download raw data” link near the upper right corner of the Browse Raw Data window), and then import it into a spreadsheet, to make it easier for you to find and extract SNP sequences.

2) Another way that you can contribute your genome scan results is by submitting your entire raw data file (edited to only include the X chromosome), to Ben Moscia, who is maintaining a spreadsheet of people’s X DNA data.  If you are a male and submit your data to Ben, you do not need to also submit it to Project as described above, as I will automatically anonymously extract the relevant DNA blocks and copy them over to the project results chart. Please note that I only do this automatic data extraction for males, due to technical limitations with female DNA datasets.

The following instructions are copied (slightly modified to fit the forum format) from Ben’s signature over at DNA-forums:

The following links are for people who want to contribute to or download the X-Chromosome 23andme file. To contribute, email me the x-portion of your raw 23andme file in .txt format. The data will be anonymous on the sheet. If you would like me to include your family origins, please include them in your email. To download the excel spreadsheet, click the link below. A file in .zip format is now included.

Email Ben here:
benmoscia AT hotmail.com

23andme X-chromosome spreadsheet can be found here:
http://cid-bb940b89da5692bf.skydrive.live.com/self.aspx/.Public/X-23andme.xls

23andme X-file in .zip format is here:
http://cid-bb940b89da5692bf.skydrive.live.com/self.aspx/.Public/X-23andme.zip


Please note that if you are submitting your entire X chromosome’s raw data to this spreadsheet, you are effectively submitting private medical information for public viewing (including any gene mutations that may make you susceptible to certain medical conditions), so do not submit your data there if this is a concern to you.  Ben will anonymize your data and replace your name with a code number though, so the risk should be minimal unless you choose to share your name non-anonymously.

When you submit your data, it is very important that you provide your ancestry percentages (assuming you know who at least some of your ancestors were), for your X-chromosome ancestors only. You can figure these percentages using one of the charts on the following website:

http://freepages.genealogy.rootsweb.ancestry.com/~hulseberg/DNA/xinheritance.html

The following chart may be easier to use if you are a male (and can only be used by males):

http://www.thegeneticgenealogist.com/2009/01/12/more-x-chromosome-charts/

You don't need to know every one of your ancestors to estimate the percentages--just fill in as much as you can, and then add up the percentages from the outermost boxes that you were able to fill in, and make sure they total 100% (e.g., 50% Swedish / 25% Irish / 25% French).


If anybody would care to explain the equivalent data extraction procedure for DeCODEme, I will add a link to your post in this thread.

Also note that I will not routinely automatically extract your data from Ben’s file and add it to the haploblock datasets for people who are listed non-anonymously (i.e., using your actual name instead of an anonymous ID number), so you will still need to send me those separate sequences (using the instructions in part #1 above) if you wish to be included in those blocks. 

3) There is a separate results spreadsheet for submitting results from X-STR testing, which you can request from me by sending me a PM.  See here for additional information and a link to the results page:

http://www.worldfamilies.net/forum/index.php?topic=8448.0
« Last Edit: May 20, 2009, 11:03:10 AM by Seán MacGorman Powell » Logged

a.k.a., GhostX
DKF
Member
**
Offline Offline

Posts: 33


WWW
« Reply #1 on: January 20, 2009, 08:52:10 PM »

I will be submitting my 23andme data in due course, and my decodeme data if I can figure out a way to do it.

What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK.  It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match.  Already this data has taught me that there are specific locations where blocks join that are found in  some populations but not others.  I have yet to come up with a molecular biological explanation of these observations but the data is very clear (I will add a posting on the subject later).  This data was analyzed by Anders Palsen and will be submitted with his permission.  The question is, how do I go about doing it.  The data is presently in a Zip format - can this be uploaded to WF (it is too large to upload to my personal website and link from there).  Thanks.
Logged

X-chromosome:  56.25% England; 12.5% Scotland; 12.5% Ireland; 12.5% Germany; 6.25% North America (Lower Mohawk, Six Nations)
Seán MacGorman Powell
X-chromosome Project Administrator
Board Moderator
Old Hand
*****
Offline Offline

Posts: 154



WWW
« Reply #2 on: January 20, 2009, 09:11:43 PM »

I will be submitting my 23andme data in due course, and my decodeme data if I can figure out a way to do it.

What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK.  It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match.  Already this data has taught me that there are specific locations where blocks join that are found in  some populations but not others.  I have yet to come up with a molecular biological explanation of these observations but the data is very clear (I will add a posting on the subject later).  This data was analyzed by Anders Palsen and will be submitted with his permission.  The question is, how do I go about doing it.  The data is presently in a Zip format - can this be uploaded to WF (it is too large to upload to my personal website and link from there).  Thanks.

David,

The PLINK results you are referring to seem like they could be a fantastic resource.  I am still trying to figure out myself what we can do with data uploads on the project website.  There is the ability to attach data in relatively small files (under 1024 KB) to any given forum message, but what you are talking about sounds far larger. 

I currently have the ability to post two large spreadsheets on the project website (just how large I don't know, but we can experiment with that), and this capacity might be able to expanded in the near future (I have to talk to Terry Barton about that).  One of the available slots is the spreadsheet that I am currently already testing here (the one for which I've asked for feedback), but if people don't find that spreadsheet useful, then we could certainly replace it with something else. 

The other slot is currently available for whatever spreadsheet people here feel would be the most useful (e.g., your PLINK data).  I think I'd have to know what the file size is, to see if I can get it to work in the frame that's available.  I'd also need to know what the ZIP file unzips into (e.g., an Excel file, PDF, etc.).

You're welcome to e-mail it to me if you want me to take a look at it, or else you can just describe it to me further.
Logged

a.k.a., GhostX
tomcat
Member
**
Offline Offline

Posts: 20


« Reply #3 on: January 20, 2009, 10:51:37 PM »

...
What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK.  It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match.  ...

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?
And, how about a PLINK tutorial?
Logged

Paternal X: 100% Ukrainian Ashkenazi. Maternal X: 50% Upper Midwest Native American, 50% European.
DKF
Member
**
Offline Offline

Posts: 33


WWW
« Reply #4 on: January 20, 2009, 11:20:48 PM »

...
What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK.  It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match.  ...

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?
And, how about a PLINK tutorial?
I will send it to you by e-mail GhostX, assuming that I can find your address since I don't think a PM would work.  It unzips into an Excel file.  My data is included in the mix, as if I was a member of "the panel".  I have 6 matches (about average - my Xibo match has 8). 

Alas tomcat I am not the PLINK expert, actually I haven't even read the literature on this program let alone experimented with it and burned up days of computer time on my laptop doing the analyses.  Only Anders has done this.  I hope that he will join our group. 

There are many different programs, each does something a little different, but after trying most of them Anders seems to have found PLINK to best meet our objectives.  All of these programs are available online for those willing to download them and experiment a bit.  I am not quite ready for this.  At the moment my focus is on collecting references and outlining specifics about the X as background to understanding the output.  Perhaps someone with a solid math - stats background would be willing to get into the act here.
Logged

X-chromosome:  56.25% England; 12.5% Scotland; 12.5% Ireland; 12.5% Germany; 6.25% North America (Lower Mohawk, Six Nations)
Svaale
Member
**
Offline Offline

Posts: 19


« Reply #5 on: January 21, 2009, 04:16:48 AM »

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?
And, how about a PLINK tutorial?

Yes you can but the output file will be several hundred megabytes large and almost unmanagable in addition you must be able to manage an enourmous amount of usable and unusable block information, mostly the latter.

The tutorial is here http://pngu.mgh.harvard.edu/~purcell/plink/tutorial.shtml
« Last Edit: January 21, 2009, 04:23:25 AM by Svaale » Logged
DKF
Member
**
Offline Offline

Posts: 33


WWW
« Reply #6 on: January 22, 2009, 10:17:26 PM »

I am a bit unclear here GhostX (not an uncommon occurence).  I submitted the complete 23andme dataset to Ben and he trimmed it and included the X part.  I have no concerns about privacy issues surrounding SNPs embedded in genes - my DNA is an open book.  Can you obtain the data directly from Ben's site (it would seem to be the simplest approach) since I have given here and now my permission to include my data in any analysis you or others here care to perform?

Secondly, it would appear that the vast majority of people here have either tested with 23andme or with both decodeme and 23andme.  Hence there does not seem to be a good reason to send the decodeme file - although you are welcome to it if it helps in any way.
Logged

X-chromosome:  56.25% England; 12.5% Scotland; 12.5% Ireland; 12.5% Germany; 6.25% North America (Lower Mohawk, Six Nations)
Seán MacGorman Powell
X-chromosome Project Administrator
Board Moderator
Old Hand
*****
Offline Offline

Posts: 154



WWW
« Reply #7 on: January 22, 2009, 11:51:16 PM »

I am a bit unclear here GhostX (not an uncommon occurence).  I submitted the complete 23andme dataset to Ben and he trimmed it and included the X part.  I have no concerns about privacy issues surrounding SNPs embedded in genes - my DNA is an open book.  Can you obtain the data directly from Ben's site (it would seem to be the simplest approach) since I have given here and now my permission to include my data in any analysis you or others here care to perform?

Secondly, it would appear that the vast majority of people here have either tested with 23andme or with both decodeme and 23andme.  Hence there does not seem to be a good reason to send the decodeme file - although you are welcome to it if it helps in any way.


David,

Yes, in your case I can get your data for the various haploblocks from Ben's spreadsheet, and I'll be happy to do that as soon as I'm done extracting the remaining haploblocks from the dna-forums discussions (just so I can do it all at once rather than having to keep going back to your data with each block that I post).  Feel free to remind me in a couple of days if I forget.

I didn't want to offer to automatically do that with everyone though, for various reasons (partly because it's just too much work for me to go back and do that for everybody just yet).  Once I'm done getting the project website all set up, then maybe I'll go back and try to reassign data from people who have de-anonymized themselves (if I can keep everybody straight--it's getting confusing with people listing their names in different ways, and with different instances of the same name--sometimes it's different family members, and sometimes it's just a different chromosome for the same person!).  Incidentally, people who are only listed by first name at this point (or by a common surname) are at risk of getting lost in the shuffle with all the new names that I keep adding to the results sheet, so people can let me know if they want to be listed in a more specific fashion.  On a couple of instances where two or more different family members are listed, I've had to guess which result went with which family member.

In the meantime, if anybody sends me SNP sequence(s) via PM or e-mail and tells me how they want to be listed (by name or otherwise), I'll add it to the results chart immediately, or move it from your anonymous listing to a listing by specific name.

Regarding your second question: No, I don't need anybody to send me their DeCODEme file.  What I meant in my earlier post is that if somebody wants to write out the procedure for extracting data for the DeCODEme raw data, then I'll post a link to that writeup (or just paste the procedure in my original post), so that other DeCODEme customers will know how to extract theirs.

Thanks for asking for the clarification--I probably could have been more clear in my original message.
Logged

a.k.a., GhostX
Seán MacGorman Powell
X-chromosome Project Administrator
Board Moderator
Old Hand
*****
Offline Offline

Posts: 154



WWW
« Reply #8 on: January 31, 2009, 04:14:39 PM »

I've changed the data-submission procedures, and have modified the original post above accordingly.  Results may now be sent to me on a template spreadsheet that I will send you.
Logged

a.k.a., GhostX
Pages: [1] Go Up Print 
« previous next »
Jump to:  


SEO light theme by © Mustang forums. Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC

Page created in 0.113 seconds with 19 queries.