New SNP Chip Validation Process Requirements¶
- Submit a completed CDCB SNP array validation form. NOTE: All the requested information MUST be provided.
- For SNP Array Full Name, please follow the format below:
[Technology (e.g. Illumina or Affymetrix)] - [Company/Collaborator name] - [SNP array name] - [description / version # (if applicable)]
- For SNP Array Full Name, please follow the format below:
- Paid the corresponding CDCB SNP array validation fees.
- The following SNP must be included on the new chip:
- Chips are required to have ALL 195 SNP used in the attached list for parentage verification and discovery (See ICAR documentation)
- Chips are required to have at least 3480 of the 3552 SNP (fast discovery) in the attached list, including all the first 96 labeled critical.
- Chips are required to have at least 350 of the 550 SNP, used for quick discovery service (QDisc) in the attached list.
- QDisc based on the ICAR 550 SNP subset: from the total ICAR 554 SNP list for parentage discovery , 4 SNP are excluded as per ICAR guidelines recommendation .
- Chips are required to have at minimum 10 Y SNP for gender verification. See the attached file for examples of Y SNP commonly used for genotyping.
- Provide one of the following two, to assess the accuracy of the genotype calls:
- At least 50 genotypes from the new chip for animals already genotyped with a Chips Used in CDCB Evaluation that includes most of the SNPs in the new chip.
- Genotypes from animals (min 50) which at least one parent has been genotyped with a Chips Used in CDCB Evaluation with substantial overlap with the new chip.
- SNP coordinates should be based on the ARS-UCD1.2 assembly (Rosen et al. 2018 WCGALP, vol. Molecular Genetics 3, p. 802 and https://www.ncbi.nlm.nih.gov/assembly/GCA_002263795.2/)
The above data is to be submitted to CDCB in 3 files¶
1) Description of the SNP (SNP manifest/map file):
Name,Index,Chr,Position SNP_Name_1,1,19,123456789 SNP_Name_2,2,30,11223344 SNP_Name_3,3,5,98765432 SNP_Name_4,4,1,123400000
- The above columns are REQUIRED, with the preferred order as shown
2) Genotypes (Final Report file):
[Header] Version 1.1.1 Processing Date 14-Mar-2023 03:44:44 PM Content Test Num SNPs 4 Total SNPs 4 Num Samples 3 Total Samples 3 [Data] AB01234567 AB01234568 AB01234569 SNP_Name_1 BB AA AB SNP_Name_2 AB AB AB SNP_Name_3 AA AB AA SNP_Name_4 BB BB BB
- Matrix format is preferred (rows are SNP, columns are samples)
- Standard format (all data in rows) can be submitted if matrix format is not available
- Sample_ID in the genotype file should match the Sample_ID in the sample sheet file
- SNP names have a maximum of 44 characters allowed
3) Information about the submitted samples (Sample sheet file):
[Header],,,,,,,,,,,,,,,,,,,,,,, Investigator Name,"Doe, John",,,,,,,,,,,,,,,,,,,,,, Project Name,2023031411_ABC_AB,,,,,,,,,,,,,,,,,,,,,, Experiment Name,,,,,,,,,,,,,,,,,,,,,,, Date,44916,,,,,,,,,,,,,,,,,,,,,, [Manifests],,,,,,,,,,,,,,,,,,,,,,, A,A_Dairy_Chip,,,,,,,,,,,,,,,,,,,,,, [Data],,,,,,,,,,,,,,,,,,,,,,, Sample_ID,Sample_Plate,Sample_Name,Project,AMP_Plate,Sample_Well,SentrixBarcode_A,SentrixPosition_A,Scanner,Date_Scan,Replicate,Parent1,Parent2,Gender,Sample Type AB01234567,123456,HOUSA000000000001,Project1,222222,E1,200000000000,R01C01,,,,,,,Tissue AB01234568,123456,HOUSA000000000002,Project1,222222,B2,200000000001,R01C02,,,,,,,Tissue AB01234569,123456,HOUSA000000000003,Project1,222222,C3,200000000002,R01C03,,,,,,,Tissue
- At minimum, the Sample sheet file must contain columns for Sample_ID, Sample_Name, Barcode, and Position
For additional details on the file formats, please see:
https://redmine.uscdcb.com/projects/cdcb-customer-service/wiki/CDCB_Accepted_genotype_file_formats