CDCB SNP Array Validation Process¶
- Submit a completed SNP array validation form
- All requested information MUST be provided.
- All requested information MUST be provided.
- Provide the SNP Array Full Name following this format:
[Technology (e.g. Illumina or Affymetrix)] - [Company/Collaborator name] - [SNP array name] - [description / version # (if applicable)]
- Payment corresponding to CDCB SNP array validation fees.
- Submit SNP array data considering the following requirements
- SNP Content:
- The chip must include all 195 SNPs used for parentage verification and discovery (See ICAR documentation).
- For data to qualify for ICAR parentage certification, at least 95% of the ICAR 195 SNPs must have genotype calls (minimum 185/195 non-missing).
- The chip must contain at least 3480 of the 3552 "fast discovery" SNPs in the attached list, including all first 96 labeled as critical.
- The chip must contain at least 350 of the 550 SNPs used for Quick Discovery Service (QDisc).
- Note: QDisc based on the ICAR 550 SNP subset, from the total ICAR 554 SNP list for parentage discovery 4 SNPs are excluded as per ICAR recommendations.
- The chip must include at least 10 Y SNPs for gender verification (examples provided in the attached file).
- Genotype Concordance (to assess genotype call accuracy, including both male and female samples)
Provide one of the following:- At least 51 genotypes from the new chip for animals already genotyped with a Chips Used in CDCB Evaluation that includes most of the SNPs on the new array; or
- Genotypes from at least 51 animals for which at least one parent has been genotyped with a Chips Used in CDCB Evaluation with substantial SNP overlap.
- SNP Coordinates
- SNP genomic coordinates must be based on the ARS-UCD1.2 assembly (Rosen et al. 2018 WCGALP, vol. Molecular Genetics 3, p. 802 and https://www.ncbi.nlm.nih.gov/assembly/GCA_002263795.2/).
- SNP genomic coordinates must be based on the ARS-UCD1.2 assembly (Rosen et al. 2018 WCGALP, vol. Molecular Genetics 3, p. 802 and https://www.ncbi.nlm.nih.gov/assembly/GCA_002263795.2/).
- Manifest File
- The SNP manifest file must include flanking sequences for all SNPs to support verification and alignment of SNP positions.
- Preference is for at least 50 bp of sequence upstream and downstream of the SNP
- SNP Content:
- The above data is to be submitted to CDCB in 3 files
- Description of the SNP (SNP manifest/map file):
Name,Index,Chr,Position,FlankingSequence SNP_Name_1,1,19,123456789,GCAGTGGCACCTGCTCCCTTCTTCCTAGGTGCGCTTCTGTACGCTTACTA[A/T]ATCTCGGCTACATCGGCTACAATTGCGTGTTATGCTCGAGGCTTACACCT SNP_Name_2,2,30,11223344,CGAGTGGAAATTGCTCACTTATGGCTAGGTGAGATTCTCTAGCCTTAGTA[C/G]CGCCTGGCTAGACTGCATAACCGGTGCGTGTTACGGTCCATTCATAGACA SNP_Name_3,3,5,98765432,CTTGAGCATGTCGCGAACCTCAGGCAATGTGTGACTCTTTAGTCTGTGTA[C/A]AATCTTACTAGAGGGCATAGTCGATGCGAGTCACTGTACATGCAGAGATT
- The above columns are REQUIRED, with the preferred order as shown
- The above columns are REQUIRED, with the preferred order as shown
- Genotypes (Final Report file):
[Header] Version 1.1.1 Processing Date 14-Mar-2023 03:44:44 PM Content Test Num SNPs 4 Total SNPs 4 Num Samples 3 Total Samples 3 [Data] AB01234567 AB01234568 AB01234569 SNP_Name_1 BB AA AB SNP_Name_2 AB AB AB SNP_Name_3 AA AB AA SNP_Name_4 BB BB BB- Matrix format is preferred (rows are SNP, columns are samples)
- Standard format (all data in rows) can be submitted if matrix format is not available
- Sample_ID in the genotype file should match the Sample_ID in the sample sheet file
- SNP names have a maximum of 44 characters allowed
- Matrix format is preferred (rows are SNP, columns are samples)
- Information about the submitted samples (Sample sheet file):
[Header],,,,,,,,,,,,,,,,,,,,,,, Investigator Name,"Doe, John",,,,,,,,,,,,,,,,,,,,,, Project Name,2023031411_ABC_AB,,,,,,,,,,,,,,,,,,,,,, Experiment Name,,,,,,,,,,,,,,,,,,,,,,, Date,44916,,,,,,,,,,,,,,,,,,,,,, [Manifests],,,,,,,,,,,,,,,,,,,,,,, A,A_Dairy_Chip,,,,,,,,,,,,,,,,,,,,,, [Data],,,,,,,,,,,,,,,,,,,,,,, Sample_ID,Sample_Plate,Sample_Name,Project,AMP_Plate,Sample_Well,SentrixBarcode_A,SentrixPosition_A,Scanner,Date_Scan,Replicate,Parent1,Parent2,Gender,Sample Type AB01234567,123456,HOUSA000000000001,Project1,222222,E1,200000000000,R01C01,,,,,,,Tissue AB01234568,123456,HOUSA000000000002,Project1,222222,B2,200000000001,R01C02,,,,,,,Tissue AB01234569,123456,HOUSA000000000003,Project1,222222,C3,200000000002,R01C03,,,,,,,Tissue
- At minimum, the Sample sheet file must contain columns for Sample_ID, Sample_Name, Barcode, and Position
For additional details on the file formats, please see:
https://redmine.uscdcb.com/projects/cdcb-customer-service/wiki/CDCB_Accepted_genotype_file_formats
For lists of required SNPs, expand "Files" section below