CDCB SNP Array Validation Process¶

Submit a completed SNP array validation form
- All requested information MUST be provided.
Provide the SNP Array Full Name following this format:
[Technology (e.g. Illumina or Affymetrix)] - [Company/Collaborator name] - [SNP array name] - [description / version # (if applicable)]
Payment corresponding to CDCB SNP array validation fees.
Submit SNP array data considering the following requirements
1. SNP Content:
  1. The chip must include all 195 SNPs used for parentage verification and discovery (See ICAR documentation).
  2. For data to qualify for ICAR parentage certification, at least 95% of the ICAR 195 SNPs must have genotype calls (minimum 185/195 non-missing).
  3. The chip must contain at least 3480 of the 3552 "fast discovery" SNPs in the attached list, including all first 96 labeled as critical.
  4. The chip must contain at least 350 of the 550 SNPs used for Quick Discovery Service (QDisc).
    - Note: QDisc based on the ICAR 550 SNP subset, from the total ICAR 554 SNP list for parentage discovery 4 SNPs are excluded as per ICAR recommendations.
  5. The chip must include at least 10 Y SNPs for gender verification (examples provided in the attached file).
2. Genotype Concordance (to assess genotype call accuracy, including both male and female samples)
  Provide one of the following:
  - At least 51 genotypes from the new chip for animals already genotyped with a Chips Used in CDCB Evaluation that includes most of the SNPs on the new array; or
  - Genotypes from at least 51 animals for which at least one parent has been genotyped with a Chips Used in CDCB Evaluation with substantial SNP overlap.
3. SNP Coordinates
  - SNP genomic coordinates must be based on the ARS-UCD1.2 assembly (Rosen et al. 2018 WCGALP, vol. Molecular Genetics 3, p. 802 and https://www.ncbi.nlm.nih.gov/assembly/GCA_002263795.2/).
4. Manifest File
  - The SNP manifest file must include flanking sequences for all SNPs to support verification and alignment of SNP positions.
  - Preference is for at least 50 bp of sequence upstream and downstream of the SNP

The above data is to be submitted to CDCB in 3 files

Description of the SNP (SNP manifest/map file):

Name,Index,Chr,Position,FlankingSequence
SNP_Name_1,1,19,123456789,GCAGTGGCACCTGCTCCCTTCTTCCTAGGTGCGCTTCTGTACGCTTACTA[A/T]ATCTCGGCTACATCGGCTACAATTGCGTGTTATGCTCGAGGCTTACACCT
SNP_Name_2,2,30,11223344,CGAGTGGAAATTGCTCACTTATGGCTAGGTGAGATTCTCTAGCCTTAGTA[C/G]CGCCTGGCTAGACTGCATAACCGGTGCGTGTTACGGTCCATTCATAGACA
SNP_Name_3,3,5,98765432,CTTGAGCATGTCGCGAACCTCAGGCAATGTGTGACTCTTTAGTCTGTGTA[C/A]AATCTTACTAGAGGGCATAGTCGATGCGAGTCACTGTACATGCAGAGATT

The above columns are REQUIRED, with the preferred order as shown

Genotypes (Final Report file):

[Header]
Version    1.1.1
Processing Date 14-Mar-2023 03:44:44 PM
Content Test
Num SNPs        4
Total SNPs      4
Num Samples     3
Total Samples   3
[Data]
        AB01234567        AB01234568        AB01234569
SNP_Name_1        BB        AA        AB
SNP_Name_2        AB        AB        AB
SNP_Name_3        AA        AB        AA
SNP_Name_4        BB        BB        BB

Matrix format is preferred (rows are SNP, columns are samples)
- Standard format (all data in rows) can be submitted if matrix format is not available
Sample_ID in the genotype file should match the Sample_ID in the sample sheet file
SNP names have a maximum of 44 characters allowed

Information about the submitted samples (Sample sheet file):

[Header],,,,,,,,,,,,,,,,,,,,,,,
Investigator Name,"Doe, John",,,,,,,,,,,,,,,,,,,,,,
Project Name,2023031411_ABC_AB,,,,,,,,,,,,,,,,,,,,,,
Experiment Name,,,,,,,,,,,,,,,,,,,,,,,
Date,44916,,,,,,,,,,,,,,,,,,,,,,
[Manifests],,,,,,,,,,,,,,,,,,,,,,,
A,A_Dairy_Chip,,,,,,,,,,,,,,,,,,,,,,
[Data],,,,,,,,,,,,,,,,,,,,,,,
Sample_ID,Sample_Plate,Sample_Name,Project,AMP_Plate,Sample_Well,SentrixBarcode_A,SentrixPosition_A,Scanner,Date_Scan,Replicate,Parent1,Parent2,Gender,Sample Type
AB01234567,123456,HOUSA000000000001,Project1,222222,E1,200000000000,R01C01,,,,,,,Tissue
AB01234568,123456,HOUSA000000000002,Project1,222222,B2,200000000001,R01C02,,,,,,,Tissue
AB01234569,123456,HOUSA000000000003,Project1,222222,C3,200000000002,R01C03,,,,,,,Tissue

At minimum, the Sample sheet file must contain columns for Sample_ID, Sample_Name, Barcode, and Position

For additional details on the file formats, please see:
https://redmine.uscdcb.com/projects/cdcb-customer-service/wiki/CDCB_Accepted_genotype_file_formats

For lists of required SNPs, expand "Files" section below

Files (4)

4K_SNP_Names.csv (93.9 KB) 4K_SNP_Names.csv		jcarrillo, 10/06/2021 12:52 PM
QDisc_550_SNP_names.csv (12.7 KB) QDisc_550_SNP_names.csv		henzenauer, 03/02/2023 04:10 PM
Y_SNP_names.txt (450 Bytes) Y_SNP_names.txt		henzenauer, 05/10/2023 04:04 PM
195_SNP_names.txt (5.48 KB) 195_SNP_names.txt		henzenauer, 08/20/2024 10:24 AM

Project

General

Profile

CDCB collaborator portal

Wiki

CDCB SNP Array Validation Process¶