CDCB file submission protocol for National Evaluation Centers¶
- Table of contents
- CDCB file submission protocol for National Evaluation Centers
Important notice¶
It is important to note that the files shared with CDCB should NOT contain:- special characters
- hexadecimal characters
- values in scientific notation.
- characters in foreign alphabet (chinese, japanese, cyrillic, etc)
- letters with accents
Please check your files before submission to identify and correct such issues. Files containing such characters will be immediately rejected.
Submission content and naming convention¶
Zip filename¶
- A single zipped file for each submission should be placed in the NEC's "in" folder of the CDCB sftp.
- NEC staff must create a Redmine ticket to inform CDCB staff of the successful submission. Please contact the CDCB genomics team if your NEC does not have or needs to modify/add your CDCB Readmine access.
- The zip file naming convention is: YYYYMMDDXX_[CDDR/NEC].zip , where
- YYYYMMDD is the date in 8 bytes, and XX is a sequential batch number (for multiple submissions in a day). E.g. 2020053001
- Example names for multiple submissions in one day:
- 2020053001_NEC.zip
- 2020053002_NEC.zip
- 2020053003_CDDR.zip
- 2020053004_NEC.zip
- NECs participating in the CDDR exchange program submitting CDDR file submissions should use CDDR in the zip file name. E.g. 2020053001_CDDR.zip
- All NECs, irrespectively of their participation in the CDDR exchange program, should submit any non-CDDR file submissions using NEC in the zip file name. E.g. 2020053001_NEC.zip
Zip content¶
- Every zip file should contain at least 2 files: a pedigree and genotype file.
- Pedigree file naming convention is: YYYYMMDDXX_pedigree.txt , where YYYYMMDDXX is the same value as in the .zip file. Pedigree file content format is in Interbull's Format200 (EXTERNAL LINK) . If the information is available, TW (twin) or ET (embryo transfer) animals should be identified in position 74-75 instead of the animal status.
An example record is:200 HOL840M000000000000 HOLUSAM000000000001 HOLUSAF000000000002 20101228 ET 20130114 ANIMALNAMEHERE USA
- Genotype file naming convention is: YYYYMMDDXX_genotype.csv , where YYYYMMDDXX is the same value as in the .zip file.
NEC Genomic File format¶
This is the format that CDCB uses to exchange (both input and output) with foreign National Evaluation Centers.
Field order | Num Bytes | Field Format | Ref Note(s) | Field Description |
---|---|---|---|---|
1 | 2 | CH | 4 | Evaluation Breed group of animal (alpha code only, no zeros). Required |
2 | 19 | CH | 19 digits Animal Identification in Interbull format: Breed(3)+Country of registration (3)+Sex(1)+ID number (12). Required | |
3 | up to 20 | CH | Sample ID : Identification of the sample when sent to the lab. Required | |
4 | up to 12 | CH | 347 | Requester ID : AI organization, breed association, or lab requesting the genotyping. Required |
5 | 1 or 2 | CH | 348 | Laboratory submitting the genotype. Not Required |
6 | 1 | CH | 163 | Parentage only indicator: if genotype is for parentage verification only or not only for parentage. Not Required |
7 | up to 8 | CH | 349 | Group Name : Location of animal or organization used to determine CDCB fee. Not Required |
8 | blank or 1 | CH | 354 | Tissue source that the DNA was extracted from. Optional |
9 | up to 12 | CH | Chip Barcode: Uniquely Identifies the chip which typically contains 24 samples. Required | |
10 | up to 6 | CH | Position: Position of the sample on the chip (row and column) e.g., R02C12. Required | |
11 | 8 | CH | Load date : Date genotype was added to CDCB database (YYYYMMDD). Not Required | |
12 | 1 | CH | 355 | Usability Code : Usability status of the genotype. CDCB use only. Not Required |
13 | 1 or 2 | CH | 162 | Array Number. Required |
14 | 1 byte per SNP, no commas between SNPs | INT | Genotype string : variable length string of 0,1,2,5 indicating SNP genotypes of BB, AB, AA, --. Required |
- Data should be comma delimited.
- Files should not contain hex bytes, nor any of the following characters: ] [ @ _ ! # $ % ^ & ( ) < > ? / \ | } { ~ :
- Please contact the CDCB genomics team for the SNP order in this format. Please indicate the chip type when requesting this information
- All 8 required fields must contain data. Conversely, not required fields can be left blank. The data provided there will not be taken into considereation.
- If chip barcode and position are not available, consult with CDCB for a procedure to create unique values for these fields.
See next example for chips from ThermoFisher:
The barcode that comes from the array itself looks like this:
5510464413482022222659
- Wherein the first 6 digits (551046) are the Affy part number, replace that with a 2 letter code. For Approved Partners, the use of 2 char country code is appropriate (e.g., CH for Switzerland).
- The next 7 digits (4413482) are the lot number of the array. Kept.
- The next 6 digits (022222) are the expiration date. Deleted.
- The final 3 digits (659) are the unit number within the lot. Kept.
This approach should create a unique number that will never repeat.
So the final chip barcode submitted would be:
CH4413482659
- Nomination of the genotypes shared must be performed by an approved nominator PRIOR of CDCB processing the files, except for males from approved partners that automatically receive fee code “P”. Genotypes without a complete nomination will not receive an evaluation.
An example record is:
HO,HOL840M000000000001,00000000001,Qualitas,,,,,CH4413482659,R02C01,,,1,0112501122202112220201021020221015021025201[...]
IMPORTANT NOTICE: When submitting files for testing during certification process, please include at least 50 genotypes with sire/dam, or both genotyped in the test batch.