CDCB file submission protocol for National Evaluation Centers (old)¶
- Table of contents
- CDCB file submission protocol for National Evaluation Centers (old)
Important notice¶
It is important to note that the files shared with CDCB should NOT contain:- special characters
- hexadecimal characters
- values in scientific notation.
- characters in foreign alphabet (chinese, japanese, cyrillic, etc)
- letters with accents
Please check your files before submission to identify and correct such issues. Files containing such characters will be immediately rejected.
Submission content and naming convention¶
Zip filename¶
- A single zipped file for each submission should be placed in the NEC's "in" folder of the CDCB sftp.
- NEC staff must create a Redmine ticket to inform CDCB staff of the successful submission. Please contact the CDCB genomics team if your NEC does not have or needs to modify/add your CDCB Readmine access.
- The zip file naming convention is: YYYYMMDDXX_[CDDR/NEC].zip , where
- YYYYMMDD is the date in 8 bytes, and XX is the batch number (for multiple submissions in a day). E.g. 2020053001
- NECs participating in the CDDR exchange program submitting CDDR file submissions should use CDDR in the zip file name. E.g. 2020053001_CDDR.zip
- All NECs, irrespectively of their participation in the CDDR exchange program, should submit any non-CDDR file submissions using NEC in the zip file name. E.g. 2020053001_NEC.zip
Zip content¶
- Every zip file should contain at least 2 files: a pedigree and genotype file.
- Pedigree file naming convention is:
YYYYMMDDXX_pedigree.txt
, whereYYYYMMDDXX
is the same value as in the .zip file. Pedigree file content format is in Interbull's Format200 (EXTERNAL LINK) . If the information is available,TW
(twin) orET
(embryo transfer) animals should be identified in position 74-75 instead of the animal status.
An example record is:200 HOL840M000000000000 HOLUSAM000000000001 HOLUSAF000000000002 20101228 ET 20130114 ANIMALNAMEHERE USA
- Genotype file naming convention is:
YYYYMMDDXX_genotype.csv
, whereYYYYMMDDXX
is the same value as in the .zip file. Genotype file is comma delimited, content is:Field order (1) Num Bytes Field Format Ref Note(s) Field Description 1 2 CH 4 Evaluation Breed group of animal (alpha code only, no zeros) 2 19 CH 19 digits Animal Identification in Interbull format: Breed(3)+Country of registration (3)+Sex(1)+ID number (12) 3 up to 22 CH sample ID : Identification of the sample when sent to the lab 4 up to 12 CH 347 Requester ID : AI organization, breed association, or lab requesting the genotyping 5 1 or 2 CH 348 Laboratory where chip was prepared and scanned. Not Required 6 1 CH 163 Parentage only indicator: if genotype is for parentage verification only or not only for parentage Not Required 7 up to 8 CH 349 Group Name - location of animal or organization used to determine CDCB fee Not Required 8 blank or 1 CH 354 Tissue source that the DNA was extracted from Not Required 9 up to 12 CH Chip Barcode: uniquely Identifies the chip which typically contains 24 samples 10 up to 6 CH Position of the sample on the chip (row and column) 11 8 CH load date : date genotype was added to CDCB database (YYYYMMDD) CDCB Use only 12 1 CH Usability Code: usability status of the genotype CDCB Use only 13 1 or 2 CH 162 Array Number 14 1 byte per SNP, no commas between SNPs INT Genotype string (2) : variable length string of 0,1,2,5 indicating SNP genotypes of BB, AB, AA, --.
NOTES:
(1) For non-required fields, please leave the field blank if data is not available.
(2) _Please contact the CDCB genomics team for the SNP order in this format. Please indicate the chip type when request this information
An example record is:
HO,HOL840M000000000000,003000247244,Acceler,2,,,,4294034076,R03C02,,,1,0112501122202112220201021020221015021025201 [...]