Project

General

Profile

CDCB file submission protocol for National Evaluation Centers (old)

Important notice

It is important to note that the files shared with CDCB should NOT contain:
  • special characters
  • hexadecimal characters
  • values in scientific notation.
  • characters in foreign alphabet (chinese, japanese, cyrillic, etc)
  • letters with accents

Please check your files before submission to identify and correct such issues. Files containing such characters will be immediately rejected.

Submission content and naming convention

Zip filename

  • A single zipped file for each submission should be placed in the NEC's "in" folder of the CDCB sftp.
  • NEC staff must create a Redmine ticket to inform CDCB staff of the successful submission. Please contact the CDCB genomics team if your NEC does not have or needs to modify/add your CDCB Readmine access.
  • The zip file naming convention is: YYYYMMDDXX_[CDDR/NEC].zip , where
    • YYYYMMDD is the date in 8 bytes, and XX is the batch number (for multiple submissions in a day). E.g. 2020053001
    • NECs participating in the CDDR exchange program submitting CDDR file submissions should use CDDR in the zip file name. E.g. 2020053001_CDDR.zip
    • All NECs, irrespectively of their participation in the CDDR exchange program, should submit any non-CDDR file submissions using NEC in the zip file name. E.g. 2020053001_NEC.zip

Zip content

  • Every zip file should contain at least 2 files: a pedigree and genotype file.
  1. Pedigree file naming convention is: YYYYMMDDXX_pedigree.txt , where YYYYMMDDXX is the same value as in the .zip file. Pedigree file content format is in Interbull's Format200 (EXTERNAL LINK) . If the information is available, TW (twin) or ET (embryo transfer) animals should be identified in position 74-75 instead of the animal status.
    An example record is:
    200 HOL840M000000000000 HOLUSAM000000000001 HOLUSAF000000000002 20101228 ET 20130114 ANIMALNAMEHERE                                     USA
    
  2. Genotype file naming convention is: YYYYMMDDXX_genotype.csv , where YYYYMMDDXX is the same value as in the .zip file. Genotype file is comma delimited, content is:
    Field order (1) Num Bytes Field Format Ref Note(s) Field Description
    1 2 CH 4 Evaluation Breed group of animal (alpha code only, no zeros)
    2 19 CH 19 digits Animal Identification in Interbull format: Breed(3)+Country of registration (3)+Sex(1)+ID number (12)
    3 up to 22 CH sample ID : Identification of the sample when sent to the lab
    4 up to 12 CH 347 Requester ID : AI organization, breed association, or lab requesting the genotyping
    5 1 or 2 CH 348 Laboratory where chip was prepared and scanned. Not Required
    6 1 CH 163 Parentage only indicator: if genotype is for parentage verification only or not only for parentage Not Required
    7 up to 8 CH 349 Group Name - location of animal or organization used to determine CDCB fee Not Required
    8 blank or 1 CH 354 Tissue source that the DNA was extracted from Not Required
    9 up to 12 CH Chip Barcode: uniquely Identifies the chip which typically contains 24 samples
    10 up to 6 CH Position of the sample on the chip (row and column)
    11 8 CH load date : date genotype was added to CDCB database (YYYYMMDD) CDCB Use only
    12 1 CH Usability Code: usability status of the genotype CDCB Use only
    13 1 or 2 CH 162 Array Number
    14 1 byte per SNP, no commas between SNPs INT Genotype string (2) : variable length string of 0,1,2,5 indicating SNP genotypes of BB, AB, AA, --.

NOTES:
(1) For non-required fields, please leave the field blank if data is not available.
(2) _Please contact the CDCB genomics team for the SNP order in this format. Please indicate the chip type when request this information

An example record is:

 HO,HOL840M000000000000,003000247244,Acceler,2,,,,4294034076,R03C02,,,1,0112501122202112220201021020221015021025201 [...]

Redmine Appliance - Powered by TurnKey Linux