CDCB Quick Discovery of close relatives (QDisc) service - User documentation¶
- Table of contents
- CDCB Quick Discovery of close relatives (QDisc) service - User documentation
Rationale and general characteristics of the service¶
The CDCB Quick Discovery (QDisc) service is a close relative discovery procedure that uses the ICAR 550 SNP subset for genomic comparisons. Because of the smaller subset of SNPs used and the lower amount of information used (e.g. no parent information available, or dates of birth) the accuracy of this service is lower than the existing CDCB genomic full service (FULL).
The QDisc service is a one-time discovery event that compares the genotypes provided by the user with all animals in the file provided and all genotypes present in the CDCB collaborator database on the day before the program runs (e.g. the specific 550 SNP subset database is updated daily).
The CDCB policy related to this service, approved by the CDCB Board of Directors, can be found here. As a general overview, this service involves users placing the genotype files in a dedicated section of their SFTP area (named, QDisc) and CDCB systems quickly returning a quick set of discovered close relatives (if found). The benefit of this solution when compared to the FULL CDCB parentage service, no nomination or correct ID17 identification of the samples is needed. Conversely, the genotypes are not stored in the CDCB collaborator database, there is no validation of the information provided beyond the minimum requirements to perform the analysis, and the subset used for the discovery is much smaller than the one used in FULL CDCB service (e.g. less accuracy).
General information about the service¶
- This service is activated upon request and requires the signing of a Quick Discovery service EUA.
- All new users, especially nominators that are not certified laboratories, are STRONGLY encouraged to test the service before implementation (please contact CDCB via Redmine ticket). Testing the service is optional and free of charge: requires the nominator to send at least 10 genotyped animals already stored in the CDCB database (for cross-checking).
- The QDisc service includes a periodical - daily - update of the specific database used for these comparisons (QDiscDB).
- There is no charge linked to the setup of this service. Further details about the costs of the service can be found in the Quick Discovery of close relatives policy.
- The discovery process is based on the ICAR 550 SNP subset: from the total ICAR 554 SNP list for parentage discovery , 4 SNPs are excluded as per ICAR guidelines recommendation .
- The service is chip-agnostic. A full chip set or a subset of SNPs can be submitted: while the QDisc system accepts any number of SNPs in the files, only the aforementioned 550 SNPs will be retained. Please see the specific input section for further details.
- Since ICAR 550 SNP subset does not include a valid way to assess the sex of samples, QDisc is available on both male and female animals. However, since the target of this service is to provide facilitated access to pedigree information to commercial herds, the output is limited to 5 male/female close relatives. So the submission of animals with large families is strongly discouraged. Clones / identical genotypes of candidate animals will be reported without any limitation to facilitate potential misidentification of a sample. Only the base animal will be reported in case close relatives are known clones (labeled accordingly).
- At least 300 called SNPs are required to provide any kind of assessment (a message will be displayed for them). Animals with 1 or fewer conflicts are considered close relatives. Only close relatives with usable (use_ind=Y) genotypes are reported, if in QDiscDB. Animals with more than 95% identical genotypes are considered clones or identical genotypes and a message will be displayed in their results file.
- No valid ID is necessary, since this service is not linked to any credit when FULL service is requested. Users may use sample_ID or submit a different ID (Sample_Name) in a SampleSheet file.
- Although the policy indicates that QDisc results will be delivered within 48 hours, the user can expect an almost immediate processing time. If files are not processed within 1 hour, please create a Redmine ticket asking for CDCB staff support. Note that this service uses the CDCB collaborator database to retrieve IDs and discovered animal information, so it will not run during production database maintenance events.
- No support other than eventual issues related to the CDCB service status and QDisc functionalities, will be provided on animals receiving QDisc results. Investigations on specific animal evaluations are the responsibility of the user. CDCB will provide support to the user only when there is reasonable and proven evidence the system is not performing as declared in the validation phases of the service.
- Invoicing will be done monthly, based on all the genotypes submitted in the previous month.
Access to the service¶
All CDCB-approved nominators can access this service. Requests will be accepted only via Redmine ticket. CDCB staff will follow up with a series of instructions on how to finalize the setup.
Technical information: Folder setup¶
The SFTP infrastructure for QDisc is completely separate from FULL service (and Quick Turnaround). The QDisc service is based in a simple system of folders. A folder named QDisc will be created by the CDCB staff in your SFTP area. Two sub-folders will be present: QDisc/in (input - writeable by the user) and QDisc/out (output - readonly to the user).
Technical information: Input files¶QDisc input genotype files (a.k.a. FinalReport - and SampleSheet - files):
- FILE PLACEMENT: FinalReport (and SampleSheet) files need to be placed in the QDisc/in folder. The placement of these file in this folder will be considered as an implicit request (and, consequently, the acceptance to pay) for the service.
- FILES AND FILENAMES: A standard FinalReport is required. SampleSheet files are optional and can be used to provide a different ID of the animal (Sample_Name will be used). Similarly to the FULL service, the file(s) MUST be zipped singularly. Although the system will recognize any filename containing 'FinalReport' (case sensitive) in the name, users are encouraged to use a similar naming convention structure used on the FULL system or - as a minumum - the following naming convention to avoid any unexpected parsing of the file: [anything]_FinalReport.zip. Note that since the system is not depending on chip information, providing the information is optional.
- FINALREPORT (AND SAMPLESHEET) FILE CONTENT: The formats used for delivering the genotypes and sample information to the FULL service, as detailed in CDCB genotype file format documentation: CDCB Accepted genotype file formats. Note that, differently from what is reported in the FULL service documentation, none of the "header" information (e.g. all information typically under "[Header]") is used by QDisc. QDisc programs start reading from the "[Data]" row onwards, so all the data prior to that line will be skipped. All other FinalReport / SampleSheet format standards and the coding of genotypes in AB format are a requirement. Please note that only official SNP names are accepted (QDisc is case INsensitive to SNP names), so please make sure the SNP names provided correspond to the ICAR official naming .
- OTHER NOTES (FEES): Since this service is based on a one-time solution, no information is stored in the CDCB database. It is the user's responsibility to ensure no repeated submissions are placed in the 'in' folder. CDCB will charge the user for any submission successfully processed, even if for the animals previously processed.
Technical information: Output files¶
- FILE PLACEMENT: After the successful processing of a submission, a single .zip file will be placed in the QDisc/out folder. CDCB will hold a copy of this file for a limited amount of time with the objective of providing support. An email will advise the user's staff of the successful processing of the file and the name of the output file created.
- FILES AND FILENAMES: A single .zip file named after the input file will be generated. The output file will include the name of the input file until 'FinalReport' and will include a '_QDisc.zip' trailing portion of the name. For example: an input file named 20200101_yourname_FinalReport.zip will generate an output zipfile named 20200101_yourname_QDisc.zip
- OUPUT ZIP FILE CONTENT: The output .zip file will contain 2 types of files:
qdstatus.csv: Contains all animals read by the system and provides basic statistics and information about the discovery process. This file, comma separated, includes 4 fields: ID provided (Input_ID), SNP calls read in the genotype provided (Input_SNP_calls), Animals with matches found (Disc_matches), a Message (if available).
Input_ID,Input_SNP_calls,Disc_matches,Message SampleHO1,490,3, SampleHO2,490,2, SampleHO3,492,2,
qdisc.csv: Close relatives discovered by the system. This file, comma separated, includes 4 fields: ID provided (Input_ID), discovered animal ID (ID17 if in QDiscDB, Input_ID if within file), discovered animal date of birth (if discovered close relative in QDiscDB), discovered animal sex (if discovered close relative in QDiscDB), total count of SNPs compared (SNP_compared), total count of SNP in conflict (SNP_conflict), a Message (if available).
In the case of cloned candidate animals, all clones are reported. In the case of cloned close relatives, only the parent clone close relative is reported (labeled with message "Disc_ID is head of clonal family ([NUM] genotyped)" where [NUM] is the number of clones genotyped.
Input_ID,Disc_ID,Disc_DOB,Disc_SEX,SNP_compared,SNP_conflict,Message SampleHO1,HOUSA000000000001,20110314,F,464,0, SampleHO1,HOUSA000000000002,20120626,M,464,0,Disc_ID is head of clonal family (3 genotyped). SampleHO1,HOUSA000000000003,20200802,F,490,0,Self or clone SampleHO2,HOUSA000000000004,20140407,M,458,0, SampleHO2,HOUSA000000000005,20220222,F,490,0,Self or clone SampleHO3,HOUSA000000000006,20120421,M,465,0, SampleHO3,HOUSA000000000007,20220801,F,492,0,Self or clone
- OTHER OUTPUT FILES: On a monthly basis, a file named QDisc_InvoiceDetail_[requester_ID]_YYMM.csv will be placed in your QDisc/out folder. This file provides a list of files processed during the timeframe indicated in the first line of the file (typically, from the first to last day of the previous month). An email will be sent to the user announcing the creation of the file.
Information for Quick Turnaround users only¶The folder structure for Quick Turnaround users that request the QDisc service will have include two folders:
- QDisc (with subfolders in/ and out/) -> files placed here will receive the QDisc service only
- QTurn_Qdisc (with subfolders in/ and out/) -> files placed here will receive both the Quick Turnaround evaluation and the Quick Discovery service. Both .zip files and content will follow their respective documentation.
Please note that while there is a credit service in place for QuickTurnaround service, there are no fee credits applied to samples sent to the QuickDiscovery service.