Files Generated During QC Process¶
Once Sample Sheet and Final report files that a lab placed in "check" directory get picked up and processed by CDCB check process, a file called results.$10bytedatestring.$chip.zip gets generated in "out" directory.
In order to proceed to load the batch in CDCB database, the lab needs to check Data QC file and replace it in "in" directory with/without a comment.
(The lab needs to provide a reasons why it is okay to processed regardless of failed criteria in their submission)
We provide various supplemental files for labs to help determining the reasons of their failure in their submissions.
The explanation of each files that we (may) provide in results.$10bytedatestring.$chip.zip is described below:
-Describes pass/fail, count, and description for results of QC process
-Needs to be returned to "in" directory after comments regarding failed category added by the lab
Example of Data QC file (with comment)
70 GP4 Genotypes Processed from lab10digit
FAIL,47,Parent Progeny Conflict SNP >2%
PASS,112,Low Call Rate SNP >10%
PASS,0,Chips w/ >80 percent Conflicts
PASS,0,No Nomination %
PASS,0,Genotype Submitted with No Sample Sheet Row
PASS,0,Genotype assigned to a different animal
comment: Explanation on why it is okay to load this batch even with some fail(s) detected by the check process
The example of comments are:
eg1) Comment: SNPs with LowCall/PPC/HWE were reviewed, but clustering could not be improved
eg2) Comment: Nominations will be sent shortly
-Shows possible switch at the lab where the genotyping was done
-It gives cases where both parents conflict, as well as pedigree sire and dam
-For labs to know possible switch or shifting by comparing sire & dam based on given pedigree and identified sire & dam based on the genotype
-List chips with high error counts
-It is flagged when 20 or more out of 24 sample chip format present conflicts
-Shows conflict on each plate to investigate potential switch or shifting (image of the file attached)
-generated when lab submitted genotype files to assign the genotype to new animal ID
-Nominator uses genotype mover application to do this, but the lab can move genotype for multiple animals at once by sending genotype file if the error on miss-assignment occurred at the lab
- This file contains Animal ID, Sample ID, and requester with no pedigree AND no nomination on CDCB's database (image of the file attached).
- Samples that are not associated with pedigree are given zero key.
- This file contains Nominator ID, NAABcode, Animal ID, sampleID.
- Pedigree exist in our database, yet no nomination was done, so sample and ID are not associated and no fee code has been assigned
-It contains any Sample_IDs which were encountered in the FinalReport file, but were not in the SampleSheet.
-When this file contains anything, processing is stopped and an email sent to the lab listing the problem sample_IDs asking the lab to correct the problem and resubmit.
-This file shows association between the animalID and sample.
-This file contains number of genotypes submitted in the first row
-Number of new genotype(C) assignment indicated in the second row
-Number of re-assignment(D) indicated in the third row
-The chip ID number is indicated in the 3rd column
-This file shows Chip based number of conflict
-high number of conflict count may indicate mixed sample within the chip
-Statistics showing HWE to check homozygosity in autosome.
-Too high or too low homozygousity may indicate clustering issue
- This file shows parent-progeny conflicts per SNP
- Only homozygous SNPs are counted (# homozygous in the "count column")
- percent (4th column) = #conflicts (2nd column) / count (3rd column)
- Parent-progeny-conflicts by SNPs
“0” = homozygous1
“1” = heterozygous
“2” = homozygous2
- Countall – total number of samples (genotypes)
- “gtype” – genotype call on the specific SNP that the animal has
- Count – number of cases of PPC on the SNP with the scenario
SNP_Name countall sex gtype sire dam COUNT ARS-BFGL-NGS-3109 56 F 0 0 2 3
- There are 56 cases that both sire and dam are heterozygous.
- Out of the 56 cases, there are 3 samples that sire=0(homo1), dam=2(homo2), female prog=0 where 1(heterozygous) is expected in the progeny's genotyppe
- indicates call rate(%) on each SNP
- Low call rate may indicate SNP quality
- This file shows genomic conflict(s)
- Animal ID, Sample_ID, name, error, other_ID, other_name, source are reported after quality check is done
- miss-assignment of genotypes or miss-identification of pedigree needs to be investigated
- It is nominator's responsibility to resolve conflicts
- Requesters also receive this report
-Report genotyped animal that already has another genotype and >1000 progeny
-The processing of the animal gets put on hold until valid reason for the genotype submission
-Include animal key, SampleID, progeny count in the file (anim_key,Sample_ID,progeny_cnt)
-When more than one genotypes with >1000 progeny is observed, the following message will be sent to the person who is in charge of the submission via Redmine.
There are X record(s) in the 2022070122 dataset that have more than 1000 progeny. Please let us know if the following animals were purposely genotyped and the reason why you submitted them. The processing of the animal is pending on your explanation. Sentrix Sentrix Animal ID Bar Code Pos Sample ID ----------------- ------------ ------ ---------------- HO84000XXXXXXXXXX XXXXXXXXX R0XC0X HO840F00XXXXXXXXXX . . .
-Reports that the same barcode and position were used for an existing negative key genotype.
-The sample should not be loaded until the negative key genotype becomes positive.
-Contains sample ID, barcode, and position that is conflicting with the negative key genotype.
Excessive_Homozygous_LABYYYYMMDD.csv (upon loading)¶
-Reports list of animals with Excessive_Homozygous_[Lab_Code]2023041113.csv
anim_key,Sample_ID,Homozygous_Percent 96032679,BRE_23MR23_18,94.5 96032548,MAM_23MR16_02,99.9 96090968,EVA_23MR23_01,99.8