Files Generated During QC Process¶
Once Sample Sheet and Final report files that a lab placed in "check" directory get picked up and processed by CDCB check process, a file called results.$10bytedatestring.$chip.zip gets generated in "out" directory.
In order to proceed to load the batch in CDCB database, the lab needs to check Data QC file and replace it in "in" directory with/without a comment.
(The lab needs to provide a reasons why it is okay to processed regardless of failed criteria in their submission)
We provide various supplemental files for labs to help determining the reasons of their failure in their submissions.
The explanation of each files that we (may) provide in results.$10bytedatestring.$chip.zip is described below:
DataQC_lab10-byte-date_*.csv¶
-Describes pass/fail, count, and description for results of QC process
-Needs to be returned to "in" directory after comments regarding failed category added by the lab
Example of Data QC file (with comment):
#Number of Genotypes #Chip Name #Genotypes Processed from lab10-byte-date
PASS/FAIL,Count,Description
FAIL,143,Parent Progeny Conflict SNP >2%
FAIL,579,Low Call Rate (across animal) SNP >10%
PASS,45,HWE SNP
PASS,0,Chips w/ >80 percent Conflicts
INFO,0,Percentage of animal genotypes with No Nominationcomment: Explanation on why it is okay to load this batch even with some fail(s) detected by the check process
The example of comments are:
eg1) Comment: SNPs with LowCall/PPC/HWE were reviewed, but clustering could not be improved
eg2) Comment: Nominations will be sent shortly
Possible_Switch.csv¶
-Shows possible switch at the lab where the genotyping was done
-It gives cases where both parents conflict, as well as pedigree sire and dam
-For labs to know possible switch or shifting by comparing sire & dam based on given pedigree and identified sire & dam based on the genotype
-SENTRIX_BARCODE,SENTRIX_POSITION,SAMPLE_ID,REQUESTER_ID,SAMPLE_PLATE,AMP_PLATE,SAMPLE_WELL,ID18,SIRE18,SIRE_WRONG,FOUND_SIRE,DAM18,DAM_WRONG,FOUND_DAM
chip_sum_bar.csv¶
-List chips with high error counts
-It is flagged when 20 or more out of 24 sample chip format present conflicts
Conflicts_by_plate.csv¶
-Shows conflict on each plate to investigate potential switch or shifting (image of the file attached)
Reassigned_genotypes.txt¶
-generated when lab submitted genotype files to assign the genotype to new animal ID
-Nominator uses genotype mover application to do this, but the lab can move genotype for multiple animals at once by sending genotype file if the error on miss-assignment occurred at the lab
key_not_found.txt¶
- This file contains Animal ID, Sample ID, and requester with no pedigree AND no nomination on CDCB's database (image of the file attached).
- Samples that are not associated with pedigree are given zero key.
No_nomination.txt¶
- This file contains Nominator ID, NAABcode, Animal ID, sampleID.
- Pedigree exist in our database, yet no nomination was done, so sample and ID are not associated and no fee code has been assigned
No_Match_Sample_ID_10-byte-date.txt¶
-It contains any Sample_IDs which were encountered in the FinalReport file, but were not in the SampleSheet.
-When this file contains anything, processing is stopped and an email sent to the lab listing the problem sample_IDs asking the lab to correct the problem and resubmit.
Sample_ID10-byte-date.txt¶
-This file shows association between the animalID and sample.
count.gt¶
-This file contains number of genotypes submitted in the first row
-Number of new genotype(C) assignment indicated in the second row
-Number of re-assignment(D) indicated in the third row
-The chip ID number is indicated in the 3rd column
Lab_Conflicts10-byte-date.htm¶
-This file shows Chip based number of conflict
-high number of conflict count may indicate mixed sample within the chip
HWE10-byte-date.html¶
-Statistics showing HWE to check homozygosity in autosome.
-Too high or too low homozygousity may indicate clustering issue
Parent_Progeny_Conflicts10-byte-date.htm¶
- This file shows parent-progeny conflicts per SNP
- Only homozygous SNPs are counted (# homozygous in the "count column")
- percent (4th column) = #conflicts (2nd column) / count (3rd column)
PPC_bySNP10-byte-date.htm¶
- Parent-progeny-conflicts by SNPs
“0” = homozygous1
“1” = heterozygous
“2” = homozygous2
- Countall – total number of samples (genotypes)
- “gtype” – genotype call on the specific SNP that the animal has
- Count – number of cases of PPC on the SNP with the scenario
Example
SNP_Name countall sex gtype sire dam COUNT ARS-BFGL-NGS-3109 56 F 0 0 2 3 - There are 56 cases that both sire and dam are heterozygous.
- Out of the 56 cases, there are 3 samples that sire=0(homo1), dam=2(homo2), female prog=0 where 1(heterozygous) is expected in the progeny's genotyppe
LowCallSNP10-byte-date.htm¶
- indicates call rate(%) on each SNP
- Low call rate may indicate SNP quality
Genomic_conflicts10-byte-date.htm (Check_Errors_20200911.csv)¶
- This file shows genomic conflict(s)
- Animal ID, Sample_ID, name, error, other_ID, other_name, source are reported after quality check is done
- miss-assignment of genotypes or miss-identification of pedigree needs to be investigated
- It is nominator's responsibility to resolve conflicts
- Requesters also receive this report
GE1k_progeny.csv¶
-Report genotyped animal that already has another genotype and >1000 progeny
-The processing of the animal gets put on hold until valid reason for the genotype submission
-Include animal key, SampleID, progeny count in the file (anim_key,Sample_ID,progeny_cnt)
-When more than one genotypes with >1000 progeny is observed, the following message will be sent to the person who is in charge of the submission via Redmine.
There are X record(s) in the 2022070122 dataset that have more than 1000 progeny. Please let us know if the following animals were purposely genotyped and the reason why you submitted them. The processing of the animal is pending on your explanation. Sentrix Sentrix Animal ID Bar Code Pos Sample ID ----------------- ------------ ------ ---------------- HO84000XXXXXXXXXX XXXXXXXXX R0XC0X HO840F00XXXXXXXXXX . . .
Existing_genotype_negative_key.csv¶
-Reports that the same barcode and position were used for an existing negative key genotype.
-The sample should not be loaded until the negative key genotype becomes positive.
-Contains sample ID, barcode, and position that is conflicting with the negative key genotype.
Excessive_Homozygous_LABYYYYMMDD.csv (upon loading)¶
-Reports list of animals with Excessive_Homozygous_[Lab_Code]2023041113.csv
anim_key,Sample_ID,Homozygous_Percent 96032679,BRE_23MR23_18,94.5 96032548,MAM_23MR16_02,99.9 96090968,EVA_23MR23_01,99.8