Relatedness
Sample Relatedness Analysis¶
Gencove supports generating sample relatedness estimates using KING relatedness software from VCF files. This analysis optionally subsets VCF files to target sites, and then runs KING to determine sample relatedness.
Once the analysis is complete, the kinship matrix and related files will be available for download via the
Analysis
object.
Please keep in mind:
- Analysis requires list of per-chromosome VCF files as input
- Subsetting to a set of target sites is optional. Provide both
subset
andsubset_tbi
parameters to subset to a set of target sites, or provide neither. This is useful in cases where the input VCFs are too large to be processed in full.
Note that output from the jointcalling shortcut can be used as input to this shortcut. For more information, see the jointcalling shortcut documentation.
Example Usage¶
Example without subsetting:
from gencove_explorer_library.shortcuts.sample_relatedness import SampleRelatedness
from gencove_explorer.file import File, NamedFile
# Without subsetting (use full VCF files)
analysis = SampleRelatedness(
vcfs=[
File(remote=NamedFile(path="chr1.vcf.gz")),
File(remote=NamedFile(path="chr2.vcf.gz")),
File(remote=NamedFile(path="chr3.vcf.gz")),
# ... more chromosome VCFs
],
maf=0.05
).run()
# Once analysis is complete, retrieve the kinship matrix
relatedness_results = analysis.result()
print(relatedness_results)
Example with subsetting (requires both subset VCF and subset TBI files):
analysis = SampleRelatedness(
vcfs=[
File(remote=NamedFile(path="chr1.vcf.gz")),
File(remote=NamedFile(path="chr2.vcf.gz")),
File(remote=NamedFile(path="chr3.vcf.gz")),
# ... more chromosome VCFs
],
subset=File(remote=NamedFile(path="gsa_sites.vcf.gz")),
subset_tbi=File(remote=NamedFile(path="gsa_sites.vcf.gz.tbi")),
maf=0.05
).run()
relatedness_results = analysis.result()
Accessing Results¶
The primary result (kinship matrix) can be accessed using the .result()
method, which returns the .kin0
file, among other deliverables. To access other deliverables, one can list outputs as follows:
Output:
{
"kin0": File(
local=PosixPath("/home/explorer/tmp/tmp15a_9czu/concatenated.kin0"),
remote=EFile(path="e://users/me/tmp/266ddf2a8f034d029b93a679480f9cac"),
),
"genome": File(
local=PosixPath("/home/explorer/tmp/tmps08_8leh/concatenated.genome"),
remote=EFile(path="e://users/me/tmp/97042be1edfd43b2b0b9cfce8f3a7add"),
),
"pairwise_allsegs": File(
local=PosixPath(
"/home/explorer/tmp/tmpjuc8okgq/concatenated_pairwise-relallsegs.txt"
),
remote=EFile(path="e://users/me/tmp/19b88f40c63448b9af483c9020dac1e5"),
)
}