Skip to content

Relatedness

Sample Relatedness Analysis

Gencove supports generating sample relatedness estimates using KING relatedness software from VCF files. This analysis optionally subsets VCF files to target sites, and then runs KING to determine sample relatedness.

Once the analysis is complete, the kinship matrix and related files will be available for download via the Analysis object.

Please keep in mind:

  • Analysis requires list of per-chromosome VCF files as input
  • Subsetting to a set of target sites is optional. Provide both subset and subset_tbi parameters to subset to a set of target sites, or provide neither. This is useful in cases where the input VCFs are too large to be processed in full.

Note that output from the jointcalling shortcut can be used as input to this shortcut. For more information, see the jointcalling shortcut documentation.

Example Usage

Example without subsetting:

from gencove_explorer_library.shortcuts.sample_relatedness import SampleRelatedness
from gencove_explorer.file import File, NamedFile

# Without subsetting (use full VCF files)
analysis = SampleRelatedness(
    vcfs=[
        File(remote=NamedFile(path="chr1.vcf.gz")),
        File(remote=NamedFile(path="chr2.vcf.gz")),
        File(remote=NamedFile(path="chr3.vcf.gz")),
        # ... more chromosome VCFs
    ],
    maf=0.05
).run()


# Once analysis is complete, retrieve the kinship matrix
relatedness_results = analysis.result()
print(relatedness_results)

Example with subsetting (requires both subset VCF and subset TBI files):

analysis = SampleRelatedness(
    vcfs=[
        File(remote=NamedFile(path="chr1.vcf.gz")),
        File(remote=NamedFile(path="chr2.vcf.gz")),
        File(remote=NamedFile(path="chr3.vcf.gz")),
        # ... more chromosome VCFs
    ],
    subset=File(remote=NamedFile(path="gsa_sites.vcf.gz")),
    subset_tbi=File(remote=NamedFile(path="gsa_sites.vcf.gz.tbi")),
    maf=0.05
).run()

relatedness_results = analysis.result()

Accessing Results

The primary result (kinship matrix) can be accessed using the .result() method, which returns the .kin0 file, among other deliverables. To access other deliverables, one can list outputs as follows:

relatedness_results.analyses()[0].get_output().list()

Output:

{
    "kin0": File(
        local=PosixPath("/home/explorer/tmp/tmp15a_9czu/concatenated.kin0"),
        remote=EFile(path="e://users/me/tmp/266ddf2a8f034d029b93a679480f9cac"),
    ),
    "genome": File(
        local=PosixPath("/home/explorer/tmp/tmps08_8leh/concatenated.genome"),
        remote=EFile(path="e://users/me/tmp/97042be1edfd43b2b0b9cfce8f3a7add"),
    ),
    "pairwise_allsegs": File(
        local=PosixPath(
            "/home/explorer/tmp/tmpjuc8okgq/concatenated_pairwise-relallsegs.txt"
        ),
        remote=EFile(path="e://users/me/tmp/19b88f40c63448b9af483c9020dac1e5"),
    )
}