netcoloc package

netcoloc.netcoloc_utils module

Utility functions useful across multiple modules.

netcoloc.netcoloc_utils.get_degree_binning(node_to_degree_dict, min_bin_size, lengths=None)[source]

Groups nodes by degree into similarly sized bins. This function comes from network_utilities.py of emreg00/toolbox

Returns a tuple with following two values:

  • list of bins where each bin contains a list of nodes of similar degree

  • mapping of degree to index of bin dict mapping a degree to the index of the bin in the bins list which contains nodes of that degree

Parameters:
  • node_to_degree_dict (dict) – Map of nodes to their degrees

  • min_bin_size (int) – minimum number of nodes each bin should contain.

  • lengths (list) – List of nodes to bin. If lengths is equal to None, then all nodes will be binned

Returns:

(list of bins, mapping of degree to index of bin)

Return type:

tuple

netcoloc.netprop module

Functions for performing network propagation

netcoloc.netprop.get_individual_heats_matrix(nam_or_graph, alpha=0.5, conserve_heat=True, weighted=False)[source]

Returns the pre-calculated contributions of each individual gene in the interactome to the final heat of each other gene in the interactome after propagation.

Changed in version 0.1.6: In addition, to a normalized adjacency matrix, this function now also supports networkx.Graph network as input

If a networkx.Graph network is passed in as the nam_or_graph parameter, the function get_normalized_adjacency_matrix() is called to generate the normalized adjacency matrix using conserve_heat and weighted parameters

Note

Resulting matrix from this function can be saved to a file with numpy.save() and loaded later with numpy.load(), but resulting file can be several gigabytes and take a minute or more to save/load.

numpy.save('heats_matrix.npy', w_double_prime)
w_double_prime = numpy.load('heats_matrix.npy')
Parameters:
  • nam_or_graph (numpy.ndarray or networkx.Graph) – square normalized adjacency matrix or network

  • alpha (float) – heat dissipation coefficient between 1 and 0. The contribution of the heat propagated from adjacent nodes in determining the final heat of a node, as opposed to the contribution from being a part of the gene set initially

  • conserve_heat (bool) – If True, heat will be conserved (ie. the sum of the heat vector will be equal to 1), and the graph will be asymmetric. Otherwise, heat will not be conserved, and the graph will be symmetric. NOTE: Only applies if nam_or_graph is networkx.Graph

  • weighted (bool) – If True, then the graph’s edge weights will be taken into account. Otherwise, all edge weights will be set to 1. NOTE: Only applies if nam_or_graph is networkx.Graph

Returns:

square individual heats matrix

Return type:

numpy.ndarray

netcoloc.netprop.get_normalized_adjacency_matrix(graph, conserve_heat=True, weighted=False)[source]

Returns normalized adjacency matrix (W’), as detailed in:

Vanunu, Oron, et al. ‘Associating genes and protein complexes with disease via network propagation.’

With version 0.1.6 and newer, the networkx.Graph can be directly passed into get_individual_heats_matrix() and this method will be invoked to create the normalized adjacency matrix

Note

Resulting matrix from this function can be saved to a file with numpy.save() and loaded later with numpy.load(), but resulting file can be several gigabytes and take a minute or more to save/load.

numpy.save('nam.npy', adjacency_matrix)
adjacency_matrix = numpy.load('nam.npy')
Parameters:
  • graph (networkx.Graph) – Interactome from which to calculate normalized adjacency matrix.

  • conserve_heat (bool) – If True, heat will be conserved (ie. the sum of the heat vector will be equal to 1), and the graph will be asymmetric. Otherwise, heat will not be conserved, and the graph will be symmetric.

  • weighted (bool) – If True, then the graph’s edge weights will be taken into account. Otherwise, all edge weights will be set to 1.

Returns:

Square normalized adjacency matrix

Return type:

numpy.ndarray

netcoloc.netprop.network_propagation(individual_heats_matrix, nodes, seed_genes)[source]

Implements network propagation, as detailed in:

Vanunu, Oron, et al. ‘Associating genes and protein complexes with disease via network propagation.’

Using this function, the final heat of the network is calculated directly, instead of iteratively. This method is faster when many different propagations need to be performed on the same network (with different seed gene sets). It is slower than iterative_network_propagation() for a single propagation.

Parameters:
  • individual_heats_matrix (numpy.ndarray) – Square matrix that is the output of get_individual_heats_matrix()

  • nodes (list) – List of nodes in the network represented by the individual_heats_matrix, in the same order in which they were supplied to get_individual_heats_matrix()

  • seed_genes (list) – Input list of genes/nodes for intializing the heat in network propagation. Any items in seed genes that are not present in nodes will be ignored.

Returns:

Final heat of each node after propagation, with the name of the nodes as the index

Return type:

pandas.Series

netcoloc.netprop_zscore module

Functions for getting z-scores from network propagation.

netcoloc.netprop_zscore.calculate_heat_zscores(individual_heats_matrix, nodes, degrees, seed_genes, num_reps=10, alpha=0.5, minimum_bin_size=10, random_seed=1)[source]

Helper function to perform network heat propagation using the given individual heats matrix with the given seed genes and return the z-scores of the final heat values of each node.

The z-scores are calculated based on a null model, which is built by running the network propagation multiple times using randomly selected seed genes with similar degree distributions to the original seed gene set.

The returned tuple contains the following:

  • pandas.Series containing z-scores for each gene. Gene names comprise the index column

  • pandas.Series containing the final heat scores for each gene. Gene names comprise the index column,

  • numpy.ndarray containing square matrix in which each row contains the final heat scores for each gene from a network propagation from random seed genes)

Parameters:
  • individual_heats_matrix (numpy.ndarray) – output of the netprop.get_individual_heats_matrix. A square matrix containing the final heat contributions of each gene

  • nodes (list) – nodes, in the order in which they were supplied to the get_normalized_adjacency_matrix() method which returns the precursor to the individual_heats_matrix

  • degrees (dict) – Mapping of node names to node degrees

  • seed_genes (list) – list of genes to use for network propagation. The results of this network propagation will be compared to a set of random results in order to obtain z-scores

  • num_reps (int) – Number of times the network propagation algorithm should be run using random seed genes in order to build the null model

  • alpha (float) – Number between 0 and 1. Denotes the importance of the propagation step in the network propagation, as opposed to the step where heat is added to seed genes only. Recommended to be 0.5 or greater

  • minimum_bin_size (int) – minimum number of genes that should be in each degree matching bin

  • random_seed

Returns:

(pandas.Series, pandas.Series, numpy.ndarray)

Return type:

tuple

netcoloc.netprop_zscore.netprop_zscore(seed_gene_file, seed_gene_file_delimiter=None, num_reps=10, alpha=0.5, minimum_bin_size=10, interactome_file=None, interactome_uuid='f93f402c-86d4-11e7-a10d-0ac135e8bacf', ndex_server='public.ndexbio.org', ndex_user=None, ndex_password=None, out_name='out', save_z_scores=False, save_final_heat=False, save_random_final_heats=False, verbose=True)[source]

Performs network heat propagation on the given interactome with the given seed genes, then returns the z-scores of the final heat values of each node in the interactome.

The z-scores are calculated based on a null model, which is built by running the network propagation multiple times using randomly selected seed genes with similar degree distributions to the original seed gene set.

This method returns a tuple containing the following:

  • pandas.Series containing z-scores for each gene. Gene names comprise the index column

  • numpy.ndarray containing square matrix where each row contains the final heat scores for each gene from a network propagation from random seed genes

Parameters:
  • seed_gene_file (str) – Location of file containing a delimited list of seed genes

  • seed_gene_file_delimiter (str) – Delimiter used to separate genes in seed gene file. Default any whitespace

  • num_reps (int) – Number of times the network propagation algorithm should be run using random seed genes in order to build the null model

  • alpha (float) – Number between 0 and 1. Denotes the importance of the propagation step in the network propagation, as opposed to the step where heat is added to seed genes only. Recommended to be 0.5 or greater

  • minimum_bin_size (int) – minimum number of genes that should be in each degree matching bin.

  • interactome_file (str) – Location of file containing the interactome in NetworkX gpickle format. Either the interactome_file argument or the interactome_uuid argument must be defined.

  • interactome_uuid (str) – UUID of the interactome on NDEx. Either the interactome_file argument or the interactome_uuid argument must be defined. (Default: The UUID of PCNet, the Parsimonious Composite Network: f93f402c-86d4-11e7-a10d-0ac135e8bacf)

  • ndex_server (str) – NDEx server on which the interactome is stored. Only needs to be defined if interactome_uuid is defined

  • ndex_user (str) – NDEx user that the interactome belongs to. Only needs to be defined if interactome_uuid is defined, and the interactome is private

  • ndex_password (str) – password of the NDEx user’s account. Only needs to be defined if interactome_uuid is defined, and the interactome is private

  • out_name (str) – Prefix for saving output files

  • save_z_scores

  • save_final_heat (bool) – If True, then the raw network propagation heat scores for the original seed gene set will be saved in the form of a tsv file in the current directory

  • save_random_final_heats (bool) – If True, then the raw network propagation heat scores for every repetition of the algorithm using random seed genes will be saved in the form of a tsv file in the current directory. (Beware: This can be a large file if num_reps is large.)

  • verbose – If True, then progress information will be logged. Otherwise, nothing will be printed

Returns:

(pandas.Series, numpy.ndarray)

Return type:

tuple

Raises:

TypeError – If neither interactome_file or interactome_uuid is provided or if num_reps is not an int

netcoloc.network_colocalization module

Functions for performing network colocalization

netcoloc.network_colocalization.calculate_expected_overlap(z_scores_1, z_scores_2, seed1=None, seed2=None, z_score_threshold=3, z1_threshold=1.5, z2_threshold=1.5, overlap_control=None, num_reps=1000, plot=False)[source]

Determines size of expected network overlap by randomly shuffling gene names

Parameters:
  • z_scores_1 (pandas.Series) – Result from netprop_zscore() or calculate_heat_zscores() containing the z-scores of each gene following network propagation. The index consists of gene names

  • z_scores_2 (pandas.Series) – Similar to z_scores_1. This and z_scores_1 must contain the same genes (ie. come from the same interactome network)

  • z_score_threshold (float) – threshold to determine whether a gene is a part of the network overlap or not. Genes with combined z-scores below this threshold will be discarded

  • z1_threshold (float) – individual z1-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z1-scores below this threshold will be discarded

  • z2_threshold (float) – individual z2-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z2-scores below this threshold will be discarded

  • num_reps (int) – Number of repitions of randomly shuffling input z_score vectors to generate null distribution

  • plot (bool) – If True, distribution will be plotted

Returns:

Observed overlap size, and vector of randomized overlap sizes from permuted z_scores

Return type:

float, np.array of floats

netcoloc.network_colocalization.calculate_mean_z_score_distribution(z1, z2, num_reps=1000, zero_double_negatives=True, overlap_control='remove', seed1=[], seed2=[])[source]

Determines size of expected mean combined z=z1*z2 by randomly shuffling gene names

Args:

z1 (pd.Series, pd.DataFrame): Vector of z-scores from network propagation of trait 1 z2 (pd.Series, pd.DataFrame): Vector of z-scores from network propagation of trait 2 num_reps (int): Number of perumation analyses to perform. Defaults to 1000 zero_double_negatives (bool, optional): Should genes that have a negative score in both z1 and z2 be ignored? Defaults to True. overlap_control (str, optional): ‘bin’ to permute overlapping seed genes separately, ‘remove’ to not consider overlapping seed genes. Any other value will do nothing. Defaults to “remove”. seed1 (list, optional): List of seed genes used to generate z1. Required if overlap_control!=None. Defaults to []. seed2 (list, optional): List of seed genes used to generate z2. Required if overlap_control!=None. Defaults to [].

Returns:

float: The observed mean combined z-score from network colocalization list: List of permuted mean combined z-scores

netcoloc.network_colocalization.calculate_network_enrichment(z_D1, z_D2, zthresh_list=[np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), np.int64(6), np.int64(7), np.int64(8), np.int64(9), np.int64(10), np.int64(11), np.int64(12), np.int64(13), np.int64(14)], z12thresh_list=[1, 1.5, 2], verbose=True)[source]

Evaluate NetColoc enrichment for a range of thresholds on network proximity z-scores.

Parameters:
  • z_D1 (pandas.DataFrame) – DataFrame containing gene names and network proximity z-scores for the first gene set

  • z_D2 (pandas.DataFrame) – DataFrame containing gene names and network proximity z-scores for the second gene set

  • zthresh_list (:list) – list of combined z-score thresholds to iterate over

  • z12thresh_list (:list) – list of individual z-score thresholds to iterate over

  • verbose (Boolean) – if True, print out some diagnostics

Returns:

netcoloc_enrichment_df: DataFrame containing NetColoc enrichment results

Return type:

pandas.DataFrame

netcoloc.network_colocalization.calculate_network_overlap(z_scores_1, z_scores_2, z_score_threshold=3, z1_threshold=1.5, z2_threshold=1.5)[source]

Function to determine which genes overlap. Returns a list of the overlapping genes

Parameters:
  • z_scores_1 (pandas.Series, pandas.DataFrame, numpy.ndarray) – Result from netprop_zscore() or calculate_heat_zscores() containing the z-scores of each gene following network propagation. The index consists of gene names

  • z_scores_2 (pandas.Series, pandas.DataFrame, numpy.ndarray) – Similar to z_scores_1. This and z_scores_1 must contain the same genes (ie. come from the same interactome network)

  • z_score_threshold (float) – threshold to determine whether a gene is a part of the network overlap or not. Genes with combined z-scores below this threshold will be discarded

  • z1_threshold (float) – individual z1-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z1-scores below this threshold will be discarded

  • z2_threshold (float) – individual z2-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z2-scores below this threshold will be discarded

Returns:

genes in the network overlap (genes with high combined z-scores)

Return type:

list

netcoloc.network_colocalization.calculate_network_overlap_subgraph(interactome, z_scores_1, z_scores_2, z_score_threshold=3, z1_threshold=1.5, z2_threshold=1.5)[source]

Function to return subgraph of network intersection.

Code to create subgraph is from NetworkX subgraph documentation

Parameters:
  • interactome (networkx.Graph) – network whose subgraph will be returned

  • z_scores_1 (pandas.Series) – Result from netprop_zscore() or calculate_heat_zscores() containing the z-scores of each gene following network propagation. The index consists of gene names

  • z_scores_2 (pandas.Series) – Similar to z_scores_1. This and z_scores_1 must contain the same genes (ie. come celfrom the same interactome network)

  • z_score_threshold (float) – threshold to determine whether a gene is a part of the network overlap or not. Genes with combined z-scores below this threshold will be discarded

  • z1_threshold (float) – individual z1-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z1-scores below this threshold will be discarded

  • z2_threshold (float) – individual z2-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z2-scores below this threshold will be discarded

Returns:

Subgraph of the interactome containing only genes that are in the network intersection (genes with high combined z-scores)

Return type:

networkx.Graph

netcoloc.network_colocalization.get_p_from_permutation_results(observed, permuted, alternative='greater')[source]

Calculates the significance of the observed mean relative to the empirical normal distribution of permuted means using a one-sided test or two sided test.

Args:

observed (float): The observed value to be tested permuted (list): List of values that make up the expected distribution alternative (str, optional): The alternative hypothesis to test against. Can be ‘greater’, ‘less’, or ‘two-sided’. Defaults to ‘greater’.

Returns:

float: p-value from z-test of observed value versus the permuted distribution

netcoloc.network_colocalization.sweep_input_pvals(D1_df, D2_df, individual_heats_matrix, nodes, degrees, cutoff_list=[0.01, 0.02, 0.03, 0.04, 0.05, 0.1], gene_column='gene', score_column='pval', cutoff_max=True, num_reps=100, verbose=True, z_score_threshold=3, z12_threshold=1.5)[source]

Evaluate NetColoc enrichment for a range of thresholds on input gene lists.

Parameters:
  • D1_df (pandas.DataFrame) – DataFrame containing gene names and scores for the first gene set

  • D2_df (pandas.DataFrame) – DataFrame containing gene names and scores for the second gene set

  • individual_heats_matrix (numpy.ndarray) – output of the netprop.get_individual_heats_matrix. A square matrix containing the final heat contributions of each gene

  • nodes (list) – nodes, in the order in which they were supplied to the get_normalized_adjacency_matrix() method which returns the precursor to the individual_heats_matrix

  • degrees (dict) – Mapping of node names to node degrees

  • cutoff_list (list) – list of values to threshold the input gene sets by

  • gene_column (string) – name of column containing genes in D1_df and D2_df

  • score_column (string) – name of column containing scores (usually p-value or log fold change) in D1_df and D2_df

  • cutoff_max (Boolean) – if True, genes will be selected which have scores less than the cutoff value. If false, genes will be selected which have scores greater than the cutoff value.

  • num_reps (int) – Number of times the network propagation algorithm should be run using random seed genes in order to build the null model

  • verbose (Boolean) – if True, print out some diagnostics

  • z_score_threshold (float) – threshold to determine whether a gene is a part of the network overlap or not. Genes with combined z-scores below this threshold will be discarded

  • z12_threshold (float) – individual z1/z2-score threshold to determine whether a gene is a part of the network overlap or not. Genes with z1/z2-scores below this threshold will be discarded

Returns:

netcoloc_pval_df: DataFrame containing NetColoc enrichment results

Return type:

pandas.DataFrame

netcoloc.network_colocalization.transform_edges(G, method='cosine_sim', edge_weight_threshold=0.95)[source]

Transforms binary edges using selected method (currently only cosine similarity is implemented). Cosine similarity measures the similarity between neighbors of node pairs in the input network

Parameters:
  • G (networkx.Graph) – network whose edges will be transformed

  • method (str) – Method to use, only cosine_sim supported. Any other value will cause this method to output a warning and immediately return

  • edge_weight_threshold (float) – Transformed edges will be returned which have values greater than this

Returns:

Graph with nodes identical to input G, but with transformed edges (values > edge_weight_threshold)

Return type:

networkx.Graph

netcoloc.network_colocalization.view_G_hier(G_hier, layout='cose')[source]

In-notebook visualization of NetColoc hierarchy, using ipycytoscape.

Parameters:
  • G_hier – network to visualize. Expects output of cdapsutil.CommunityDetection(), transformed to networkx format. ‘CD_MemberList_LogSize’ is a required field of the network to map to the node size.

  • layout (str) – Layout method to use, any layout supported by cytoscape.js is supported. Suggest ‘cose’ or ‘breadthfirst’.

  • edge_weight_threshold (float) – Transformed edges will be returned which have values greater than this

Returns:

Nothing

netcoloc.network_localization module

netcoloc.validation module

Functions for performing validation of NetColoc subgraph

netcoloc.validation.MPO_enrichment_full(hier_df, MPO, mgi_df, MP_focal_list, G_int, use_ddot=False, min_genes=10, max_genes=2000, verbose=False)[source]

Function to test for enrichment of genes resulting in selected phenotypes when knocked out in every NetColoc system (not just root)

The returned pandas.DataFrame will have these columns:

  • log(OR_p) - -log10(Odds ratio p-vlaue)

  • log_OR - natural log odds ratio

  • num_genes - number of genes in MPO term overlapping with focal system

  • gene_ids - list of overlapping genes between MPO term and focal system

Parameters:
  • hier_df (pandas.DataFrame) – NetColoc systems map (processed output from cdaps_util)

  • MPO (ddot.Ontology) – DDOT ontology containing the parsed mammalian phenotype ontology

  • mgi_df (pandas.DataFrame) – parsed MGI knockout dataframe

  • MP_focal_list (list) – List of MPO phenotypes to check for enrichment against

  • G_int (networkx.Graph) – Background interactome

  • use_ddot (boolean) – Use the deprecated DDOT package to load the MGI ontology. Default False to use obonet

Returns:

Dataframe containing enrichment results

Return type:

pandas.DataFrame

netcoloc.validation.MPO_enrichment_root(hier_df, MPO, mgi_df, MP_focal_list, G_int, verbose=False, use_ddot=False, min_genes=10, max_genes=2000)[source]

Function to test for enrichment of genes resulting in selected phenotypes when knocked out in root node of NetColoc hierarchy.

The returned pandas.DataFrame will have the following columns:

  • OR_p - Odds ratio p-vlaue

  • log_OR - natural log odds ratio

  • log_OR_CI_lower - lower 95% confidence interval on log_OR

  • log_OR_CI_upper - upper 95% confidence interval on log_OR

  • num_genes_in_term - number of genes in MPO term

  • MP_description - description of MPO phenotype

Parameters:
  • hier_df (pandas.DataFrame) – NetColoc systems map (processed output from cdaps_util)

  • MPO (ddot.Ontology) – DDOT ontology containing the parsed mammalian phenotype ontology

  • mgi_df (pandas.DataFrame) – parsed MGI knockout dataframe

  • MP_focal_list (list) – List of MPO phenotypes to check for enrichment against

  • G_int (networkx.Graph) – Background interactome

  • verbose (bool) – If true, print out some progress

  • use_ddot (boolean) – Use the deprecated DDOT package to load the MGI ontology. Default False to use obonet

Returns:

Dataframe containing enrichment results

Return type:

pandas.DataFrame

netcoloc.validation.check_keywords(keywords, description)[source]

Function to check if any of the keywords are present in the description. :param keywords: List of keywords to search for :type keywords: list :param description: Description to search in :type description: str :return: True if any keyword is found, False otherwise :rtype: bool

Function to find terms in the MPO that match a list of keywords.

Parameters:
  • MPO (networkx.MultiDiGraph) – DDOT ontology containing the parsed mammalian phenotype ontology

  • keywords (list) – List of keywords to search for in the term descriptions

  • use_ddot (boolean) – Use the deprecated DDOT package to load the MGI ontology. Default False to use obonet

Returns:

List of terms that match the keywords

Return type:

list

netcoloc.validation.focus_ontology(ont, start_node, include_children=True, include_parents=False)[source]

Focus the ontology on a specific node and its neighbors. :param ont: The ontology to focus on :type ont: networkx.MultiDiGraph :param start_node: The node to focus on :type start_node: str :param include_children: Whether to include children of the start node :type include_children: bool :param include_parents: Whether to include parents of the start node. Note that this will include the root node. :type include_parents: bool :return: List of nodes in the focused ontology :rtype: list

netcoloc.validation.format_mapping(mapping, gene_col='human_ortholog', term_col='MP')[source]
netcoloc.validation.get_MP_description(term_id, ontology, use_ddot=False, include_definition=False)[source]

Function to get the description of a given MPO term. :param term_id: ID of the MPO term :type term_id: str :param ontology: The ontology to get the term description from :type ontology: ddot.Ontology or networkx.MultiDiGraph :param use_ddot: Use the deprecated DDOT package to load the MGI ontology. Default False to use obonet :type use_ddot: boolean :param include_definition: If True, include the definition in the description :type include_definition: bool :return: Description of the MPO term :rtype: str

netcoloc.validation.get_focal_terms(MPO, MP_focal, use_ddot=False, include_children=True, include_parents=False)[source]

Function to get the focal terms for a given MPO term. :param MPO: DDOT ontology containing the parsed mammalian phenotype ontology :param MP_focal: Focal MPO term to get the description for :param use_ddot: Whether to use the DDOT package :param include_children: Whether to include child terms :param include_parents: Whether to include parent terms :return: Tuple containing the focal terms and their description

netcoloc.validation.load_MGI_mouseKO_data(url='http://www.informatics.jax.org/downloads/reports/MGI_PhenoGenoMP.rpt', update=False, data_loc=None, map_using='mygeneinfo', verbose=False)[source]

Function to parse and load mouse knockout data from MGI.

Parameters:
  • url (str) – location of MGI knockout data

  • data_loc (str) – location to save the downloaded file, if None, saves in current directory

  • update (bool) – whether to update the data if it already exists

Returns:

parsed MGI knockout dataframe, including column for human orthologs

Return type:

pandas.DataFrame

netcoloc.validation.load_MPO(url='http://www.informatics.jax.org/downloads/reports/MPheno_OBO.ontology', use_ddot=False, update=False, data_loc=None)[source]

Function to parse and load mouse phenotype ontology, using DDOT’s ontology module

Parameters:
  • url (str) – URL containing MPO ontology file

  • use_ddot (boolean) – Use the deprecated DDOT package to load the MGI ontology. Default False to use obonet

Returns:

MPO parsed using DDOT

Return type:

ddot.Ontology

Raises:

ImportError – If DDOT package is not found

netcoloc.validation.map_genes_to_MPO(MPO, mapping, restrict_to=None, map_col='human_ortholog', MP_col='MP')[source]
netcoloc.validation.map_mgi_to_human_orthologs(mgi_df, map_using='mygeneinfo', verbose=False, data_loc=None, update=False)[source]
netcoloc.validation.perform_hypergeometric_test(hier_df, hier_node, G_int, mgi_genes)[source]
netcoloc.validation.perform_log_odds_z_test(mgi_genes, focal_genes, G_int_nodes, verbose=False, name='')[source]
netcoloc.validation.test_single_MPO_term(focal_terms, hier_df, hier_node, mgi_df, G_int_nodes, verbose=False, min_genes=11, max_genes=2000, name='')[source]

Function to test for enrichment of genes resulting in selected phenotypes :param focal_terms: List of MPO phenotypes to check for enrichment against :param hier_df: Hierarchical DataFrame containing gene information :param hier_node: Node in the hierarchy to test :param mgi_df: DataFrame containing MGI gene information :param G_int_nodes: List of nodes in the interaction graph :param verbose: Whether to print detailed output :param min_genes: Minimum number of genes required for testing :param max_genes: Maximum number of genes allowed for testing :param name: Name of the test (for output purposes) :return: Tuple containing p-value, confidence interval, log odds ratio, and list of MGI genes

Module contents