% Compute functional coherence score and statistical significance of a protein set (given a reference set) % % Author: monica.chagoyen@cnb.csic.es % % Reference: Chagoyen et al. "Assessment of protein set coherence using functional annotations" % [Full citation here] % % Additional info: http://www.cnb.csic.es/~monica/coherence/ % % Input: % Msim - protein x protein similarity matrix of reference set (IMPORTANT NOTE: diagonal should contain 0 values) % set - indices of Msim corresponding to proteins in set % ng - number of proteins in reference % % Output: % score - overall functional coherence of the protein set % pval1 - p-value corresponding to neighborhood num. 1 (see reference for details) % pval2 - p-value corresponding to neighborhood num. 2 (see reference for details) % pval3 - p-value corresponding to neighborhood num. 3 (see reference for details) % N - number of proteins in set % X - number of set proteins in neigh.1 (= core) % K - number of reference proteins in neigh.1 not in set % Nc - number of set proteins in neigh. 2 % Er - number of reference proteins in neigh. 2 not in set % Ns - number of set proteins in neigh. 3 % E - number of reference proteins in neigh. 3 not in set % function [score, pval1, pval2, pval3, N, X, K, Nc, Er, Ns, E]=sim_score(Msim, set, ng) nset=setdiff((1:ng),set); Yset=squareform(Msim(set,set)); score=mean(Yset); vmset=sum(Msim(set,set))/(length(set)-1); core=find(vmset>=score); vm=mean(Msim(set,nset)); mx=max(Msim(set,nset)); vmres=mean(Msim(set(core),nset)); mxres=max(Msim(set(core),nset)); mxse=max(Msim(set,set)); mxco=max(Msim(set(core),set)); N=length(set); X=length(core); K=length(find(vm>=score)); pval1=1-hygecdf(X-1,ng,K+X,N); Nc=length(find(mxco>=score)); Er=length(find(mxres>=score)); pval2=1-hygecdf(Nc-1,ng,Er+Nc,N); E=length(find(mx>=score)); Ns=length(find(mxse>=score)); pval3=1-hygecdf(Ns-1,ng,E+Ns,N);