PE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> часы rolex Human symptoms–disease network (PDF Download Available) nbaqghil

Human symptoms–disease network (PDF Download Available) часы rolex

Human symptoms–disease network

Article  · June 2014   with   220 Reads DOI: 10.1038/ncomms5212 · Source: PubMedAbstractIn the post-genomic era, the elucidation of the relationship between the molecular origins of diseases and their resulting phenotypes is a crucial task for medical research. Here, we use a large-scale biomedical literature database to construct a symptom-based human disease network and investigate the connection between clinical manifestations of diseases and their underlying molecular interactions. We find that the symptom-based similarity of two diseases correlates strongly with the number of shared genetic associations and the extent to which their associated proteins interact. Moreover, the diversity of the clinical manifestations of a disease can be related to the connectivity patterns of the underlying protein interaction network. The comprehensive, high-quality map of disease-symptom relations can further be used as a resource helping to address important questions in the field of systems medicine, for example, the identification of unexpected associations between diseases, disease etiology research or drug design.

Discover the world's research

Join for free Figures ARTICLEReceived 7 Nov 2013 | Accepted 27 May 2014 | Published 26 Jun 2014 Human symptoms–disease networkXueZhong Zhou 1,2,3, *, Jo ¨ rg Menche 2,3,4, *, Albert-La ´ szlo ´ Baraba ´ si 2,3,4,5,6 & Amitabh Sharma 2,3,6 In the post-genomic era, the elucidation of the relationship between the molecular orig nbaqghil. orologio di diamanti hublotins ofdiseases and their resulting phenotypes is a crucial task for medical research. Here, we use alarge-scale biomedical literature database to construct a symptom-based human diseasenetwork and investigate the connection between clinical manifestations of diseases and theirunderlying molecular interactions. We find that the symptom-based similarity of two diseasescorrelates strongly with the number of shared genetic associations and the extent to whichtheir associated proteins interact. Moreover , the diversity of the clinical manifestations of adisease can be related to the connectivity patterns of the underlying protein interactionnetwork. The comprehensive, high-quality map of disease–symptom relations can further beused as a resource helping to address important questions in the field of systems medicine,for example, the identification of unexpected associations between diseases, disease etiologyresearch or drug design.DOI: 10.1038/ncomms52121 School of Computer and Information T echnology and Beijing Key Lab of T raffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China. 2 Center for Complex Network Research, Northeastern University Physics Department, 111 DA/Physics Dept., 110 Forsyth Street, Boston, Massachusetts 02115, USA. 3 Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Smith Bldg., Rm. 858A, 450 Brookline A ve, Boston, Massachusetts 02215, USA. 4 Department of Theoretical Physics, Budapest University of T echnology and Economics, Budafoki u ´ t. 8, 1111 Budapest, Hungary. 5 Center for Network Science, Central European University, Na´ dor u ´ t. 9, 1051 Budapest, Hungary. 6 Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, Massachusetts 02115, USA. * These authors contributed equally to this work. Correspondence andrequests for materials should be addressed to X.Z. (email: or to A.S. (email: COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | 1 & 2014 Macmillan Publishers Limited. All rights reserved. T he past decades have brought remarkable advances in our understanding of human disease 1 . While progress on the genetic and proteomic aspec ts has been impr essive 2 , most aspects of the relation be tween genotype a nd phenotype stil lremain uncle ar, especiall y for complex disease s 1 . Heterogen eity, polygeni city and pleiotro pism are major factor s that are hamperingthe progress 3,4 , as well as diffuse bo undaries betw een diseases 5 ,a s they can hav e multiple caus es and be related t hrough severaldimension s 6–13 . A number of reso urces have been cons tructed aiming to un derstand the en tangled relat ionship betwe en diseases,often in the f orm of networks 6 . For example, Rz hetsky et al. 12 inferred t he comorbidi ty links between 161 disorders fr om thedisease hist ory of 1.5 million pat ients and proposed mod els toestimate th e genetic overl ap between dise ases. Hidalg o et al. 9 construct ed a disease phenoty pic network usin g comorbiditypatterns fr om more than 30 mi llion Medicare pa tients, capt uringdisease prog ression patte rns, such as that patie nts tend to developdiseases i n the network vic inity of dise ases that they a lready haveand that pa tients with high ly interconne cted disease s show highermortali ty. In model organ isms, for example , physical pro teininteract ions point to genes tha t are related to simila r phenotypeswhen knoc ked out 14–17 . Furt hermore, a numbe r of studies indicated th at similarity bet ween phenotypes re flects biologic almodules of inte racting funct ionally relate d genes. Likewise,phenotyp ic similariti es between monog enic syndrome s in humanhave been sh own to reflect sha red biologica l mechanisms and canbe exploite d to predict gene functi on 18–20 . Intere stingly, the inclusion of di sease phenotyp e similarities ca n substantiall yimprove th e performanc e of candidate gen e predictionmethods 21– 24 . Resources lik e the Human Phenotype O ntology 25 (HPO) and the Mamm alian Phenoty pe Ontology 26 prov ide a standard ized vocabulary o f phenotypic info rmation that can also beused to transf er detailed knowle dge of model organisms tointerpre t and predict associa ted phenomena in hu man 27,28 . An important available resource that has been overlooked sofar is the highest level clinical phenotypes, that is, symptoms andsigns (called symptoms in brief in the following). Symptoms arecrucial in clinical diagnosis and treatment. For example, themajor symptoms of a heart attack are pain or discomfort in thechest, arms or shoulder, jaw, neck, or back, feeling weak, light-headed or faint and shortness of breath 29 . The wide range of symptoms illustrates the interdependence of the homeostaticmechanisms, whose perturbations lead to the manifestation of adisease. Community health professionals and generalpractitioners derive most of their knowledge of the symptomsof individual diseases from hospital-based observation 30 . Indeed, symptoms are the most directly observable characteristics of adisease and the very basis of clinical disease classification. Theelucidation of the connection between shared symptoms andshared genes or protein–protein interactions of two diseases couldtherefore help bridge the gap between bench-based biologicaldiscovery and bedside clinical solutions.In this paper, w e use large-scale med ical bibliogra phic recordsand the relate d Medical Subject Head ings (MeSH) meta data 31 from PubMed 32 , to ge nerate a symptom- based network of huma n diseases (H uman Symptoms Dise ase Network, HSDN ), where thelink weight be tween two disease s quantifies the si milarity of theirrespecti ve symptoms. By in tegrating di sease–gene ass ociation andprotein– protein intera ction (PPI) data , we investiga te thecorrelat ions between the symptom si milarity of di seases and theirdegree of sh ared genes or PPIs (Fig. 1 a nd Supplementa ry Fig. 1).ResultsConstruction of the HSDN . We extracted 7,109, 429 (about 35.5% in over twenty mill ion records) PubMed bibl iographic records wit hone or more dise ase/symptom term s in the MeSH metadata fie ld(see Methods) , yielding a total of 4,442 disea se terms and 322symptom terms (Su pplementary Data 1 and 2). Afte r filtering forthe co-occur rence of at least one di sease and one sympto m term,849,103 (4.2% ) PubMed records were left . From these records, weextracted th e symptom–disea se relationships, resulting in 147,978connection s between 322 symptoms and 4,21 9 diseases (Fig. 2,Supplement ary Data 3), which represen t 98.5% of all symptomsand 95.0% of all diseas es contained in the MeSH voca bulary. Toquantify the re lation between a sy mptom and a disease , we thenused the term fr equency-invers e document freque ncy (seeMethods). Afte r measuring the symptom si milarities for all disea sepairs, we obtain ed the HSDN with 7,488,851 link s with positivesimilarity bet ween 4,219 diseases . The HSDN covers all MeSHdisease catego ries, from broad cate gories like cancer to spec ificconditions li ke cerebral cavernous he mangioma. The twenty mo stfrequent dise ases and symptoms are depict ed in Fig. 2a,b. The twomost frequent di seases in the PubMed da tabase are breast canc erand hyperten sion. Note that this reflects the cumula tive focus ofresearch in the biom edical field rather than th e epidemical pre-valence of dise ases. The HSDN cons titutes a single gi ant compo-nent, tha t is, all diseases d irectly or indirec tly connect to all o thers.The network is v ery dense, with 94% of th e nodes being connec tedto more than 50% of all other nodes (Fig. 2d). The most highlyconnected dise ase is Hyponatremia (4, 214 disease neighbou rs), anelectrolyt e disorder associated wi th a number of common symp-toms that occur in many dise ases, such as headache, naus ea andfatigue. The di sease with the few est connections is Odontoma(eight disease ne ighbours), a tumour orig inating from teeth.Performance evaluation of the HSDN . In order to validate our approach, we di d an extensive manu al quality check of th e coredata. We randomly se lected 1,000 PubMed reco rds and manuallyevaluated th e extracted sympto m–disease relati ons with the aid ofmedical expert s (see Supplementa ry Methods, Supplemen taryData 5). We find that (i) the vast ma jority of the relations a remedically mea ningful and direct. Th e only notable (5.5% o f therandom record s) confounding fact ors were symptoms relat ed todrug treatmen t instead of the immediate di sease. (ii) The diseaserelations in the HSDN a re very specific, 57% of the random re cordscontain only a si ngle disease, 28.5 % contain two and only 14.5%more than two. (iii) Th e automated process yiel ds very few falsepositives: on ly 0.8% of the cases contained a nega tion as in ‘diseaseX is NOT related to sympto m Y’ that our text mining approachcould not capt ure.To further test the reliability of the obtained disease similarityscore, we create a benchmark disease network using the manuallycurated HPO 25 data (Supplementary Methods), in which two diseases are connected if they share at least one symptom. Thebenchmark network includes 940 MeSH diseases (correspondingto 2,111 OMIM disease identifiers) and 121,945 links. It is muchsmaller than the HSDN, but arguably of high quality. Comparingthe HSDN with the HPO network, we find that higher symptomsimilarity in the HSDN is related to higher edge overlap with theHPO network (Fig. 3a). The Pearson correlation coefficient(PCC) between the ratio of shared disease links and diseasesimilarity is very high (PCC ¼ 0.96, P ¼ 1.4  10  5 ), indicating that the proposed disease similarity is a reliable measure forshared symptoms. For comparison with random expectation, wereshuffled (10 random permutations) the symptom features ofeach disease using the Fisher–Yates method 33 , finding significantly less overlap and fewer high similarity values(Fig. 3b). In randomized networks, most disease similarities arelow ( o 0.1), and their distribution is significantly different from the one in the real HSDN, where the count of disease linksdeclines much more slowly with increasing disease similarity.ARTICLE NA TURE COMMUNICATIONS | DOI: 10.1038/ncomms5212 2 NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | & 2014 Macmillan Publishers Limited. All rights reserved. To further examine the completeness of the HSDN, we calculatethe number of the common nodes and links with the HPO diseasenetwork (Fig. 3c). The results show that the benchmark networkfrom HPO is almost a complete subset of the HSDN, whichcaptures 898 of its nodes (95.5%) and 107,098 of its links (87.8%of the whole HPO network, 95.7% of the subnetwork of the 898common nodes). The number of overlapping links is significantlyhigher ( P ¼ 2.2  10  16 , binomial test, see Supplementary Methods) than random expectation, again indicating that theHSDN offers reliable relationships.D9D7D8D4D3D5D2D6D1DnD9D7D8D4D3D5D2D6D1DnSymptomsimilarityHighLowD1DnDiseases –Shared genes1st order PPI2nd order PPID1DnDiseases –Disease–disease networkbased on symptom similarity Disease–disease networkbased on shared genes/PPIsGenes ofdisease AGenes ofdisease BGenes ofdisease CD3DnD1S2D4D2S1SmS3......DiseasesSymptomsAssociation bybibliographicco-occurenceA & Bsharegene( )B & Cshare1st orderPPI( )A & Cshare2nd orderPPI ( )1234131514586 9 71110121 Metabolic diseases2 Metabolic syndrome X3 Diabetes mellitus4 Hypercholesterolemia5 ObesityUrogenital diseases6 Glomerulonephritis7 Kidney diseases8 Proteinuria9 Amyloidosis10 Pregnancy complications11 Fetal growth retardation12 Spontaneous abortion Eye diseases13 Retinal degeneration14 Uveitis15 Choroid diseasesExtracting disease–symptom relationships Extracting disease-gene relationshipsBackbone of the symptom–disease networkProteinsPPICardiovascular Skin RespiratoryMuscularDigestive systemNervous SystemBacterial infections& mycosisHemic &lymphaticNeoplasmsImmune systemMental disordersFigure 1 | Construction of the HSDN. ( a ) Extracting the disease–symptom relationships from PubMed bibliographic literature database. The association between symptoms and diseases are based on their co-occurrence in the MeSH metadata fields of PubMed. ( b ) A disease network is constructed, in which nodes represent diseases and links represent symptom similarities between diseases. ( c ) Integrating both disease–g ene associations and PPI databases to obtain shared genes/PPIs between diseases. W e consider shared PPIs of 1st order (directly connected proteins) and of 2nd order (pr oteins areconnected by a path of length two). ( d ) Resulting disease network in which links represent shared genes/PPIs. ( e ) The backbone of the HSDN with shared genes/PPIs. We observ e highly clustered regions of diseases that belong to the same broad disease category.NATURE C OMMUNICATIONS | DOI: 10.1038/ncomms5212 ARTICLE NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | 3 & 2014 Macmillan Publishers Limited. All rights reserved. Shared symptoms indicate shared genes between diseases . We integrated three genotype–phenotype databases, yielding28,336 disease–gene associations (Supplementary Methods,Supplementary Data 6) and constructed a Human Disease Net-work as described in Goh et al. 13 , in which two diseases are connected if they share an associated gene. The resulting networkconsists of 1,741 diseases and 47,410 links. Comparing the linkoverlap between the HSDN and Human Disease Network, wefind a total of 41,880 overlapping links (20,182 overlappingdisease links with similarity score Z 0.2, a 1.8-fold increase compared with random expectation, P ¼ 2.2  10  16 , binomial test; Fig. 4b). The overlapping link ratio (fraction of disease pairswith both shared symptoms and shared genes of all disease pairswith shared symptoms) shows strong positive correlation withdisease similarity (PCC ¼ 0.92 and P ¼ 1.8  10  4 ; Fig. 4a), that is, diseases with more similar symptoms are more likely to havecommon gene associations. Disease pairs with well-establishedsimilar clinical manifestations and known common genesinclude, for example, hypoalphalipoproteinemia and metabolicsyndrome (similarity score 0.97), insulin resistance and metabolicsyndrome (0.99), insulin resistance and diabetes mellitus (0.97),fatty liver and diabetes mellitus (0.93) and duodenal ulcer andstomach ulcer (0.93). High similarity scores can also suggest yetunknown common genetic associations. For example, a recentstudy 34 established similar patterns of genomic alteration in the two cancer types colonic neoplasm and rectal neoplasm. In theHSDN, they also have very similar clinical manifestations(similarity score 0.64), even higher values are obtained betweenthe related terms rectal neoplasms and colorectal neoplasms(0.92) or colonic neoplasms and colorectal neoplasms (0.73).Shared symptoms indicate shared protein interactions .T o further assess whether shared symptoms indicate not only sharedgenetic associations, but also close interaction of the corre-sponding proteins, we integrated five publicly available PPIdatabases (Supplementary Methods) and constructed diseasenetworks in which two diseases are linked if they have shared 1stand 2nd order PPI interactions, respectively: shared 1st order PPImeans that two diseases have associated proteins that directlyinteract within the PPI network, while shared 2nd order PPImeans that they are connected by a path of length two (Fig. 1c,d).In both cases, we find strong positive correlations betweensymptom similarity and shared PPIs. The ratio of diseases withshared PPIs increases significantly with higher symptom simi-larity (PCC ¼ 0.89, P ¼ 5.4  10  4 for 1st order interactions, Fig. 5a; PCC ¼ 0.84, P ¼ 0.002 for 2nd order interactions, Fig. 5b). It is well established that proteins associated to the same humandisease/disease category or phenotype tend to interact with eachother 13,20,35 . In contrast to previous phenotype maps 19 , the HSDN strictly considers only symptom features (excluding inparticular disease terms themselves, anatomical features,congenital abnormalities, and so on) and is not focused onmonogenic diseases, but includes all disease categories. Ourresults therefore provide robust evidence that interacting proteinsbetween diseases are also connected to similar high-levelmanifestations.This broader scope enables us to extend previous approachesto uncover novel disease associations. For example, it isconsidered that both genetic and environmental factors play arole in the pathogenesis of Parkinson’s disease (PD) 36 , which is characterized by resting tremor, akinesia and rigidity. In the405060708090100110120130Term frequency(thousands)020406080100120140160Term frequency(thousands)0.0010.010.1105001,0001,5002,0002,5003,0003,5004,0004,500100101102100101102Number of recordsSymptom–disease co-occurenceBreast neoplasmsHypertensionCoronary artery diseaseLung neoplasmsMyocardial infarctionHiv infectionsBrain neoplasmsDiabetes mellitus, type 2SchizophreniaArthritis, rheumatoidLiver neoplasmsPainCarcinoma, squamous cellObesitySkin neoplasmsProstatic neoplasmsDementiaAdenocarcinomaAsthmaCoronary diseaseParalysisFatigueUrinary incontinencePsychophysiologic disordersVomitingProteinuriaWeight gainWeight lossVision disordersHeadacheDeafnessPain, postoperativeFeverBirth weightOedemaAngina pectorisDiarrheaSeizuresMental retardationAnoxiaObesityPainBody weightFrequency P(k) Degree k Figure 2 | Basic statistics of the HSDN. ( a ) The twenty most frequent disease terms in the MeSH fields of PubMed records, containing eight types of cancers (for example, breast neoplasms, lung neoplasms), four types of vascular diseases (for example, hypertension, myocardial infarction andcoronary diseases), HIV infections, asthma, obesity , pain, rheumatoid arthritis, type 2 diabetes and two mental diseases. Breast neoplasms have more than120,000 PubMed occurrences. ( b ) The top twenty symptom terms include five body weight–related symptom terms. Note that in MeSH, pain is also considered as a symptom, occurring more than 100,000 times in the PubMed database. ( c ) Symptom and disease co-occurrence distribution. ( d ) Distribution for the number of connections (degrees) of nodes in the HSDN. ARTICLE NA TURE COMMUNICATIONS | DOI: 10.1038/ncomms5212 4 NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | & 2014 Macmillan Publishers Limited. All rights reserved. HSDN, we found that PD has highly similar symptomswith substance-related diseases like mercury poisoning (0.60),MPTP (1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine, a toxin)poisoning (0.58) and manganese poisoning (0.52). MPTP isan established disease model for PD 37 , and manganese poisoning has also been proposed recently 38 . Similarly, it has been suggested that the molecular response to mercury exposuremay increase dopamine neuron vulnerability and the propensityto develop PD 39 . The results above indicate that high symptom similaritystrongly correlates with shared genes, as well as with 1st- and2nd-order protein interactions. This suggests that there is ageneral relationship between phenotypic similarity on one hand,and path lengths on the PPI network on the other hand. To testthis hypothesis, we calculate the minimum shortest path length(MSPL) of proteins within the PPI network for each disease pair(see Methods). Indeed, we find strong negative correlationbetween the MSPL and symptom similarities (PCC ¼ 0.93 and P ¼ 7.7  10  5 ; Fig. 6a,b), that is, the higher the symptom similarity, the shorter the PPI network distance between diseases.The MSPL decreases from 2.88 to 1.98 when disease similaritybins increase from 0.1 to 1.0. This indicates that the networkparsimony principle 6 according to which causal molecular pathways tend to coincide with shortest network paths can beused to quantify the correlation between manifestations ofdiseases and their related protein interactions.Diversity of disease manifestations and molecular mechanisms . In genetic nosology, it has been recognized that due to pleio-tropism and genetic heterogeneity there is a large discrepancybetween the diversity of their clinical manifestations and theunderlying cellular mechanisms 4 . For example, sickle cell disease has rather diverse clinical manifestations, such as mild anaemia,painful crises, bony infarcts and acute chest syndrome, despitebeing a classical monogenic disease. Familial hypertrophiccardiomyopathy on the other hand, is caused by mutations of anumber of different genes, yet its pathophysiology largelymanifests itself in a specific portion of the heart muscle (whichin turn may lead to several clinical phenotypes). To fully unravelthese complex relations, comprehensive and complete maps areneeded that combine genome or proteome components withintermediate phenotype components, environmental factors andpathophenotypes 5 . 202530354045505560650 0.2 0.4 0.6 0.8 1Fraction of overlapping linksin HPO networkDisease similarity binsRandomObserved1011021031041050 0.2 0.4 0.6 0.8 1Number of overlapping linksin HPO networkDisease similarity binsRandomObserved00.0010.0020.0030.0040.005103,200 103,400 103,600 103,800 104,000 104,200Probability densityLink overlapRandomObserved:107,098Figure 3 | Reliability evaluation of symptom similarity in the HSDN.( a ) The percentage of HPO network disease links in the HSDN for different similarity bins. In the real data, stronger symptom similarity is related tohigher edge overlap with the HPO network. For high similarity values,the overlap is much bigger than expected by chance as in 10 randompermutation cases. ( b ) The overlapping edge count distributions for real data and random permutation. Error bars in a and b denote s.d. ( c ) Number of overlapping disease links (observed overlapping links versus randomexpectation).00.0010.0020.0030.0040.0050.0060.0070.00839,200 39,300 39,400 39,500 39,600 39,700 39,800 39,900Probability densityLink overlapRandomObserved:41,8800510152025300 0.2 0.4 0.6 0.8 1Fraction of disease linkswith shared genesDisease similarity binsRandomObserved00.0010.0020.0030.0040.005Probability densityLink overlapObserved:20,18210,800 11,000 11,200 11,400 11,600 11,800Figure 4 | Correlation between symptom similarity and shared genes.( a ) The link overlap between the disease network based on shared symptoms and the disease network based on shared genes. Randomexpectation is derived from 10 random permutations, error bars denote s.d.( b ) The observed overlap (blue arrow ) and the distribution of the expected overlap for the random control for two cases of (i) all disease links withpositive symptom similarity and (ii) disease links with symptom similaritybins Z 0.2 (inset). In both cases the overlap is statistically highly significant ( P ¼ 2.2  10  16 , binomial test). NATURE C OMMUNICATIONS | DOI: 10.1038/ncomms5212 ARTICLE NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | 5 & 2014 Macmillan Publishers Limited. All rights reserved. In a first attempt to analyse the relation between molecular andphenotypic diversity of diseases, we construct an integrateddisease network that combines phenotypic relations based onsymptom similarity, with shared molecular mechanisms based onprotein interactions: First, we filter the HSDN for significant linkswith similarity scores 4 0.1 (1,121,899 links remain). Second, we identify all disease links that are supported by either shared genes,or 1st/2nd order protein interactions. The resulting sharedsymptoms and shared genes/PPIs network (SGPDN) contains133,106 interactions between 1,596 distinct diseases(Supplementary Data 4). We used two quantities to measuredisease diversity in this network: betweenness and node diversity(see Methods). In the HSDN, we assume that a disease has a highcapability to accommodate different manifestations when it has ahigh network betweenness, that is, a high number of shortestpaths pass through it. We calculated the disease diversity in theSGPDN and the corresponding maximum diversities of disease-related genes in the PPI network, finding strong positivecorrelations between the two (node diversity correlation:PCC ¼ 0.84, P ¼ 2.5  10  10 , Fig. 7a; betweenness correlation: PCC ¼ 0.59, P ¼ 9.5  10  7 , Fig. 7b). These results demonstrate that a disease with diverse clinical manifestations will typicallyalso have more diverse underlying cellular network mechanisms.Disease groups . The HSDN approach can further be used to study interrelationships between groups or classes of diseases. Inorder to obtain a more global view, we extracted the backbone ofthe SGPDN disease network using the multi-scale backbonealgorithm 40 (Supplementary Methods). The resulting subnetwork includes 2,159 disease links with significant associations of sharedsymptoms, shared genes and (1st or 2nd order) PPIs (Fig. 1e andSupplementary Fig. 2). We find that diseases within the samecategory form clear, highly interconnected communities, such asmetabolic diseases, respiratory tract diseases, digestive systemdiseases, cardiovascular diseases, neoplasms and mentaldisorders. Exceptions include bacterial infectious diseases, virusdiseases and parasite diseases, which appear to be spread amongother disease categories. Besides the links within the samecategory, there are also many links connecting diseases ofdifferent categories, for example, between neoplasms and otherdisease categories. In particular, we find that the three maindisease risks, namely infectious diseases, chronic inflammationdiseases and neoplasms, are highly interconnected. A detailedanalysis of these connections may yield novel insights into themore and more widely recognized pathological and aetiologicalassociations between inflammatory diseases and neoplasms 41 and the human genetic susceptibility to infectious diseases 42 . DiscussionDespite the known limitations in completeness and quality ofcurrently available data on clinical manifestations and cellularmechanisms of disease, our results indicate strong associationsbetween symptom similarity of diseases and shared genes andPPIs, as well as a clear correspondence between the diversity ofthe clinical manifestations of diseases and the underlying diversityin their cellular mechanisms. This demonstrates that individual-level disease phenotypes (for example, symptoms) and molecular-level disease components (for example, genes and PPIs) showrobust correlations, even though their direct associations areinfluenced by complicated intermediate factors 43 . This finding opens up promising venues to use the presented symptom-basednetwork as a rich resource to quantitatively address diversequestions in the field of systems medicine.The observed correlations between clinical manifestations andmolecular mechanisms of diseases can be highly valuable forfunctional annotations of genomics 11 and reveal regularities between different disease categories. Inflammatory bowel diseases3436384042444648505254560 0.2 0.4 0.6 0.8 1Fraction of disease links withshared 2nd order PPIsDisease similarity binsRandomObserved51015202530350 0.2 0.4 0.6 0.8 1Fraction of disease links withshared 1st order PPIsDisease similarity binsRandomObservedFigure 5 | Correlation between symptom similarity and shared PPIs.Percentage of overlapping disease links between the network of sharedsymptoms and the network of shared 1st order PPIs ( a ) and shared 2nd order PPIs ( b ). Random expectations are derived from 10 random permutations, error bars denote s.d.10 –6 10 –5 10 –4 10 –3 10 –2 10 –1 10 0 0 5 10 15 20 25 30Fraction of all pairsMinimum shortest path length0. 0.2 0.4 0.6 0.8 1 Minimum shortest pathsbetween disease modulesDisease similarity binsDisease similarity:Figure 6 | Correlation between symptom similarity and shortest pathlength of the associated proteins in the PPI network. ( a ) MSPL between disease modules. ( b ) MSPL distributions for different disease similarities. ARTICLE NA TURE COMMUNICATIONS | DOI: 10.1038/ncomms5212 6 NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | & 2014 Macmillan Publishers Limited. All rights reserved. (IBD), for example, are a group of diseases of increasing globalprominence, generally described by chronic relapsinginflammatory conditions of the gastrointestinal tract. There aretwo major types, ulcerative colitis (UC) and Crohn’s disease(CD) 44 . Despite their very different pathological characteristics, they may present with common symptoms like abdominal pain,vomiting, diarrhoea , rectal bleeding and weight loss. In total, UC and CD share 78 symptoms in the HSDN (similarity scoreB 0.89). In agreement with the clinical recognition of UC and CD, eight out of their respective 10 symptoms with the highestbibliographic co-occurrence are in common (Table 1). Also at themolecular level many shared genetic risk loci/genes have beenidentified, for example, IL23R , JAK2 , IL12B , STAT3 , PTPN2 ,TNFSF15 and CARD9 45 . A recent research found 71 new genome-wide significant associations for a total of 163 IBD loci,most of which contribute to both UC and CD phenotypes 46 .W e have further investigated the correlation between IBD and all the27 disease categories in MeSH (Supplementary Methods andSupplementary Table 1). In addition to the expected relation toother digestive system diseases, we found positive correlationswith bacterial infections, virus diseases, parasitic diseases andimmune system diseases (Supplementary Fig.7, SupplementaryTables 2 and 3). This finding is also coherent with genome-wideassociation study results 46 , showing that genetic loci identified for IBD have a strong overlap with genes tied to the immuneresponse to mycobacterial infections and to other immune-related disorders such as ankylosing spondylitis and psoriasis.A second promising example for the use of our broad dataacross disease categories is a comparison between genetic andinfectious diseases. By analysing integrated data (virus targets,related PPIs and disease–gene associations) of the Epstein–Barrvirus (EBV) and the human papillomavirus, a recent study 47 showed that these viruses perturb the host network in a highlylocalized fashion, indicating that primarily the proteins directlyconnected to viral targets play a mechanistic role in theimplicated diseases. We examined the HSDN network fordiseases with similar symptoms as EBV infections. The 20 moststrongly associated diseases include several EBV-implicateddiseases, such as infectious mononucleosis (similarity score0.63), T-cell lymphoma (0.59), Hodgkin disease (0.59), diffuselarge B-cell lymphoma (0.58) and non-Hodgkin lymphoma(0.58). These examples show that diseases associated with geneslocated in the close neighbourhood of EBV targets in the PPInetwork also exhibit high symptom similarity with EBVinfections. Symptom similarity scores could therefore provide apromising venue for gene prioritization and target identificationof viral/bacterial infections.Another important area in which symptoms play a crucial roleis drug-related research. Most drugs approved by the US Foodand Drug Administration are merely palliative 48 , that is, they only treat symptoms rather than targeting disease-specific genes orpathways. A detailed understanding of how symptoms relate tounderlying molecular processes is therefore central for our effortstowards more effective and individualized treatments. Firstattempts in this direction have been proposed recently in drugdesign, using for example phenotype screening or the similaritiesof side-effects 49 , which are also most often observed and reported as clinical symptoms 50 . Our comprehensive symptom-based disease relationships may provide valuable input for suchapproaches. For example, the similar treatment of the twodiseases with high symptom similarity discussed above, UC and–1012–1.5 –1 –0.5 0 0.5 1 1.5 2 Disease-relatedgene maximum diversityDisease node diversity0246– 1 012345678Disease-relatedgene maximum diversityDisease node betweennessFigure 7 | Disease node diversity and betweenness. Disease node diversity ( a ) and betweenness ( b ) in the disease network compared to the node diversity/betweenness of the re lated genes within the PPI network.The values of node diversity/betweenness are normalized by z -score. The red points in b represent data points that have been removed in order to test for the sensitivity of our results towards possible outliers. The resultsremained the same. Error bars denote s.d. of the data in the re spective bins.T able 1 | The ten symptoms with the highest co-occurrence with Crohn’s disease and ulcerative colitis.Ulcerative colitis Crohn’s diseaseSymptom Occurrence Symptom OccurrenceDiarrhea 214 Diarrhea 228 Psychophysiologic disorders 123 Body weight 141 Body weight 62 Abdominal pain 101 Abdominal pain 34 Pain 63 Pain 31 Psychophysiologic disorders 62 Fever 20 Fe ver 44 Constipation 18 W eight loss 43 Nausea 17 Oedema 39 Headache 17 Abdomen, acute 26Weight loss 15 Nausea 24 Symptoms associated with both diseases are shown in red.NATURE C OMMUNICATIONS | DOI: 10.1038/ncomms5212 ARTICLE NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | 7 & 2014 Macmillan Publishers Limited. All rights reserved. CD, is well established in clinical practice. In both cases, steroidsare used to relieve symptoms, as well as common drugs, forexample, azathioprine, infliximab and olsalazine. We speculatethat the HSDN could help in systematically generatinghypotheses for such disease pairs. Alzheimer’s disease (AD), forexample, is still lacking an effective therapy to reverse theprogressive loss of memory and other cognitive functions. In theHSDN, AD shows high symptom similarity with epilepsy andseveral of its variants, like temporal lobe epilepsy (0.63). The twodiseases also exhibit significant comorbidity 51 . An antiepileptic drug (levetiracetam) was recently found to reverse deficits inlearning and memory in AD mice and might also help amelioraterelated abnormalities in human 52 . Symptoms represent the high-level manifestations of a diseasethat are actually observed by patients and physicians. Ultimately,it is due to certain symptoms that an individual will seekprofessional help, and they are crucial for accurate clinicaldiagnosis and designing the appropriate treatment. However, theobjective validation of the patients’ experience of major classes ofsymptoms still remains a pressing challenge in clinical practice.Currently, the MeSH metadata do not include more accurate,quantitative descriptions of symptom features (for example,severity, frequency or prevalence rate). Promising routes tofurther increase the accuracy of symptom-based disease relationswould therefore be the integration of medical terminologies andclinical data. Clinical terminology systems like SNOMED-CT 53 hold millions of relationships between medical entities (forexample, diseases, body locations and clinical findings), yetcurrently they only contain relatively few symptom–diseaseassociations as considered in this study (SupplementaryMethods). A second source containing vast amounts of relevantinformation are electronic health records and their relatedpersonal laboratory results. These data probably constitute therichest and most promising resource towards a quantitative,personalized description of symptom–disease relationships.To this date, however, clinical documentation is still highlyvariable and rife with errors and imprecision 54,55 . Symptoms are typically described in narrative notes, therefore requiring complexfull-text analysis. In addition, a large-scale data integrationaiming at comprehensive disease and population coveragewill also meet difficulties pertaining to privacy issues andsemantic interoperability across institutions or countries 56 . Notwithstanding these challenges, we are convinced thatadvances in the field of automated text mining 57 will eventually enable us to substantially expand the data presented in thismanuscript.MethodsBasic datasets . The construction of a symptom-based disease network requires (i) a basic taxonomy for diseases and symptoms and (ii) a corpus of data fromwhich to extract their relations. After evaluating several possible options (seeSupplementary Methods, Supplementary Data 6 and 7 for a comparison withSNOMED-CT, ICD9/10 and HPO), we chose the combination of the MeSHvocabulary and the PubMed literature database. The MeSH classification is definedby experts and offers a comprehensive vocabulary across all disease categories (incontrast to, for example, OMIM which focuses on monogenic diseases), system-atically organized in a hierarchical tree (in contrast to, for example, ICD9/10 whichhas only two levels). The most important advantage for our purposes is that MeSHis used directly to index all articles in the massive PubMed database. The indexingis done manually by trained experts and according to standardized procedures,thereby ensuring highly accurate assignments 58 . In addition, this process alleviates a core challenge in medical text mining, the ambiguity and multiple conventions innomenclature, since the MeSH nomenclature includes synonymous aliases for anygiven term.The basic data used in our study also bears certain limitations. The MeSHvocabulary is relatively old and rigid with only annual updates. This may limit theextent to which the identified associations capture latest research results of therapidly evolving field of medicine. On the other hand, stable and well-establishedterms may also lead to more robust associations for our purposes. Other importantshortcomings are that MeSH has relatively few disease terms (compared with, forexample, ICD9/10) and that our associations are not derived directly from clinicaldiagnosis, but from research articles. In the future, it would be highly desirable todevelop techniques that enable us to automatically extract information fromclinical records. Currently available methods for this very challenging problem ofautomated full-text analysis in large-scale data do not yield results with comparableaccuracy 55 . A challenge inherent to all disease taxonomies is that the distinction between symptoms and diseases is not always clear, for example obesity. Accordingto the expert-based MeSH classification, obesity belongs to four different broadcategories, namely ‘Nutritional and Metabolic Diseases’, ‘Diagnosis’, ‘PhysiologicalPhenomena’ and ‘Pathological Conditions, Signs and Symptoms’. Considering itsMeSH definition as ‘a status with body weight that is grossly abov e the acceptableor desirable weight, usually due to accumulation of excess fats in the body [...]’ it isapparent that a precise and unique classification into a single category is difficultand obesity may indeed be regarded as a disease, a symptom, a di agnosis andphysiological phenomenon at the same time. Since the multihierarchical structureof MeSH explicitly allows for multiple categories for a single term, the data wegenerated can be used to explore both interpretations, for example, therelationships of obesity as a symptom or as a disease.Acquisition of symptom and disease relationships . Each article listed in PubMed is associated to metadata that include a list of manually assign ed keywordsdescribing the major topics of the article. We developed a Java programme(Supplementary Fig. 4) utilizing the NCBI E-utility web services to acquire allPubMed identifiers whose keywords include any of the disease or symptom termsdefined by MeSH (2011 ASCII version, see Supplementary Methods). Note that wedo not use a full-text search of the articles or their abstracts, but only the manuallycurated metadata. The association between symptoms and diseases were thenquantified using term co-occurrence (number of PubMed identifiers in which twoterms appear together; see Supplementary Methods and Supplementary Fig. 5).Similar methods have been widely used as a reliable approach to identif y asso-ciations between different medical entities 59 . Note that this pairwise term co- occurrence does not take possible interactions between symptoms into account, butconsiders different symptoms of a given disease to be independent of each other.Prevalent combinations of symptoms can be extracted from the weighted symptomvectors described below. However, these combinations only account for positiveinteractions between symptoms. Cases, in which certain symptoms of the samedisease are mutually exclusive, cannot be detected with this simple method.Symptom-based diseases similarity . In the field of information retrieval, text documents or concepts are commonly represented by feature vectors 60 . Here, we describe every disease j by a vector of symptoms d jd j ¼ w 1 ; j ; w 2 ; j ; :::; w n ; j ; ð 1 Þ where w i,jquantifies the strength of the association between symptom i and disease j . The prevalence of the different symptoms and diseases is very different, for example, there are highly abundant symptoms like pain, and publication biasestowards certain diseases like breast cancer. To account for this heterogeneity, wetherefore do not use the absolute co-occurrence W i,jto measure the strength of anassociation between symptom i and disease j, but the term frequency-inverse document frequency 60 w i,j:w i ; j ¼ W i ; j log N n i ð 2 Þ where N denotes the number of all diseases in the dataset and n ithe number ofdiseases where symptom i appears. Since all symptoms in our data have at least one associated disease, the potential problem of dividing by zer o does not arise.A widely used measure in both text mining and the biomedical literature toquantify the similarity between two concepts is the cosine similarity of therespective vectors. The similarity between the vectors d xand d yof two diseases x and y is calculated as follows: cos d x ; d y ¼ P i d x ; i d y ; i ffiffiffiffiffiffiffiffiffiffiffiffiffi ffiP i d 2 x ; i q ffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P i d 2 y ; i q ð 3 Þ The cosine similarity ranges from 0 (no shared symptoms) to 1 (identicalsymptoms).Filtering significant symptom–disease associations . The full HSDN is very dense with over 84% of all possible pairwise disease links being present. In additionto the absolute value of a pairwise symptom similarity, we therefore also deter-mined its statistical significance, for instance for a more accurate inference ofphenotype–genotype associations. A widely used statistic to filter significantassociations between medical entities from co-occurrence literature data is the w 2 - test that compares observed frequencies with the frequencies expected for inde-pendence. A priori we do not know how many true associations to expect, even though it is reasonable to assume that many co-occurrences are indeed meaningful,given the manual curation process of the MeSH metadata. In order torationalize the choice of a significance threshold, we use a method specificallydeveloped for a similar application 61 that combines w 2 -tests with P -value plots 62 ARTICLE NA TURE COMMUNICATIONS | DOI: 10.1038/ncomms5212 8 NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | & 2014 Macmillan Publishers Limited. All rights reserved. (see Supplementary Methods and Supplementary Fig. 3 for more details).Comparable to previously reported values, we find a threshold of P -value ¼ 0.13, indicating that there are indeed relatively many false null hypothesis, that is, trueassociations. For our subsequent analysis, we have nevertheless chosen to proceedwith the more conservative and commonly used threshold of P -value ¼ 0.05. We provide the full dataset in order to enable the research community to adapt thesechoices to their particular needs, for example to employ stricter criteria for a moretargeted investigation on few diseases of interest. In our case, we obtain 62,820filtered significant connections between 3,973 diseases and 322 symptoms. T heaverage number of diseases per symptom is about 196, some general symptoms likeabnormal body weight and pain have more than 1,000 associated diseasesShortest paths and single linkage between disease modules . Shortest paths are an important topological quantity for the analysis of social and biological net-works 63 , the most prominent example of its use is probably the well-known small- world property of many complex networks 64 . We use Dijkstra’s algorithm 65 to find all shortest paths in the PPI network. In order to quantify the PPI distance betweendisease pairs, we use the single linkage distance D SL, that is, the minimum of allshortest paths between related proteins: For two diseases x and y with the corresponding related protein sets P xand P y, the single linkage distance is given byD SL ð x ; y Þ¼ min p i 2 P x ; p j 2 P y Dpi ; p j  ð 4 Þ where D(p i,pj) is the shortest path length between the two proteins p iand p j.Disease diversity . In order to characterize the connectedness of a node within a network, we use betweenness 66 and node diversity 67 . Betweenness is a centrality measure quantifying how many shortest paths run through a given node and canbe used, for example, to quantify the influence of individuals in social networks 68 . The diversity f of node j is based on the node bridging coefficient 69 and defined by f ð j Þ¼ X i 2 N ð i Þ d ð i Þ k ð i Þ 1 ð 5 Þ where k(i) is the degree of node i , N(i) denotes its neighbourhood, that is, the set of all its direct neighbours and d (i) is the total number of links leaving that neighbourhood. The diversity f is large for nodes with many neighbours that have many out-going links themselves.For the disease diversity within the HSDN, both betweenness and node diversitycan be measured directly for each disease. For the diversity of disease-related geneswithin the PPI, we use the maximum of all respective betweenness or nodediversity values to represent the diversity of the disease in the PPI context.Furthermore, we normalized the diversity values of each disease by using the z -score before calculating the correlation between its two related diversity values.References1. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes:past successes for mendelian disease, future approaches for complex disease.Nat. Genet. 33 Suppl 228–237 (2003). 2. Vidal, M., Cusick, M. E. & Barabasi, A. L. Interactome networks and humandisease. Cell 144, 986–998 (2011). 3. McKusick, V. A. The growth and development of human genetics as a clinicaldiscipline. Am. J. Hum. Genet. 27, 261–273 (1975). 4. McKusick, V. A. On lumpers and splitters, or the nosology of genetic disease.Perspect. Biol. Med. 12, 298–312 (1969). 5. Loscalzo, J., Kohane, I. & Barabasi, A. L. Human disease classification in thepostgenomic era: a complex systems approach to human pathobiology. Mol. Syst. Biol. 3, 124 (2007). 6. Barabasi, A. L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-basedapproach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). 7. Wang, Q. et al. Community of protein complexes impacts disease association. Eur. J. Hum. Genet. 20, 1162–1167 (2012). 8. Park, J., Lee, D. S., Christakis, N. A. & Barabasi, A. L. The impact of cellularnetworks on disease comorbidity. Mol. Syst. Biol. 5, 262 (2009). 9. Hidalgo, C. A., Blumm, N., Barabasi, A. L. & Christakis, N. A. A dynamicnetwork approach for the study of human phenotypes. PLoS. Comput. Biol. 5, e1000353 (2009).10. Lee, D. S. et al. The implications of human metabolic network topology for disease comorbidity. Proc. Natl Acad. Sci. USA 105, 9880–9885 (2008). 11. Brunner, H. G. & van Driel, M. A. From syndrome families to functionalgenomics. Nat. Rev. Genet. 5, 545–551 (2004). 12. Rzhetsky, A., Wajngurt, D., Park, N. & Zheng, T. Probing genetic overlapamong complex human phenotypes. Proc. Natl Acad. Sci. USA 104, 11694–11699 (2007).13. Goh, K. I. et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007).14. Giot, L. et al. A protein interaction map of Drosophila melanogaster . Science 302, 1727–1736 (2003). 15. Li, S. et al. A map of the interactome network of the metazoan C. elegans . Science 303, 540–543 (2004). 16. Dudley, A. M., Janse, D. M., Tanay, A., Shamir, R. & Church, G. M. A globalview of pleiotropy and phenotypically derived gene function in yeast. Mol. Syst. Biol. 1, 2005 0001 (2005). 17. Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006). 18. Freudenberg, J. & Propping, P. A similarity-based method for genome-wideprediction of disease-relevant human genes. Bioinformatics 18 (Suppl 2): S110–S115 (2002).19. van Driel, M. A., Bruggeman, J., Vriend, G., Brunner, H. G. & Leunissen, J. A. Atext-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).20. Gandhi, T. K. et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat. Genet. 38, 285–293 (2006). 21. Wu, X., Jiang, R., Zhang, M. Q. & Li, S. Network-based global inference ofhuman disease genes. Mol. Syst. Biol. 4, 189 (2008). 22. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating genesand protein complexes with disease via network propagation. PLoS. Comput. Biol. 6, e1000641 (2010). 23. Wang, X., Gulbahce, N. & Yu, H. Network-based methods for human diseasegene prediction. Brief. Funct. Genomics 10, 280–293 (2011). 24. Moreau, Y. & Tranchevent, L. C. Computational tools for prioritizingcandidate genes: boosting disease gene discovery. Nat. Rev. Genet. 13, 523–536 (2012).25. Robinson, P. N. et al. The Hum an Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83, 610–615(2008).26. Smith, C. L., Goldsmith, C. A. & Eppig, J. T. The Mammalian PhenotypeOntology as a tool for annotating, analyzing and comparing phenotypicinformation. Genome Biol. 6, R7 (2005). 27. Doelken, S. C. et al. Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis ofsingle-gene mutations in humans, mice and zebrafish. Dis. Model. Mech. 6, 358–372 (2013).28. Robinson, P. N. et al. Improve d exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348 (2014). 29. Little, R. A. et al. Plasma catecholamines in the acute phase of the respon se to myocardial infarction. Arch. Emerg. Med. 3, 20–27 (1986). 30. Knottnerus, J. A. The effects of disease verification and referral on therelationship between symptoms and diseases. Med. Decis. Making 7, 139–148(1987).31. Lowe, H. J. & Barnett, G. O. Understanding and using the medical subjectheadings (MeSH) vocabulary to perform literature searches. JAMA 271, 1103–1108 (1994).32. Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic. Acids. Res. 35, D5–D12 (2007). 33. Fisher, R. A. S. & Yates, F. Statistical Tables for Biological, Agricultural and Medical Research 2nd edn revised and enlarged (Oliver & Boyd, 1943). 34. The_Cancer_Genome_Atlas_Network. Comprehensive molecularcharacterization of human colon and rectal cancer. Nature 487, 330–337 (2012).35. Oti, M., Snel, B., Huynen, M. A. & Brunner, H. G. Predicting disease genesusing protein-protein interactions. J. Med. Genet. 43, 691–698 (2006). 36. Warner, T. T. & Schapira, A. H. Genetic and environmental factors in the causeof Parkinson’s disease. Ann. Neurol. 53 (Suppl 3): S16–S23 (2003). 37. Duty, S. & Jenner, P. Animal models of Parkinson’s disease: a source of noveltreatments and clues to the cause of the disease. Br. J. Pharmacol. 164, 1357–1391 (2011).38. Sanchez-Betancourt, J. et al. Manganese mixture inhalation is a reliable Parkinson disease model in rats. Neurotoxicology 33, 1346–1355 (2012). 39. Vanduyn, N., Settivari, R., Wong, G. & Nass, R. SKN-1/Nrf2 inhibits dopamineneuron degeneration in a Caenorhabditis elegans model of methylmercurytoxicity. Toxicol. Sci. 118, 613–624 (2010). 40. Serrano, M. A., Boguna, M. & Vespignani, A. Extracting the multiscalebackbone of complex weighted networks. Proc. Natl Acad. Sci. USA 106, 6483–6488 (2009).41. Grivennikov, S. I., Greten, F. R. & Karin, M. Immunity, inflammation, andcancer. Cell 140, 883–899 (2010). 42. Chapman, S. J. & Hill, A. V. Human genetic susceptibility to infectious disease.Nat. Rev. Genet. 13, 175–188 (2012). 43. Loscalzo, J. & Barabasi, A. L. Systems biology and the future of medicine. Wiley interdisciplinary reviews . Syst. Biol. Med. 3, 619–627 (2011). 44. Molodecky, N. A. et al. Increasing incidence and prevalence of the inflammatory bowel diseases with time, based on systematic review.Gastroenterology 142, 46–54 (2012). 45. Khor, B., Gardet, A. & Xavier, R. J. Genetics and pathogenesis of inflammatorybowel disease. Nature 474, 307–317 (2011). NATURE C OMMUNICATIONS | DOI: 10.1038/ncomms5212 ARTICLE NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | 9 & 2014 Macmillan Publishers Limited. All rights reserved. 46. Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012). 47. Gulbahce, N. et al. Viral perturbations of host networks reflect disease etiology. PLoS. Comput. Biol. 8, e1002531 (2012). 48. Yildirim, M. A., Goh, K. I., Cusick, M. E., Barabasi, A. L. & Vidal, M. Drug-target network. Nat. Biotechnol. 25, 1119–1126 (2007). 49. Campillos, M., Kuhn, M., Gavin, A. C., Jensen, L. J. & Bork, P. Drug targetidentification using side-effect similarity. Science 321, 263–266 (2008). 50. Kuhn, M., Campillos, M., Letunic, I., Jensen, L. J. & Bork, P. A side effectresource to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010). 51. Imfeld, P., Bodmer, M., Schuerch, M., Jick, S. S. & Meier, C. R. Seizures inpatients with Alzheimer’s disease or vascular dementia: a population-basednested case-control analysis. Epilepsia 54, 700–707 (2013). 52. Sanchez, P. E. et al. Levetiracetam suppresses neuronal network dysfunction and reverses synaptic and cognitive deficits in an Alzheimer’s disease model.Proc. Natl Acad. Sci. USA 109, E2895–E2903 (2012). 53. Cote, R. A. & Robboy, S. Progress in me dical information management.Systematized nomenclature of medicine (SNOMED). JAMA 243, 756–762(1980).54. Kohane, I. S. Using electronic health records to drive discovery in diseasegenomics. Nat. Rev. Genet. 12, 417–428 (2011). 55. Hripcsak, G. & Albers, D. J. Next-generation phenotyping of electronic healthrecords. J. Am. Med. Inform. Assoc. 20, 117–121 (2013). 56. Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records:towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).57. Pathak, J., Kho, A. N. & Denny, J. C. Electronic health records-drivenphenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inform. Assoc. 20, e206–e211 (2013). 58. Coletti, M. H. & Bleich, H. L. Medical subject headings used to search thebiomedical literature. J. Am. Med. Inform. Assoc. 8, 317–323 (2001). 59. Jensen, L. J., Saric, J. & Bork, P. Literature mining for the biologist: frominformation retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).60. Salton, G., Wong, A. & Yang, C. S. A vector space model for automaticindexing. Commun. ACM 18, 613–620 (1975). 61. Cao, H., Hripcsak, G. & Markatou, M. A statistical methodology for analyzingco-occurrence data from a large sample. J. Biomed. Inform. 40, 343–352 (2007). 62. Schweder, T. & Spjotvoll, E. Plots of p-values to evaluate many testssimultaneously. Biometrika 69, 493–502 (1982). 63. Girvan, M. & Newman, M. E. Community structure in social and biologicalnetworks. Proc. Natl Acad. Sci. USA 99, 7821–7826 (2002). 64. Watts, D. J. & Strogatz, S. H. Collective dynamics of ’small-world’ networks.Nature 393, 440–442 (1998). 65. Cormen, T. H. & Cormen, T.H.I.t.a. Introduction to Algorithms (MIT Press, 2001).66. Newman, M. E. J. Networks: an introduction (Oxford University Press, 2010). 67. Liu, Lu. et al. Mining Diversity on Networks. Database Systems for Advanced Applications 5981, 384–398 (2010). 68. Freeman, L. C. A set of measures of centrality based on betweenness.Sociometry 40, 35–41 (1977). 69. Hwang, W., Kim, T., Ramanathan, M. & Zhang, A. in Proceedings of the 14th ACM SIGKDD Internat Conf on Knowledge Discovery and Data Mining,336–344 (ACM, 2008).AcknowledgementsWe thank Baoyan Liu, Chaoming Song, Dashun Wang and Andrew Michaelson foruseful discussions and suggestions, and especially Lili Xu, Guangli Song, Haixun Qi,Minghui Lv, Yiwei Wang, Xiaofeng Zhou and Hongwei Chu for the manual validation ofthe selected PubMed records. X.Z. was supported by the National Science Foundation ofChina (61105055, 81230086), National Basic Research Program of China(2014CB542903), National Key Technology R&D Program (2013BAI02B01,2013BAI13B04), National S&T Major Special Project (2012ZX09503-001-003) andBeijing Municipal S&T Program of China (Z131110002813118). This work was sup-ported by MapGen grant (1U01HL108630-01) and by the EC-FP7 Program, Synergy-COPD, GA n o 270086. Additional support was provided by HL066289 and HL105339 grants from the U.S. National Institutes of Health.Author contributionsX.Z and A.-L.B. conceived and designed the experiments; X.Z. performed theexperiments; X.Z., A.S. and J.M. analysed the data; X.Z.Z., A.S., J.M. and A.-L.B.wrotethe paper.Additional informationSupplementary Information accompanies this paper at naturecommunicationsCompeting financial interests: The authors declare that they do not have any competing financial interests.Reprints and permission information is available online at reprintsandpermissions/How to cite this article: Zhou, X.Z. et al. Human symptoms-disease network. Nat. Commun. 5:4212 doi: 10.1038/ncomms5212 (2014). ARTICLE NA TURE COMMUNICATIONS | DOI: 10.1038/ncomms5212 10 NATURE COMMUNICATIONS | 5:4212 | DOI: 10.1038/ncomms5212 | & 2014 Macmillan Publishers Limited. All rights reserved.
часы rolex

speedmaster omega
Cartier klokker pris
Breitling Navitimer Preis

Program Manager (Finance, Risk & Market Data Technology)

Код вакансии: 17000YVR
Дата публикации: 20-10-2017
Дата начала
Профессиональная область
Information Technology
Место расположения
Тип занятости


SG CIB is the Corporate and Investment Banking arm of the Société Générale Group. Present in over 50 countries across Europe, the Americas and Asia. SG CIB provides corporate, financial institutions, investors and public sector clients with value-added integrated financial solutions. 

To strengthen our delivery capacity, we are opening a permanent position based in Hong Kong. The successful candidate will join the Finance, Risk & Market Data technology team as program manager. FRM is a regional team with over 80 people based in Hong Kong and Bangalore, covering the Asia Pacific region (Japan, Korea, Taiwan, Hong Kong, Singapore, China, India and Australia).


Your Role:
  • Program Manager Leading Information System transformation for Finance across Asia Pacific Locations primarily focusing on Japan where significant investment are made on the Securities and Futures / Options perimeter
  • You will be part of a dynamic regional project team, dealing with major stakeholders including Business Lines, Finance / Risks teams
Job Description:
  • Lead and execute projects project for Finance function across Asia Pacific
  • Organize and dynamically manage a portfolio of projects
  • Manage a team of BAs working within the program
  • Chair project steering and operational committee meetings
  • Manage program internal and external dependencies
  • Develop strong partnership with the business, IT counterparts, offshore teams in Bangalore and central teams in Paris
  • Manage program dependencies, priorities and secure the effort of the different teams involved
  • Define projects roadmap, planning and budget
  • Coordinate and handle the user acceptances tests and migration strategy for the region
  • Assess and design the change management involved and get buy-in from key stakeholders. Define training plans, organize training needed and assist in creating robust documentation of the processes, including detailed operational procedures
  • Develop team members in terms of both functional skills and soft skills


Your Profile:
  • Project / Program management skills: Ability to see the global picture, identify dependencies and risk, Stakeholders management (internal & external) & communication, ability to structure necessary changes. Project and resource planning skills are mandatory
  • Business analysis skills: Strong analytical mindset, attention to details, critical thinking, solution oriented
  • Functional Knowledge: Sound knowledge of accounting, investment banking products, finance and regulatory processes. Accounting schema
Soft Skills: 
  • Excellent inter-personal skills. Able to build rapport with different project counterparts from different departments. Highly motivated and team oriented
  • Dynamic person with strong analytical mindset. Adaptive and fast to absorb domain knowledge. Attentive to details and capable to maintain global view of the situation
  • Courage to propose innovative ideas. Pragmatic and solution driven. Strong drive to succeed and ability to structure necessary changes
  • Fast learner, team player, independent, ability to handle multiple tasks and functional topic simultaneously
  • Strong command of English
  • PC skills: Excel, PowerPoint, Word
  • Min 8-10 years of successful experience as a Project Manager and Business analyst delivering project for Finance department (Accounting information system)
  • Excellent academic record
  • Project / Program management certification is a plus
  • Functional knowledge of Finance. Industry knowledge of regulatory reporting is a plus

<< назад
Service Management Officer M/F - VIE Hong Kong вперёд >>
Analyst/Associate, Front Office Developer (Exotic Rates and FX Derivatives)
Необходимо войти в профиль

Для выполнения этого запроса необходимо войти в профиль кандидата или создать его.

Войти Забыли пароль?

Ваш выбор

2155 Вакансии

show remider
Hide remider

.pol { width: 120px; height: 120px; overflow: hidden; border-radius: 50%; border: 5px solid #E83A51; float: left; margin-right: 20px; }

Большая премьера

Нюргуяна Сыроватская/НЮРГУША/

Иван Аргунов тыллара уонна мелодията///Аранжировка Dj Aiex///Запись,сведение,мастеринг Антон Иванов///Dobun music///Якутск 2017
Скачать (Скачан 716 раз) Нюргуяна Сыроватская/НЮРГУША/ - Эн баар буолаҥҥын

Элина Иннокентьева

Николай Протасов тыллара уонна мелодията///Аранжировка,запись Алексей Батюшкин///Сведение,мастеринг Антон Иванов///Dobun music///Якутск 2017
Скачать (Скачан 593 раз) Элина Иннокентьева - Доҕорбор


Дьулуурҕа тыллара,Толлуман мелодията///Аранжировка Александр Константинов///Запись,сведение,мастеринг Антон Иванов///Dobun music///Якутск 2017
Скачать (Скачан 294 раз) Толлуман - Түүл минньигэһэ

Варя Ларионова

Александр Самсонов(Айыы уола) тыллара уонна мелодията///Аранжировка Иван Наумов///Запись,сведение,мастеринг Степан Афанасьев///Якутск 2017
Скачать (Скачан 1106 раз) Варя Ларионова - Ыллаа гитарам

Юлиан Семенов

Юлиан Семенов тыллара уонна мелодията///Аранжировка Александр Константинов///Запись,сведение,мастеринг Антон Иванов///Dobun music///Якутск 2017
Скачать (Скачан 925 раз) Юлиан Семенов - Арахсыы ардаҕа

Анастасия Готовцева

Наталья Михалева Сайа тыллара,Александр Дмитриев Таммах мелодията///Аранжировка Дмитрий Готовцев///Флейта Ульяна Другина///Запись,сведение Dobun music///Якутск 2017
Скачать (Скачан 337 раз) Анастасия Готовцева - Күһүҥҥү долгураҥ

Гаврил Шепелёв

Уйгулаана Саха сирэ уонна Гаврил Шепелев тыллара,Антон Иванов мелодията///Dobun music///Якутск 2017
Скачать (Скачан 371 раз) Гаврил Шепелёв - Сибэккибэр

Варя Аманатова

Күн Сиккиэрэ тыллара,Дьол Сандаара мелодията///Аранжировка,запись,сведение,мастеринг Николай Михеев///Якутск 2017
Скачать (Скачан 210 раз) Варя Аманатова - Таптыыбын

Санита Ай

Мэхээс Сэмэнэп тыллара уонна мелодията///Аранжировка,запись,сведение,мастеринг Дмитрий Готовцев///Якутск 2017
Скачать (Скачан 765 раз) Санита Ай - Алтан күһүн

Надина Эльпис

Надина Эльпис тыллара уонна мелодията///Аранжировка Любовь Лопатина///Запись,Антон Иванов///Dobun music///Якутск 2017
Скачать (Скачан 621 раз) Надина Эльпис - Бесконечнось


Главная   /   MP3   /   Новинки