Gerard Manning, Sugen, Inc., South San Francisco, CA, USA
Protein Kinases are key regulators of cell function that constitute one of the largest and most functionally diverse gene families. By adding phosphate groups to substrate proteins, they direct the activity, localization and overall function of many proteins, and serve to orchestrate the activity of almost all cellular processes. Kinases are particularly prominent in signal transduction and co-ordination of complex functions such as the cell cycle. The diversity of essential functions mediated by kinases is shown by the conservation of some 50 distinct kinase families between yeast, invertebrate and mammalian kinomes. Of the 518 human protein kinases, 478 belong to a single superfamily whose catalytic domains are related in sequence. These can be clustered into groups, families and sub-families, of increasing sequence similarity and biochemical function. The kinase dendrograms (above) show the sequence similarity between these catalytic domains: the distance along the branches between two kinases is proportional to the divergence between their sequences. Seven major groups are labeled and colored distinctly. For instance, the tyrosine kinases form a distinct group, whose members phosphorylate proteins on tyrosine residues, whereas enzymes in all other groups phosphorylate primarily serine and threonine residues. The relationships shown on the tree can in some instances be used to predict protein substrates and biological function for many of the over 100 uncharacterized kinases presented here. A further 40 ‘atypical’ kinases have no sequence similarity to typical kinases, but are known or predicted to have enzymatic activity, and some are predicted to have a similar structural fold to typical kinases.
The main dendrogram (above right) shows the sequence similarity between protein kinase domains, derived from public sequences and gene prediction methods detailed in Manning et al. (Science, 298, 1912-1934). Domains were defined by hidden Markov model profile analysis and multiple sequence alignment. The initial branching pattern was built from a neighbor-joining tree derived from a clustalW protein sequence alignment of the domains. This was extensively modified by reference to other alignment and tree building methods (hmmalign and parsimony trees), and extensive pairwise sequence alignment of kinase domains. The curved layout was created manually. Many branch lengths are semi-quantitative, but the branching pattern is more informative than any single automatic method. The more detailed trees on subsequent pages were generated automatically by clustalW alignment of full length protein sequences followed by neighbor-joining tree building. Unpublished kinases are named where possible according to family nomenclature. Some divergent kinases retain a numerical SgK (SuGen Kinase) accession number. The second domains of dual-domain kinases are named with a "~b" suffix.