The colon-specific proteome

The most distal part of the small intestine (ileum) enters into the ceacum, which is the most proximal part of the colon. Anatomically the colon is divided into the caecum, colon acendence, transversum, decendence, and sigmoideum. The main function of the colon is the reassertion of fluid, electrolytes, and vitamins. The transcriptome analysis shows that 69% of all human proteins (n=19628) are expressed in the colon and 165 of these genes show an elevated expression in colon compared to other tissue types. An analysis of the genes with elevated expression in the colon with regards to subcellular localization reveals that the corresponding proteins are predominantly located in the cytoplasm, membrane or brush border.

  • 84 genes defined as group enriched in colon
  • Most group enriched genes share expression with the rectum
  • 165 genes defined as elevated in the colon

Figure 1. The distribution of all genes across the five categories based on transcript abundance in colon as well as in all other tissues.

165 genes show some level of elevated expression in the colon compared to other tissues. The three categories of genes with elevated expression in colon compared to other organs are shown in Table 1.

Table 1. The genes with elevated expression in colon


Number of genes


Tissue enriched 1 At least five-fold higher mRNA levels in a particular tissue as compared to all other tissues
Group enriched 84 At least five-fold higher mRNA levels in a group of 2-7 tissues
Tissue enhanced 80 At least five-fold higher mRNA levels in a particular tissue as compared to average levels in all tissues
Total 165 Total number of elevated genes in colon

There is only one gene in the category of genes with tissue enriched expression in the colon. This is not unexpected as the included cell types, function, and morphological features of the colon are highly similar to that of rectum. Genes that specifically signify colon and rectum will thus be categorized as group enriched genes (see Figure 2).

The colon transcriptome

An analysis of the expression levels of each gene makes it possible to calculate the relative mRNA pool for each of the categories. The analysis shows that 86% of the mRNA molecules derived from colon correspond to housekeeping genes and that 4% of the mRNA pool corresponds to genes categorized as colon enriched, group enriched, or colon enhanced. Thus, most of the transcriptional activity in the colon relates to proteins with presumed housekeeping functions as they are found in all tissues and cells analyzed.

Protein expression of genes elevated in colon

In-depth analysis of the elevated genes in colon using antibody-based protein profiling allowed us to visualize the protein expression patterns in the colon with respect to cellular compartments. Brush border (MS4A12) endocrine cells (INSL5), nuclear in glandular cells (CDX2) and cell membrane (GPA33) are examples of protein profiles of elevated genes in colon.

Genes shared between colon and other tissues

There are 84 group enriched genes expressed in the colon. Group enriched genes are defined as genes showing a 5-fold higher average level of mRNA expression in a group of 2-7 tissues, including colon, compared to all other tissues.

In order to illustrate the relation of colon tissue to other tissue types, a network plot was generated, displaying the number of commonly expressed genes between different tissue types.

Figure 2. An interactive network plot of theácolonáenriched and group enriched genes connected to their respective enriched tissues (grey circles).áRedánodes represent the number ofácolon enriched genes andáorangeánodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up toá3átissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.

The network plot reveals that most group-enriched genes are shared with the rectum (n=75). A Gene Ontology (GO)-based analysis of these shared genes shows that the group enriched genes were associated with methabolic processes such as glycosylation.

Colon histology

The colon is divided into four parts, the ascending, transverse, descending and sigmoid colon and is on average 1.5 meters long. Its main function is reassertion of fluid, electrolytes, and vitamins. Since the colon has no villi or plica circularis the mucosa is smooth. Simple tubular intestinal glands (crypts of Lieberkuhn) extend through the entire thickness of the mucosa. The surface columnar epithelium and the cells lining the crypts are enterocytes, with an oval basal nucleus and apical brush border, the microscopic representation of microvilli. There are also numerous mucous secreting goblet cells recognized by their content of a large mucous globule. The lamina propria with connective tissue and inflammatory cells surround the crypts. A thin smooth muscular layer, the lamina muscularis mucosae marks the border between the mucosa and submucosa. The submucosa consists of loose connective tissue with vessels and nerves. Some solitary lymph follicles are also seen. The muscular layer (muscularis externa) consists of an inner circular smooth muscle layer, the outer longitudinal muscle layer is not continuous as in the rest of the gastrointestinal tract. It is divided into three thickened muscular bands, called teniae coli.

The histology of human colon including detailed images and information about the different cell types can be viewed in the Protein Atlas Histology Dictionary.


Here, the protein-coding genes expressed in the colon are described and characterized, together with examples of immunohistochemically stained tissue sections that visualize protein expression patterns of proteins that correspond to genes with elevated expression in the colon.

Transcript profiling and RNA-data analyses based on normal human tissues have been described previously (Fagerberg et al., 2013). Analyses of mRNA expression including over 99% of all human protein-coding genes was performed using deep RNA sequencing of 172 individual samples corresponding to 37 different human normal tissue types. RNA sequencing results of 13 fresh frozen tissues representing normal colon was compared to 159 other tissue samples corresponding to 36 tissue types, in order to determine genes with elevated expression in colon. A tissue-specific score, defined as the ratio between mRNA levels in colon compared to the mRNA levels in all other tissues, was used to divide the genes into different categories of expression. These categories include: genes with elevated expression in colon, genes expressed in all tissues, genes with a mixed expression pattern, genes not expressed in colon, and genes not expressed in any tissue. Genes with elevated expression in colon were further sub-categorized as i) genes with enriched expression in colon, ii) genes with group enriched expression including colon and iii) genes with enhanced expression in colon.

Human tissue samples used for protein and mRNA expression analyses were collected and handled in accordance with Swedish laws and regulation and obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All human tissue samples used in the present study were anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board.

Relevant links and publications

UhlÚn M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Yu NY et al, 2015. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res.
PubMed: 26117540 DOI: 10.1093/nar/gkv608

Fagerberg L et al, 2014. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics.
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600

Gremel G et al, 2014. The human gastrointestinal tract-specific transcriptome and proteome as defined by RNA sequencing and antibody-based profiling. J Gastroenterol.
PubMed: 24789573 DOI: 10.1007/s00535-014-0958-7

Histology dictionary - the colon