| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
EDUCATIONAL REVIEW |
From the Departments of Surgery and Interdisciplinary Oncology, H. Lee Moffitt Cancer Center at the University of South Florida, Tampa, Florida.
Correspondence: Address correspondence and reprint requests to: Timothy J. Yeatman, MD, FACS, Departments of Surgery and Interdisciplinary Oncology, H. Lee Moffitt Cancer Center, 12902 Magnolia Dr., Tampa, FL 33612; Fax: 813-979-3893; E-mail: yeatman{at}moffitt.usf.edu
ABSTRACT
Abstract: Predicting who will develop cancer and how the cancer will behave and respond to therapy after diagnosis are some of the potential benefits of the ongoing genetic revolution that can be envisioned within the next decade. Translational applications of genomic-based research efforts may actually precede the development of effective therapeutic agents that can exploit the vast amounts of data derived from these efforts. In the future, understanding the wealth of information generated by high-throughput molecular efforts and how it can be applied to clinical problems will likely be critical to the surgeon who guides the multidisciplinary care of the cancer patient. This review will discuss the advances in our understanding of the human genome (DNA), its derived transcriptome (RNA), and its translated proteome (proteins) and will focus on the translation of this information into routine clinical practice. In particular, we will focus on the potential for clinical application of microarray-based gene-expression profiling to the diagnosis, prognosis, and therapy of malignancies.
Key Words: Genome Transcriptome Proteome Microarray
We have entered a new world. There are new ideas, new terms and definitions, and many new genes. Whereas human genetics has a long history marked by events with historical significance, sequencing the genome is likely just the beginning of another renaissance. Multiple Nobel prizes have already been awarded in the discipline of genetics, including awards for the discovery of restriction enzymes, DNA-sequencing techniques, the polymerase chain reaction, and the structure of DNA itself. Not but 16 years ago, there was significant debate about the feasibility of sequencing the human genome.1 Recent articles have now put the controversy to rest by using two different approaches to develop draft sequences of the human genome. The international Human Genome Project2 used the hierarchical shotgun approach, whereas Celera Genomics adopted the whole-genome shotgun3 approach. The end result of these two efforts is a first draft of the entire human genome sequence. Clinical researchers, practicing physicians, patients, and the general public now have access to the 2.9 billion nucleotide codes of the human genome that are available as a resource for scientific investigation (Web sites: http://www.ensembl.org and http://public.celera.com/cds/login.cfm). The human genome sequence provides a record of who we are and how we have evolved. It holds promise in the understanding of all inherited traits. It is the key to understanding human disease and its predispositions. It is the blueprint for our destiny.
Structure of the Genome
Surprisingly, it has been reported that only approximately 1% of the human genome is actually composed of exons that code for protein structure. Twenty-five percent of the genome represents intronic sequences or regions of DNA between exons that are spliced out; 75% of the genome is composed of intergenic DNA for which we have no significant understanding of its function or role in the process of gene transcription into RNA or translation into protein. Despite the large phenotypic differences between individuals, the similarity in the structure of human DNA from one individual to another is striking. It has been estimated that we share at least 99.9% of our nucleotide code, suggesting that any two individuals differ by only .1%.3 Because the genome has evolved with time, it can be considered a historical record. This concept is reinforced by the large degree of homology between species, allowing us to trace our origins backward in time throughout the evolutionary process. In this regard, the genome carries significant information as to how our ancestors evolved and were affected by their environment through natural selective processes. For example, it is believed that proto-oncogenes, such as c-Src, were incorporated into our genome long ago via retroviral infections to assume roles in cellular growth and replication. Now we know that these proto-oncogenes may be activated through mutational events to perform the abnormal functions that permit cellular transformation (i.e., oncogenesis). Throughout the genome, there are numerous nonrandom repeat elements that have been identified. These include lone interspersed repetitive elements and short interspersed repetitive elements or Alu sequences that have no known function to date. They do, however, comprise up to 10% of the human genome and colocalize with the gene-rich regions. The genome also has a great amount of apparent duplication for the presumed purpose of generating related gene superfamilies. This gene duplication, occurring at a rate of 10- to 100-fold greater than the fruit fly or worm, may represent a distinguishing structural event that separates humans from lower species.
Annotation of the Genome
With the recent announcement of the completion of the first draft of the human genome sequence, to be followed soon by the submission of a final draft, we are now faced with the challenging task of applying this vast wealth of information to the study of human disease. In this regard, sequencing the human genome is just one of a large number of tasks that will have to be completed before real progress is realized. For example, although the sequence is nearly complete, the genome is not yet completely annotated.4 Annotation involves the prediction of where a genes structure begins and ends and, in essence, is the source of our predictions of the number of individual genes in our genome. Annotation provides a predicted map of the genes that will ultimately require careful validation. One approach to validation involves the prediction of the existence of a gene on the basis of evidence derived from multiple databases from multiple species. Identifying numerous orthologs for genes across several species may validate the existence of a particular human gene. It is interesting that relatively few human genes (roughly 30,00050,000) have been predicted on the basis of sequence analyses of the human genome. This is a surprise when 14,000 genes in the fruit fly,5 19,000 genes in the roundworm,6 and 26,000 genes in the mustard plant7 have been identified. Homology analyses suggest a strong conservation of genetic information throughout evolution. Identification of homologs and orthologs will ultimately assist in the discovery of the function of individual genes. Comparative genomics offers the promise of understanding the function of genes through evaluation of homologous genes in lower species. There are now software tools available to the public that are capable of performing homology searches that permit the comparison of genes across numerous species. One such site is provided by The Institute for Genomic Research (http://www.tigr.org). This site is also quite useful for researchers performing gene-expression profiling experiments.
Integration of the Human Genome
Major events, such as landing on the moon, spawned significant technological breakthroughs that had effects decades after the initial event. These effects were widespread, sparing few, if any, scientific disciplines. The completion of the Human Genome Project will likely herald a similar explosion of events in this age of information. Although it is impossible to predict all of the fallout from the completion of the genome, there are emerging concepts that suggest great potential for translational applications of this large body of information. One example is the development of large databases to assess the variation that exists in our DNA code. We have long known that DNA polymorphism exists but are now just beginning to catalog what are likely to be millions of alterations in genetic sequence between any two individuals. These alterations are termed single nucleotide polymorphisms (SNPs) and represent the single base pair substitution of one nucleotide for another in a DNA strand. SNPs are thought to be distinct from disease-causing mutations because they cause no discernible phenotype. They can, however, affect gene function and may predispose to disease even without altering protein-coding structure. In fact, <1% of SNPs actually occur within exons. These alterations can occur on average in 1 out of 1250 nucleotides and may affect the regions that control the expression of genes, which in turn can affect their function. The promise of SNPs in clinical medicine relates to their potential capacity to predict our predisposition to disease and our response to therapy. A new field of study, termed pharmacogenomics, has been developed to determine which patients will experience toxicity when exposed to a drug versus those who are seemingly unaffected by therapy. SNPs may play a key role in making these critical clinical predictions.810 Another example of how knowledge of the genome sequence may become useful relates to the promoter regions of genes that control their transcription. There is an emerging body of data suggesting that the CpG islands associated with promoter regions are susceptible to hypermethylation, which seems to be associated with the suppression of gene transcription.11,12 These alterations in the DNA are considered epigenetic alterations because the actual genetic code is not altered. Recent technology has supported the use of high-throughput technology to assess the presence of hypermethylated CpG islands across thousands of genes, and these data may provide considerable insight as to the future biological behavior of a tumor.1113
THE HUMAN TRANSCRIPTOME
The Challenge of Interrogating Gene Expression
Although the prediction of a gene on the basis of annotation criteria may strongly suggest the existence of a particular gene, this does not guarantee the expression of a functional messenger RNA transcript. Characterizing the population of transcribed genes has led to the creation of a new term, the transcriptome.14 This concept attempts to define the large number of transcripts that can result from both unspliced and spliced gene products (resulting from skipped exons). Interestingly, whereas recent estimates suggest upwards of 50,000 genes in the human genome, many more transcripts (and, potentially, protein products of those transcripts) may exist. Thus, the complexity of the human over other subspecies may be derived in part from differences in RNA splicing. The transcriptome, therefore, represents the universe of RNA messages that may code for proteins when properly instructed to do so. Because the technology has recently been developed to interrogate the diversity of the transcriptome, gene-expression profiling has become a mainstay of modern molecular biologic research.
Gene-expression analysis was tedious and possible only on a gene-by-gene basis until recently, when large-scale gene-profiling technologies were developed. Before these technological advances, differential gene-expression analysis was relegated to Northern blots, reverse-transcriptase polymerase chain reaction, differential display, and subtractive hybridization. Currently, expression profiling is accomplished by two or more different platforms that include primarily complementary DNA (cDNA) spotted arrays and oligonucleotide-based arrays (Fig. 1, A and B).15 These arrays can be home-grown or may be commercially available. Oligonucleotide arrays generally use small DNA sequences from 20 to 70 mers in length to recognize and distinguish individual genes. Spotted cDNA arrays are constructed by spotting down thousands of longer portions of DNA, each representing an individual gene. The oligonucleotide arrays have the potential benefit of identifying splice products, whereas the cDNA arrays generally cannot. Both platforms have shown promise in producing large quantities of reproducible data that can subsequently be mined by a number of techniques. Typically, RNA is derived from a tumor and a control specimen or panel of cell lines and is converted to cDNA that is fluorescently labeled and hybridized to the target chip (Fig. 2). On a typical high-density array, there may be 12,000 to 60,000 genes represented, most of which are unnamed genes with no known functions. Expression profiling is a very data-rich process that produces large lists of named genes that may have direct or indirect causal links to the perturbation examined; however, many of the alterations may simply be casual rather than causal associations. Furthermore, gene-expression profiling can produce large lists of unnamed genes often designated by gene clone numbers or expressed sequence tag numbers that have no functional meaning or significance to the casual observer.
|
|
To interrogate and translate these complex data requires a new set of analysis tools with which many clinicians are not familiar.15 The generation of large sums of data by gene-expression profiling requires extensive computational analysis. More importantly, it requires the development of new mathematical algorithms and software to handle the data. New databases are required to hold the data adjacent to invaluable clinical information that will need to be queried and updated on a frequent, almost real-time, basis. These requirements have led to the development of at least three new scientific disciplines: computational biology, bioinformatics, and biostatistics.
There has been somewhat of a paradigm shift in the evaluation of basic science proposals. What would have been considered a massive "fishing expedition" 5 years ago is now considered scientifically valuable and has been termed discovery science. Discovery science has gained a respectable reputation, not only for the vast amounts of potentially invaluable data that it produces, but also for the hypotheses that it generates.
A Call for Interdisciplinary Science: Systems Biology
The surgeons role in the development of functional genomics efforts is paramount. The surgeon provides a link between the patient and the science though his or her capacity to extirpate information-rich normal and neoplastic tissues. The practicing surgeon, with a daily exposure to the battlefields of cancer, can play a significant role in focusing science on the clinically relevant problems. In this regard, the skilled clinician can provide a wealth of intuition to science that may lead it to a successful outcome. The translation of clinical acumen to science, however, cannot occur unless the clinician is well versed in the doctrines of science. This is true for a number of reasons. For example, the surgeon without a minimal understanding of genetic principles cannot effectively communicate with the basic scientist counterpart.
The role of the physician-scientist thus becomes more important if this paradigm is to be successful. Many clinical departments boast of large sums of National Institutes of Health funding, but it is not uncommon that the principal investigators within these departments are not actually the physicians themselves but rather the employees of the department. Although this may be a successful model in numerous instances, it does once again remove the surgeon another step away from the scientific discovery process.
By no means can the surgeon or clinician accomplish translational research alone or in a vacuum. Much as cancer centers specializing in cancer care have determined that multidisciplinary groups are necessary to deliver the best care to the patient, we are now becoming aware of the same requirements for the effective advancement of translational science. Just as it is ideal to have the surgeon, radiotherapist, pathologist, radiologist, and medical oncologist in the same room to discuss the real-time management of a patients disease, it is also ideal to bring together the necessary disciplines in science to meet the task at hand. This thought process has birthed the concept of systems biology, a new multidisciplinary approach to scientific discovery. The challenges of translating the benefits of sequencing the human genome to clinical medicine will require large concerted interdisciplinary efforts to be successful. In this regard, there is a need for multiple scientific disciplines to join together to solve problems, and there is a need for computer scientists to develop and write software to house and query the large datasets that are rapidly accruing. Mathematicians are needed to develop statistically based algorithms to analyze the data. Molecular biologists are indispensable to provide insight as to the significance of gene clusters and identifications. Pathologists are important for their capacity to identify suitable tissues and to perform microdissection to ensure that what is being examined in microarray analysis is what is desired. Chemists are necessary to assist in the drug-discovery process, which is yet another discovery science in itself. Molecular targets are now defined, and chemical agents addressing these novel targets are defined. The surgeons role goes beyond that of harvesting tissue (although this is a critical role): the surgeon can also provide important insight into the clinically relevant problems that need to be addressed by science through proper experimental design. Point in fact is the development of institutes dedicated to the new field of systems biology.16,17 This interdisciplinary model goes beyond the term functional genomics in that it has directly implicated multiple scientific disciplines, working side by side, to address the presumed complexity of intermolecular relationships and networks in the cell. This is the future of scientific endeavor.
The Effect of Creative Experimental Design
The challenge of gene-expression profiling is translating the acquired data into something that may ultimately have a clinical effect. What seems to be a simple idea of comparing cell line A with cell line B, each with a definable genotype and phenotype, may turn into a large, complex list of expressed genes with no clear associations. Perhaps more confusing is the potential for clustering programs to permit the development of apparent molecular relationships that may not exist. For example, if two sets of microarray data are clustered by a hierarchical clustering algorithm, there will always be a positive and enticing result. In other words, there will always be gene sets that cluster on the basis of their direction and degree of over- or underexpression. However, just because gene A clusters with gene B does not imply that they are functionally related. How do we make sense of all of this information? The answer probably lies in how we design our experiments. Thoughtful experimental design can help weed out much confounding genetic noise that might inhibit progress toward understanding the function of genes. Gene-expression profiling can be used for many different types of applications that may be simple or complex in design. For example, whereas gene-expression profiling can be used to compare cell line A with cell line B, a more complex and perhaps informative design would be a comparison of cell line A with or without a drug versus cell line B with or without a drug.
We recently explored the potential for gene-expression profiling to identify new tumor markers and new tumor-progression markers for human colorectal cancer. Although a significant number of markers have been identified, few have been developed to the point that they are widely used in clinical practice. In fact, CEA is perhaps the only marker in widespread use. Because we sought to examine the expressed tumor-specific genes in a population of patients, we were interested in identifying the expressed genes common to most people in a selected group. For this reason, we rationalized that the gene expression of pooled patient samples might be similar to that derived from examination of large numbers of individual samples, yet would be significantly more efficient. RNA derived from tumor samples in groups of 5 to 10 was then pooled in equimolar amounts to produce a mixture that could be assessed by microarray. Before large numbers of human tumor tissues were examined, the pooling concept was first validated by experiments designed to compare a physical pool with the calculated mathematical pool by means of individual sample analysis. These experiments strongly suggested that pooling was a valid technique when information regarding the behavior of a population was sought. The pooling process was then applied to six sets of normal and tumor tissues derived from different clinical stages (benign mucosa, n = 10; benign adenoma, n = 10; liver metastases, n = 10; and carcinomas: Astler-Collier B1, n = 10; C2, n = 10; and D, n = 10). Interestingly, these experiments led to the identification of >300 tumor markers (distinguishing cancer from normal) and >100 tumor-progression markers (distinguishing one stage of cancer from another). Of the tumor-progression markers, the lead marker was identified to be osteopontin, a secreted glycoprotein with numerous functions that have been related to the progression and metastasis of cancer.
The Pervasive Effects of Unraveling the Transcriptome
Twenty years ago, medical students were faced with what once seemed to be a daunting taskmemorizing the elements of the Krebs cycle or other metabolic pathways. The students of the future, however, will soon be faced with the challenge of not only learning the classic biochemical pathways, but also learning many new intermolecular relationships that will emerge as novel pathways over time. For example, over the past decade we have witnessed the development of numerous signal transduction pathways. By using colon cancer as the paradigm, once the initial genetic elements were defined, a new neoplastic pathway was elucidated. The Wnt pathway, involving Apc, ß-catenin, and other related molecules, is now an established molecular pathway that contributes to colon cancer development and progression.18 In this regard, a thorough understanding of molecular biology and all of its basic tenets is a must for the contemporary medical student. Moreover, the teachers of these medical students must be facile with this body of information to communicate it properly. The immediate translational benefits of the Human Genome Project are quite expansive and include the capacity to identify and understand the effects of specific mutations. One example is the capacity to screen for and act on familial forms of cancer linked to specific inherited genetic alterations, such as those linked to familial polyposis, hereditary nonpolyposis colorectal carcinoma, and hereditary breast cancer. The fallout from this vast wealth of data will not be realized for many years to come.
THE HUMAN PROTEOME
Now that the Human Genome Project is nearing completion, focus may be realigned to the development of human protein indices that will ultimately identify the structure and function all human proteins, which may ultimately be more informative than understanding the sometimes evanescent messages associated with the transcriptome. The human proteome is the universe of human proteins and their isoforms that are constructed with the aid of messenger RNA and its splice products.19 Although this proteomics project was initiated nearly 40 years ago, it has not been pursued with the vigor of the Human Genome Project. With the development of protein indices comes the promise for the in vitro synthesis of these proteins for use in functional studies as well as for the production of antibodies that may have diagnostic, prognostic, and therapeutic applications. The challenge of identifying all human proteins is, however, enormous. Moreover, proteins may be posttranslationally modified by complex processes such as glycosylation and phosphorylation, which may be difficult to experimentally replicate. Human proteins, unlike DNA, are composed of up to 20 amino acids, rather than 4 base pairs, and proteins have a final processed 3-dimensional form that is significantly more complex and cannot be predicted from the blueprints of genes.
Simply identifying all of the human proteins in the proteome is a very significant challenge, but certainly deciphering protein structure and function for these proteins will consume the time of scientists for years to come. The current technology for modern proteomics has its roots in the development of isoelectric focusing in the first dimension and sodium dodecyl sulfate electrophoresis in the second dimension, techniques first reported in 1975, almost simultaneously, by Klose,20 OFarrell,21 and Scheele.22 The introduction of advances in mass spectroscopy to the proteomics field has permitted the use of two-dimensional gel electrophoresis on a much larger scale, making this technology a viable tool for protein discovery and analysis.19
One very interesting statistic that has emerged from large-scale analyses of data derived from gene-expression profiling and synchronous two dimensional gel analyses is the concept that there is very little correlation (r = .48) between the presence or absence of an expressed message and that of the cognate protein.23 These sorts of studies suggest that although development of the proteome has taken a back seat to that of the genome and transcriptome, the analysis of protein expression may ultimately be a gold standard for the interpolation of gene expression. In recent years, the practice of proteomics research has experienced a dramatic shift within the pharmaceutical and biotechnology industries, with the widespread implementation of novel applications. The areas of interest extend all the way from discovery of novel drug, vaccine, and diagnostic targets, characterization of protein-based products, toxicology, and identification of surrogate markers of activity in clinical research to the ability to provide information on the mechanisms of drug action. The power of two-dimensional gel electrophoresis and advances in mass spectrometric techniques, combined with sequence database correlation, have enabled speed and accuracy in the identification of proteins in complex mixtures.24 The science of protein discovery is likely to be the next growth area in human biology.
Chip-Based Cancer Management: One Tumor, One Chip
The future of functional genomics is bright and holds great promise for the discovery of new genes, new messages, and new proteins. The term discovery science has been coined to describe the great potential for gene-expression profiling technologies to generate new hypotheses based on enormous datasets not previously available. With the capacity to evaluate thousands of genes and/or proteins in a single experiment by using microarray technology, the potential for clinical translation of these data to human cancer is also enormous. For example, we25 and others2629 have also begun to develop sophisticated molecular classifiers for a broad range of human cancers that may soon have clinical application. Currently, chip-based technologies are being used to derive gene expression patterns that predict the accurate tissue origin of a particular tumor on the basis of the simultaneous analysis of informative genes. For the first time, it is becoming possible to make the diagnosis of a particular cancer, as well as cancer subsets, without even examining the histology. This application of molecular profiling not only may eliminate the diagnostic category of the unknown primary cancer, but may also improve the diagnostic accuracy of current approaches by using classic histopathologic techniques combined with gene-by-gene immunohistochemical analyses. Moreover, it is now feasible to predict clinical outcome on the basis of gene-expression signatures,3033 making it feasible to direct therapy to patients who will actually derive benefit.
It is easy to envision that the future of clinical cancer management will be based on the development of functional genomics and related disciplines, with specific emphasis on chip-based analyses of tumors. We are entering a chip-based era in medicine that will positively affect multiple scientific and medical disciplines. Information-rich gene-expression datasets will be applied clinically to predict accurate diagnosis, prognosis, and possibly even therapeutic options. These predictions may result in a significant therapeutic paradigm shift by assisting in the selective administration of adjuvant chemotherapy and radiotherapy to the patient subsets who are actually at risk rather than treating the majority of patients to help a few. It may even be possible to predict which patients will actually benefit from extirpative surgical procedures, such as the Whipple procedure for pancreas cancer (decision based on the survival benefit) or mastectomy for the patient with an axillary metastasis but no primary lesion detected in the breast (decision based on accurate diagnosis of occult breast cancer). Finally, gene-expression profiles may be used to predict the clinical response to both conventional and targeted therapeutics. It is equally likely that these technologies will provide us with more information than we can currently exploit to the patients advantage if selective and effective chemotherapeutic agents are not yet developed at a similar pace. The future of cancer management will likely change for the better with the incorporation of microarray gene chips but will still provide us with significant challenges that must be addressed by physicians and scientists in an interdisciplinary fashion. The surgeons role in this process will be instrumental.
APPENDIX: GLOSSARY OF TERMS
Acknowledgments
The appendix and acknowledgments are available online at www.annalssurgicaloncology.org.
The author thanks John Quackenbush, PhD, at The Institutes for Genomic Research, Rockville, MD, for his helpful discussions in the preparation of this manuscript. Supported by The Directors Challenge Grant UO-1 CA85052-01A1 and CA85429-01.
Footnotes
Molecular biology will soon meet clinical medicine head on. This review describes the potential for gene-expression profiling to assess diagnosis and prognosis and even predict therapy on the basis of the analysis of every tumor by a single microarray chip.
Received for publication May 29, 2002. Accepted for publication September 30, 2002.
REFERENCES
This article has been cited by other articles:
![]() |
A. Cesario, A. Catassi, L. Festi, A. Imperatori, A. Pericelli, D. Galetta, S. Margaritora, V. Porziella, V. Cardaci, P. Granone, et al. Farnesyltransferase Inhibitors and Human Malignant Pleural Mesothelioma: A First-Step Comparative Translational Study Clin. Cancer Res., March 1, 2005; 11(5): 2026 - 2037. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |