Skip to main content

Full text of "The human genome project : how private sector developments affect the government program : hearing before the Subcommittee on Energy and Environment of the Committee on Science, U.S. House of Representatives, One Hundred Fifth Congress, second session, June 17, 1998"

See other formats



The Hiian Genoiie Project: How Private Sector 
Developjients Affect the Governiient Prograu, (No. 








JUNE 17, 1998 

[No. 66] 

Printed for the use of the Committee on Science 

APR 6 1999 








JUNE 17, 1998 

[No. 66] 

Printed for the use of the Committee on Science 

51-217CC WASHINGTON : 1998 

For sale by the U.S. Government Printing Office 

Superintendent of Documents, Congressional Sales Office, Washington, DC 20402 

ISBN 0-16-057661-X 


F. JAMES SENSENBRENNER. Jr., Wisconsin, Chairman 

CURT WELDON, Pennsylvania 
KEN CALVERT, California 
THOMAS M. DAVIS, Virginia 
MARK FOLEY, Florida 
THOMAS W. EWING, Illinois 

PHIL ENGLISH, Pennsylvania 

TOM A. COBURN, Oklahoma 

Todd R. Schultz, Chief of Staff 

Barry C. Beringer, Chief Counsel 

Patricia S. Schwartz, Chief Clerk /Administrator 

Vivian A. Tessieri, Legislative Clerk 

Robert E. Palmer, Democratic Staff Director 

GEORGE E. BROWN, Jr., California RMM* 


BART GORDON, Tennessee 


TIM ROEMER, Indiana 

ROBERT E. "BUD" CRAMER, Jr., Alabama 

JAMES A. BARCIA, Michigan 

PAUL MCHALE, Pennsylvania 



LYNN N. RIVERS, Michigan 

ZOE LOFGREN, California 


MICHAEL F. DOYLE, Pennsylvania 


BILL LUTHER, Minnesota 

WALTER H. CAPPS, California 


BOB ETHERIDGE, North Carolina 



Subcommittee on Energy and Environment 


CURT WELDON, Pennsylvania 
PHIL ENGLISH, Pennsylvania 
TOM A. COBURN, Oklahoma 

California, Chairman 
TIM ROEMER, Indiana 
PAUL McHALE, Pennsylvania 
MICHAEL F. DOYLE, Pennsylvania 
ZOE LOFGREN, California 

*Ranking Minority Member 
**Vice Chairman 




June 17, 1998 — The Human Genome Project: How Private Sector De- 
velopments Affect the Government Program 

Opening Statement by Representative Ken Calvert (CA-43), Chairman, Sub- 
committee on Energy and Environment, Committee on Science, U.S. House 
of Representatives 1 

Opening Statement by Representative Tim Roemer (IN-3), Ranking Minority 
Member, Subcommittee on Energy and Environment, Committee on 
Science, U.S. House of Representatives 2 


Dr. Aristides A. Patrinos, Associate Director of Energy Research for 
Biological and Environmental Research, U.S. Department of En- 
ergy, Washington, DC: 

Oral Testimony 5 

Prepared Testimony 8 

Biography 14 

Dr. Francis S. Collins, Director, National Human Genome Research 
Institute, National Institutes of Health, U.S. Department of Health 
and Human Services, Bethesda, MD: 

Oral Testimony 15 

Prepared Testimony 18 

Biography 25 

Dr. J. Craig Venter, President and Director, The Institute for 
Grenomic Research, Rockville, MD: 

Oral Testimony 26 

Prepared Testimony 28 

Biography 36 

Financial Disclosure 37 

Dr. David J. Galas, President and Chief Scientific Officer, 
Chiroscience R&D Inc., Bothell, WA: 

Oral Testimony 42 

Prepared Testimony 46 

Biography 53 

Financial Disclosure 54 

Dr. Maynard V. Olson, Professor of Medical Genetics and Genetics, 
Department of Molecular Biotechnology, and Director, Genome 
Center, University of Washington, Seattle, WA: 

Oral Testimony 55 

Prepared Testimony 58 

Biography 64 

Financial Disclosure 71 


Reasons for Federal Government To Complete Human Genome Sequenc- 
ing 72 

Refocusing of Federal Human Genome Project 73 

Federal Program's Use of Latest Technologies 74 

Federal Budget for the Human Genome Project 74 

Dr. Olson's Criticisms of Private-Sector Venture 75 

Ethical, Legal and Social Concerns 77 

Patentability of Human Genome 77 

Difference Between Federal Human Genome Project and Private-Sector 

Venture 78 



Recapturing Private Investment 79 

Tension Between Free Market and Information Dissemination 80 

Concerns About Public Access to Information 81 

Consequences of Intellectual Property/Patient/Privacy Rights 83 

Consequences of Private-Sector Venture for Federal Human Genome 

Project 84 

Efficiency of Federal Human Genome Project 84 

Appendix 1: Answers to Post-Hearing Questions Submitted by Members 
of the Subcommittee on Energy and Environment 

Dr. Aristides A. Patrinos, Associate Director of Energy Research for 
Biological and Environmental Research, U.S. Department of En- 

Republican Member Questions: 

Scientific Justification for Completing Government-Funded Sequencing 

of Entire Human Genome 87 

Efficiencies of DOE's Joint Genome Initiative vs. Three Different DOE 
Laboratory Programs 88 

Democratic Member Questions: 

Difference Between the DOE-NIH and "Shotgun" Human DNA Se- 
quencing Approaches 89 

Role of DOE and NIH in Collaboration with Private-Sector Venture 89 

Concerns of International Collaborators About Intellectual Property 
Rights and Patenting 90 

Dr. Francis S. Collins, Director, National Human Genome Research 
Institute, National Institutes of Health, U.S. Department of Health 
and Human Services: 

Republican Member Question: 

Scientific Justification for Completing Government-Funded Sequencing 

of Entire Human Genome 92 

Democratic Member Questions: 

Difference Between the DOE-NIH and "Shotgun" Human DNA Se- 
quencing Approaches 93 

Role of DOE and NIH in Collaboration with Private-Sector Venture 94 

Concerns of International Collaborators About Intellectual Property 
Rights and Patenting 94 

Federal Government's Cost to Completely Sequence the Human Ge- 
nome 96 

Dr. J. Craig Venter, President and Director, The Institute for 
Genomic Research: 

Republican Member Questions: 

Will the Private Initiative Duplicate the Federal Human Genome 

Project? 97 

Concern About Release of Data to the Public 98 

Recommendations for Restructuring the Federal Human Genome 
Project 98 

Democratic Member Questions: 

Availability of Genomic Information to the Scientific Community 99 

Timeliness of Release of and Compensation for Human DNA Sequence 

Data 99 

Plans to Patent Genomic Sequences 100 

Uniqueness of Expressed Sequence Tags 100 

Role of DOE and NIH in Collaboration with Private-Sector Venture 101 

Restrictions on Researchers' Ability to Obtain Human DNA Sequence 
Information 101 



Relation of New Venture to the Federally-Funded Human Genome 
Sequencing Effort 102 

Dr. David J. Galas, President and Chief Scientific Officer, 
Chiroscience R&D Inc: 

Republican Member Questions: 

Practical Value of Federal Completion of Entire Human Genome Se- 
quencing Process 103 

Democratic Member Questions: 

Impact on Current Efforts 104 

Importance of Genomic Data That May Be Withheld 104 

Reasonable Fees and Conditions to Private-Controlled Genetic Informa- 
tion 104 

Rights of Individuals' Privacy and Compensation Issues 105 

Dr. Maynard V. Olson, Professor of Medical Grenetics and Genetics, 
Department of Molecular Biotechnology, and Director, Genome 
Center, University of Washington: 

Democratic Member Questions: 

Concerns About Ability to Access Genomic Information 106 

Impact on Current Efforts 107 

Importance of Genomic Data That May Be Withheld 107 

Reasonable Fees and Conditions to Private-Controlled Genetic Informa- 
tion 107 

Rights of Individuals' Privacy and Compensation Issues 108 

Appendix 2: Additional Materials for the Record 

J. Craig Venter, et al., "Shotgun Sequencing of the Human Genome," Science 

280, 1540 (June 5, 1998) 110 

Nicholas Wade, "Scientist's Plan: Map All DNA Within 3 Years," The New 

York Times, May 10, 1998, p. Al 113 

Bill Richards, "Perkin-Elmer Jumps Into Race to Decode Genes," The Wall 
Street Journal, May 11, 1998, p. B6 115 

Nicholas Wade, "Beyond Sequencing of Human DNA," The New York Times, 

May 12, 1998, p. C3 116 

Justin Gillis and Rick Weiss, "Private Firm Aims to Beat Government in 

Gene Plan," The Washington Post, May 12, 1998, p. Al 118 

Clive Cookson, "Genetic mapping triggers contest: Academics race private 
enterprise," The New York Times, May 12, 1998, p. C16 120 

Nicholas Wade, "International CSrene Project Gets Lift: Wellcome Trust Dou- 
bles Commitment to Public-Sector Effort," The New York Times, May 12, 
1998, p. A20 121 

William A. Haseltine, "Gene-Mapping, Without Tax Money," The New York 

Times, May 21, 1998, p. A37 123 

John Carey, "The Duo Jolting the Gene Business: Craig Venter and Perkin- 
Elmer target the human genome," Business Week, May 25, 1998, pp. 70- 
71 124 

Steven E. Koonin, "An Independent Perspective on the Human Genome 
Project," Science 279, 36 (January 2, 1998) 126 

Human Genome Program Report, Part 1, Overview and Progress, Prepared 
by the Human Genome Management Information System, Oak Ridge Na- 
tional Laboratory for the U.S. Department of Energy, Office of Energy 
Research, Office of Biological and Environmental Research, DOE/ER-0713 
(Part 1), November 1997 128 

Human Genome Program Report, Part 2, 1996 Research Abstracts, Prepared 
by the Human Genome Management Information System, Oak Ridge Na- 
tional Laboratory for the U.S. Department of Energy, Office of Energy 
Research, Office of Biological and Environmental Research, DOE/ER-0713 
(Part 2), November 1997 240 



William A. Haseltine, "Discovering Genes for New Medicines," Scientific 
American 276, No. 3, March 1997, pp. 2-7 338 

To Know Ourselves: The U.S. Department of Energy and the Human Genome 
Project, Prepared by the Lawrence Berkeley National Laboratory for the 
U.S. Department of Energy, Office of Energy Research, Office of Health 
and Environmental Research, July 1996 345 

Francis Collins and David Galas, "A New Five-Year Plan for the U.S. Human 
Genome Program," Science 262, 43 (1993) 380 

DOE Human Genome Program Primer on Molecular Genetics, Prepared by 
the Human Genome Management Information System, Oak Ridge National 
Laboratory for the U.S. Department of Energy, Office of Energy Research, 
Office of Health and Environmental Research, June 1992 390 




House of Representatives, 

Committee on Science, 
Subcommittee on Energy and Environment, 

Washington, DC. 

The Subcommittee met, pursuant to notice, at 1:05 p.m., in room 
2318, Rayburn House Office Building, Hon. Ken Calvert, Chairman 
of the Subcommittee, presiding. 

Chairman Calvert. This hearing of the Energy and Environ- 
ment Subcommittee will come to order. 

Today we will review a program whose success will have pro- 
found importance for medical science for the 21st Century. Some of 
our witnesses today have used some strong language in describing 
the value of the human genome project, but it's hard to exaggerate 
the importance of a program that could lead to prevention, and 
even cures, to some of the most serious diseases that afflict us. The 
sequencing of the human genome began in the mid-1980's as an ef- 
fort by the Department of Energy (DOE) to study the effects of ra- 
diation on the survivors of Hiroshima and Nagasaki. However, it 
became an international program with much broader implications 
and our federal program is jointly run by DOE and the National 
Institutes of Health. As the 15-year, $3 billion federal program 
reached its halfway point this year, the scientific world was 
stunned on May 9th when one of the country's foremost genetic sci- 
entists. Dr. Craig Venter, and the Perkin-Elmer Corporation an- 
nounced they would form a new venture to, as they put it, "sub- 
stantially complete the sequencing of the human genome" in 3 
years at one-tenth the cost of the federal program. 

Just how this should affect the government program is the focus 
of this hearing today. Press reports and some back and forth be- 
tween critics and supporters of the federal program have raised as 
many questions as it has produced answers. For example, are the 
goals of the initiative realistic or just an optimistic vision? Will this 
private sector initiative duplicate the federal program and make it 
redundgmt or is it another approach that can complement the fed- 
eral program and make it stronger? Is the pace and the cost of the 
federal program increased by the bureaucratic nature of any fed- 
eral program or does the timetable and cost reflect what is nec- 
essary to do a thorough job? And will the federal program utilize 


the latest technology described in the private sector announce- 

Our witnesses today, a cross-section of distinguished scientists 
from the government and from the private sectors, should be able 
to supply, I hope, some of the answers to those questions. 

One of the witnesses today warns that Congress is the wrong 
forum in which to debate the relative merits of different scientific 
approaches to sequencing the human genome. Let me say I couldn't 
agree more. We're not, as my friend Greorge Brown might say, set 
up to be a science court. 

However, we are given the responsibility of overseeing a federal 
program that has spent about $1.9 billion to date. The purpose of 
this hearing is to get the best advice possible on how to — how addi- 
tional moneys should be spent. 

I would also like to take a moment to thank our witnesses for 
being here today. Some of you traveled long distances at your own 
expense; others had to rearrange their personal schedules to fit 
ours, and we certainly appreciate it. 

Before I introduce our panel, let me turn to my good friend from 
Indiana, the distinguished Ranking Minority Member, Mr. Roemer, 
for his opening remarks. 

Mr. Roemer. I thank our distinguished Chairman and want to 
applaud him and salute him for this timely hearing on such a com- 
plicated, yet fascinating, subject. I would ask unanimous consent 
that my entire statement be entered into the record, Mr. Chair- 

Chairman Calvert. Without objection, so ordered. 

Mr. Roemer. And I will just talk for a few seconds and then yield 
back the balance of my time to this expert panel. Certainly we 
have heard the mantra in this Congress of faster, cheaper, better. 
We have heard promises at times from the public sector, and prom- 
ises at times from the private sector, that appeared too good to be 
true. Here we have the possibility, a golden possibility, of a private- 
public partnership that could result in phenomenal return for 
science and in phenomenal return for the taxpayer. We want to see 
if these promises, and if this potential, is in fact true and if, in fact, 
we can do this partnership between the public and private sector 
that some have talked about. We want to look at the question of 
privacy and patent issues. We want to look at many other serious 
questions when it results in cutting the costs as has been talked 
about in the press by such a significant degree, yet yielding the 
science that we have been talking about for the last decade. So I'm 
anxious to hear from our expert witnesses. I'm very, very inter- 
ested in this topic and we look forward to our expert panel giving 
us the insight and the advice to fulfill the mantra of faster, cheap- 
er, better, not just with political rhetoric but with real promise for 
a private sector, public sector partnership. And with that, I yield 
back the balance of my time. 

[The prepared statement of Mr. Roemer follows:] 








JUNE 17, 1998 

I would like to thank the Subcommittee Chairman for his foresight and timely action in 
calling this hearing. This development is a complicated one, not just in terms of what it 
will mean for our federal programs, although that is the most prominent question, but in 
terms of what it will mean for our citizens and our international relationships. 

In these times of balanced budgets, tobacco settlements, and huge international projects, 
the lOS* Congress has readily embraced the "faster, better, cheaper" mantra. Often, but 
not always, for very good reasons.This pattern seems to be holding as we address the 
decision made by Craig Venter and the Perkin-Elmer Corporation to form a new company 
that claims it will complete the sequence of the entire genome in 3 years at about 1/10 the 
cost of the Federal Human Genome Project 

This development has raised the question of whether or not we in Congress should scale 
back our federal programs based simply on the promise of respected and experienced 
scientists and an equally respected and estabhshed private corporation. The purpose of this 
hearing is to determine if that line of thinking is premature. 

At thi^ point, I am ^ore concerned with the inevitable changes that will occur as the 
mission shifts from public interest to private profit. While I do not discount the sentiment 
and motive behind the search for this hfe-saving knowledge, I think that it is only right to 
address the possible pitfalls of private-sector control of this genetic information. 
Commercialization can promote the availability of new treatments, but it can also stifle 
discovery and iimovation. Also, issues of privacy have never been fully addressed. The 
complexity of these issues should not be underestimated and an appropriate balance must 
be struck. 

So I thank you again Mr. Calvert and I welcome our witnesses. I hope that they will be 
able to shed some light on how the involved parties might form a symbiotic relationship 
between the Federal Human Gemome Project and the proposed private-sector project, 
and how they plan to ensure that the rights of the American people are not violated or 
their needs exploited. 

Chairman Calvert. I thank the gentleman. 

Our first witness is Dr. Ari Patrinos, Associate Director of En- 
ergy Research for the Department of Energy wha oversees the 
human genome project for DOE. Dr. Francis ColHns is Director of 
the National Human Genome Research Institute for the National 
Institutes of Health; Dr. Craig Venter is President of the Institute 
for Grenomic Research in Rockville, Maryland, and is one of the 
partners in the private sector initiative announced on May 9th; Dr. 
David Galas isjPresident and Chief Executive Officer of CHIRO 
Science R&D-Tnc. of Washington State. Dr. Galas at one time 
served as Director for Health and Environmental Research at the 
Department of Energy; and Dr. Majoiard Olson is Professor of Med- 
icine for the Division of Medical Genetics at the University of 

Gentlemen, it's our policy to swear in all witnesses. So I would 
ask you to rise for me please. 

Do you solemnly swear to tell the truth, the whole truth, and 
nothing but the truth? 

Mr. Patrinos. I do. 

Dr. Collins. I do. 

Mr. Venter. I do. 

Mr. Galas. I do. 

Mr. Olson. I do. 

Chairman Calvert. You're sworn in. Let the record show that all 
answered in the affirmative. 

You may be seated. 

Without objection, the full written testimony for each of you will 
be included in the record. I would ask that each of you summarize 
your remarks in approximately 5 minutes so we'll have sufficient 
time for questions. 

Dr. Patrinos, you may begin your opening statement. 


Mr. Patrinos. Thank you, Mr. Chairman, Mr. Roemer. I am 
pleased to testify before the Subcommittee on the future of the 
human genome project and, specifically, how the new private sector 
venture, will help shape our program. I'm honored to testify along 
with such a distinguished set of scientists, the gentlemen to my 
left. The Department of Energy takes great pride in its pioneering 
in the human genome project that will essentially revolutionize bi- 
ology and help usher in a new millennium of wonderful applica- 
tions in medicine, environmental bioremediation, and sustainable 

Back in 1986, the Biological and Environmental Research pro- 
gram that I have the privilege of directing presently, while seeking 
a molecular level understanding of the effects of ionizing radiation 
on human biology, proposed to sequence the 3 billion base pairs of 
human DNA and identify the important genes on the 23 pairs of 

It was a proposal that at the time was considered with, or at 
least was met with considerable skepticism and, I might add, some 

hostility as well. However, the rest is history, as you know, and in 
1990, along with our colleagues at the National Institutes of 
Health, we formally launched the Human Grenome Program, along 
with a common 5-year plan that we updated in 1993 because of 
faster-than-expected progress. As you mentioned. Dr. Galas, who 
was my predecessor in this job, was, in fact, in charge of the DOE 
element of the program at that time. Last month representatives 
of our two agencies from the NIH and the Department of Energy 
met with key members of the scientific community to work out the 
details of the next 5-year plan that we expect to issue in October, 
officially October of this year, and I expect, we expect that this 
plan will be coordinated with our international partners such as 
the Sanger Center in the United Kingdom, as well as with private 
sector ventures such as initiative that you made reference to, the 
initiative launched by Dr. Craig Venter of the Institute for 
Genomic Research and Perkin-Elmer. 

At the midpoint of its projected 15-year lifetime, the human ge- 
nome program is embarking on its high-volume DNA sequencing 
phase. This has been made possible because of advances in se- 
quencing technologies, because of advances in informatics and also 
because of enhanced access to cloned resources. The Department of 
Energy has met this challenge by creating the Joint Genome Insti- 
tute and merging the resources and capabilities and talents of our 
three genome centers at our laboratories at Berkeley, Los Alamos, 
and Livermore. The DOE expects to do its fair share of high-vol- 
ume DNA sequencing at the sequencing factory that we are estab- 
lishing at Walnut Creek, California. 

From the very beginning the human genome program has fo- 
cused on developing technologies and resources that would advance 
the utility and science of the information contained in the human 
genome and it is in that vein that we welcome the private sector 
initiatives such as the one announced by Dr. Venter and Perkin- 
Elmer. That effort is particularly noteworthy because it is our un- 
derstanding that they will share their data with us promptly, and 
it also comes at a time when we all collectively recognize that our 
nation needs enhanced sequencing capacity so that we can all reap 
the benefits of the human genome project in terms of public health 
and medicine. 

Some of the basic research that the Human Genome Program 
has nurtured, both at The Institute of Genomic Research and else- 
where, laid the foundation for the sequencing approach that's been 
proposed by the private sector venture. Such intellectual partner- 
ships between the public and private programs, we believe, will 
speed the completion of the human genome project goals and sig- 
nificantly enrich the scientific community that's involved in the 
project. As we speed up the exploitation of the genomic informa- 
tion, however, we should be ever vigilant about the ethical, legal, 
and social implications that we may have to deal with. During the 
next few months we will be unveiling the specifics of our new 5- 
year plan that will definitely incorporate the new private sector 
venture. The scientific community that is involved in our project is 
on the cutting edge of technology development and scientific dis- 
cover, and I have every confidence that many more surprises await 
us on the road ahead. 

I believe that these discoveries will happen at the interfaces be- 
tween the agencies that are involved in the human genome project 
such as biology, information science, and engineering, and I think 
that our program and, from the parochial point of view, our labora- 
tories, the DOE National Laboratories, are ideally suited to con- 
tribute to the discoveries for the benefit of our Nation. 

This completes my prepared remarks and I'll be ready to answer 
any questions. Thank you. 

[The prepared statement and attachments of Mr. Patrinos fol- 












JUNE 17, 1998 

Mr. Chairman and Members of the Subcommittee: 

I am pleased to testify before the Subcommittee on the future of the Human Genome Project 
(HGP). The Department of Energy (DOE) takes great pride in its role in this important research 
endeavor that will revolutionize the field of biology and help usher in a new millennium of 
wonderful applications in the fields of medicine, environmental remediation, and sustainable 

The DOE Biological and Environmental Research (BER) program launched a pilot project in 
1986 to examine the feasibility of sequencing the three billion pairs of human DNA and to 
identify all the genes on our twenty-three pairs of chromosomes. One of the initial objectives of 
the BER project was to seek a molecular-level understanding of the effects of ionizing radiation 
on human biology, a goal that continues today. The National Institutes of Health (NIH), having 
started its own program in 1988, joined DOE in the formal launch of the HGP in 1990 and 
together the two agencies issued a five-year research plan. In 1993, that plan was updated two 
years ahead of schedule, due to faster than expected progress; most notably, rapid progress came 
fi-om advances in physical mapping and in technology, and simultaneously fi-om the unexpected 
pace of disease gene discovery that dramatically demonstrated the value of genome-scale 
research. Last month, representatives from the two agencies met with key members of the 
scientific community to agree on the details of the next five-year plan that will be released in 
October 1998. The plan will be coordinated with those of our intemational partners (e.g., with 
the United Kingdom's Sanger Center) as well as with parallel private sector initiatives such as the 


recently announced venture by Perkin-Elmer and Dr. Craig Venter of The Institute for Genomic 
Research (PE-TIGR). 

At the midpoint of its projected 15-year lifetime, following achievement of every milestone of 
the 1993 plan on or ahead of schedule, the HOP is embarking on the task of high volume human 
DN A sequencing in order to deliver the highly accurate sequence of an entire generic human 
genome by 2005; the task has been made possible by advances in sequencing and information 
technologies and in enhanced access to clone resources. The DOE has responded to the new 
challenges of this phase of the HOP by creating the DOE Joint Genome Institute (JGI), the 
combination of the DOE genome research centers at Los Alamos, Lawrence Berkeley, and 
Lawrence Livermore National Laboratories. The Institute will undertake the DOE's share of high 
volume sequencing at its new production sequencing facility in Walnut Creek, California. 

The new five-year plan will describe the details of the public sector sequencing strategy as well 
as the other elements of the HGP. In addition to the pursuit of a complete m^ of the human 
genome, these elements include: the further development of sequencing technologies that will be 
needed to use information being generated in the HGP long after the first human sequence is 
completed in 2005; the creation of the data bases that will accept and process the large amounts 
of data generated by sequencing; the sequencing of genomes of model organisms to help us 
understand, most efficiently and cost effectively, the human genome; the ethical, legal, and social 
implications (ELSI) of the HGP; and the pursuit of some of the biological applications that will 
be enabled by the completion of the first reference or generic genome sequence, a sequence 



comprised of DNA fhim ten women and ten men who will be rigorously anonymous and whose 
informed consent will have been fully assured. 

Progress in the HGP itself, together with scientific contributions fix)m the many HGP spinoffs in 
both the public and private sector, will enable us to include new program goals that could not 
have been anticipated only a few years ago. These unexpected new goals are consistent with the 
history of the HGP making bigger payoffs and providing even greater value than anticipated, 
both scientific and economic. Advances in technology will enable the efficient characterization 
of the biological functional units in every cell, the gene transcripts and their protein products. 
Moreover, rapid progress in determining the genomic sequences of model organisms such as 
yeast (the first yeast genome was completed in 1996), the worm, C. Elegans, (scheduled for 
completion in 1998), and a rapidly increasing number of microbes is enabling more rapid 
characterization and discovery of human genes than previously expected. Progress in meeting 
the sequencing and biological goals of the HGP will also challenge the ELSI component of the 
HGP to address, more quickly, the critical issues arising from the unexpectedly rapid availability 
and use of human genome information. 

From the beginning, the HGP has been focused on developing technologies and resources that 
would advance the science and utility of the information contained in the human genome. Thus, 
DOE welcomes private sector initiatives such as the PE-TIGR venture that will add value to the 
public sector effort. This private sector effort is particularly noteworthy since it is our 
understanding that PE-TIGR intends to share its data promptly with the HGP, and since it comes 



at a time when there is an increased need for sequencing capacity if the Nation is to realize fully 
the public health and medical benefits of the genome project as quickly as possible. 

It is notable that NIH- and DOE-funded basic research (at TIGR and elsewhere) laid the 
foundation for the sequencing approach being proposed by PE-TIGR. We do believe that such 
emerging public-private intellectual partnerships will speed completion of some HGP goals and 
enrich the scientific community involved in the HGP. However, at the same time, it is important 
that we work to guarantee that HGP data acquired with public funds continue to be made 
available to the scientific community at large and that the data is of a quality that provides the 
greatest scientific information and utility. The product of the PE-TIGR venture will contain 
many gaps, whereas the HGP has always been committed to a contiguous, high quality, highly 
accurate, complete sequence. Moreover, there is a critical need for increased sequencing 
capacity within our academic and national laboratories to meet the many public sector 
sequencing demands that will follow the HGP. This information will be revealed by sequencing 
the genomes of model organisms, such as mice, rats, and primates for which we have a rapidly 
growing wealth of biological information that provides insight into how human genes function. 
In addition, sequence information fi-om portions of the genomes of hundreds of individuals will 
be needed to understand human genetic variation and will serve as the basis for developing 
individual-specific diagnosis and therapy, a potential focus of 21st Century medicine. 

The scientific community involved in the HGP is truly on the cutting edge of technology 
development and scientific discovery; and as a result, surprising new discoveries and advances 


can be expected over the next few years. Many of these discoveries will occur at the interfaces 
of the sciences that are involved in the HGP such as biology, information science, and 
engineering. The multidisciplinaiy capabilities of our national laboratories are ideally suited to 
contribute to these discoveries. Together with our NIH partners we strive to facilitate these 
discoveries and advances for the benefit of the Nation. 

This completes my prepared testimony. I would be happy to answer your questions. 


Ari Patrinos 

Dr. Patrinos received a diplofna in mechanical and electrical engineering from the 
National Technical University of Athens and a PhD in mechanical engineering and 
astronautical sciences from Northwestern University. His research included 
atmospheric turbulence, computational fluid dynamics, and hydrodynamic stability. 
After a year on the faculty of the University of Rochester he joined Oak Ridge National 
Laboratory in 1976 to conduct research on energy-related weather and climate 
modification and to develop humerical codes for loss-of-coolant (LOC) nuclear accident 

simulations as well as for river flows and lake circulations. 


In 1980, he joined Brookhaven National Laboratory to develop atmospheric chemistry 
models and to lead field programs on wetfall chemistry. In 1984, he was detailed to 
EPA and to the National Acid Deposition Assessment Program (NAPAP) staff in 
Washington, DC. He joined DOE in 1986, restructuring the Department's atmospheric 
sciences program, and in 1988 led the expansion of DOE's research effort in glot>al 
environmental change. He \yas the director of the Atmospheric and Climate Research 
Division (ACRD) of DOE's Office of Biological and Environmental Research (OBER) 
until 1990. When ACRD was merged with OBER's Ecological Research Division, he 
became director of the comblined Environmental Sciences Division. 

From August 1993 until March 1995, Dr. Patrinos was acting as the Associate Director 
for Biological and Environmental Research in the Office of Energy Research; since 
March 1995 he has been the Associate Director, who oversees the research activities 
including the DOE human and microbial genome programs, structural biology, nuclear 
medicine and health effects, iglot)al environmental change, and basic research 
underpinning DOE's environmental restoration effort. Dr. Patrinos represents DOE on 
several subcommittees of the Committee on Environment and Natural Resources of the 
National Science and Technjology Council. He is a member of the American Society of 
Mechanical Engineers, the American Geophysical Union, the American Meteorological 
Society, and the Greek Technical Society. 


Chairman Calvert. Dr. Collins. 


Dr. Collins. Thank you very much, Mr. Chairman. I am honored 
to appear before this Committee, especially with the distinguished 
folks sitting at the table with me. I am Director of the National 
Human Genome Research Institute which is the part of the Na- 
tional Institutes of Health which is devoted to the human genome 
project, one of 22 such institutes and centers of the NIH. 

In case you are not familiar with the NIH's means of funding 
science, let me just quickly point out that the funding that we give 
to the Human Genome Project is derived from grant applications 
which we get from investigators at universities, institutes and 
some companies around the country. They send in their grant pro- 
posals to us. Those are peer reviewed and then we select the ones 
that we think are the most meritorious for funding. Regrettably at 
the present time, only about one in four approved applications is 
funded but that is where the work of the NIH component of the ge- 
nome project is done, out there in academia, in small companies, 
and in institutes. 

I wanted to make four points in my brief opening statement 
which are taken from the written remarks which are more exten- 
sive. First of all, Mr. Chairman, you pointed out that there have 
been bold words spoken about the genome project. Let me speak a 
couple of them myself. As a physician and a scientist, I do believe 
that genetics has become the core science of medicine. Whatever 
disease you're interested in understanding, genetics is now the 
most powerful tool you have to get at the mysteries that still re- 
main unlocked. I also believe that the genome project has become 
the center of genetics, this effort to map and sequence all the DNA 
of the human and other model organisms is very much the focal 
point of the modem revolution. So what we are talking about today 
is the core of the core. Its importance can hardly be overstated. I 
do believe historians will look at this as the most ambitious and 
important organized scientific effort that humankind has mounted, 
including splitting the atom or going to the moon, because this is 
an investigation into ourselves. 

Second point: The genome project has been characterized by a 
complex, but carefully planned, agenda since the outset. There has 
been some misunderstanding I believe, and perhaps recently espe- 
cially in the press, about what the genome project aims to do. This 
is not just a project to sequence human DNA. In its first several 
years, many of the goals of the project related to developing maps, 
genetic maps and physical maps, as well as improving the tech- 
nologies in order to be able to afford to do the human sequencing 
at the pace that was needed to complete the job at the cost that 
was estimated to be available. So up until now, in fact, only a 
minor fraction of the budget of the human genome project has been 
devoted to the actual human sequencing, the part that is now 
ramping up in a major way with 10 percent of that now available 
in public database in assembled or partially assembled form. 


There is also an emphasis on model organisms which has taught 
us much about how genetics predicts a particular kind of pheno- 
type and which will serve us well in trying to understand what the 
human DNA sequence means. And there is our ELSI program 
which Dr. Patrinos has already mentioned, looking at the ethical, 
legal, and social implications of this research. So the genome 
project is much broader than just the human sequence. When we 
look at cost comparisons, for instance, of this approach versus that 
approach, it would be important to be sure we are talking about 
the same activities. 

Third point: The genome project up until now is arguably one of 
the more impressive success stories of the federal investment in 
science of all time. Every milestone that has been put forward by 
carefully chosen advisers outside the government have been 
achieved or exceeded. The cost that has gone into this project is 
roughly 25 percent less in its first half than was expected by the 
original planners, so it is fair to say the project has been faster, 
better, and cheaper up until now and we aim to maintain that 

As a physician I can tell you the consequences of this project are 
all around us. Back in the 1980's, when I was on the faculty at the 
University of Michigan, I spent almost 10 years finally identifying 
the cystic fibrosis gene and another roughly 10 years participating 
in a group that found the Huntington's disease gene. That was the 
best you could do in the 1980's. Nowadays, it's a matter of months. 
Just a few months ago, a gene for Parkinson's disease was found, 
using the tools of the genome project, in 9 months, and breaking 
open research in that field which has really been frustrating for 30 
years. So this is a success already. You don't have to wait until the 
sequence is in hand to see it happen. 

Fourth point: Partnership with the private sector is both nec- 
essary and desirable and we welcome this new initiative which is 
being discussed today by Dr. Venter. In fact, such public/private 
partnerships have characterized the genome project from the out- 
set. There are many other examples of that sort, though perhaps 
none as bold as this one. Again, we need to look carefully at the 
ways in which this private initiative and the publicly-funded effort 
can be complementary and we also need to consider scientifically 
the ways that the strategy is different, which actually adds to the 
complementarily. And I know Dr. Olson will particularly comment 
upon that in his remarks. 

Let me assure you, we will work together. If you doubt that, no- 
tice that Dr. Venter and I seem to have worn the same clothes 
today without intending to. We are intending to be partners in this 
in every possible way, so let this be a symbol thereof. 

This is not a race. We will work together, we believe in the value 
of that, we believe we have complimentary strategies. The federal 
effort is fully prepared to adjust their strategy. As we move for- 
ward we have a vigorous advisory process to do that, constituted 
by some of the world's best scientists. We have adjusted our strat- 
egy on a regular basis, based on technological developments, but I 
would argue that it's a little soon to know exactly what that adjust- 
ment should be. As Dr. Venter will tell you, the proposal ^vhich has 
been put forward is bold, but is yet untried, and the quality of the 


product, a very serious question because we do believe we want the 
whole genome sequence with as few gaps as possible, as few mis- 
takes as possible, the quality is so important that one must not, I 
think, deviate from that goal or from the strategy to get there until 
we have the data in front of us to see how this new approach will 

In that regard, we welcome a proposal by Dr. Venter to try out, 
as a pilot effort, the genome sequence of the fruitfly Drosophila. 
This effort, which will get under way in about 6 months, focuses 
on an organism whose genome is 30 times smaller, and much more 
tractable and I believe we will learn a lot from that pilot effort 
about the ways in which this strategy can be applied to the human. 
At that point it will be easier, perhaps, for the federal effort to 
make some predictions about ways that we might adjust our strat- 

But to summarize, we welcome this development, we believe that 
we have a good track record of working together with the private 
sector, and I look forward to seeing these two complimentary ef- 
forts get us there soon, which is my goal, and should be yours. 

[The prepared statement and attachments of Dr. Collins follow:] 


National Institutes of Health 

Statement of 

Francis S. Collins, M.D., Ph.D. 

Director, National Human Genome Research Institute 


The Human Genome Project: 

How Private Sector Developments Affect the Government Program 

before the 

Subcommittee on Energy and the Environment 

Committee on Science 

Unites States House of Representatives 

June 17, 1998 


I am Dr. Francis Collins, Director of the National Human Genome Research Institute 
(NHGRI) of the National Institutes of Health. I appreciate the opportimity to appear before the 
Subcommittee today to discuss the Human Genome Project and the implications of the recent 
announcement by a private company of their intentions to carry out large-scale sequencing of the 
human genome. 

The NHGRI is one of the 22 Institutes and Centers that comprise the federation of federal 
research entities known as the National Institutes of Health (NIH). The vast majority of research 
dollars appropriated to the NIH flow out to the scientific community across the Nation, primarily 
in the form of peer-reviewed research grants. Today, that community numbers more than 50,000 
investigators affiliated with nearly 2,000 universities, hospitals, and other research facilities 
located in all 50 states, the District of Columbia, Puerto Rico, Guam, the Virgin Islands, and 
certain points abroad. 

The NHGRI is the lead Institute at the NIH with responsibility for The Human Genome 
Project (HGP). The HGP officially began in October of 1990 as a 15-year program to 
characterize in detail the complete set of human genetic instructions (the "genome"). The central 
aim of the project, which the federal government funds through programs at the NIH's National 
Human Genome Research Institute and the Department of Energy, is to arm health researchers 
with powerful gene-finding and DNA analysis tools to unravel and understand the myriad human 
diseases that have their roots in DNA. Now at its half-way mark, genome project tools have 
underpinned virtually all gene discoveries of this decade. 

The Human Genome Project's success stems largely from a unique and rigorous plaiming 
process that sets ambitious research goals, time lines and budgets. The first joint NIH/DOE plan, 
which covered years 1991-1995, included goals for: 

► physical and genetic maps; 

► experimental DNA sequencing of the fhiit fly, a round worm, yeast, and the bacterium 

► computer management of research data; and 

► studies of the ethical, legal, and social implications (ELSI) of these new abilities to read 
genetic information 

Because of the rapid pace of genome research and technology development, scientists met 
many of those initial goals ahead of schedule and under budget. So the research plan was 
updated again in 1993 to establish new NIH-DOE goals through 1998. All of these goals have 
now been met or exceeded. Original expectations were that the NIH cost of these activities fi^om 
FY'91-97 would exceed $1 billion in 1991 dollars. I am pleased to report that the cost has been 
about 25 percent less than that projection. 


Gene Discovery 

Today, with Human Genome Project tools, it is possible to track down a disease-related 
gene even when nothing is known about the biochemical problems of the disease or how the gene 
works. This technique, based on identifying the position of a gene in the chromosome and then 
isolating it, is commonly referred to as positional cloning and was successfully used for the first 
time in 1986. Now, the increasing detail and quality of genome maps have reduced the time it 
takes to find a disease gene fi-om years, to months, to weeks, to sometimes just days, and 
scientists are using the tools to discover dozens of disease genes each year. 

An Example - Parkinson's Disease 

The isolation of a gene for Parkinson's disease (PD) last year demonstrated the power of 
this new discovery method and showed conclusively that changes in DNA can cause PD in some 
families. Only two years ago, the National Institute of Neurological Disorders and Stroke held a 
workshop to explore using genetic approaches to understand PD. A team led by scientists in 
NHGRI's Division of Intramural Research (DIR) began large-scale genetic analysis of DNA fi-om 
members of a large Italian family containing almost 600 people, more than 60 of whom have been 
diagnosed with Parkinson's. In nine days, NHGRJ gene hunters mapped the gene to a region of 
chromosome 4, which contained approximately 100 genes. One of the several genes in that 
interval had already been identified on the gene map and was known to encode a protein called 

In just a few months, the researchers showed conclusively that an altered alpha-synuclein 
gene caused Parkinson's disease in the study families. Many have hailed this as the most 
significant advance in Parkinson's disease research in 30 years. Just last month, a Japanese 
research team used genome mapping tools to isolate another gene, this time on chromosome 6, that 
also appears to contain a gene that, when altered, predisposes the individual to a rare juvenile form 
of Parkinson's disease. 

Ethical, Legal, and Social Implications 

NHGRI has established productive partnerships among consumers, scientists, and policy 
makers to help reduce the possibility that genetic information will be used to harm an individual or 
family members and ensure that it will be of benefit to both patients and providers. As an integral 
part of the Human Genome Project, the NHGRI and the DOE have each set aside a portion of their 
funding to anticipate, analyze, and address the ethical, legal, and social implications (ELSI) of the 
Project's new advances in human genetics. The current goals of the ELSI program are to improve 
the understanding of these issues through research and education, to stimulate informed public 
discussion, and to develop policy options intended to ensure that genetic information is used for 
the benefit of individuals and society. Because genetic information is personal, powerfiil, and 
potentially predictive, it can be used to stigmatize and discriminate against people. Genetic 
information must be private. 


DNA Sequencing 

If the letters representing the 3 bilHon bases in the human genome were printed out in 
books, and the books were stacked one on top of the other, they would reach as high as the 
Washington Monument. The current major goal of the Human Genome Project is to read the order, 
letter by letter, of those 3 billion bases. 

Sequencing was once done by hand as a series of chemical reactions^a slow and costly 
method. In 1990, when the HGP began, the sequencing cost was $10/base. Now, because of 
public investment and collaboration with the private sector, machines read the sequence fragments 
quickly and efficiently. As a result, the sequencing cost has been dramatically reduced to roughly 
$.50/base for high-quaUty "finished" sequence. 

Using a strategy referred to as "shotgun" sequencing, an investigator takes each page of 
those books stacked as tall as the Washington Monument, and randomly cuts the text into small 
fragments. These fragments are small enough for sequencing machines to read. To get long 
sfretches of contiguous DNA, investigators must then reassemble these sequenced fragments back 
into sentences, paragraphs, chapters, and books. The reassembly of this puzzle is carried out 
largely by sophisticated computer programs. 

The sequencing strategy the public genome project uses employs shotgun sequencing of 
DNA fragments that already have been carefully mapped and catalogued. This process makes 
reassembling the sequenced fragments into contiguous sequence easier because you know where 
the fragment came from. In addition, scientists periodically encounter DNA fragments that are 
particularly difficult to sequence. To return to the analogy, it is much easier, takes less time, and 
is less costly to assemble the text in "finished" form if all the fragments are known to have come 
from the same chapter. 

In 1996, NHGRI began pilot projects to test strategies and technologies for full-scale 
sequencing of the human genome. We now have imdertaken human sequencing in earnest. As a 
result, investigators have deposited almost 150 million bases of "finished" high-quality human 
DNA sequence in GenBank, the publicly frinded database supported by the National Library of 
Medicine. In accordance with the agreed-upon standards of the international genomic community, 
all NIH-DOE fiinded sequencers have agreed to a rapid data release policy, such that, new 
sequence data is submitted to publicly accessible data banks within 24 hours. If one includes 
"finished" and "close-to-finished" sequence, over 300 million bases, or 10 percent, of the human 
DNA sequence has been deposited in GenBank. 

In order to meet the standards adopted by the international genomic commimity, the 
sequence produced must have four characteristics —the "4 A's" of the Himian Genome Project ~ 

1) the sequence must be accurate, that is, the DNA spellings must be correcL The publicly 
funded genome effort will ensure accuracy of 99.99 percent or better. 


2) the sequence must be assembled. Large-scale sequencing relies on the accurate 
assembly of smaller lengths of sequenced DNA into longer, genomic-scale pieces, so DNA 
will be assembled into long pieces that reflect the original genomic DNA. 

3) Because human DNA sequence must also be affordable, a portion of our research 
funds focuses on technology development to reduce the cost as much as possible. 

4) Finally, high-quality, finished human DNA sequence must be accessible. In order to be 
useful, sequence data needs to be rapidly available to the entire research community. 

Research Planning 

Informed by a series of workshops over the past year that reviewed research progress and 
identified genome research opportunities, Human Genome Project leaders recently met with more 
than 100 representatives from a range of scientific disciplines to develop the next 5-year plan, 
scheduled to begin in the fall of 1998. With both the physical and genetic maps complete, and 
human DNA sequencing pilot projects underway, goals of the 1998-2003 draft plan considered at 
that meeting focused on: 

completing a full, highly accurate and contiguous human genome DNA sequence; 

further development of technologies for steadily increasing sequencing capacity and 

reducing costs; 

studies of variations in human DNA; 

studies of how large sets of genes function; 

studies of the similarities and differences between the human genome and those of 

important laboratory animals; 

improved computer methods for data management; and 

studies regarding the ethical, legal and social implications of the HGP. 

Private Sector Developments 

Just prior to the HGP planning meeting, industry researchers fi-om The Institute for 
Genomic Research (TIGR) and Perkin Elmer, Inc. announced a plan to apply a DNA sequencing 
strategy they had used on micro-organisms to produce a "rough draft" of the human genome 
sequence. The sequencing strategy recently proposed by Perkin-Elmer, Inc. and TIGR differs 
from the public effort in two significant ways: quality and access. 

First, that strategy, called "whole-genome shotgun sequencing", employs fragments that 
have not been previously mapped or catalogued prior to sequencing. Because scientists will not 
know where in the long chain of 3 billion base pairs the fragment might belong, the task of 
reassembling the fragments becomes far more difficult. This difficulty in reassembly inevitably 
will lead to gaps and misassemblies in the sequence. Some of these may occur in DNA regions 
with great biological significance. The private sector approach does not propose to fill in all the 
gaps left by these unsequenced fragments, thereby creating a product that will be incomplete for 


many research uses. 

Secondly, release of sequence data from the Perkin-Elmer-TIGR effort will occur 
quarterly, rather than daily. The policy of daily release of DNA sequence data by publicly- funded 
efforts was arrived at because of the great interest in the scientific community in gaining access to 
this highly valuable information. Any delay can result in wasted effort in research. 

Deliberations on Five-Year Research Plan 

Because the industry plan seemed to parallel some aspects of the federal Human Genome 
Project, planners and advisors to the NIH-DOE program have been debating extensively how the 
two proposals could be matched up. The scientists, at the recent planning meeting on the draft 
HOP 5-Year Plan, concluded that while the two projects should complement one another, the 
federal project should continue its plans to provide high-quality human DNA sequence as soon as 
possible and that all data should be freely accessible. 

Those conclusions rested on a few key factors: 

► The industry effort may not deliver the product in the time and manner proposed. The 
industry approach to sequencing has not been tried on large and complex genomes, such as 
the human, and depends on newly developed and unproven machines. Data to evaluate the 
"whole genome" shotgun approach will initially come from a trial project on the fioiitfly, 
Drosophila, but is not expected on the human for at least 12 to 18 months; 

► The industry plan will produce a large amount of highly useful sequence data, but this plan 
will yield a qualitatively different product that will likely contain tens of thousands of 

► The industry plan calls for release of sequence data on a quarterly basis, and patenting of 
100-300 "gene systems." While quarterly data release is commendable, the plan is not as 
sfrong as the standards established by the international sequencing community which 
require release of data within 24 hours and discourage patenting. Further, some concerns 
were expressed that the private effort's commitment to data release might diminish over 
time, if business pressiu-es came to the forefront. 

In view of those concerns, advisors at the planning meeting enthusiastically made several 
unanimous recommendations: 

► The publicly funded genome project should continue with plans to provide a complete, 
high-quality human DNA sequence by the year 2005, and sooner if at all possible; 

► All possible steps must be taken to ensure that all sequence data remain in the public 


► The publicly funded effort should take advantage of technology advances to increase 
sequencing capacity as much as possible as soon as possible to meet research needs, both 
for sequencing of the human and model organisms; and 

► The sequencing of DNA regions of high utility and research interest should be emphasized. 

Now, Human Genome Project leaders at the NIH and DOE are considering that advice as 
they put the final touches on the new research plan, which will be published in the fall of 1998. 
The complete plan will contain details for all of the Human Genome Project's goals, including 
sequencing, gene function, human variation, technology development, and Ethical Legal and 
Social Implications. 

The private and public genome sequencing efforts should not be seen as engaged in a 
race. In fact, scientists at TIGR and Perkin-Elmer have expressed their enthusiasm for a continued 
vigorous public effort on the HGP, and have conveyed their willingness to collaborate with NTH 
and DOE on the production of the complete human sequence. The NIH and DOE welcome this 
collaborative approach, as the whole should be greater than the sum of the parts. 


Mr. Chairman, I commend you, and the Members of this Subcommittee, for convening this 
hearing today. The impact on the future of biology of knowing the order of all 3 billion human 
DNA bases has been compared to Mendeleev's establishment of the Periodic Table of the 
Elements in the 19th century and the advances in chemistry that followed. The complete set of 
human genes-the biologic periodic table-will make it possible to begin to understand how they 
function and interact. Rapidly evolving technologies, comparable to those used in the semi- 
conductor industry, will allow scientists to build detectors that analyze tens of thousands of genes 
in a single experiment. Scientists will use the powerful new tools to reveal the secrets of disease 
susceptibility. This knowledge will in turn allow researchers to create broad new opportxmities for 
preventive medicine, lay the foundation needed to develop and better target effective therapeutics, 
and provide unprecedented information about the origin and migration of human populations. 

The investment of substantial funds by the private sector in human sequencing reaffirms 
the enormous value of Human Genome Project products and is a testament to the success and 
value of the tools already developed by the publicly supported project. For the reasons outlined 
above, it is not yet knovra what role this new endeavor will play over the long term in providing 
the publicly available, detailed "A-to-Z" instruction book ultimately promised by the Human 
Genome Project. Project leaders at the National Institutes of Health and the Department of Energy 
look forward to close cooperation with Perkin-Ehner and TIGR as the new initiative unfolds over 
the next few years. 

This concludes my remarks. I would be pleased to answer any questions. 


Francis S. Collins, M.D., Ph.D., Dr. Francis Collins was appointed Director of the National Human Genome 
Research Institute in April 1993. NHGRI oversees tlie role of the National Institutes of Health in the U.S. Human 
Genome Project. 

Dr. Colhas pioneered the development of a powerful gene-finding method known as "positional cloning." which 
utilizes the inheritance pattern of a disease within families to pinpoint the location of the gene associated with the 
disease. Positional cloning is now commonly used to isolate genes even when no information about the gene's 
function or biochemistiy is known. Dr. Collins is perhaps best known for asing positional cloning techniques to 
isolate the genes for cystic fibrosis, neurofibromatosis type 1 , Huntington's disease, and ataxia telangiectasia. 

He was formerly a Howard Hughes Medical Institute investigator and professor in the Departments of Internal 
Medicine and Human Genetics at the University of Micliigan School of Medicine in Ann Arbor. He was also 
director of the NCHGR-supported human genome center at Michigan. 

Current active research projects in the Collins laboratory include the develop of better methods for analyzing 
mutations in disease genes, especially for the BRCAl gene on chromosome 17. The laboratory Li also involved in an 
ambitious effort to map the major genes contributing to adult-onset diabete.s. by carrying out extensive linkage 
analysis on affected siblings, largely collected in Finland. Positional cloning of the gene.s for familial mediterranean 
fever and multiple endocrine neoplasia are also underway, in collaboration with other Investigators. 

Bom in Staunton, 'Virginia, in 1950, Dr. Collins received his bachelor of science degree with highest honors from 
the University of 'Virginia. He received both his M.S. and Ph.D. degrees in physical chemistry from Yale University 
and an M.D. degree from the University of Nonh Carolina School of Medicine. He completed his internship and 
residency in internal medicine at the North Carolina Memorial Hospital. From 1981 to 1984, he was a fcUow in 
human genetics and pediatrics at Yale. He joined the DepartmenLs of Internal Medicine and Human Genetics at 
Michigan in 1984, becoming professor in 1991. He became a Howard Hughes Medical Institute assistant 
investigator in 1987 and full investigator in 1991 . Collins is a diplomate of the American Board of Internal 
Medicine, the Anterican Board of Medical Genetics, and the American College of Medical Genetics. 

Dr. Collins was elected to the Institute of Medicine in 1991 and the National Academy of Sciences in 1993. He is 
also a member of the American Federation for Medical Research, the American Society for Clinical Invesiigadon, 
the Association of American Physicians, and the international Human Genome OrganiTation. He serves as an 
associate editor for several publications, including Genomics; Genes. Chromosomes and Cancer, Human Molecular 
Generics: Somatic Cell and Molecular Generics; and Human Mutation. 

Among his most recent awards and honors. Dr. Collins has received the Gairdner Foundation International Award, 
the Young Investigator Award of the .'Vmerican Federation for Clinical Research, the Doris Tulcin Award for Cystic 
Fibrosis Research, University of Michigan's Distinguished Faculty Achievement Award, the National Medical 
Research .\ward, and the University of Pittsburgh Dickson Priw. He holds honorary degrees from several academic 


Chairman Calvert. Thank you, Doctor. 
Dr. Venter. 


Mr. Venter. Thank you very much, Mr. Chairman. I appreciate 
the opportunity to testify before your Subcommittee about the im- 
pact our new developments on the federally-funded human genome 
effort. I also appreciate the comments of Dr. Patrinos and Dr. Col- 

I'm the founder and President of The Institute for Genomic Re- 
search, often known as TIGR, in Rockville, Maryland, and I'm the 
to-be President of the new company we're forming, I'm a co-founder 
of that company along with Tony White and Mike Hunkapiller of 
the Perkin-Elmer Corporation. Recent publicity about our new ven- 
ture to sequence the human genome in 3 years has lead to specula- 
tion that funding for the human genome effort should be reduced 
or eliminated. Nothing could be further from the truth. Upon com- 
pletion of today's hearing, I hope it's clear that this new private 
venture, and the federally-funded project are, in fact, complimen- 
tary efforts that can work together to make unprecedented impact 
on improving research on human health. 

One goal of our new to-be-named company is to sequence the 
human genome over 3 years, using dramatic new technology devel- 
oped by Mike Hunkapiller's team at the Perkin-Elmer Corporation 
in strategies that have been developed by myself and my colleagues 
at The Institute for Genomic Research for sequencing whole 
genomes. I agree with the comments of Dr. Collins that the focus 
has been lost in the purpose of obtaining the human genome se- 
quence. And it was concentrating on what was perceived to be an 
absolutely monumental task of obtaining that sequence, due to the 
limits and technologies and procedures that we've had in the past. 
Analogies to the Manhattan Project and Apollo Project are often 
used. Billions of dollars from the U.S. Government and Europe and 
Japan, decades of work from thousands of scientists around the 
world, were thought to be required to obtain that sequence. New 
technologies and strategies now change and replace some of these 
assumptions. The human genome will be accurately and completely 
covered in one facility by a new company in Rockville, Maryland, 
with a few hundred workers using new technology. 

Our effort has been described by some as a rough draft or worse 
of the human genome but I've heard these comments before in 1994 
when Nobel Laureate Ham Smith and I proposed the new strategy 
for sequencing genomes. In fact the first genome in history that we 
published in Science in 1995 was done with this approach. The ge- 
nome review panel involving NIH funding rejected our grant as 
being impossible and that we'd have a large number of 
noncloseable gaps and misassembled pieces of the genome and at 
the best the sequence would be an incomplete and full of holes. 
They were clearly wrong. 

TIGR is the only organization in the world to have completely 
sequenced more than one genome. In fact, we've completed seven, 
including the first three and those seven represent half of the en- 
tire world's complement of completed genomes. All seven, plus five 


more to be finished this year by us, were done by the whole genome 
shotgun approach. Our sequences are some of the highest-quality 
sequences ever completed and published. More than a dozen patho- 
gen genome projects are now under way at TIGR, including the 
malaria genome with funding by the National Institutes of Health. 
I should point out that the Department of Energy using slightly 
different review processes funded TIGR to sequence two out of 
three of the first genomes completed in history and that funding 
was obtained prior to the completion of the Hemophilus influenza 
sequence in 1995. 

The DOE has also funded TIGR to sequence more than a dozen 
key environmental genomes, using the whole genome shotgun 
method, and the Department of Energy has also funded the bac- 
terial artificial chromosome in sequencing strategy that is provid- 
ing the scafiblding for assembling the entire human genome se- 
quence. I'm here to urge you not only to not cut the DOE or other 
genome budgets because of our announcement and effort, but to ac- 
tually consider increasing it. 

Having the complete genome moves forward all the issues associ- 
ated with genomics. The sequence is the beginning of the genome 
project. It is absolutely not the end of anything, except, perhaps, 
the end of ignorance. A private/public partnership will not only en- 
sure completion of the genome sequence sooner, it will provide the 
basis for beginning the key aspects of the genome project, for exam- 
ple, understanding what the sequence means. 

Because our effort is moving forward substantially the timetable 
for completing the genome sequence, the resources for understand- 
ing the genomic code become even more important. With compara- 
tive genomes, we've learned this in microbial genome sequences, 
having one genome was fantastic, having two or three was phe- 
nomenal and aided our understanding. That's the situation with 
human and that's part of the existing plan to do the mouse and 
other genomes. We need those genomes to understand and inter- 
pret the human genome. By working together, DOE, NIH, and 
other public and private institutions can help meet the goal of hav- 
ing a complete map and sequence of the human genome within 
three years. I see that as an announcement that everybody can be 
proud of. 

I hope that after this hearing you will view our announcement 
in the federal program, for which you are responsible, not as an ei- 
ther/or proposition, but instead will focus on how these two activi- 
ties, working in tandem, can ultimately improve our lives and 
those of generations to come. 

This concludes my remarks and I'm pleased to answer any ques- 
tions you may have. 

[The prepared statement and attachments of Mr. Venter follow:] 

<i-on oe 


9712 Medkol C«il« Drive, Rodcville, Marylond 208S0 


(301) SSB-O^Ot Fax . . 










June 17, 1998 . 


Mr. Chairman, I appreciate the opportunity to testify today before your subcommittee 
about the impact of private sector developments on the federally-funded Human Genome 
Project. Recent publicity surrounding the intent announced by Perkin-Elmer and me to 
sequence the human genonie has led some to speculate that federal funding for the human 
genome is no longer needed. Nothing could be further from the truth. The Human 
Genome Project is truly a success that both the scientific community and the federal 
government can look upon with pride which will continue to generate important 
information. I am pleased to be here to put in context the role that I have played up until 
now, and the role the I hope to play in the future. I hope after today you will recognize 
the success of the program that you have funded, and also recognize the vast potential to 
improve human health that lies just around the comer by linking both the federally- 
funded initiative and our new private sector venture. 

I am J. Craig Venter, President and Director of The Institute for Genomic Research 
(TIGR), an independent, not-for-profit research institute in Rockville, MD that I founded 
in 1992 after leaving the National Institutes of Health (NIH). On May 1 1, The Perkin- 
Elmer Corporation, the largest producer of DNA sequencing technologies in the U.S., and 
I announced a new venture to create a company that will sequence, as part of its initial 
projects, the Drosophila (fruit fly) genome and the human genome within the next three 
years. These two sequencing projects will be undertaken using breakthrough DNA 
sequencing technology developed by Perkin-Elmer, and a DNA sequencing strategy that 
was pioneered by my colleagues and me at TIGR, known as the whole-genome shotgun 
sequencing method. 

This announcement is very exciting for both the public and private scientific communities 
throughout the world, but it is of particular significance to the United States because it is 
the validation of the scientific claims of the Human Genome Project, that was first 
discussed over 14 years ago and funded for the last ten years by U.S. taxpayers. 
However, I believe that in order for me to explain this comment and adequately answer 
the question that is the reason for today's hearing, it is important to discuss the events that 
made our announcement possible. 


When I was at NIH, I was a Section Chief at the National Institute for Neurological 
Disease and Stroke (NINDS). My lab was involved in a large scale chromosome 
sequencing effort to discover genes associated with neurological functioning and disease. 
During this research, my colleagues and I developed a new strategy for identifying genes 
more rapidly and at much less expense than previously had been possible. Prior to the 
development of this new strategy we had labored for many years using "traditional" 
sequencing methods to identify a few genes. In my own case , I spent ten years on the 
gene for the adrenalin receptor. With the new strategy we greatly exceeded the work of 
many previous years of effort in just a few months. This new strategy known as 
Expressed Sequence Tags (ESTs) was published in the joumal Science in June 1991 


(Complementary DNA Sequencing: "Expressed Sequence Tags" and the Human Genome 
Project. Science 252, 1 65 1 - 1 656 ( 1 99 1 )). At the time of this pubUcation, fewer than 
2,000 of the 60,000 to 80,000 human genes were known. 

It is important to note that this new strategy was more than just creative thinking on the 
part of the federally-funded scientists in my lab. It also included a significant role played 
by a new technology company with which we had begun to collaborate. In the late 
1980's, Applied Biosystems manufactured a new DNA sequencing technology that 
greatly improved the speed with which a DNA sequence could be obtained. My NIH lab 
entered into a CRADA with this firm and worked with them to improve their technology. 
In fact, this was the first CRADA entered into by NIH with a commercial organization. 

By linking my lab's new EST strategy with Applied Biosystem's sequencing technology 
it became possible to greatly improve the speed with which new genes and DNA 
sequences in general could be identified. While our new strategy was not yet widely 
accepted, I learned that orders for the Applied Biosystems DNA sequencers that we used 
in our experiments had skyrocketed. So there was clearly significant movement on the 
part of both academic and commercial institutions to adopt this new technique detailed in 
the Science publication. 

About a year earlier. Congress had provided the initial funding to the Department of 
Energy (DOE) and NIH for the Human Genome Project (HGP). From its inception, 
major technical innovations were considered essential to the success of the project and 
our new strategy was a significant step forward. In fact, the gene discovery phase of the 
project could be shortened to almost one-tenth of the originally anticipated timeframe. 
However, there were many other hurdles to clear. 

Obviously with this exciting new strategy I was eager to scale up our research program at 
NIH in order to implement a successful, large-scale genome sequencing and gene 
discovery program. However, the extramural genome community did not want genome 
funding being used on intramural programs. In addition, there was growing controversy 
surrounding the issue of the U.S. government patenting ESTs that I discovered. I was 
frustrated that I would be unable to participate in the revolution in biology that we had 
helped start. I did not want to leave NIH, but after much soul-searching I felt it was the 
most appropriate option. 

In 1992, with funding from the venture capital community, I formed TIGR as an 
independent, not-for-profit research institute to implement the programs that I had 
envisioned for my lab at NIH. In short order, we utilized the EST strategy to identify 
more than half of the genes in the human genome and published this information in the 
Human Genome Directory in the journal Nature in 1995 (Initial Assessment of Human 
Gene Diversity and Expression Patterns Based Upon 52 Million Basepairs of cDNA 
Sequence. Nature 377 suppl., 3-174 (1995)). Also in 1995, using a new strategy for DNA 
sequencing that we pioneered, known as the whole-genome shotgun approach, TIGR 
published the first complete sequence of a self-replicating, living organism, Haemophilus 


influenzae, a bacteria that causes ear infections in children (Whole-Genome Random 
Sequencing and Assembly of Haemophilus influenzae Rd. Science 269 . 496-512 (1995). 

In the time since then, TIGR has become one of the leading genomics institutions in the 
world by determining the complete DNA sequence for six other organisms. Most 
recently, we published the sequences for the pathogen that causes Lyme disease, Borrelia 
burgdorferi, and the bacteria that causes stomach ulcers, Helicobacter pylori ( Genome 
Sequence of the Lyme Disease Spirochaete, Borrelia burgdorferi. Nature 390 . 580-586 
(1997), The Complete Genome Sequence of the Gastric Pathogen Helicobacter pylori. 
Nature 38L 539-547 (1997)). We have also published the DNA sequence for 
Methanococcus jannaschii, the first archaeal genome to be sequenced, funded by the 
Department of Energy (DOE), and we will soon be publishing the third DOE-funded 
genome, Deinococcus radiodurans (The Complete Genome Sequence of the 
Methanogenic Archeon, A/er/ianacocc/«yanmuc/«i. Science 273. 1058-1073(1996). No 
other institution in the world has completed more than one genome. 

TIGR has also been funded to sequence human chromosome 16 by the NIH as one of the 
genome sequencing centers funded through the National Human GenonK Research 
Institute (NHGRI). In support of this effort, DOE has funded TIGR to generate sequence 
from the ends of 600,(X)0 B ACs (bacterial artificial chromosomes) that will form a 
scaffold linking the human genome sequence together. 


During this same timeframe Applied Biosystems had grown as well. The continued 
expansion of the Human Genome Project, and the use of genomics for research in other 
areas of biology created huge demand for DNA sequencers. Between 1987 and 1997, 
more than 6,000 ABI sequencing systems had been sold, giving them the largest installed 
base of automated sequencers in the world. 

In 1993, Perkin-Elmer, a U.S.-based scientific instrument manufacturer, acquired Applied 
Biosystems and renamed it PE Applied Biosystems. Perkin-Elmer made a significant 
investment in the life sciences with its acquisition of Applied Biosystems and it has 
continued to enhance this investment by, for example, investing over $100 million in the 
last year for research and development to ensure that it continues to develop new, cutting 
edge technologies. It is one of the these new technologies, the ABI Prism 3700, that will 
be used for this new venture. 


As I'm sure you are all familiar, the Human Genome Project has continued to be funded 
through the DOE and NIH and is now entering its ninth year. This project was officially 
launched in 1990 as a $3 billion, 15-year federal initiative to map and sequence the 
complete set of human chromosomes and those of several model organisms. This project 
was a huge boost to the scientific community and represents a project that, when 
completed, could have much greater significance to our society than landing on the moon. 
As a result of this commitment made by the U.S. government, our biotechnology 


industry, which is holding its annual meeting in New York City this week, leads the 
world both in the science it undertakes, the jobs it creates, and the products it delivers to 
improve human health. 

Last month a working group completed a review of the draft for the next five-year plan of 
the Human Genome Project. The program continues to move forward and has made great 
strides. When it was conceived, very few other organizations, either public or private 
recognized the value that this activity would have in the scientific and broader 
communities. Now, largely through the success of this relatively small federal program, 
whole pharmaceutical companies are restructuring their drxig discovery and development 
process based on genomics. 

Unfortunately, when the Human Genome Project was initially explained to the Congress 
and other organizations a misunderstanding occurred, and the NIH Director, Dr. Harold 
Varmus, pointed this out at the press briefing we held last month to announce our new 
venture. The scientists who helped organize this program indicated that sequencing the 
human genome was the key to improving our knowledge of human biology. This 
statement has led many to believe that obtaining the complete human DNA sequence 
would mark the end of the project. In fact, the acquisition of the sequence is only the 
beginning. The sequence information provides a starting point from which the real 
research into the thousands of diseases that have a genetic basis can begin. So, the sooner 
we can get to this starting point, the sooner we can begin to see a payoff in ultimately 
improving human health. 


As I earlier indicated, our announcement last month to sequence the human genome 
within the next three years has been widely reported in both the scientific and popular 
press. Like the federally-funded project, it captures the imagination. Like the federally- 
funded project, our goal is not to obtain the sequence for its own sake, but to obtain it to 
serve as a foundation of data upon which new research into human health can be built. 
The goal is to develop the definitive resource of genomic and associated medical 
information that will be used by scientists, in both the public and private sectors, to 
develop a better understanding of the biological processes in humans and to deliver 
improved health care in the future. 

In addition, this new company intends to build the scientific expertise and informatics 
tools necessary to extract valuable biological knowledge from this data. This will include 
discovering new genes, developing polymorphism assay systems, and developing a 
variety of databases. 

There is value in obtaining the sequence of the human genome as quickly as possible—not 
for the sequences themselves, but for the new research opportunities it will create. There 
is a significant infrastructure already in place in public sector research institutions that 
will greatly benefit from this data. Meanwhile, the pharmaceutical and biotechnology 
industries recognize that the human genome will be the significant resource for future 


drug discovery and development. Most important, we believe that access to this 
information is valuable because it will ultimately transform the fundamentals of 
healthcare delivery and medical practice and improve the lives of millions of people. 

The development of a new, fully-automated sequencer by Perkin-Elmer, coupled with the 
whole-genome shotgun strategy will reduce the costs of operating labor and reagents, 
while it increases the speed with which sequences can be generated. By building on the 
resources that have already been developed, such as the significant resource funded by the 
DOE to sequence the ends of B ACs, we have a framework for linking the human genome 
together, the mechanism for verifying the alignments of sequences on individual 
chromosomes and internal controls for ensuring the quality of the information that this 
venture will generate. 

The aim of our project is to produce a highly accurate, ordered sequence that spans more 
than 99.9% of the human genome. The accuracy of this sequence will be comparable to 
the standard now used in the genome sequencing community of fewer than one error in 
10,000 base pairs. We look forward to working with other genome centers to ensure that 
the sequence meets the requirements of the scientific community for accuracy and 


A fact that has often been overlooked or questioned in the press accounts of this venture 
is that an essential feature of the new company's business plan is to provide public 
availability of the sequence data. A major consequence of the analysis of data generated 
by this project will be the creation of a comprehensive human genomic database. 
Because of the importance of this information to the entire biomedical research 
community, key elements of this database, including primary sequence data, will be made 
available. In this regard we will work closely with national DNA repositories like the 
National Center for Biotechnology Information. 

it is our plan to release data into the public domain at least every 3 months including the 
complete human genome sequence at the end of the project. We also anticipate providing 
a connect fee for online access to these data and many of the informatics tools that 
researchers can use to interpret them. We will also market the database system to 
commercial companies engaged in pharmaceutical and biotechnology research. 

A concern that has been raised in many publications is how the intellectual property 
issues associated with generating the entire human genome sequence will be handled. 
First, let me just say that I have been associated with intellectual property issues related to 
DNA sequences from the beginning and have great appreciation for the sensitivities of 
this concept. By making the sequence of the entire human genome available it makes it 
virtually impossible for any single organization to own its entire intellectual property. It 
eliminate the entire speculative nature that is currently associated with patenting DNA 
sequence information and requires that researchers understand the biology of a sequence 
before they file a patent application. 


Our actions will make the human genome unpatentable. We expect that this primary 
data will be used by us and others as a starting point for additional biological studies that 
could identify and define new pharmaceutical and diagnostic targets. Once we have fully 
characterized important structures (including, for example, defining biological function), 
we expect to seek patent protection as appropriate. Given the complexity and scope of 
the information found in the human genome sequence, we expect our efforts to be 
focused on 100 to 300 targets from among the thousands of potential targets. 


Another question that I have been asked frequently is, can the whole-genome shotgun 
strategy even work with a genome the size of the human genome? It is our hypothesis 
that this approach will be successful. In fact, we plan test the effectiveness of this 
strategy by collaborating with Gerald Rubin of the Howard Hughes Medical Institute and 
the University of California at Berkeley and the Berkeley Drosophila Genome Project to 
sequence Drosophila, another large and complex genome, while we establish the 
infrastructure for the larger human effort. In addition, this genome will provide us 
significant insights into the biology of another model organism. 


Finally, there is the concern that has brought us before you today. How will this new 
private venture impact the federally-funded Human Genome Project? It is our sincere 
hope that this program complements the broader scientific efforts to define and 
understand the information contained in our genome. We recognize that our effort would 
not even be possible if not for the efforts of those in academia and government who 
conceived and initiated the Human Genome Project. In fact, the knowledge gained from 
this effort will provide the key to deciphering the genetic contribution to thousands of 
human conditions and substantiates and underscores the need to increase the government 
investment in further understanding of the human genome. 

I have heard from different sources that our new venture indicates that the federally- 
funded program has been a waste of money. I cannot state emphatically enough that our 
announcement should not be the basis for this claim. Let me explain this by way of an 
example. Recently, the genome of yeast, S. cerevisiae, was completed. This genome was 
begun before the whole genome shotgun strategy was developed and as a result it took 
many years to complete. Literally thousands of scientists worked on this project. Does 
the fact that a faster way to obtain the sequence of the organism they were working on 
render their work meaningless? Likewise, this new technology and strategy we have 
announced would have allowed us to sequence the first genome, H. influenzae, much 
more quickly. This fact does not diminish the importance of obtaining the sequence of 
this organism. 

By increasing the speed with which the sequence of the human genome will be obtained, 
we have not brought any program to completion. We have only helped get everyone to 
the starting line a little bit sooner. The real race is the one that confronts us each and 


every day, and that is the one to develop treatments that will help end human suffering 
brought on by the thousands of diseases that plague humanity. 

The impact that our new venture will have on the federally-funded Human Genome 
Project should be to re-orient it sooner to move beyond DNA sequencing into the 
research that will help us better understand and treat these diseases. 

It is not appropriate to judge the relevance of the Human Genome Project on the basis of 
our announcement in a retrospective fashion. Without the past we could not be here 
today. However, it is appropriate to judge the program's relevance in light of our 
announcement, and others that may come, by the its ability to adapt and work with new 
initiatives rather than compete against them. 

In effect, this new venture is the private sector recognition of the importance of the 
Human Genome Project. By working closely together, NIH, DOE and other public and 
private institutions can help meet the goal of having a complete map and sequence of the 
human genome sooner than anyone ever imagined. 

There are many other issues that completing the sequence of the human genome, as well 
as other genomes, will raise in the very near future. This increased knowledge of 
evolution, and ultimately ourselves, will likely prompt many questions that society has 
never even considered. If anything, this new information will require us to strengthen our 
scientific infrastructure and improve scientific education. We must work to ensure that 
the science is of the highest quality, appropriately interpreted and peer reviewed. If these 
areas are addressed, I believe we can appropriately assimilate the wealth of new 
knowledge and technology that genomics will provide. 


As I said at the outset, I see the announcement of this new venture as one for which 
everyone can be proud. It includes the federal government taking the initiative to begin a 
significant program which is then made more successful by individual creativity and 
ingenuity, and ultimately is validated by support from the private sector. I hope that after 
this hearing you view both our announcement and the federal program for which you are 
responsible as not an "either/or" proposition, but instead focus on how these two 
activities working in tandem can ultimately improve our lives and those of the 
generations to come. Thank you. 


J. Craig Venter, Ph.D. 

The Institute for Genomic Research 

9712 Medical Center Drive 

Rockville, MD 20850 


J. Craig Venter, Ph.D., is the Founder, President and Director of The Institute for Genomic Research 
(TIGR), a not-for-profit, tax exempt basic research institute in Rockville, Maryland. Between 1984 and 
the formation of TIGR in 1992, Dr. Venter was a Section Chief, and a Lab Chief, in the National Institute 
of Neurological Disorders and Stroke at the National Institutes of Health (NIH). In 1990, Dr. Venter 
developed a new strategy for gene discovery. This called expressed sequence tags (ESTs) and has 
revolutionized the biological sciences. Over 72% of all accessions in the public database GenBank are 
ESTs from a wide range of species including humans, plants and microbes. Using the EST method Dr. 
Venter and the scientists at TIGR have discovered and published over one half of all human genes. Out 
of new algorithms developed to deal with 100,000's of sequences TIGR developed the whole genome 
shotgun method that led to TIGR completing the first three genomes in history. 

Dr. Venter recently announced that he signed a letter of intent with Perkin-Elmer for the formation of a 
new genomics company. The strategy of this company will be centered on a plan to substantially 
complete the sequencing of the human genome in three years. 

Dr. Venter has published more than 150 research articles and is currently tied with Dr. Adams of TIGR 
as the most cited scientist in biology and medicine. Dr. Venter has received numerous awards and 
honorary degrees for his pioneering work and has been elected a Fellow of the American Association 
for Microbiology and the AAAS. Dr. Venter received his Ph.D. in Physiology and Pharmacology from 
the University of California, San Diego in 1975. 

Scientific papers published include: 

Complementary DNA Sequencing: "Expressed Sequence Tags" and the Human Genome Project. Science 252 , 
1651-1656 (1991) 

Potential Virulence Determinants in Terminal Regions of Variola Smallpox Virus Genome. Nature 
366 , 748-751 (1993) 

Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 269 , 
496-512 (1995) 

Initial Assessment of Human Gene Diversity and Expression Patterns Based Upon 52 Million 
Basepairs of cDNA Sequence. Nature 377 suppl, 3-174 (1995) 

The Minimal Gene Complement of Mycoplasma genitalium. Science 270 , 397-403 (1995) 

Complete Genome Sequence of the Methanogenic Archeon, Methanococcus jannaschii. Science 273, 
1058-1073 (1996) 

The Complete Genome Sequence of the Gastric Pathogen Helicobacter pylori. Nature 388, 539-547 (1997) 

The Complete Genome Sequence of the Hyperthermophilic, Sulphate-Reducing Archaeon Archaeoglobus 
fulgidus. Nature 390, 364-370 (1997) 

Genome Sequence of the Lyme Disease Spitochaete, Borrelia burgdorferi. Nature 390 , 580-586 (1997) 

Complete Genome Sequence of Treponema pallidum, the Syphilis Spirochete, Science (submitted). 



Lilt (l»lMd: ttnSI— 






Dapanrrwru of En«rgy (Adanie) 
* £y ao J S«qLMnc» Ti^ trwn Human 
Brain lor Qenoma Mapptnsi* 



Oopartmanl ol EAargy (ReMc) 
Idwrtdjeatlon tf OertM in Anctfiymoua 



AmFAF) (Hoera) 
'Ezamlnatbn d CaOular - Viral 
Praletn Aaaocblloni* 


Jchnt Hopkdm Uruv«nUy (tracer] 
"SPOHE In GMtrolnLaftlwtal Canear' 



S P60 CAS>B24.<ie 

National &etwiD* FoundaUw) fnalds) 
Inlogrsiton at Mol«eutar Soquenc^ 
wiiti Speoiman and Tszonomlc Dare In 
a Putlkfy-Accasalbla Dalabaae* 



DeparVn»nt erf Energy (VanteO 
IHHlh-Throuphpui DNA Saquwidng 
and CharacierlZktkjn ol Dhwrsa 
MicnJbal Gttmntmt' - REVISED 
No-ctMi ailancbn ihrsugn 2/1 4/M 

1 1/15/04- DE-FC02.»SEMiae2JUXn 3.362. 52S 

2/1 a/OS 

08part>T^on1 ol &>vgf (VA"1sr) 
"Mycoplaama 9«nJiafium* 

11/16/04- DC-FCOS-OSEReteeZJUMl 100.0DO 


D*pBrtni*nl of EnofOy (Vaniar) 
*Hl9h-Thraughput DMA Sequencing 
and Ct<«ractarfxatlon of OiverM 
MIcrabial QoncMr<et* - Supplemeni to 
Y#tr 3 from abova 

11/15/04. D&PC02-96ER610eZJUK>4 1,000.000 


Def>vtm«ni al Energy (Vamw) 
'Whole Ga^wftie S«qu«nclng of 
Painoooccua raiflodurane' 

SMC/B6- OE'FCD8-«5EASieeaA0O3 2.265.476 


SmkMCIirM BMohain Pharrnacaulkuto 


■AlZhaimar% Ofaoata RasMich' 

9/21 /•$- 

DriHSlNIH (Klitiian) 
'CharactarlzBlkm of a nova) 
GABA-A raoafior utwiti; pf 


I n29 N834702-01 

'Ganaik Or^Antzaikm ol 
Trvponama PafldWh* 


1 R01 A1 403004)1 




t RavlMtf: e«nS/M 



SUNY (Rmw) 

*Phycic«l Mapping of ScNMcsoim Mvuont 





0«pa«vnem of Navy (Tgmb) 
'Genetic Regutalion ir itis 
AipiMia polNa SymbiosiK* 



N0001 4-96-1 -0604 

Tha a. HsfoM and Lata Y Malhort 
Char1labl« Foundation (Fraaor) 



DHHSMIH (Adanw) 
*S«|uanoing ot Oiromoaatrie 
iep'04/1 1/06-03^31/99 


3/11 /ag 

1 noi HG0146«^1 

DHHSjNIH (Adarr*] 
'Saquendns ol ChronMaom« 
I6p SupplsmanI 

4/01/AS- 3 R01 HG01«e*4t28l 1.44a.30S 


OHHS>N(M (Adam) 
'Saquanctng ot ChromowNno 
i6p Supplemoni 

0HHS/N1H {Floiachmann) 
*Comp4flia 0«noma S«qu«nea 
ol Uyeotuciartum luDarculotk* 
REVISED BUOQET irc. UsUru st^p 

e/10/08- 3 ROl HGai«6«-0262 452 


eM5/96* 5 ROl A1«013S'02 3.267,763 


OHHS/NIH (FMscKmann) 
*Compksf« Qanoina Saquanca 
of UycotMOarlum tubarcubsls* 
(Ann On&rool Supplam«nl) 

6/1/97- 6 ROl Al 40125-Qe 


N9F (Vemw) 

'Arabdepatt Qonorrifl eequancing 
Oslnc) Randan Shotgun Saquanclng 
ol &AC Gona'-REVISEO 

09/01/96- DBI-Oe320aS 3.669.391 


OOE rVamivr) 

'A/abidopBtc Q8nerr>« Saquancinfl 
UaIng Random Stwigun Saqiisnevig 
or BAG Oonat'-REVISED 

09/0 1 /96- 0E-FQ02-96ER20249AOO1 

Da^anmanl ol Enar^ SuboonLraot (Adamc) 
*ConainjcTlon or a ganome-wkle, 
rwghV cftB(act0rlc«d dona r«aoura« 
lor 9»noine aaquorMino.* 



OHMS/NIH (Clayton) 

'Whole G*rx>me SvquancKig erf 

VUwiQ choterM' - REVISED 


1 ROl AiaOSaVOI 

DMH8MH - (Kelchuml 
'Complata Ganoma Sequenoa ol 
E/iieraecKcua (aeaaUs ' C^OA 93.666 


1 ROl A1408e341 




Ttuff^nfr . 



m iso.eoo 

• atUo 

No-cnl «aMMlBn iftni I/SI/M 



7 KDt Haoeosr-M 

•ia«»7> 1 Ml CA770W«I 


■RMt MMqr- CRM aaias4 

•ajanatel a . Rg CTT (MIT) CMilll 


I RDI Mjnsi-oi 


iilwrwh lililiii^ pmnnmt 6t Dw 
prvlOTt ATPm* In m«vbrvw petMW 
ngiiMn- RfvSED RUOQET 



BimugM WMalM ruid dUxtiar) 
Xon^lrti nucUatdi aa^MnM d 


7/1 •>•■ 






DtWSiMM (T«k) 


a>«i«7-ot/»iw flESUtHrnec 

1/1 t/tT. DE-ngt-cTCwMu ••i.4 

•ft/*7. I RSI DEiaoa-oiAi i,»i.«$i 


OWtSMH (TWidimMt 



a/iiw*- et-f eea iT tw aoo 





Lmi RmlMd: 0C/1HM 



D«Qartm«nl of Oatons* (Ver>t«r) 
'Mslarte Qanoma S«qu«icino PrajMt* 

12/17/07- ERM5«gr222700a 

1 8/16/02 

*Cgrnputatton*l TachnlqwM for G«nom|c Analycw* 

01/01 /Qt* 

7 tcoi HQoootz-o* zoo.ota 

Mwdc GMKinw RacMfCfi IntlHutv (OUT) 

* Gcnoma An«)y«lt gf Swphyiococcia UJfma* 

(RevlMd Bud0»() 

(RevlMd auil9«l daivd oa/OO/M) 

Marck C^anome fl«««arch kwiliulv (3milh) 
'CompUstlarial An«ly«|« of Irnargwwc 
Hagb»« m Ukirobtal Ovnoms' 
( n >y k» d BudQI) 

03/01 /B8- 


04/01 '8B* 


MCm Prapaal ff73 T 1 9.000 

1 R01 AU3fi67^i 1.305,947 

ManiPra(iHrtff74 118.643 


'At trtegmad Program in MtcroUal 

Ganoma Soquandng and Afwiysh' 

02/1 6/08- 


D»«4S/NIH (Ad«JM) 

'African Tiypanetoma Gvnoina 



1 R01 AI43062-01 3.306.637 

'Svquanoa Analyvla aH PlMtnodm 
Uldparum cKrarTKMomaft v4 lOr 


1 not Ali2243.01 2,9S4.034 

D»«4a/NM tf^4Mr) 06/01/00' 

•Qmnom* AnaV^ of CMiinvdUl apadcs* 4/30/00 

Char«g«J acclpnmwil twin 1 ROi HOOITM^ 

1 ROI A»430»O1 700,711 

Tom) Fw4«d Gfaftt to Data 



TMi Hnmnv PCM a 







DO€' (Vamv) 

01/01 /«•• 




Chairman Calvert. Dr. Galas. 


Mr. Galas. Mr. Chairman and Mr. Roemer, I certainly welcome 
the opportunity to testify before the Committee concerning the fu- 
ture of a project so central to the future of, not only the biological 
sciences but the biotechnology and health care industries of the 
United States, and it is a pleasure to be here with such a distin- 
guished group. 

This is, as is evident, a critical time for this historic project and 
the attention of Congress, the private sector, and the public sector, 
and all of the scientific community, is certainly called for to ensure 
that we make the most of our opportunity here, the opportunity to 
advance the scientific foundations of these areas that are so impor- 
tant to the health nation. 

Now having worked in academia, as well as the private sector, 
I have witnessed firsthand the effect it has already had on research 
in the public and private sectors and several of the previous wit- 
nesses have cited these. It's become a cliche to call these effects 
revolutionary and I'm not going to add to any of these cliches, but 
let me just point out that in this case, almost all of these cliches 
have been quite accurate. 

So why is the Human Genome Project so important and when 
one summarize this, what is this revolution about? Well, I'd say it's 
simply about scientists, wherever they are in the life sciences, hav- 
ing the fundamental data close at hand about the information in 
the human genomes, the genes and regulatory elements, so that 
they can enable their research into fundamental disease mecha- 
nisms, diagnostics, therapeutics, and other fundamental biological 
mechanisms to an extent never seen before. 

Now this genetic information is particularly important to the pri- 
vate sector which is devoted to discovering and developing new 
therapeutic drugs, among other things. A great deal of money and 
time is now spent in publicly-supported laboratories and in private 
companies across the world acquiring genomic information, 
genomic sequence information piecemeal as it is needed. For exam- 
ple, the availability of the full sequence of the human genome, even 
a rough version thereof, this past year would have saved our small 
biotechnology company about, I estimate, about $1.5 million in di- 
rect costs and countless months of time on each of several projects. 
Our work in discovering therapeutics for autoimmune disease, 
osteoporosis, and other diseases is still a small corner of the bio- 
medical research spectrum, and so these costs to us need to be mul- 
tiplied by the relative size and number of all involved biotechnology 
and pharmaceutical companies in this country to see what the di- 
rect cost impact on biomedical research would be. Now the indirect 
costs are also great, as will be the impact on publicly funded re- 
search of all kinds. It all adds up to a very large potential savings 
and some very rough calculations that I made suggest that, per- 
haps, a year advance in the availability of this information, say in 
the next year, for purposes of argument, would probably save some- 
thing like $2 billion in funding in the private sector and, I think, 
that's quite a conservative estimate. 


So the discovery of therapeutics, of course, is not only about 
money. The savings that arise from better, more effective therapies, 
and diagnostics that come sooner to the public, and I emphasize 
the word sooner, must also be a major consideration. The need for 
widely-available public data resource containing the full com- 
plement of human sequence information has never been greater. 
The announcement by Dr. Venter and his colleagues that they are 
forming this new enterprise to generate vase amounts of human se- 
quence brings us here today and this project, I'd like to just make 
a few comments on. This is a most ambitious project, of course, re- 
quiring a large number of new things, new automated machines, 
new computational methods, new significant data production orga- 
nization, but a relatively small group. It's a difficult undertaking, 
but as you see, and as you have responded, it is galvanizing, a gal- 
vanizing prospect to the entire community. 

Now while I cannot directly assess the new technical advances 
that are cited in their announcement, to me the claims are quite 
credible and most welcome. And judging from my familiarity with 
the field, are probably within reach. The scientists involved are ex- 
perienced, serious, and careful and the prospect of doing what is 
planned is certainly within what I view as technically feasible and 
certainly not fanciful. While there will always be debates about 
how new approaches will work and about the technical details, and 
these will change, there's no question, from month to month as we 
go forward, I would say in summary that their proposal seems to 
be well-founded and plausible. 

Now, obviously, the first judgment on their success or failure is 
going to depend on, on their resolve, their resource commitment, 
and, finally, on awaiting real results, but it seems to me they have 
an excellent chance of succeeding and achieving their most impor- 
tant goals. So it is notable and very welcome in addition that the 
community effort is going to be treated to the availability of the 
vast amounts of this information as the project goes forward, ac- 
cording to their announcement. 

In reaction to that I'd say it's essential that the community and 
the leadership of the genome project take these prospects very seri- 
ously and work both to reform or restrategize about the human ge- 
nome project strategy, anticipating access to this new data, and to 
forge close links to the private sector, both sentiments have already 
been described by the leadership of the project. 

So let me just say in emphasis, I do not believe that it is sen- 
sible, however, for the federally-supported program either to con- 
tinue absolutely unchanged with the strategy currently in effect, 
nor to reduce the level of their efforts. Both of those are very im- 
portant and I think it's clear from the response so far that at least 
this general view is shared by both the DOE and the NIH. It seems 
that the prospect of the private sector sequencing effort has served 
as quite a useful stimulus to refocusing the Federal effort or at 
least having a look at the strategy. And I'm sure Dr. Olson will 
comment on some of these. In my view the, changing the strategy 
slightly will be very effective and now let me explain what I mean 
by that in very, in just a few, a few words. 

Initially, what's most important in the genome is the location 
and structure of the functional components, the genes and the con- 


trol elements. Next most important is the variations that occur in 
these components, in these component parts, and how they occur 
in the human population, and the fundamental biological effect on 
the, on individuals that carry those variations. 

Now it is going to be the research, the research work of many 
decades to understand the basic biological and health effects of 
these variations. But in achieving the initial goal, the first of these, 
getting the fundamental understanding information about the 
genes and their and their control elements, I would argue that it 
should be the first new goal of the human genome project to focus 
its attention on getting the first characterization of the genome se- 
quence as quickly as possible. It's been characterized as a first 
draft, that may be considered to be a pejorative, but I think what 
we really need is to get that information out as soon as possible 
and I think plans are under way that could well put this together. 

Now reaching this goal in conjunction with the private effort 
would enable the human genome project to succeed more rapidly 
than ever, but I think even without that, it's the right thing to do, 
to reorient towards getting a rapid release of something that some, 
some call a first draft or an intermediate draft. So this strategy, 
I think, makes a great deal of sense and let me just summarize the 
arguments that I'm putting forward for that. 

No. 1 is speed. Speed is absolutely critical to the private sector 
and the public sector. The second one is that it is a major benefit, 
every piece of new information is a major benefit to the biomedical 
research community. Third, an effective and positive response to 
the private sector proposal is also gained by adopting this sort of 
a strategy. And, finally, future technical effectiveness, I think there 
are many technical aspects of the revised strategy that stand to 
provide significant advantages for future sequencing effort once the 
details were worked through as they will be in the next few years. 

Reaching the first goal, however, should be seamless with a fol- 
low-on effort to completely fill in the sequence draft, if you will, by 
producing a very accurate, high quality, and complete reference se- 
quence of the genome. This final project of the human genome pro- 
gram will then become the single most important database of 
human biology, the complete sequence of our genetic heritage. 

Rather than being redundant, the federal program is more rel- 
evant than ever, since federal support should now be able to 
achieve more per dollar spent, and produce a project quite different 
from what can be expected from the private effort, if the private 
effort succeeds. I would suggest that more resources should be de- 
voted to the sequencing effort now because the project offers re- 
turns soon and the impact of early acquisition of the information 
will be well worth it. 

The prospect before us of a highly-cooperative effort between 
public and private sectors is one that I think we should seize en- 
thusiastically. Now the federal program appears to be already re- 
sponding with renewed resolve to this opportunity by rethinking 
the strategies and there's been a lot of effort, I know, expended on 
discussing plans for sequencing programs. I applaud this resolve 
and I expect the genome community at large, both public and pri- 
vate will recognize the critical nature of this moment and seize the 
opportunity to make the most of it. 


This completes my prepared remarks and I'd be happy to answer 
any questions. 
[The prepared statement and attachments of Mr. Galas follow:] 


Dr. David J. Galas 

President and Chief Scientific Officer 
Chiroscience R& D Inc. 

1631 220th Street SE 
BotheU Washington, 98021 





17 JUNE 1998 


Galas, 10 June 1998 
Mr. Chairman and Members of the Committee: 

I welcome the opportunity to testify before the committee concerning the future of a project 
so central to the future of medicine, the biological sciences and the biotechnology and 
health care industries of the United States, the Human Genome Project ("HGP"). This is a 
critical time in the progress of this historic project and the attention of the congress, the 
private sector and all the scientific community is called for to insure that we make the most 
of this opportunity to advance the fundamental scientific foundations of these areas so 
important to the health of our nation. 

I will present here my views on the strategic issues confronting the broader community 
directly concerned with the project and explain why the impact on the public and private 
sectors will be so fundamental. I am the President and Chief Scientific Officer of a small 
biotechnology company in Seattle, Washington. Having worked in academia, as well as 
the private sector, I have participated in the revolutionary changes in the biomedical 
sciences engendered by the explosive accumulation of genetic data and of DNA sequence 
information, and have wimessed, first hand, the effect it has already had on the conduct of 
research in the pubhc and private sectors. It has become almost a clich6 to call these 
effects revolutionary, but in this case the cliche is accurate. I have served in government, 
and I am proud to have been in the position of responsibiUty in DOE now occupied by Dr. 
Patrinos at the official launch of the Human Genome Project in 1990 by DOE and NIH. 

Why is the HGP so important and what is this revolution about? It is simply about 
scientists and researchers having close at hand the fundamental data about the layout and 
information content of all the human genome, genes and regulatory elements. This enables 
research into fundamental disease mechanisms, diagnostics and therapeutics to an extent 
never seen before. Therefore, this genetic information is particularly important to the 


Galas, 10 June 1998 

private sector devoted to discovering and developing new therapeutic drugs. A great deal 
of money and time is now spent in private companies across the world acquiring genomic 
information piecemeal, as it is needed. For example, the availabihty of the full sequence of 
the human genome this past year would have saved our small biotechnology company $1.5 
million alone in research costs directly expended on sequencing new regions of the genome 
and countless months of time on each of several projects. Our work towards discovering 
therapeutics for autoimmune disease, osteoporosis and other diseases is still a small comer 
of the biomedical research spectrum. These costs to us need to be multiplied by the relative 
size and number of all the involved biotechnology and pharmaceutical companies in this 
country to see the direct cost impact on biomedical research - the indirect effects will also 
be numerous and impressive. It adds up to a very large potential savings, and all of these 
needs will continue to increase as research advances. In addition, the biomedical research 
funded by the federal government will also be enabled and accelerated by this information. 
Therefore, the cost savings to the public and private sectors, in time and money alone, will 
be enormous. However, the discovery of new therapeutics is not only about money. 
Savings of another kind, that which arises from better, more effective therapies and 
diagnostics coming sooner to the public, must also be a major consideration. The need for 
a widely available, public data resource containing the full con^lement of human sequence 
information has never been greater. 

What brings us here today is the announcement by Dr. Venter and his colleagues (PE- 
TIGR) that they are forming a new enterprise to generate vast amounts of sequence data on 
the human genome in a few short years. This is a most ambitious project, requiring a large 
number of new automated machines, new computational methods, a significant data 
production organization and new infrastructure. It is a galvanizing prospect to the entire 
community. While I am not in a position directly to assess the new technical advances that 
are cited in their announcement, the claims are both credible in detail and most welcome and 


Galas, 10 June 1998 

judging from my familiarity within the field, are probably well within reach. The scientists 
involved are experienced, serious and careful and the prospect of doing what is planned is 
certainly within what I view as technically feasible and certainly not fanciful. While there 
will always be debates about whether and how new approaches will work and about the 
technical details, their proposal appears to be well founded and plausible. Final judgment 
on their success or failure will depend on the resolve and resource commitment of the 
principals and must, of course, await the first real results, but it seems likely to me that they 
stand a good chance of succeeding in achieving their most important stated goals. It is 
notable and very welcome to the entire community that the PE-TIGR effort has made 
commitment to sharing sequence data with the public HGP. 

It is essential that the community and the leadership of the genome project take these 
prospects very seriously and work both to reform the HGP's strategy anticipating access to 
this new data and to forge close links to the private sector effort. As I will argue below, I 
do not believe that it is sensible for the federally supported project either to continue 
unchanged with the strategy currently in effect, or to reduce the level of their efforts. I 
think it is clear from the response thus far that this general view is shared by the DOE and 
NIH alike. They appear to be responding with an eminently sensible attempt at revision of 
the strategy for sequencing and a commitment to take advantage of whatever new 
sequencing c£^acity and data release comes from the private effort. It seems that the 
prospect of the private sector sequencing effort has served as a beneficial stimulus to 
refocus the federal effort on a strategy that will, in my view, maximize the effectiveness of 
the project whether or not the private effort reaches their stated goals. If they do reach 
these goals the strategy will greatly advance the rate of accumulation of useful data and 
hasten the day of the first completion of the sequence of the human genome. 


Galas, 10 June 1998 

Initially, what is most important in the genome is the location and structure of the fiinctional 
components - the genes and their control elements. Next important is the variations that 
occur in these component parts in the human population and the fundamental biological 
effects on the individuals that carry these variations. It is these variations that make each 
of us distinct in our good health and strengths, and our susceptibility to disease and ill 
health. It will be the research work of many decades to understand the extent and the 
basic biological and health effects of these variations - this work will be a large part of the 
future of medical research. 

The initial goal of the HGP sequencing effort is to provide the initial blueprint, the basic 
sequence, not the myriad of sequence variations. While many basic researchers and 
companies alike, us included, are focused on detecting and understanding consequences of 
these many small variations in the human genome, called single nucleotide polymorphisms 
or SNPs, we aU need the initial sequence to progress this next wave of biomedical 
research. Therefore, I argue that it should be the essential primary goal of the HGP to 
focus its attention on how to arrive at the first initial characterization of the genome 
sequence as quickly as possible, whether or not the private effort contributes in the long 
run to reaching this goal. Reaching this goal in conjunction with the private effort, 
however, would enable the HGP to succeed more rapidly than ever, but even without the 
impetus of the prospect of the private effort the HGP should be re-oriented to this primary 
goal - to obtain an initial "first draft" of the human genome as soon as possible. Even a 
rough "first draft" would be absolutely invaluable to the broad biomedical community. It 
appears that the prospect that brings us here today has galvanized the HGP into considering 
a strategy like this in any case and one that could, with public-private cooperation, lead to a 
much more rapid achievement of this initial goal. This strategy makes sense. 

To summarize the arguments for a refocused HGP strategy: 


Galas, 10 June 1998 

1 . Speed. The critical information will be available sooner, probably 95% within 3 

2 . A major benefit to biomedical research. The benefits of locating genes and control 
elements sooner will substantially advance all biomedical research sectors. 

3 . An effective and positive response to the PE-TIGR proposal. Refocus of the HGP 
strategy takes advantage of the opportunity to leverage the private sector investment into a 
valuable public resource. 

4 . Future technical effectiveness. There arc many technical arguments for the revised 
strategy that stand to provide advantages for future sequencing efforts once the details are 
worked through. 

The achievement of the initial goal of a "first draft" should in no way mark the end of the 
project. It is important that the reaching of the first goal be seamless with a continuing, 
follow-on effort to complete the sequence "draft" by producing a very accurate, high- 
quality, complete reference sequence of the genome. Finishing this final product is just as 
important as the initial goal and will be easier and less expensive than it is now. This final 
product of the HGP will then become the single most important database of human 
biology, the complete sequence of our genetic heritage. 

Rather than being redundant, the federal HGP is more relevant than ever, since federal 
support should now be able to achieve more per dollar spent, and produce a product quite 
different from what can be expected from the private effort. I suggest that the early 
prospect of completion that arises from the private proposal should be met with increased 
funding for the federal project, subject to successful completion of the new planning effort 
that is underway. The changes should not, however, end there. The prospect before us of 
a strong, highly cooperative effort between the public and private sectors is one that we 
should seize enthusiastically. Public-private sector cooperation too often is afflicted with 


Galas. 10 June 1998 

bureaucratic viscosity, management difficulties and basic problems in reaching the stated 
goals. To my view, this opportunity appears to be one that will lend itself well to avoiding 
these pitfalls. The benefits to both sides and to the pubUc at large, of a successful endeavor 
are indeed great and the commitments and progress will be visible and accountable in large 
measure by both sides. 

The federal program appears to be already responding with renewed resolve to this 
opportunity by rethinking the strategy and replanning the sequencing programs and I expect 
the genome community at large, both public and private, will recognize the critical nature of 
this moment and seize the opportunity to make the most of it. 

This completes my prepared testimony. I would be happy to answer any questions. 


David J. Galas, Ph.D. 

David J. Galas, Ph.D. is currently Executive Director of Darwin Discovery, 
Chiroscience Group pic. and is also President and Chief Scientific Officer at 
Chiroscience R&D, hic, formerly Darwin Molecular Corporation. Before 
joining Darwin in August of 1993, Dr. Galas served as Director for Health and 
Environmental Research, U.S. Department of Energy, where he had 
responsibility for the Human Genome Project and all biological and 
environmental research. He assumed his position with the Department of 
Energy in 1990 after nine years on the faculty of the University of Southern 
California, Department of Biological Sciences as Professor of Molecular 
Biology. Dr. Galas also held positions at the University of California - 
Lawrence Livermore National Laboratory and the University of Geneva 
(Switzerland) before joining USC in 1981. 

Dr. Galas has been a member of many federal and academic advisory groups 
including the National Biotechnology Policy Board, the National Cancer 
Advisory Board, and the National Academy of Science Research Council 
Board on Biology. He chaired the biotechnology Research Subcommittee of 
the Federal Coordinating Council on Science and Technology. 

His research interests have included the study of the transposition of genetic 
elements and their consequences, and the study of DNA-protein interactions. 
He has developed several techniques used in molecular biology research, 
including the widely used DNA "footprinting" technique, a method often 
used to define DNA sequence-specific sites for DNA-binding proteins 
controlling gene expression for DNA replication and recombination. He has a 
long-term interest in the development of interdisciplinary research in the 
biological sciences and the applications of diverse technologies to biological 

Dr. Galas received his B.A. in Physics at the University of California at 
Berkeley in 1967 followed by a Master's degree and a Ph.D. in Physics from the 
University of California, Davis-Livermore in 1968 emd 1972 respectively. 



Dtvid J. Galst, Ph.D 


ChUf Scitiuipc Officer 

June 15, 1998 


Mr. Ken Calvert 


Subcommittee on Eiwrgy and Environment 

Suite4 2320 Raybum House Office Building 

Washington, DC 20515 

Dear Mr. Calvert 

In refe r ence to your letter dated Jvine 10, 1998 please be advised that Chirosdence 
R&D, Inc. has not received federal fimding during the current and two preceding 
fiscal years relating to this testimony. 

David J. Galas 

Chlrosd«KS R&D. IIK 1631 22tth Streei SE. BoCMI. WA 98021. USA 
Td 4» 419 aooo fat. 4]) 'M Mil 


Chairman Calvert. Thank you, Doctor. 
Doctor Olson. 


Mr. Olson. Thank you, Mr. Chairman. I'm here to provide the 
perspective of an academic researcher who has been involved in 
what is now called genome analysis for over 20 years. Indeed, my 
involvement dates to a time when the term genome was rarely 
used, even in scientific circles, and had yet to have any impact 
whatsoever on public discourse. Since then, of course, the times 
have changed as this hearing and the intensive press coverage of 
the Perkin-Elmer announcement indicate. They've changed, per- 
haps, foremost because the singular historical opportunity that we 
now face to unravel the molecular details of how the information 
is stored and what the information is that glides the trans- 
formation of a fertilized egg into a fully-developed human being has 
caught both the popular and the scientific imagination. 

More practically, and, perhaps, more forcefully in the short run, 
times have changed as the immediate value of the data produced 
by genome analysis has become evident, particularly the value of 
DNA sequence data. These data have a high scientific value and 
also a high value in dollars, yen, and Euros. Thus, entering a major 
participation of the commercial, injecting a major participation of 
the commercial sector into what had previously been predomi- 
nantly a basic science initiative. 

Congress now faces a new challenge of understanding and re- 
sponding to a scientific environment in the human genome project 
that has all of chaos that comes with scientific and policy success. 
My basic message in this turbulent environment if quite system 
and that is that the system is working. It is important to keep in 
mind that biomedical research in the United States derives its for- 
midable strength from the synergy between three sectors, the bio- 
technology industry, the more traditional pharmaceutical industry, 
and academic and publicly-supported research. All of these sectors 
are scrambling in their own ways to adjust to our sudden ability 
to produce DNA sequence on a large scale. In this context the 
Perkin-Elmer announcement is a bold example of the response of 
the biotech sector to these opportunities. 

Perkin-Elmer is adopting here an overtly biotech style of oper- 
ation despite its roots as a manufacturer of scientific instruments 
and reagents. It's a hallmark of the biotech style that time is of the 
essence and publicity is a key tool for influencing events. Those of 
who are watching this spectacle from the sidelines should certainly 
wish Perkin-Elmer well. The company's investment will surely lead 
to faster testing of new reagents and instrumentation and also will 
produce much data that will be of both commercial value and basic 
scientific interest. 

However, the excitement generated by the well-orchestrated pub- 
lic relations campaign surrounding this announcement should not 
disguise that what we have at the moment is neither new tech- 
nology nor even new scientific activity. What we have is a press re- 


lease. And I believe when I speak for many academic spectators 
when I say I look forward to a transition from plans to reality. In 
short, show me the data. 

I cannot emphasize too strongly that science by press release 
and, worse yet, science policy by press release is not a path that 
the United States Congress or the federal agencies wants to walk 
down. I believe that the overwhelming risk for the publicly-funded 
program is one of overreaction. What the Perkin-Elmer initiative 
offers with the greatest probably is that the immediate needs of the 
biological community during a period of a few years, roughly in the 
interval 2000 to 2003 may be better met than would otherwise 
have been the case. And I hope that the project is successful and 
that the data are sufficiently accessible to the scientific community 
that this promise is met. 

However, in the larger scheme of the Human Genome Project, we 
would all be unwise to focus on so transient the contribution. The 
case for the transience of these data's value lies in one's assess- 
ment, in advance, of any real basis to make such a judgment of the 
likely quality of the final product, as has mentioned repeatedly by 
others at the table and will be a subject of intensive technical dis- 
cussion for some years to come. 

I, frankly, am a skeptic that the approaches as publicly described 
will lead to a product of sufficient quality to meet the long-term 
needs of the scientific community. I'm prepared to be proven wrong, 
as any scientist must be, but I am comfortable predicting that this 
approach, as the downside of its efficiency, will encounter reason- 
ably catastrophic problems at the stage of which the tens of mil- 
lions of independent sequencing tracks need to be melded together 
to produce a composite view of the human genome. 

To be specific, I'm comfortable predicting that there will be over 
100,000 serious gaps in the final product and in this context, I de- 
fine a serious gap as one in which there is uncertainty even as how 
one should orient and align the islands of assembled sequence be- 
tween the gaps. Furthermore, I'll predict that a substantial frac- 
tion, particularly the smaller islands of sequence of produce will be 
misassembled, that is they will not actually correspond to the orga- 
nization of the human genome and I say these things being thor- 
oughly familiar, and admiring, TIGR's success in sequencing bac- 
terial genomes by what superficially would appear to be a similar 

I want to emphasize that even such data will certainly have con- 
siderable biological utility and it may prove to be a major help in 
the final push toward a high quality human sequence, although I 
would also emphasize that this prospect is somewhat less certain. 
Experience has tended to show that large amounts of low-quality 
sequence data are a poor substitute for smaller amounts of high- 
quality data collected for the specific purpose of assembling a con- 
tiguous, accurate sequence which I believe should continue to be, 
with a minimum of distractions, the focus of the publicly-funded ef- 

Clearly, as time develops, if data from this private initiative 
proves to be of clear utility in achieving that publicly-financed goal, 
other strategies should, and will, adapt. I want to emphasize that 
there are two reasons to aim high in terms of the quality of the 


final human sequence. And, frankly, I am much more concerned 
about the force of these arguments than I am about the oppor- 
tunity costs, although I acknowledge there will be opportunity 
costs, associated with relatively transient delays in the availability 
of the final product. 

The two reasons have to do, first, with deferred costs as a prac- 
tical reason. A human sequence that has many deficiencies will 
defer for decades to come, throughout the biomedical research en- 
terprise, the need to fix small problems as they are encountered by 
individual investigators. The other argument is perhaps even more 
important in taking the broad view of public policy in this matter. 
And that is that all of us, as we build the total package of activity 
in the public sector, the private sector, throughout biomedical and 
agricultural product research, we need, collectively to achieve an 
extremely high standard in human genetics. We should start with 
an extremely high scientific standard and not waver in our commit- 
ment to that goal. 

The human genome sequence is part of that commitment. A more 
important part, built upon it, will be our study of human variation 
and the biological consequences of that variation. 

So, I have some additional comments in my written records, but 
I hope, for the purposes of this hearing, that the Congressional 
message to the federal agencies responsible in this area will be that 
you are proud of your institution's role in initiating this project and 
look forward, as I do, to the production of a sequence that is freely 
accessible to all sciences, delivered on schedule, and of impeccable 

Thank you. 

[The prepared statement and attachments of Mr. Olson follow:] 


Testimony of Maynard V. Olson before the House Committee on Science, Subcommittee 
on Energy and Environment, scheduled for June 1 7, 1 998 

The Human Genome Project has come a long ways since its fragile beginnings a decade 
ago In its early years, the proposal to develop a complete DNA sequence of the human 
genetic material often seemed an idea ahead of its time: the project's feasibility could 
reasonably be questioned, there was little support amongst rank-and-file biologists, and 
the pharmaceutical and agricultural-products industries were disengaged Now, residual 
technical arguments involve minor squabbles between experts, basic and applied 
biological research is reorganizing itself around the assumption that complete genome 
sequences will soon be available for all intensively studied organisms, and the 
commercial sector has emerged as a major player in large-scale genome analysis. Indeed, 
we not only now have a vigorous biotech industry — in which the United States is the 
undisputed world leader — but a whole tier of "genomics" companies created to meet the 
insatiable demand for specialized data about genomes that has arisen throughout the 
biotechnology, pharmaceutical and agricultural-products industries. 

It is worth reflecting briefly on the reasons for this success. First, there are the scientific 
fundamentals. We have only known for a few decades that all life is based on digital 
information — the "base-four" code of DNA sequence that is now featured even on movie 
marquees (as in the movie title "GATTACA," which is simply a short bit of DNA 
sequence expressed with the four standard symbols G, A, T, and C). The information 
present in a human sperm or egg cell is encoded in 3 billion G's, A's, T's, and C's. Thus, 
the total information content of the human genome is only 750 Megabytes — about the 
capacity of a compact disc — an awe-inspiring level of data compression. 

Although the challenge of interpreting the human sequence will remain a central 
preoccupation of science for centuries to come, available sequence data already yield rich 
dividends. Most profoundly, computer-based methods of sequence comparison 
frequently allow detection of functionally informative similarities between genes 
discovered in different organisms. This feature of DNA sequences allows biologists 
studying human diseases to infer important lessons about the molecular basis of these 
pathological processes through gene-to-gene comparisons with the richly informative 
data already available about the genes of "model" organisms such as yeast and fruit flies. 

A former member of this institution. Rep. Claude Pepper, deserves great credit for having 
recognized that biological research needed to be led aggressively into the information 
age. His support for establishment of the National Center for Biotechnology Information 
at the National Library of Medicine is one of the great success stories of proactive 
involvement by the Congress in the building of research infrastructure. The Wold Wide 
Web site of the NCBI, on which DNA-sequence comparisons are the central activity, has 
become a major epicenter of biological research. 

As the NCBI story illustrates, the present success of genome analysis has roots in policy 
as well as science. In the Human Genome Project, Congress was actually ahead of the 
majority of scientists in recognizing that it was time to move boldly to create an 


information-based future for bioitiedical research. The establishment of the Human 
Genome Project, which led in a few years to the creation one of the NIH's most dynamic 
and forward-looking Institutes, the National Human Genome Research Institute, was the 
work of a relatively small group of committed scientists and federal officials, who 
brought a strong case to Congress and received an equally strong response This 
response was all the more impressive given the draconian budgetary constraints that had 
to be overcome to bring the Human Genome Project into existence. 

The Congress now faces the new challenge of understanding and responding to a 
scientific environment in the Human Genome Project that has all the roiling chaos that 
comes with scientific and policy success. My basic message in this turbulant 
environment is quite simple: the system is working. 

Biomedical research in the United States derives its formidable strength from the synergy 
between three dynamic sectors: academic research, the biotechnology industry, and the 
pharmaceutical industry Academic research, with its reliance on federal ftinding and the 
stewardship of a highly evolved resource-allocation system administered by the NIH and 
other federal agencies, is clearly "the goose that laid the golden egg." The 
pharmaceutical industry provides a powerful engine for translating new research into 
safe-and-effective products As the pace of biological research has accelerated following 
the development of recombinant-DNA techniques and the introduction of other new 
research tools, a whole industry — the increasingly important biotech sector — has arisen 
to respond rapidly to new commercial opportunities. This sector is characteristically 
quicker on its feet and more willing to take large business risks than the pharmaceutical 
industry. Time will tell whether the pharmaceutical and biotech sectors ultimately merge 
or retain their currently distinct identities. 

The present landscape in the Human Genome Project illustrates well the operation of all 
three sectors. The academic sector is focused on the creation of a high-quality reference 
sequence of the human genome, presently targeted for completion in 2005. This still- 
ambitious goal is defined in terms of rigorous quality-control standards enforced through 
a vigorous process of peer-reviewed scientific performance and peer-assessment of data 
quality. The academic sector is also responsible for the critical task of training a growing 
cohort of young scientists who can lead genome analysis into its open-ended fliture. 
Similarly, academic research is the incubator in which new technical approaches and new 
applications of genome analysis to biology are under development. 

Increasingly, the pharmaceutical industry is redirecting long-term drug-discovery 
programs to exploit the new opportunities provided by an avalanche of sequence data, 
data that are leading daily to the discovery of new genes, new proteins, and new 
fijnctional dimensions to life processes. In addition to its primary reliance on public- 
domain sequence data, the pharmaceutical industry is building in-house data-collection 
capabilities — and even more dramatically — pursuing such data through a host of 
contracts, partnerships, and other relationships with biotech and genomics companies. 


It is against this background that the recently announced Perkin Elmer initiative to 
accumulate a large database of DNA sequences sampled directly from the human genome 
should be viewed. Although traditionally a manufacturer of scientific instruments and 
research reagents, Perkin Elmer is adopting, in this venture, an overtly "biotech" style of 
operation. The business risks are considerable since it remains unclear how the company 
will recover its substantial investment. Furthermore, as is a hallmark of biotech research, 
time is of the essence and publicity is a key tool for influencing events. Those of us who 
are watching this spectacle from the sidelines (i e , as neither participants nor 
competitors) should wish Perkin Elmer well. The company's investment will surely 
stimulate rapid reduction-to-practice of new reagents and instrumentation and will also 
produce much data that will be both of commercial value and basic scientific interest. 
However, the excitement generated by the well-orchestrated public-relations campaign 
surrounding the Perkin Elmer announcement should not disguise that what we have at the 
moment is neither new technology nor even new scientific activity: what we have is a 
press release I believe that I speak for many academic spectators when I say that I look 
forward to a transition from plans to reality. In short, "Show me the data." 

The risk here for the publicly funded program is one of overreaction. What the Perkin 
Elmer initiative offers is the possibility that the immediate needs of the biological 
community during a period of 2-3 years, roughly in the interval 2000-2003, may be 
better met than would otherwise have been the case. I hope that the project is successfijl 
and that the data are sufficiently accessible to the scientific community that this promise 
is met. However, in the larger scheme of the Human Genome Project, we would all be 
unwise to focus on so transient a contribution. 

The case for the transience of these data's value lies in the likelihood that they will be of 
poor quality. While I am prepared to be proven wrong, as any scientist must be, I am 
equally prepared to put my reputation as a scientific prognosticator on the line in 
predicting that the Perkin Elmer initiative will fail to produce a sequence of the human 
genome that will meet the long-term needs of the scientific community. Specifically, I 
predict that the proposed technical strategy for sampling human DNA sequences will 
encounter catastrophic problems at the stage at which the tens of millions of individual 
tracts of DNA sequence must be assembled into a composite view of the human genome. 
Based on extensive experience with the assembly of composite human DNA sequences in 
our genome center and other laboratories, I predict that there will be over 100,000 
"serious" gaps in the assembled sequence: a "serious" gap. in this context, is one in 
which there is uncertainty even as to how to orient and align the islands of assembled 
sequence between the gaps. Furthermore, I predict that a significant fraction of the small 
islands between serious gaps will be misassembled (i.e., they will not actually correspond 
to the organization of the human genome). 

Even such fragmentary data will certainly have considerable biological utility 
Furthermore, it may prove to be a substantial help in the final push toward a high-quality 
human sequence, although this prospect is less certain. Experience has tended to show 
that large amounts of low-quality sequence data are a poor substitute for smaller amounts 


of high-quality data collected for the specific purpose of assembling a contiguous, 
accurate sequence. 

It is of the utmost importance that a vigorous public effort be maintained that is directed 
toward the development of a sequence that will meet the test of time. There are two 
compelling rationales for aiming high in terms of the quality of this sequence. In 
practical terms, any other approach will defer large costs, diffusing them across the 
biomedical research enterprise for decades to come as individual investigators are left to 
complete and correct the reference sequence in regions of the genome where the data are 
inadequate to meet their particular needs. Perhaps still more important is the need to set a 
high standard in all aspects of human genetics, starting with an unwavering commitment 
to quality in the Human Genome Project's flagship mission. Although I have confidence 
that the spectacular advances we are currently witnessing in human genetics will lead to 
great public benefit, I do not share the view — expressed in some quarters — -that the speed 
of generating data must take precedence over all other considerations. An element of 
caution in developing this first comprehensive view of the human genetic material is 
advisable. High scientific standards tend to be infectious. I would hke the legacy of my 
involvement in the Human Genome Project to be a product that will not only facilitate the 
research of future scientists but will also inspire them to set a similarly high scientific 
standard as they interpret the sequence and study its variation from one human to another 
and the effects of that variation on human biology. 

For its part in bringing about this fijture, I would advise Congress to wait and watch 
rather than to attempt to provide detailed guidance to the involved agencies. At root, 
many of the issues are deeply technical and Congress is the wrong forum in which to 
debate the relative merits of capillary-gel electrophoresis vs. slab-gel electrophoresis, 
whole-genome "shotgun" sampling vs. a clone-by-clone approach, and so forth. The 
agencies need a more general sense of how Congress views the public benefit associated 
with the Human Genome Project. I hope that the Congressional message will be that you 
are proud of your institution's role in initiating this project and look forward, as I do, to 
the production of a sequence that is freely accessible to all scientists, delivered on 
schedule, and of impeccable quality. 

1 would like to close by identifying three areas of concern that I do think bear ftirther 
scrutiny by appropnate Congressional processes. First, I think there is a strong case for 
increased funding for the National Human Genome Research Institute, although my 
argument for increased funding would differ from that of many of my colleagues. I 
believe that the current NHGRI budget is actually adequate, in combination with funding 
through other channels, to produce a quality human sequence by 2005. Given the large 
technical uncertainties, I think the National Research Council Committee on the Mapping 
and Sequencing of the Human Genome, on which I had the honor of serving, did a good 
job of projecting the cost of the Human Genome Project. Indeed, it also did a good job of 
estimating the time required to complete the project. I doubt that the current schedule 
could be much accelerated without encountering human-resource bottlenecks that would 
be difficult to overcome. 


However, I am concerned that without expanded funding, the peak phase of data 
production for human sequencing, will drain other valuable activities at the NHGRI. The 
^fRC Committee did not fiilly envision the rapidity with which genome analysis would 
open up new opportunities in biological research. Indeed, the Perkin Elmer proposal is 
but one symptom of the magnitude and immediacy of these opportunities. While moving 
ahead toward its flagship goal of producing a quality human sequence, the NHGRI also 
faces increasing responsibilities to identify and stimulate research avenues opened by the 
early successes of the Human Genome Project These opportunities include development 
of new technology, improved computational methods for analyzing DNA sequence, 
approaches to the comprehensive functional analysis of genomes, and — perhaps most 
profoundly — characterization of natural variation in human DNA. In my view, the 
strongest case for increased NHGRI funding lies in its excellent track record and the 
continuing expansion of research opportunities in areas that go beyond the Institute's core 
mission but which provide critical links between the emerging human sequence and the 
rest of biological research. 

Two other issues, which are illustrated by, but not narrowly related to, the Perkin Elmer 
initiative bear Congressional attention. The most important concerns the influence of 
intellectual-property law on the research enterprise. Particularly in areas where the 
interests of the three major sectors of biomedical research — academe, the pharmaceutical 
industry, and the biotechnology industry — diverge, there are increasing signs of trouble 
The pharmaceutical industry has legitimate concerns that it has become too easy for 
biotechnology companies to acquire valuable intellectual-property rights through cream- 
skimming research investments. Continuation of the current system risks the 
accumulation of disincentives for drug development in certain areas or, alternately, 
diversion of the attention of pharmaceutical companies into purely defensive acquisition 
of its own tenuous intellectual-property claims Academic research faces other concerns. 
Foremost amongst these are situations in which the conduct of basic research in the non- 
profit sector — the very research on which our current success rests — is distorted by 
conflicts over intellectual property and access to data. In the worst cases, commercial 
owners of intellectual property are using their property to attempt to impede research in 
the non-profit sector when they do not see that research as compatible with their short- 
term interests. 

A more direct warning posed by the Perkin Elmer initiative is that academic researchers 
risk losing equal access to critical research tools. These tools, such as advanced 
instrumentation for DNA analysis, are increasingly seen as a means through which their 
developers can acquire intellectual property rather than as products in their own right 
Perhaps if the microscope were a contemporary invention, we would find optical 
companies competing to sell images rather than microscopes. Basic scientists need 
access to state-of-the-art research tools, not just to the output of these tools However, 
the tools themselves are now universally refined, manufactured, and marketed by private 
companies rather than by basic researchers themselves. Hence, tool-making companies 
are in a powerful position to influence the directions that basic research takes and the 
distribution of that research between the non-profit and for-profit sectors. 
Instrumentation provides one simple illustration of this dynamic; however, even mor6 


problematic situations arise in areas such as reagents, analytical processes, and reference 
databases There are no simple answers to the resultant dilemmas, but the pubUc interest 
in keeping basic researchers well equipped to do their work is clear. The United States is 
the world leader in an area that is central to the human future — biomedical and 
agricultural research — and it has gained this enviable position by coupling the world's 
strongest system of research universities to an aggressive commercial sector. Effort 
expended fine-tuning the relationship between these parties will be effort well spent. 







California Institute of Technology, Pasadena, CA; June, 1965; major field, 


Stanford University, Stanford, CA; January, 1970; major field, inorganic 

chemistry; thesis advisor, Henry Taube; thesis title: 1 Studies with maleate 

as a ligand; II. '^O magnetic resonance of aqueous solutions of V(n) and 


Awards and Honors 

Undergraduate: Graduated fi^om Caltech with Honors ( 1 965) 

Graduate: National Science Foundation Graduate Fellowship 

Postdoctoral: National Institutes of Health Individual Postdoctoral 

FeUowship (1977-1979) 
Professional: Genetics Society of America Medal ( 1 992) 

Fellow of the American Association for the Advancement of 

Science (1993) 
National Academy of Sciences (1994) 

Positions Held 

9/69 - 1/76 Instructor and Assistant Professor, Department of Chemistry, 

Dartmouth College, Hanover, New Hampshire 
9/74 - 8/75 Visiting Scholar, Department of Genetics, University of 

Washington, Seattle, Washington 
2/76 - 7/79 Research Associate, Department of Genetics, University of 

Washington, Seattle, Washington 
8/79 - 8/92 Assistant Professor, Associate Professor, and Professor of Genetics, 

Washington University School of Medicine, St. Louis, Missouri 
10/89 - 8/92 Investigator, Howard Hughes Medical Institute at Washington 

University, St. Louis, Missouri 
9/92 -9/97 Professor of Molecular Biotechnology, University of 

Washington, Seattle, Washington 
9/92- Professor of Medicine (Division of Medical Genetics), University of 

Washington, Seattle, Washington 
8/96- Adjunct Professor of Computer Science, University of Washington, 

Seattle, Washington 
7/97- Professor of Genetics, University of Washington, Seattle, Washington 


Professional Service 

(1987 - 1988): National Research Council Committee on Mapping and Sequencing 

the Human Genome 
(1987 - 1988): Genetics Study Section of the National Institutes of Health 
(1989 - 1992): Program Advisory Committee on the Human Genome of the 

National Institutes of Health 
(1994- ): National Research Council Govemment-University-Industry 

Research Roundtable Council 
(1994- ): Chairman, Genome Research Review Committee of the National 

Human Genome Research Institute, National Institutes of Health 

Society Memberships 

American Association for the Advancement of Science 
Genetics Society of America 

Editorial Boards 

Genome Research 
Human Genetics 

Publications (Research Papers) 

Olson, M.V., Kanazawa, Y , and Taube, H. (1969) '^O magnetic resonance of aqueous 
solutions of vanadium(ir) and chromium(III). J. Chem. Phys. 51: 289-296. 

Olson, M V and Taube, H. (1970). Hydration and isomerization of coordinated maleate. J. 
Amer. Chem. Soc. 92: 3236-3237. 

Olson, M.V. and Taube, H. (1970). The chromium(II) reduction of maleatopentaammine- 
cobalt(III) Inorg. Chem 9: 2072-2081. 

Olson, M.V. (1973). Reaction between ethylenediaminetetraacetic acid and carboxylato- 
pentaaquochroniium(III) complexes. /«org^. Chem 12: 1416-1423. 

Olson, M.V. and Behnke, C.E. (1974). Kinetics of the spontaneous ring-clos'.ng and aquation 
reactions of malonatopentaaquochromium(ni). Inorg. Chem. 13: 1329-1334. 

Goodman, H.M., Olson, M.V., and Hall, B.D. (1977). Nucleotide sequence of a mutant 
eukaryotic gene: the yeast tyrosine-inserting ochre suppressor SUP4-0 Proc. Natl. Acad Set. 
USA 74: 5453-5457. 


Olson, M v., Montgomery, D L , Hopper, A K , Page, G S , Horodyski, F , and Hall, B.D 
(1977). Molecular characterisation of the tyrosine tRNA genes of yeast Nature 267: 639-641 . 

Olson, M.V , Hall, B.D , Cameron, J R , and Davis, R.W. (1979). Cloning of the yeast tyrosine 
transfer RNA genes in bacteriophage lambda J. Mol. Biol 127: 285-295. 

De Robertis, EM. and Olson, M.V. (1979). Transcription and processing of cloned yeast 
tyrosine tRNA genes microinjected into frog oocytes. Nature 278: 1 37-143. 

Olson, M.V., Loughney, K., and Hall, B.D. (1979). Identification of the yeast DNA sequences 
that correspond to specific tyrosine-inserting nonsense suppressor loci. J. Mol. Biol. 132: 

Olson, M.V , Page, G.S., Sentenac, A., Piper, P W , Worthington, M., Weiss, R.B , and Hall, 
B.D. (1981). Only one of two closely related yeast suppressor tRNA genes contains an 
intervening sequence. Nature 291: 464-469. 

Shalit, P., Loughney, K., Olson, M V., and Hall, B.D. (1981) Physical analysis of the 
CYCl-sup4 interval in Saccharomyces cerevisiae. Mol Cell Biol 1: llZ-llib. 

Sandmeyer, SB. and Olson, M V. (1982). Insertion of a repetitive element at the same position 
in the 5'-flanking regions of tw^o dissimilar yeast tRNA genes. Proc. Natl Acad Sci. USA 79: 

Brodeur, G M , Sandmeyer, SB, and Olson, M.V. (1983) Consistent association between 
Sigma elements and tRNA genes in yeast. Proc. Natl Acad. Sci. USA 80: 3292-3296. 

Carle, G.F and Olson, M V (1984). Separation of chromosomal DNA molecules from yeast 
by orthogonal-field-altemation gel electrophoresis. Nucleic Acids Res. 12: 5647-5664. 

FischhoflF, DA., Waterston, R.H., and Olson, M.V (1984). The yeast cloning vector YEpl3 
contains a tRNAs'^ gene that can mutate to an amber suppressor. Gene 11: 239-25 1 . 

Gray, A.J., Beecher, D.E., and Olson, M.V. (1984). Computer-based image analysis of one- 
dimensional electrophoretic gels used for the separation of DNA restriction fragments. Nucleic 
Acids Res. 12:473-491 

Shaw, K.J., and Olson, M.V. (1984). EflFects of altered 5'-flanking sequences on the in vivo 
expression of a Saccharomyces cerevisiae tRNA^'^ gene. Mol. Cell Biol. 4: 657-65. 

Carle, G.F and Olson, M.V. (1985). An electrophoretic karyotype for yeast. Proc. Natl Acad 
Sci. USA 82: 3756-3760. 

Helms, C , Graham, MY, Dutchik, J E., and Olson, M V (1985). A new method for purifying 
lambda DNA from phage lysates. DNA 4: 39-49. 


Burke, D.T. and Olson, M.V (1986) Oligodeoxynucleotide-directed mutagenesis of 
Escherichia coli and yeast by simple cx)transformation of the primer and template. DNA 5: 

Carle, G.F., Frank, M., and Olson, M.V. (1986). Electrophoretic separations of large DNA 
molecules by periodic inversion of the electric field. Science 232: 65-68. 

Olson, M V , Dutchik, J E , Graham, M.Y , Brodeur, G M , Helms, C , Frank, M , MacCoUin, 
M , Scheinman, R., and Frank, T (1986) Random-clone strategy for genomic restriction 
mapping in yeast. Proc. Natl. Acad Sci. USA 83: 7826-7830. 

Burke, D.T , Carle, G F , and Olson, M.V (1987) Cloning of large segments of exogenous 
DNA into yeast by means of artificial chromosome vectors. Science 236: 806-812. 

Graham, MY, Otani, T , Boime, I , Olson, M V , Carle, G F , and Chaplin, D D (1987) 
Cosmid mapping of the human chorionic gonadotropin beta subunit genes by field-inversion gel 
electro-phoresis. Nucleic Acids Res. 15: 4437-4448. 

Johnson, D I , Jacobs, C.W , Pringle, JR., Robinson, L C , Carle, G.F , and Olson, M.V. 
(1987) Mapping of the Saccharomyces cerevisiae CDC3, CDC25, and CDC42 genes to 
chromosome Xn by chromosome blotting and tetrad analysis. Yeast 3: 243-253. 

Riles, L. and Olson, M.V. (1988). Nonsense mutations in essential genes o{ Saccharomyces 
cerevisiae. Genetics 118: 601-607. 

Brownstein, B.H., Silverman, G.A., Little, R D , Burke, D.T., Korsmeyer, S.J., Schlessinger, 
D., and Olson, M.V (1989). Isolation of single-copy human genes fi^om a library of yeast 
artificial-chromosome clones. Science 244: 1 348- 1351. 

Riethman, H C, Moyzis, RK , Meyne, J , Burke, D T , and Olson, M.V. (1989). Cloning 
human telomeric DNA fi^agments into Saccharomyces cerevisiae using a yeast 
artificial-chromosome vector Proc. Natl. Acad. Sci. USA 86: 6240-6244. 

Green, ED and Olson, M V. (1990) Systematic screening of yeast artificial-chromosome 
libraries by use of the polymerase chain reaction. Proc. Natl. Acad Sci. USA 87: 1213-1217. 

Green, ED. and Olson, M.V. (1990) Chromosomal region of the cystic fibrosis gene in yeast 
artificial chromosomes: A model for human genome mapping. Science 250: 94-98. 

Dmry, H , Green, P , McCauley, B , Olson, M V., Politte, D.G and Thomas, L J Jr. (1990) 
Spatial normalization of one-dimensional electrophoretic gel images. Genomics 8: 1 19-126. 

Imai, T and Olson, M V (1990) Second-generation approach to the construction of yeast 
artificial-chromosome libraries Genomics 8; 297-303 


Link, A J and Olson, M.V. (1991). Physical map of the Saccharomyces cerevisiae genome at 
UO-kb resolution. Genetics 127: 681-698. 

Huxley, C , Hagino, Y , Schlessinger, D. and Olson, M V (1991) The human HPRT gene on 
a yeast artificial chromosome is fianctional when transferred to mouse cells by cell fusion. 
Genomics 9: 742-750. 

Gnirke, A., Barnes, T S., Patterson, D , Schild, D , Featherstone, T. and Olson, M V. (1991). 
Cloning and in vivo expression of the human GART gene using yeast artificial chromosomes. 
The EMBO Journal 10 1629-1634. 

Green, E.D., Riethman, H C, Dutchik, J.E., and Olson, M V (1991) Detection and 
characteri2ation of chimeric yeast artificial-chromosome clones. Genomics 11; 658-669. 

Green, ED Mohr, R.M. Idol, JR., Jones, M., Buckingham, J M., Deaven, L.R., Moyzis, 
R.K., and Olson, M.V. (1991). Systematic generation of sequence- tagged sites for physical 
mapping of human chromosomes: Application to the mapping of human chromosome 7 using 
yeast artificial chromosomes. Genomics 11: 548-564. 

Kwok, P -Y, Gremaud, M.F., Kickerson, DA, Hood, L , and Olson, M.V. (1992). 
Automatable screening of yeast artificial-chromosome libraries based on the oligonucleotide- 
liagation assay. Genomics 13, 935-941. 

Riles, L., Dutchik, J E., Baktha, A, McCauley, B K., Thayer, E.G., Leckie, MP., Braden, 
V.V., Depke, J.E., and Olson, M.V. (1993). Physical maps of the six smallest chromosomes of 
Saccharomyces cerevisiae at a resolution of 2.6-kilobase pairs. Genetics 134: 81-150. 

Olson, M.V and Green, P (1993) Criterion for the completeness of large-scale physical 
maps of DNA. Cold Spring Harb. Symp. Quant Biol. Vol. 58: 349-355. 

Gnirke, A , ladonato, S P , Kwok, P.-Y., and Olson, M V (1994). Physical calibration of 
yeast-artificial-chromosome-contig maps by RecA-assisted restriction endonuclease 
(RARE) cleavage Genomics 14, 199-210 

Gillett, W., Hanks, L., Wong, G. K -S., Yu, J., Lim, R., and Olson, M.V. (1996). 
Assembly of high-resolution maps based on multiple complete digests of a redundant set 
of overlapping clones. Genomics 33, 389-408. 

Wong, G.K.-S., Yu, J., Thayer, E.C., and Olson, M.V. (1997). Multiple-Complete-Digest 
(MCD) Restriction-Fragment Mapping: Generating Sequence-Ready Maps for Large- 
Scale DNA Sequencing. Proc. Natl. Acad Sci. USA 94, 5225-5230. 


Publications (Review articles, Book chapters. Essays) 

Smith, R P and Olson, M V (1973). Drug-induced methemoglobinemia. Seminars in 
Hematology 10: 253-268. 

Olson, M.V. and Crawford, J.M (1975). Semi-micro ion exchange in the freshman laboratory 
J. Chem. Educ. 52: 546-549. 

Olson, M V , Page, G S , Sentenac, A., Loughney, K , Kurjan, J , Benditt, J , and Hall, B D 
(1980). Yeast suppressor tRNA genes. Transfer RNA: Biological Aspects (Soil, D., Abelson, 
J.N., and Schimmel, P.R., Eds), pp. 267-279, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, N.Y. 

Olson, M.V. (1981). Applications of molecular cloning to Saccharomyces. Genetic 
Engineering . Vol. 3, (Setlow, J.K. and Hollaender, A., Eds), pp. 57-88, Plenum Press, New 
York, NY. 

Carle, G.F and Olson, M V. (1987). Orthogonal-field-altemation gel electrophoresis Methods 
in Enzymology . Vol. 155 (Wu, R., Ed), pp. 468-482, Academic Press, San Diego, CA 

Helms, C, Dutchik, J.E., and Olson, M.V. (1987). A lambda DNA protocol based on 
purification of phage on DEAE-cellulose. Methods in Enzymology . Vol. 153 (Wu, R., and 
Grossman, L., Eds), pp. 69-82, Academic Press, San Diego, CA. 

Olson, M.V. (1989). Separation of large DNA molecules by pulsed-field gel electrophoresis. J. 
Chromat 470: 377-383. 

Olson, M.V. (1989) Pulsed field gel electrophoresis. Genetic Engineering . Vol. 11 (Setlow, 
J K., Ed), pp 183-227, Plenum Press, New York, NY. 

Olson, M , Hood, L., Cantor, C , and Botstein, D. (1989) A common language for physical 
mapping of the human genome. Science 245: 1434-1435. 

Burke, D T. and Olson, M V. (1991). Preparation of clone libraries in yeast artificial- 
chromosome vectors Methods in Enzymology . Vol. 194 (Guthrie, C, and Fink, G.R., Eds), 
pp. 251-270, Academic Press, New York, NY. 

Olson, M.V. (1991). The Human Genome Project and analytical chemistry: a tale of two cities. 
Analyt. Chem. Vol 63: 416A-420A. 

Olson, M.V. (1991). Genome structure and organization in Saccharomyces cerevisiae. The 
Molecular Biology of the Yeast Saccharomyces: Genome Dynamics. Protein Synthesis, and 
Energetics (Broach, JR., Pringle, JR., and Jones, E.W., Eds), pp. 1-39, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y. 


Olson, M.V. (1993). The Human Genome Project. Proc. Natl. Acad Sci. USA 90: 4338-4344 
Olson, M.V. (1995). A time to sequence. Science 210: 394-396 


Funding received by Dr. Maynard V. Olson 

Agency and Grant Number: 


Grant Title: 

Dates of Entire Project Period: 

Total Costs for Project: 

NIH/NHGRI 5R01 HG01475 

UW Genome Center-NIH 



Agency and Grant Number: 


Grant Title: 

Dates of Entire Project Period: 

Total Costs for Project: 

NIH/NHGRI 5 ROl HG 01475 

UW Genome Center-NIH98 



Agency and Grant Number: 
Grant Title: 

Dates of Entire Project Period: 
Direct Costs for Project: 

Cystic Fibrosis Foundation 
CF - P. aeruginosa 
12/31/96- 12/31/98 

Agency and Grant Number: 
Grant Title: 

Dates of Entire Project: 
Direct Costs for Project: 

Pathogenesis - P. a. 
12/31/96 - 12/31/98 


Agency and Grant Number: 

Grant Title: 


Dates of Entire Project Period: 

Direct Costs for Project: 


Discovering and Scoring 



Chairman Calvert. Thank you, Doctor. 



Chairman CALVERT. This question is first for Dr. Patrinos and 
Dr. Colhns. Doctors, in a guest column in The New York Times, Dr. 
WiUiam Haseltine, a former Harvard Medical School professor and 
CEO of his own genomics company said the following, "It makes lit- 
tle sense for the Federal Government to go to the trouble of decod- 
ing the junk DNA. The $3 billion of federal money now devoted to 
the entire human genome should be spent instead on university- 
based research initiated by individual medical investigators. The 
era of government-sponsored big science in which a few labora- 
tories receive as much as $10 million a year to analyze mostly junk 
DNA, while scientists doing disease-related research beg for financ- 
ing should end." 

At this point, if there is no objection, I would ask unanimous con- 
sent to insert the entire column at this point in this record and, 
hearing no objection, so ordered. 

[The information referred to follows:] 

Chairman CALVERT. And with that, I assume that each of you 
disagree and could you tell us why? 

Dr. Collins. Every new development in science or in public pol- 
icy tends to bring out of the woodwork individuals with fringe opin- 
ions who seek to take advantage of that new development to pro- 
mote their own agenda. In this instance, the comments you quote 
are those of an individual who has a transparent financial conflict 
of interest in making such assertions, given that the future of his 
particular business enterprise would be best served by genome 
projects of all sorts, public or private, ceasing to exist. In addition, 
there are statements in those remarks which I think the vast ma- 
jority, I would say greater than 99 percent of the scientific commu- 
nity, would profoundly disagree with. What Dr. Haseltine refers to 
as junk DNA includes sequences that play profound roles in juve- 
nile onset diabetes, in cancer, in osteoporosis, and many other dis- 
eases and that has been scientifically demonstrated. 

So, I would ask you not to consider that particular point of view 
as representative of the mainstream of scientific thought, either 
public or private. 

Chairman Calvert. Thsink you for your clear answer. Dr. 
Patrinos. [Laughter.] 

Dr. Patrinos. I certainly couldn't have said it better myself. 


Chairman Calvert. Dr. Galas, in his testimony, and this, again, 
is for Dr. Patrinos and Dr. Collins, says the Federal Grovernment 
approach should continue but should refocus its goals to produce a 
first draft, and that was indicated also by other witnesses, of the 
human genome as soon as possible. Will the government program 
consider this approach in collaboration with the private effort by 
Dr. Venter? Would you like to respond to that as well. Doctor? But, 
go ahead. 

Mr. Patrinos. As I mentioned in my oral remarks, this is, in 
fact, our intention. I agree wholeheartedly with what Dr. Galas has 


said about the value of providing this intermediate product as soon 
as possible and we certainly plan to deliver that intermediate prod- 
uct in coordination and in full cooperation with private-sector ini- 
tiatives such as the initiative that Dr. Venter described. 

Dr. Collins. It is, actually, worthy of note that there is a plan- 
ning process under way right now for the NIH and the DOE ge- 
nome programs. Ari and I work together on all of these planning 
processes and there was a meeting just three weeks ago involving 
more than a hundred scientists from various fields, most of them 
not genome scientists, to look at the next 5 years of the genome 
program. This subject of whether or not the publicly-funded effort 
should revise its strategy in light of the new developments was in- 
tensely discussed. 

I think it's fair to say there is not complete unanimity on the an- 
swer to that question, in part, because of the uncertainty until that 
new initiative has moved forward a bit about exactly what it will 
look like. But I can certainly reassure you, this is being looked at 
with great intensity and I'm sure Ari would agree with me that as 
that data be^ns to become available we will be doing everything 
possible to adjust the strategy to make the most of that and to get 
to the goal as quickly as possible. 


Chairman CALVERT. Let's briefly discuss new technologies. There 
was discussion about that today also. Is the federal program using 
the latest technologies, for example, the new robotics advances in 
the last several years in our endeavor on our — answer the question. 

Mr. Patrinos. There are indeed. Certainly among both our lab- 
oratory and academic performers in the human genome project 
there are many examples of cutting-edge technologies in robotics, 
sequencing technologies in general. This is a field, of course, that 
is rapidly changing. Advances are expected, as I mentioned earlier, 
and probably will be the norm rather than the exception, the sur- 
prising new developments, that is. 

Dr. Collins. I would agree with that. In fact, I would add to it 
the federally-funded effort is not only using the new technology, 
we're developing a lot of it. The NIH component of the Human Ge- 
nome Project spends $20 million a year on technology development. 
One of our successes is the DNA chip which was founded on the 
basis of a company that got going with an NIH grant about 4 or 
5 years ago. So we are intensely interested in technology develop- 
ment. Many of our grantees are engineers, they are not necessarily 
all biologists, computer scientists, robotics experts, and the like. 
This is part of our goal. 

Mr. Patrinos. L^t me add also one thing. At least the Depart- 
ment of Energy is investing some modest amount of funding in 
some of the cutting-edge technologies that we expect will be in 
place not in the next few years, but maybe 10-20 years from now, 
ones that are sort of blue sky right now with respect to their fea- 
sibility because we know that technology changes very, very quick- 
ly and how we sequence 20 years from now will probably be en- 
tirely different than how we are sequencing today. 

Chairman Calvert. Thank you. Mr. Roemer. 



Mr, ROEMER. Thank you, Mr. Chairman. I first of all want to 
thank the panel once again for your very helpful testimony on a 
very complicated subject. Certainly in my background in political 
science and in other areas that prepared me maybe better for run- 
ning for Congress than it did contemplating many of these very 
complicated questions that you experts deal with, we're very appre- 
ciative for your, not only your expert testimony but, I think the 
way that you've also presented your testimony today as well too, 
in a very helpful, very persuasive, and very collaborative sense. We 
haven't had complete unanimity from the panel today and I want 
to get to that point. But first of all, Dr. Venter, I want to ask, to 
make sure that I heard your remark and clarify on it. You said 
that in this collaborative effort, you would not encourage Congress 
to cut the budget. In fact, you would encourage the Congress to in- 
crease the budget for this particular project, even though we're see- 
ing this collaborative public/private partnership. Is that correct? 

Mr. Venter. That's absolutely correct, but not just for sequenc- 
ing of humans. It is because we're going to have the sequence so 
much faster that we can now move to the phase that all of us hope 
to in the envisioning of the human genome project in the first place 
is starting to interpret and understand that genetic code. It will not 
be interpretable without having mouse and other genome se- 
quences so the fact that human's going to be there faster, we need 
mouse even faster. Of the 60,000-80,000 human genes, there's only 
around 5,000 of those genes that have full-length cDNA sequences 
available to the worldwide community. Stepping up the effort so 
that every one of those human genes has a full-length cDNA se- 
quence, which can be done on a very broadly-distributed effort in 
America's universities, we'll move forward to make sure we have 
the tools on a broadest possible sense for everybody to use. There's 
more reasons to fund more genomic research now than there ever 
has been. 

Mr. RoEMER. So your testimony, which is very, you know, per- 
suasive and compelling testimony, you say that in this collabo- 
rative effort, you are not replacing something that is being done in 
the publicly-funded research. In fact, in this collaborative effort, 
you are working together in a partnership and that does not mean 
that slices should be taken out of the existing budget. 

Mr. Venter. Well, we're certainly trying our best to work to- 
gether and I don't think an3rthing should be taken out of the budg- 
et. I've heard from some of my colleagues here that they've been 
criticized for wasting federal dollars based on this new announce- 
ment. I think that's a very unfair and unfortunate use of our an- 
nouncement for people who have the agenda to attack the pro- 
grams. I think it's a very different situation 3 years from now, per- 
haps looking back, if we are successful, and we would not have 
made this announcement if we didn't intend to be, but I think we, 
we want to be judged on our accomplishments, not by our press re- 
leases or announcements, and our accomplishments, hopefully, will 
show that it's wise to change the directions currently under way to 
work with us in a collaborative fashion to move this important re- 
search forward faster for everybody. 



Mr. ROEMER. Thank you. Dr. Collins, you said in your testimony, 
I believe you said in your testimony, that you had worked at the 
University of Michigan and you had worked on the cystic fibrosis 
and Huntington disease sti*ucturing or the DNA researching and 
that that had taken close to a decade. You got some pretty strong 
criticism fi-om Dr. Olson, even though you have some practical ex- 
perience in academic life, he used pretty strong words such as this 
is science by press release, this is public policy by press release. He 
predicted there are going to be 100,000 gaps in the final product 
and misassembled data and so forth. How do you, as somebody that 
has been in his shoes in academic life at the University of Michi- 
gan, respond to this rather strong criticism and, well, let me leave 
it at that. And, I would just say that you certainly were not shy 
when it came to your remarks about Dr. William Haseltine's re- 
marks as well too. 

Dr. Collins. Mr. Roemer, I think there's a little confusion in the 
nature of Dr. Olson's remarks. Again, I'm the person who is respon- 
sible for overseeing the federally-funded effort at the National In- 
stitutes of Health on the genome project. I believe his comments 
about difficulties in assembling the structure were related to the 
announcement by Perkin-Elmer and Dr. Venter and not directed at 
the publicly-funded effort. 

As a researcher who worked on cystic fibrosis and was fortunate 
to lead one of the two teams that worked together to find that 
gene, I can tell you that the 10 years that went by during that en- 
terprise where I, as a physician, had to keep explaining to families 
whose children were increasingly getting sicker that we hadn't 
found the gene yet because it was just too hard, were among the 
more frustrating years of my life and I don't wish that on anybody 
in the future. And that is one of the major motivators to do this 
project and to do it right. Actually, Dr. Olson and I are pretty much 
in sync on this. I do believe that until the Perkin-Elmer effort has 
produced, over the course of the next 2 or 3 years, the data that 
will be required to evaluate this strategy, that exactly what kind 
of a product comes out of it is not knowable. It's not that we're just 
not doing our homework to know it, it's not knowable. It is a prob- 
lem that hasn't been tried before and, therefore, I agree with Dr. 
Olson that the publicly-funded effort, which Dr. Patrinos and I are 
responsible for, should not drastically alter our strategy which is 
targeted toward having this final complete, highly-accurate product 
until we have some more data. 

Mr. ROEMER. But I'm asking you objectively as a scientist to com- 
ment on Dr. Olson's remarks about Dr. Venter, that's what the 
question was about, not a confusion as to where the criticism was 
coming from — or where it was directed. 

Dr. Collins. I think as, as I tried to say, that this approach to 
putting together the human genome sequence is bold. It is of uncer- 
tain success value. It could be that 2 or 3 years from now, as Dr. 
Olson is predicting, we end up with a rough draft which is actually 
rough enough that it is very difficult to work with. The publicly- 
funded effort is probably the only part of this enterprise that's ab- 
solutely dedicated to obtaining the completely contiguous, highly- 


accurate, close-all-the-gaps enterprise and I think we need to take 
that responsibility and take it seriously and will continue to do so. 
But I welcome this new initiative and look forward to seeing what's 
going to happen. It's a scientific experiment; we like that. Sci- 
entists are energized by the opportunity to see a new approach 
tried out. It will take a while to find out, but that's what science 
is all about. 

Mr. ROEMER. So, you are consistent in your initial enthusiasm 
with your testimony for Dr. Venter's efforts; however, you do have 
concerns as a scientist as to what it may produce. You may not 
agree with some of Dr. Olson's conclusions, but you are saying that 
first of all, this effort should go forward; secondly, you are excited 
about the potential; thirdly, you do have questions as Dr. Olson 
does about what the outcome may be? 

Dr. Collins. I think every scientist has to agree with Dr. Olson 
when he says show me the data; then I will make up my mind. 


Mr. ROEMER. Dr. Patrinos, you said in your initial testimony as 
well, that you're excited, you support this collaborative effort. You 
also said that you have some ethical and legal and social concerns. 
Can you be a little bit more specific as to what those might be and 
do they come back to some of Dr. Olson's concerns about access, 
privacy, or any of those other issues? 

Mr. Patrinos. Of course, as you know the Human Grenome 
Project from the very beginning identified the ethical, legal, and so- 
cial implications of this project as very important, in fact the HGP 
carved a significant piece of the budget from the very beginning to 
deal with those issues and that's something we've been doing. Dr. 
Collins and I, for quite some time. My comment was mostly made 
in the context of the faster delivery of the product. In a sense the 
faster delivery of the product will confront us with many of the eth- 
ical and legal and social implications of the project that have been 
articulated by many of the scientists and the science managers in- 

Mr. Roemer. Please give me some examples of what that con- 

Mr. Patrinos. Issues of privacy and confidentiality of genetic in- 
formation, issues of insurance and employment discrimination, the 
multitude of issues in forensics. You know the list is endless, we 
can have an entire hearing solely devoted to this as I'm sure Dr. 
Collins would be delighted to have such a hearing because this is 
one of his very important private concerns. So I was making ref- 
erence to the issue of having that information faster than perhaps 
we had expected a few years back and, thus, forcing us to confront 
some of these issues sooner rather than later. 

Mr. Roemer. I would hope that our Chairman might be ame- 
nable to having another hearing on that and learning of some of 
those potential problems and gleaning maybe some of the potential 
answers to those problems and maybe having sin ethicist as well 
to discuss what those might be. With that, I understand my col- 
league from Michigan has to leave the hearing and I'd be happy to 
yield back the time, although I'm sure the Chairman has been very 
patient with me and I don't have any time left, so. 


Chairman Calvert. Well, we certainly can come back for an- 
other round, so, that's not a problem. Mr. Ehlers. 

Mr. Ehlers. Thank you, Mr. Chairman. It is a very interesting 
hearing. I apologize for being a little bit late, but it's been one of 
those days again. It all sounds terribly complicated to me, then 
maybe because I'm a physicist I am used to dealing with simple 
problems, just electrons and nucleae and quarks, and so forth. 


Mr. Ehlers. Dr. Venter, I think I understand the difference be- 
tween your approach and what we may call the standard approach 
but I'm interested in your comment that you, in your written testi- 
mony you say that this will, your actions will basically make the 
human genome unpatentable. Can you explain that to me? Are you 
saying that you are going to wipe out so much of it that, and you're 
not planning to patent it, that no one will be able to, or what? Just 
what do you mean by that? 

Dr. Venter. Well, our plan, as we've announced in our so-called 
press barrage was that we do plan to make the sequence data we 
generate over the next couple of years on the complete human ge- 
nome accessible to the public. We do not plan to patent that human 
genome sequence, the human chromosomes, or the complete ge- 
nome. In fact, by putting it in the public domain as the individuals 
who sequence that information, if we do not patent it, we will be 
making it and rendering it unpatentable by others. However, we 
will be using that sequence as the beginning for discoveries, as all 
others will be able to, once we release it to discover new genes that 
are key for pharmaceutical development, new hormones that could 
become pharmaceutics themselves and the key to understanding 
key human diseases. 

Some of those genes, such as the gene for human insulin when 
Genentech patented it, that allowed the process to begin for human 
insulin to be available to diabetics as a drug because someone was 
willing to produce it. We will be patenting cDNA's in a limited 
number for new, exciting discoveries that we make with the ge- 
nome. The human chromosome sequence itself and the human ge- 
nome will be unpatented by us and because we will be doing this 
so quickly, we are going to render it unpatentable by others. 

difference between federal human GENOME PROJECT AND 


Mr. Ehlers. Let me ask another question. I've done some experi- 
ments which demand extreme precision, parts and 10 to the 9th, 
and very, very careful work over some time. I've also done some 
which are called quick and dirty where you are just trying to out- 
line the parameters of something to decide whether or not there is 
something worth investigating there. Is that, in a sense, the dif- 
ference between the so-called human genome project and your 

Dr. Venter. Absolutely not. In fact, I appreciate you asking that 
question. Quick does not mean dirty. Quick means better tech- 
nology, better approaches, new strategies. We're going to be se- 
quencing the human genome 10 times. The sequences that we've 
done in the past are some of the most accurate sequences ever put 


in the public domain by any scientist and we're going to have the 
same standard for the sequences that we do with the human ge- 
nome. It's a completely different strategy; in fact, we think it's a 
scientifically more justifiable strategy than rel3dng on clones that 
have been processed several times, coming from limited parts of the 
genome, not necessarily reflecting the entire genome. We're start- 
ing with the entire set of human DNA, the entire set of chro- 
mosomes and using that £ind going right into the sequencing ma- 
chines to generate the data. We're reljdng on new algorithms we've 
developed, new strategies we've developed, and the very forefront 
of computing to be able to reassemble all these pieces into the ge- 

Mr. Ehlers. So your statement would be that your method is 
going to yield results with the same completeness and the same ac- 
curacy as the Human Genome Project? 

Mr. VE^^^ER. We actually feel that our approach is going to yield 
more completeness and at least the same level of accuracy as done 
by the best groups, including our own that have now been sequenc- 
ing the human genome by the existing strategy. It is unknown, you 
know, my colleagues are correct in characterizing this as an experi- 
ment. But some of these same individuals are the same ones that 
criticized our approach to sequence the hemophilus influenza ge- 
nome. In fact, one of the questions I get asked most often is why 
didn't we just apply to the Federal Grovemment for funds to do this 
new strategy. 

Well, I think it's clear, Maynard Olson is the Chairman of that 
review committee and I think you've heard the comments. I think 
if we went and asked for $300 million to do this new project, that 
they might get some good chuckles out of it, but it's not the way 
new initiatives can be made. 

Mr. Ehlers. So, basically, what I hear you saying is it's not the 
contrast between the precise, complete experiment and the quick- 
and-dirty experiment but rather the contrast between a bureau- 
cratic risk-free approach and a more thoughtful modem approach. 

Mr. Venter. I think that would characterize my view quite well. 


Mr. Ehlers. All right. Next question. You mentioned $300 mil- 
lion. If you are putting $300 million in, obviously, you hope to get 
a return on that, or at least your investors do. How will you recap- 
ture your investment? 

Mr. Venter. Well, the goal, in fact, the strategy that we're tak- 
ing proves our philosophy that getting the sequence is only the first 
step. And while we feel morally compelled to release that genome 
sequence to the entire public, and the companies that have pro- 
ceeded on the basis of secrecy are taking things very much in the 
wrong direction, the business strategy is going to be building the 
ultimate genome database relating every bit that we c£in of the 
human genome information out to individuals, to physicians, to 
biotech companies and pharmaceutical companies. On the other 
side, and one of the things that comes out of this whole genome 
strategy that hasn't been discussed, is we get the sequence from 
both chromosomes, both alleles, and we're, in the first three 
months of operation, going to have over 3 million pol3nnorphic vari- 


ations that we're going to use as the basis for setting up high 
throughput screening of patients, of individuals, in part for the 
pharmaceutical industry as a basis for the new clinical trials strati- 
fying patients. This is going to be the basis of the future of individ- 
ualized medicine and we feel we can build a very major business 
without rel3dng on secrecy and allowing other people to use the 
same sequence, discoveries, for their businesses and for their own 
scientific discoveries. 

Mr. Ehlers. Thank you. I find this very interesting and, as Dr. 
Collins observed, this is an experiment. I will be very interested in 
seeing the results of the experiment and it will be fun to get you 
back in about 3 or 4 years and read your prepared testimony and 
your answers back to you at that point. 

Mr. Venter. Thank you. I appreciate that. 

Mr. Ehlers. And find out who really was out on this one. Thank 
you very much. 

Mr. Venter. Thank you. 

Chairman Calvert. Mr. Ehlers. Mr. Bartlett. 

Mr. Bartlett. Thank you very much, and I apologize for not 
being able to be here for the testimony. 



Mr. Bartlett. We obviously, as a society, have two objectives 
that are in tension here. One is the objective to make knowledge 
of the genome widely available so it will benefit the maximum 
number of people. The other is to use competition which, wherever 
it's used in our free market society makes the product or the serv- 
ice better and it makes it cheaper. And, obviously these two things 
tend to be in tension here. How do we proceed so that we maximize 
the contributions that competition will make and, yet, be assured 
that we are going to have as wide a possible dissemination of this 
information so that there will be the maximum benefit from it? 

Mr. Venter. I assume that question is for me? 

Mr. Bartlett. Well, for whoever. 

Mr. Venter. Okay. Well, we're going to be disseminating our in- 
formation, first in terms of the raw sequence itself will be provided 
to the world for free and also the world will have access to this new 
database that we're building. We're not here to try to persuade 
NIH or DOE or anybody else not to do what they are doing. We're 
not concerned with competition. I would hate to see the federal 
budget cut because of the basis of what we're doing. I think we can 
proceed much better if we work together. There's clear complemen- 
tary approaches taken with both strategies that will 3deld a much 
more complete, faster product, even sooner than we could possibly 
anticipate. We would like to be judged, as I said earlier, on what 
we accomplish. We're not concerned with competition, other than 
my concern is as a scientist who first spent 10 years at the NIH 
and before that 10 years trjdng to get NIH grants, my institution 
is totally funded by NIH, DOE, NSF, and Department of Defense 
grants. I have as much concern for the public funding of science as 
I do for the private funding of science and if it goes in the wrong 
direction, we all lose from that proposition. 


Dr. Collins. Could I add a comment? I think you asked a very 
appropriate question about how to balance these two forces, but I 
think this is a very good example where those two forces actually 
are sjmergistic on both counts. Having a public/private partnership 
of this sort should speed up getting the final product, that's the na- 
ture of a synergism, a collaboration, if it works, and we are deter- 
mined to see that it does work. But I believe having the public ef- 
fort continue to be vigorously involved in this as much or more so 
than they have been, is also the best insurance that the data is 
made publicly accessible. I do not question for a moment Dr. 
Venter's sincerity in his statement that this data will be made 
available on a quarterly basis in a database that anybody can look 
at. I know that that is what he is committed to doing. But, after 
all, the sequence of the human genome is of such profound impor- 
tance, that I think a scenario where large quantities of it were only 
available within the database of a single private entity might be a 
rather unstable situation. If business demands were to change or 
personnel were to change or the stockholders were to decide it's not 
such a good thing to be giving this all away an5anore, one would 
not want to see a circumstance where the publicly-funded effort 
was suddenly found to have dropped the ball. We don't intend to 
drop the ball. 

Mr. Bartlett. Thank you. I am very supportive of private-sector 
funds in this kind of scientific endeavor. Our federally-funded sci- 
entific organizations have done an exemplary job through the year, 
through the years, but in spite of that, I have a growing concern 
that when you have put all of your eggs in this basket which is 
controlled by a Congress which can, which can change course very 
quickly, that we put the future of science at risk. And so I am very 
supportive of any mechanism which attracts more private-sector 
funds and more competition. I think that whenever you have all of 
the direction of a program under the control of a single entity, in 
this case, ultimately the Congress, I think that you, that you buy 
some risk that you don't need to buy, if the ventures are broadly 
supported through competitive infusion of private-sector funds. So, 
thank you very much for your answers. 

Chairman Calvert. Thank you, Mr. Bartlett. When you say 
things change rapidly, everything except this Congress. Mr. Roe- 
mer, do you have any concluding questions? 


Mr. ROEMER. Yes, Mr. Chairman, just one or two, and I appre- 
ciate getting into a second round here. I'm reading from a Washing- 
ton Post article, Tuesday, May 12, 1998, and in it, I quote Dr. 
Olson saying, "Even though there are promising public access," and 
I guess you mean Dr. Venter's group? 

Mr. Olson. I haven't read that article 

Mr. RoEMER. "They control the terms and there is a history of 
terms being more onerous than is acceptable to most scientists." Is 
that your quote? 

Mr. Olson. I haven't seen the article in question, but 

Mr. RoEMER. Does that sound like your quote? 

Mr. Olson. Sounds like me. [Laughter.] 


Mr. ROEMER. Can you clarify what you meant by that quote and 
maybe we can get Dr. Venter to respond to that? 

Mr. Olson. Well, as I say, it would help if I had a little more 
context, but, at the close of my written 

Mr. ROEMER. Let me try to help you there, Dr. Olson, because 
I'm not sure if, you know, in a newspaper article, they're limited 
by space and I'm not sure how they can provide in terms of the 
lead-in. The previous paragraph says, "These companies have been 
granted scores of patents on their genetic discoveries raising fears 
among some critics that a handful of companies will control the 
commercialization of a vast and potentially lucrative biological re- 
source. Those fears arose again yesterday when Venter announced 
his new project," then your quote. 

Mr. Olson. I see, well, at the close of my written testimony, I 
actually encouraged the Congress to keep careful track of the im- 
pact of intellectual property issues, particularly on basic research 
which is my interest. And I do encourage you to do so. I share Con- 
gressman Bartlett's view that this dynamic involvement of multiple 
sectors is critical to the health of contemporary science. 

My own interest happens to be in, my most vital interest hap- 
pens to be in the public sector, and I think what I was referring 
to there, in the short history of proprietary databases, and these 
databases, which are privately funded are at their inception propri- 
etary and should be proprietary, they're paid for by private funds, 
that there is a history of the data being made available to academic 
investigators only in return for what are sometimes called reach- 
through agreements in which subsequent discoveries made by aca- 
demic investigators using those data will be, the intellectual prop- 
erty status of these subsequent discoveries will be influenced by 
the agreement that must be signed at the time that the data are 
made available. And I think I was simply trying to make the point 
in this context that there are different degrees of accessibility and 
I think most scientists are comfortable, particularly with genome 
sequence data, that it be absolutely unimpeded by hidden costs. 

Mr. ROEMER. So your reference of onerous, terms more onerous 
than is acceptable to most scientists, would refer to these reach- 
back provisions 

Mr. Olson. Yes. 

Mr. ROEMER. That are sometimes used. Dr. Venter, I want to 
give you time to respond to that. You say in the next paragraph 
that with the exception of perhaps 100 to 300 genetic sequences 
that you expect will show special commercial promise, the company 
will make all the genetic information available free to the world's 
scientists. You say, I quote, excuse me, you said, and I quote, it 
would be morally wrong to hold the data hostage and keep it se- 
cret, unquote. Is it morally wrong to keep the 100 to 300 genetic 
sequences from this same kind of scrutiny or providing this to the 
scientific community? 

Mr. Venter. Well, as Dr. Olson knows from his own work on the 
pseudomonas originosa genome with private companies, there is a 
big difference between secrecy and accessibility. One hundred per- 
cent of the sequence that we will generate will be publicly avail- 
able. We will be putting it in the public domain. Having intellec- 
tual property rights on specific genes have no impact on Dr. Olson 


or anybody else. They allow whatever company has those rights the 
ability to commercially produce that product, whether it be insulin 
or raythocroeatin, whether key drugs that have a tremendous im- 
pact on human health. 

I agree with Dr. Olson's concerns about reach-through rights and 
we've made that a key tenet of our philosophy. In fact, putting the 
human sequence in the public domain guarantees that there are no 
rights, reach-through or otherwise, that come with this. Any licens- 
ing that we do will not have reach-through rights. We're basing 
this company and the commercial aspects on this on building the 
best database ever. If it's not, nobody will pay to have access to it 
because they won't want it. If we can't measure poljonorphisms 
faster and better and more meaningfully than anybody else, we 
won't make money. If the genes we discover don't have an impact 
on medicine, nobody will want to license those. None of those have 
any impact whatsoever on whether the fundamental data is widely 
and freely available to others. 



Mr. ROEMER. Finally, Dr. Collins, let me just end with this final 
question and I'm not sure that I will phrase it the way that I want 
so bear with me. Is there, then, a difference here that we're speak- 
ing about in this collaborative effort that if Dr. Venter's group se- 
quences the DNA, does the DNA sequencing for some form of can- 
cer, or Parkinson's, or Alzheimer's and has a patent or privacy on 
that, is there different access, then, for that particular scientific 
knowledge than there would be under the research that the NIH 
and DOE are doing? And what are the consequences of that? 

Mr. Collins. These are subtle and difficult questions, but let me 
do the best I can. The way that the publicly-funded effort is going 
forward is that we insist that our grantees, who are working at 
universities all over the country and also at the DOE labs (and this 
also applies on the international scene to the large-scale genome 
sequencing efforts that are going on in other countries) deposit 
their sequence data within 24 hours of the time it reaches an as- 
sembly of 2,000 letters in a row or more. 

We are not, at the NIH, allowed to deny our grantees the oppor- 
tunity to file for intellectual property rights on things they discover 
with NIH funds, because of the Bayh-Dole Act. So, we cannot tell 
them not to do that, but by insisting upon this early deposit of the 
data, the net outcome of that seems to be that that filing is not 
going on. 

To our knowledge, none of the genome centers are filing for intel- 
lectugJ property protection. They just don't have time and their 
goal is, really, to get the data out there so that other scientists can 
figure out what's there. So, they are pouring out data every day of 
this sort for the rest of the scientific community to use, to analyze, 
to try to figure out. Is there a cancer gene in yesterday's output 
from the St. Louis center? Is there a diabetes gene in the day-be- 
fore-yesterday's output from Maynard Olson's Center at the Uni- 
versity of Washington? It takes another set of steps to figure that 


The sequence itself is publicly accessible. It is truly in the public 
domain. "Public domain" is usually reserved to say there has been 
no intellectual property placed upon this, so the sequence is both 
publicly accessible and it is in the public domain. Now future inves- 
tigators, who figure out the value of a particular gene sequence, 
may learn that it causes a particular disease or learn that it can 
be turned into a pharmaceutical, and then may decide that they 
have added enough value to that to meet the criteria of novelty, 
nonobviousness, and utility and file a patent on it. Those investiga- 
tors might be in academia or they might be in companies, and the 
Patent and Trademark Office then decides whether they've made 
a convincing case or not. 

Mr. ROEMER. Thank you. I think each time you ask a question, 
it begs some more questions. It's been a fascinating panel and 
you've been very helpful and I hope we can do another panel like 
this and add to some more questions. And I appreciate the Chair- 
man, your foresight in having this hearing today. 

Chairman Calvert. Thank you, Mr. Roemer. 



Chairman Calvert. I have just a quick question for Dr. Olson. 
Obviously you are a skeptic when it comes to the private sector ini- 
tiative described here today. If this project is likely to fail, in your 
estimation, should we just ignore it and continue the federal pro- 
gram that we have today unchanged? 

Mr. Olson. Well, I want to make clear that failure is a relative 
term. I have emphasized that I believe it will produce a huge 
amount of extremely useful data. I don't believe that it will meet 
the quality standards which have been outlined. And I think that 
the federal program would be well advised over the next 2 or 3 
years to concentrate on defining the cost-benefit tradeoffs associ- 
ated with the high-quality sequence product. No known approach 
is going to produce a perfect product. Indeed, perfect is not well- 
defined in the context of intrinsically variable structure like the 
human genome, but I believe that the federal, the unique niche for 
the federal program over the next few years is to refine the meth- 
ods that are required to produce the best available product that can 
be achieved at a reasonable cost, and I would define a reasonable 
cost as roughly current levels of funding. 

One of the difficulties in this highly-collaborative model, which is 
certainly correct in principle, but a technical point about the pro- 
posed Perkin-Elmer strategy is that it is heavily back loaded in 
terms of answering my concerns. Even a simple theoretical analysis 
of this approach to sequencing the genome, indicates that particu- 
larly this issue of gaps, will only be addressable relatively late in 
the project. One simply can't tell from the early indicators how that 
issue is going to go. 

So, I believe the federal project should focus on the high quality 
and the definition of high quality, the exploration of the cost/bene- 
fit issues, the demonstration that by fail-safe methods we can 
produce such data over the next few years and when this rather 
back loaded information comes to us from this initiative or other 
initiatives, all I can really say is that we will look at it very closely 


and I'm certainly pleased to hear these renewed strong assurances 
that we'll be able to look at it. That is the data that will be there. 
Chairman Calvert. Thank you, Doctor. 


Chairman Calvert. Dr. Galas, you've got some experience in 
government, now in the private sector. How would you evaluate the 
efficiency of the government program and their ability to make 
changes as technology improves? 

Mr. Galas. I, actually I think that the human genome program, 
perhaps because of the fact that, unlike most federally-supported 
programs there's internal competition of a friendly type within the 
program having two agencies running it actually has been very re- 
sponsive in being able t(? take advantage of new technologies. With 
the DOE and the NIH looking over each other's shoulders, I think 
actually the human genome program has done reasonably well in 
that regard. I'm sure it could be improved and I'm sure they are 
constantly looking at how to do so, but I think they can take ad- 
vantage of that. 

I would say that, if I might address some of the comments that 
Dr. Olson just made, I think that in fact there probably does exist 
a strategy that would be a different strategy from what is being 
right now in the program. Maybe only slightly different, but dif- 
ferent nonetheless, that does not, on the one hand, depend entirely 
on the success or the back loaded success of the private-sector pro- 
gram but can take advantage of data as it's released from this pro- 
gram and enhance the federal effort, but not depend on the success 
of the private program, but merely be accelerated by it if it does 
succeed. And I think that's what the federal program should focus 
on, rather than focusing on the downstream, final product which I 
think, quite frankly, that Dr. Olson makes when he talks about se- 
quence quality on the one hand and scientific standards on the 
other, they are not equivalent at all. Those are really not, that's an 
inequality that can't be made I think. 

I think there's a rational strategy in there which does have a 
continually improving quality of sequence, or a staged quality of se- 
quence that would get some of the fundamental, really important 
biological data out sooner and benefit us, be able to take advantage 
of what data is released by the private sector without making any 
assumptions about either the quality or whether or not they'll suc- 

Chairman Calvert. Thank you. 

Mrs. Lee. No questions? Any other questions from the panel? 

I want to thank our witnesses for very interesting testimony amd 
answers to our questions. I think you can rest assured, I doubt 
very much if Congress will cut funding on the Human Genome 
Project and we look forward to a successful conclusion and cer- 
tainly. Doctor, we wish you well in your new venture. Thank you. 

[Whereupon, at 2:40 p.m., the hearing was adjourned.] 

[The following material was received for the record.] 


APPENDIX 1: Answers to Post-Hearing Questions Submitted by Members of 
the Subcommittee on Energy and Environment 






The Human Genome Project: 
How Private Sector Developments Affect the Government Program 

June 17, 1998 

Post-Hearing Questions Submitted to 

Dr. Aristides A. Patrinos 

Associate Director of Energy Research for 

Biological and Environmental Research 

U.S. Department of Energy 

Washington, DC 

Post-Hearing Questions Submitted by Chairman Calvert 

Scientific Justification for Completing Government-Funded Sequencing of Entire Human 

Ql. Critics of the government program say that sequencing the entire human genome is 
a waste of the taxpayer's money. Please explain why it is scientifically necessary to 
complete the entire process. 

Al. We estimate that the human genome, approximately 3 billion bases of DNA, contains 
about 80,000 genes. It has been estimated that the DNA sequence (cDNAs) containing 
the specific instructions for making these 80,000 protein products may occupy only about 
3% of the total genome. While the specific role for the remaining 97% of the genomic 
sequence is unknown at this time there is no way at present to reliably recognize in 
advance those components that we need to sequence. Even if we could physically 
recognize the important sequences there is no method to select out in an economical way, 
those parts that are biologically significant for sequencing. Merely sequencing the 
expressed cDNAs certainly won't deliver the needed information to understand human 
biology — on this there is very strong agreement fi^om the research community. For 
example, essentially all of the information that is critical for the proper regulation of genes, 
information vital to the proper "turning on" and "turning ofiP' of genes so that they 
become operational at the right times and in the right cells is not recovered in the 
expressed cDNAs. Damage in these regulatory regions has been shown to be an 
important cause of genetic disease in humans. 


We can and must do the best job we can to prioritize what we sequence so that, in our 
estimation, we are getting the best value for the money. However, we need to know the 
entire sequence to fully explore the complexity of human biology and fully exploit the 
information in the human genome. 

EfTiciencies of DOE's Joint Genome Initiative vs. Three DilTerent DOE Laboratory 

Q2. In your testimony, you describe the Joint Genome Initiative, which allows for joint 
management and oversight of three different laboratory programs, those at 
Lawrence Berkeley, Lawrence Livermore and Los Alamos. The JGI was 
implemented seven years into the program. Were there inefficiencies and higher 
costs as a result of separate management of the three labs' programs and, in 
hindsight, would it have been better if joint management existed from the beginning 
of the program? 

A2. The first phase of the Human Genome Program (HOP), closely coordinated between the 
DOE and the NIH was the phase of exploration requiring many independent pursuits. 
Also, it was necessarily devoted to laying the groundwork for the intensive sequencing 
effort that has begun in the last couple of years. In 1990, at the start of the HOP, 
sequencing technologies were not advanced enough, nor efficient enough, to accomplish 
the task of sequencing 3 billion base pairs at the expected funding levels and in the 
expected time frame. Additionally, large scale chromosomal mapping efforts were 
undertaken to provide the detailed physical maps that it was thought would be critically 
necessary to achieving the complete genome sequence. Each of the three DOE Lab 
genome centers carried out parallel and non-overlapping research eflforts to map different 
chromosomes and to explore technologies that would accelerate the sequencing. Not until 
the genome project was ready to switch directions to full scale production sequencing, 
was the nature of the task such that issues of critical mass, economies of scale, and 
sharpness of focus together made central management the correct paradigm. 


Post-Hearing Questions Submitted by Democratic Members 

Difference Between the DOE-NIH and "Shotgun" Human DNA Sequencing Approaches 

Ql. How does the DOE-NIH approach, projected to be completed by the year 2005, 
differ from the Venter-Perkin-Elmer plan to use the "shotgun" method to sequence 
the human genome in three years? 

Al. The DOE/NIH commitment is to produce a complete and accurate image of the human 
genome by 2005. In the first 2 years (FY 1997 and FY 1998) of the production effort, the 
approach taken insisted on full sequencing accuracy, high continuity, and detailed mapping 
(location) knowledge every step of the way, in part to ensure that these meritorious 
standards could be achieved at affordable cost. This assurance now being in hand, DOE is 
considering an approach that we produce an intermediate draft version of the genome 
based on a "mapped clone shotgun method" — in contrast to the "whole genome shotgun 
method" being followed by Venter-Perkin-Elmer. In the mapped clone shotgun, in which 
we shotgun sequence, but only within already mapped clones that are about 1/20,000 the 
size of the genome, we can have a much higher assurance of positional and sequence 
assembly validity than the Venter-Perkin-Elmer method. In practice, the two approaches 
will complement each other and be extremely usefiil to the scientific community. 

Role of DOE and NTH in Collaboration with Private-Sector Venture 

Q2. Do you see a role for DOE and NHI to collaborate with Venter and Perkin-Elmer to 
complete sequencing of the human genome? 

A2. Yes, a very significant opportunity exists. In practical and scientific terms, the two 
approaches can strongly and synergistically complement each other. In fact, the clone 
resources that Venter-Perkin-Elmer will utilize have been developed and made available to 
the pubhc by DOE and NIH; and the DOE is fiinding projects that will provide the 
sequence information fi-om the ends of 600,000 BACs (bacterial artificial chromosomes) 
that will form the scaffold needed for linking the human genome sequence together in the 
Venter-Perkin-Elmer Plan. The DOE and NIH will help both private and the public 
sequencing efforts by aggressively completing the BAC-end sequence set, as well as 
developing a high resolution radiation hybrid map of BAC ends and other sequence 
markers, and the mapping of all cDNA ESTs (Expressed Sequence Tags) against the BAC 
libraries being sequenced. 

Q2.1. How would this be done? 

A2. 1 . On the Venter-Perkin-Elmer side, prompt and complete sharing of their raw data 
with the public is the core requisite of making the two efforts mutually 
complementary. On the public side, it is necessary that DOE and NIH 
simultaneously produce a high quality, fully mapped, draft ('scaffold') intermediate 
version of the genome, on top of which the Venter-Perkin-Elmer sequence could 
most usefully be assembled (adding depth for improved accuracy and coverage). 
The public effort would then proceed to complete this jointly constructed draft 


version to full coverage and accuracy sooner than originally planned and at a lower 

Q2.2. At what stage would it be done? 

A2.2. The Venter-Perkin-Elmer venture has projected a completion date of two to three 
years; thus, to be effective, any collaborative elements need to be in place quickly 
and ongoing during the course of the project. As mentioned above, some of the 
needed efforts are already underway and it is anticipated that the remaining 
components will be initiated before January 1999. 

Concerns oflntemational Collaborators About Intellectual Property Rights and Patenting 

Q3. The international Human Genome Organization (HUGO) has been fairly vocal 
about their feelings concerning intellectual property rights and patenting. 

Q3.1. How have the international collaborators responded to this proposed 

A3.1. With very serious concern. These concerns derive from the immense and 
essentially unrestrained possibilities that exist for intellectual property rights 
control when extremely high rate, highly automated data generation techniques are 
used by a privately owned company to produce and combine both "composition of 
matter" information (sequence data) with "utility" information (e.g., mapping and 
gene expression data), to form the basis of patent applications en masse. Thus, the 
response to this venture by the Wellcome Trust in Great Britain, the principal 
public funder of human genome sequencing efforts at the Sanger Center in Britain, 
was to announce that they would double the budget in support of human genome 
sequencing at the Sanger Center. The Sanger Center, like its US counterparts has 
a policy of daily release of sequence. 

Q3.2. How do you plan to allay their concerns that the race for patenting will (1) 
hinder information exchange and (2) result in unnecessary and costly 

A3. 2. (1) The DOE and NIK must not deviate from their clearly stated policy, elaborated 
at a series of meetings of the heads of sequencing programs and large sequencing 
labs in the US and other countries, of nightly electronic release of newly 
determined human sequence, without any restrictions on availability. 

The Venter-Perkin-Elmer group has publicly stated that the vast majority of 
sequence information that they determine will be deposited in public databases 
within a few months of sequencing. The several hundred genes that they say they 
will focus on represents much less that one percent of all human genes. Thus 
information exchange for the vast majority of human genes should not, 
theoretically be compromised by this private sequencing effort. Similariy, there 
should not be a costly race for patenting for >99% of the human genes simply as a 
result of this one private effort. It should not be surprising, however, that "use 


patents" for human genes may become a significant issue when large numbers of 
human genes are finally identified, whether by private or public methods. 

(2) As mentioned earlier, the Venter-Perkin-Elmer genome sequencing efforts are 
seen by DOE as complementary and not duplicative of the public efforts by the US 
public Human Genome Program. With regard to the public efforts, the Human 
Genome Organization (HUGO) is coordinating, through a Web site, a current view 
of which centers/labs are sequencing which human chromosomes or chromosome 
fragments. This site is accessible to anyone via the Web. The purpose of this 
HUGO effort is to minimize duplication among publicly funded sequencing efforts. 






The Human Genome Project: 
How Private Sector Developments Affect the Government Program 

June 17. 1998 

Post-Hearing Questions Submitted to 

Dr. Francis S. Collins 

Director, National Human Genome Research Institute 

National Institutes of Health 

U.S. Department of Health and Human Services 

Bethesda, MD 

Post-Hearinfi Ouestions Submitted from Chairman Ken Calvert 

Scientific Justification for Completing Government-Funded Sequencing of Entire Hum an 

Ql. Critics of the government program say that sequencing the entire human genome is 
a waste of the taxpayer's money. Please explain why it is scientifically necessary to 
complete the entire process. 

Al . The more we study DNA, the more we understand how it carries out its amazing work. 
Genes affect almost all important biological processes, at least in part. This includes 
those processes that lead to or are involved in disease By identifying the gene(s) 
associated with a disease, we will gain important understanding that can help us develop 
therapies or preventive strategies. The Human Genome Project, including sequencing the 
entire human genome, is designed to speed up the process of gene identification and make 
it much more cost-efficient. Genes, we have learned, are made up of several parts that 
control their activity. Sometimes all the parts are clustered in the same DNA 
neighborhood, but other times, the parts may be scattered far apart from each other Also, 
at times mistakes in DNA spelling in regions thought to be of no importance turn out to 
contribute to disease risk. We already have found such examples for cancer, diabetes, and 
osteoporosis Some important parts are very easy to spot and some aren't Knowing all of 
the parts of a gene is critical to understanding how it works. Many of the other 
approaches to gene identification that have been used so far cannot find all of the parts of 
every gene (that is one reason why these other approaches tend to be somewhat faster and 
appear to be less expensive) Having a complete genome sequence is the only way to find 
all of the parts of all of the genes that may affect human health The Human Genome 
Project will provide a truly complete genome sequence containing no gaps That level of 
completeness we believe is necessary to provide researchers with the best possible tool for 
understanding the function of genes and their role in human heahh and disease. 


Post-Hearing Questions Submitted by Democratic Members 

Difference Between the DOE-NIH and "Shotgun" Human DNA Sequencing Approaches 

Ql. How does the DOE-NHI approach, projected to be completed by the year 2005, 
differ from the Venter-Perkin Elmer plan to use the "shotgun" method to sequence 
the human genome in three years? 

Al . Sequencing was once done by hand as a series of chemical reactions — a slow and costly 
method. Now, machines can read the sequence quicidy, but current instruments can only 
read short DNA fragments at a time. So, using a strategy referred to as "shotgun" 
sequencing, an investigator randomly cuts DNA into small fragments. These fragments 
are small enough for sequencing machines to read. Then, the scientist must correctly 
reassemble all of these sequenced fragments in order to properly reconstruct the full- 
length DNA sequence. The reassembly of this giant puzzle is carried out largely by highly 
skilled scientists using sophisticated computer programs. 

The sequencing strategy the public genome project uses employs shotgun sequencing of 
DNA fragments that have been careftjUy mapped and catalogued This strategy is designed 
to maximize the accuracy of reassembling the sequenced fragments, because the scientist 
knows where the fragments belong. Even so, the scientists periodically encounter DNA 
regions that are particularly difficult to sequence, and which therefore require special 
attention. Because all the fragments have been catalogued, a scientist can return to these 
difficult spots after most of the genome has been sequenced and assembled to work on 
closing the gaps and strengthening the weak areas so that the entire sequence will, in the 
end, be finished to very high quality. The international sequencing community, whose goal 
is to complete the human DNA sequence by 2005, has agreed to a policy of releasing 
completed sequence every 24 hours into a free, publicly-accessible database. More than 10 
percent of the human sequence is now available in a public database, and about half of that 
is already "finished." 

The sequencing strategy proposed by scientists at Perkin-Elmer, Inc and Dr. Venter also 
employs shotgun sequencing, but differs from the public effort in several significant ways. 
First, that strategy, called "whole-genome shotgun sequencing", employs fragments that 
have not been previously mapped or catalogued Because the scientist does not know 
where in the morass of 3 billion base pairs the fragment might belong, the task of 
reassembling the fragments becomes far more difficult. Many believe, this difficulty in 
reassembly will inevitably lead to many gaps and misassembled regions in the sequence. 
These scientists believe that, on its own, the quality of the "whole genome shotgun 
sequence" will not be as high as that planned for the publicly-fianded sequence. For 
example, when a scientist encounters a fragment that is particularly difficult to sequence, 
he or she will not be able to return to the fragment later because it has not been 
catalogued. The Perkin-Elmer- Venter approach does not propose to fill in all the gaps left 
by these unsequenced fragments, thereby creating a product that may be incomplete for 
many research uses. Not having a sequence of the highest quality will be a serious problem 
when the gaps and errors occur in DNA regions with biological significance. 


In addition, release of sequence data from the Perkin-Eimer- Venter effort will occur 
quarterly, rather than daily. Although the company states that sequence will be made 
public, release will be significantly slower than data release from the publicly-flinded 
effort As a result, the larger research community's access to this valuable data will be 
slowed down. Furthermore, the new company maintains the right to patent the most 
biologically important gene data. 

Role of DOE and NIH in Collaboration with Private-Sector Venture 

Q2. Do you see a role for DOE and NEH to collaborate with Venter and Perkin-Elmer to 
complete sequencing of the human genome? 

Q2.1 How would this be done? 
Q2.2 At what stage would it be done? 

A2 Partnership with the private sector is both necessary and desirable and we welcome this 
new initiative by Perkin-Elmer and Dr. Venter In the year ahead, we will look carefully at 
the ways in which this private initiative and the publicly-funded effort can be 
complementary If need be, the federal effort is fully prepared to adjust its strategy In 
fact, in late May, just weeks after the private sector announcement, there was a meeting 
involving more than 100 scientists from various fields and from both the public and private 
sectors, to look at the next five years of the genome project. The subject of how 
collaboration might occur and whether or not the publicly-funded effort should revise its 
strategy was intensely discussed. I think it is fair to say there is not yet complete 
unanimity on the answer to those questions. The Perkin-Elmer/Venter proposal is a 
scientific experiment; we like that. Scientists are energized by the opportunity to see a 
new approach tried out It will take time, at least 12 to 18 months, to develop enough 
data to allow the usefulness of the approach to be evaluated, and to assess the quality of 
the product, but that is what science is all about 

Concerns of International Collaborators About Intellectual Property Rights and Patenting 

Q3. The international Human Genome Organization (HUGO) has been fairly vocal 
about their feelings concerning intellectual property rights and patenting 

Q3.1 How have the international collaborators responded to this proposed 

Q3.2 How do you plan to allay their concerns that the race for patenting will (1) 

hinder information exchange and (2) result in unnecessary and costly 


A3. On May 13, 1998, the Wellcome Trust announced their intent to increase its support of 
British science in the sequencing of the human genome Previously, the Wellcome Trust 
had committed to funding the sequencing of one sixth of the human genome at the Sanger 
Centre in the United Kingdom. The May 1 3 announcement, doubled that commitment to 
one third of the genome and expressed concern with regard to a number of aspects of the 
private sector initiative. In the press release accompanying the announcement, the 
Wellcome Trust stated: 


"The Wellcome Taist has today announced a major increase in its flagship 
investment in British science in the sequencing of the human genome... 
The Trust is concerned that commercial entities might file opportunistic 
patents on DNA sequence. The Trust is conducting an urgent review of 
the credibility and scope of patents based solely on DNA sequence... 
This week a commercial venture announced its intention to produce 
partial sequence of the human genome, to delay release of this 
information and to have exclusive rights to patent some of these 
sequences .. The Wellcome Trust believes that the human genome 
should be sequenced, through an international collaboration, as speedily 
and accurately as possible, with the results being placed immediately in 
the public domain." 

The Wellcome Trust is the leading European funder of human genome sequencing Its 
early support of work in the field has enabled Dr. John Sulston, Director of the Sanger 
Centre, and his colleagues, to generate one third of all the human sequence which had 
been produced at the time of the May 13 announcement 

With regard to patenting, this is a difficult area that does not lend itself to simple answers 
The way the publicly-funded effort in the United States, which includes HGP grantees 
from universities all over the country and also at the DOE labs, is going forward is that we 
have agreed with our international sequencing collaborators to deposit sequence data 
within 24 hours of the time it reaches at least an assembly of 2,000 bases, or letters, in a 
row. Absent a finding of exceptional circumstances, we are not at the NIH allowed to 
deny our grantees the opportunity to file for intellectual property rights on things they 
discover with NIH funds, because of the Bayh-Dole Act. As a practical matter, however, 
the pubhcly supported sequencing community has agreed to a 24 hour data release policy, 
and we are not aware that there have been any patent filings. 

Therefore, the sequence itself is publicly accessible It is truly in the public domain, which 
usually is reserved to say there have been no intellectual property restrictions placed upon 
the data. So, future investigators, who figure out the function of a particular gene 
sequence and/or turn that sequence information into a pharmaceutical or a new diagnostic, 
may decide they have added enough value to meet the patent criteria of novelty, 
nonobviousness, and utility, and file for a patent. Those investigators may be in academia, 
here in the United States or abroad, or they might be in private industry. But all seeking 
patent protection must make a case sufficient to convince the Patent and Trademark 
Office that their discovery deserves protection under the law. 


Federal Government's Cost to Completely Sequence the Human Genome 

Q4. Dr. Collins, you have indicated that to date, the Federal Government has spent 
about $100 million on human genome sequencing. How much more do you think it 
will cost the Federal Government to completely sequence the human genome using 
the federal sequencing approach? 

A4. The original projection was that the entire Human Genome Project, including mapping, 
sequencing, technology development, model organisms, informatics, and ELSI would cost 
$200 million a year for 15 years, for a total of $3 billion in 1990 equivalent dollars. If you 
include the FY'99 budget request, a total of $15 billion in 1990 dollars will have been 
spent over a 9 year period. This is approximately $300 million below the $1.8 billion 
originally projected for the Project over the first 9 years. So we are significantly under the 
projected cost of the Project 

Up to this point, the Project has only spent about $100 million on human production 
sequencing Now it is a very critical question, what will it cost the government to 
completely sequence the human genome? The difference between 50 cents per finished 
base and 49 cents per finished base is $30 million worth of cost Greater reductions in the 
per finished base cost will yield more significant reductions in cost. 

The NHGRI has instituted a new method of bringing together our genome sequencing 
centers They have agreed to cooperate to share their technology ideas and to figure out 
who is saving money and at what step or steps in the process. The NHGRI also will 
continue to support research to improve sequencing technology and reduce costs. 

I think it is a little hard to predict how things will go in the next 6 or 7 years, particularly 
with regard to the impact on costs of fiirther developments in technology and activity in 
the private sector. But I am very optimistic that the sequencing component of the Project 
can be accomplished within the projected budget To date, we have met our goals on 
time, and under budget. I would hope the Human Genome Project in the fiiture v^ll be 
judged by the total budget that was required to provide a highly accurate, publicly 
accessible, contiguous, finished sequence as soon as possible 






The Human Genome Project: 
How Private Sector Developments Affect the Government Program 

June 17, 1998 

Post-Hearing Questions Submitted to 

Dr. J. Craig Venter 

President and Director 

The Institute for Genomic Research 

Rockville, MD 

Post-Hearing Questions Submitted by Republican Members 

Will the Private Initiative Duplicate the Federal Human Genome Project? 

Ql. Please tell us, should your initiative be successful, will you in fact have 
duplicated the federal program, or, as some have said, given us a "synopsis" 
of the human genome? 

Al. By obtaining the complete DNA sequence of the human genome by the year 2000, 
our new venture will make the science of genomics directly applicable to 
combating human disease in the broadest way possible We won't duplicate the 
federal program because we'll actually obtain the complete sequence and make it 
available before that effort is complete We will, however, be building our 
program on resources and strategies that have been developed as a result of the 
federally-funded initiative. As I indicated in my testimony, obtaining the complete 
sequence of the human genome is not an end to itself, but represents a beginning 
for the real research that will allow us to better understand the disorders that afflict 
humankind The federally-funded program needs to be positioned to ensure this 
new research takes place, whether in the year 2005 as previously planned or in the 
year 2000. 


Concern About Rdcasc of Data to the Public 

Q2. In his testimony, Dr. Francis Collins expressed concern that your plans to 
release data to the public on a quarterly basis is not suflicient. Please tell us 
your response to that 

A2. As a requirement for receiving a grant from either the Department of Energy or the 
NationaJ Human Genome Research Institute for DNA sequencing the recipients 
are required to release sequence data as soon after it is generated as possible. This 
is a requirement for publicly-funded activities. As I indicated in my testimony, we 
don't presume to be able to understand the biolo^cal significance of all the data 
that we will generate in completing the sequence of the human genome. As 
scientists, we also understand the importance of sharing data. The current model 
that is employed by most commercial organizations in this field is to keep human 
DNA sequence data private. We intend to share the data that we generate on a 
quarterly basis. There are obviously people and organizations, especially in the 
pubUc sector, who don't feel this frequency is adequate. However, we are not 
required to meet the objectives of the publicly-funded project and given the current 
commercial alternative we believe our approach is very appropriate 

Recommendations for Restructuring the Federal Human G^enome Project 

Q3. In your testimony, you say the impact your venture will have on the federal 
program will be to re-orient it to focus on research into the genetic impact of 
disease on a broad basis. Could you please elaborate on that and tell us any 
specific recommendations you have on how the federal program should be 

A3. The Human Genome Project is about much more than just obtaining the complete 
human DNA sequence. The sequencing is just the biggest initial hurdle that needs 
to be cleared. Once the human sequence is complete, the information will exist to 
begin in-depth research into the actual functioning of the genetic code. One 
critical resource that will be required to undertake this task will be providing 
researchers access to full-length cDNA clones. This will allow researchers to 
study specific genes in great detail and at this time there is no resource for this 
material. Only a small percentage of the genome is actually made up of genes, but 
these regions will attract a significant amount of the initial research activity from 
both private and public entities. However, there will be real value in understanding 
all aspects of the human genome, and NHGRl is a logical place to undertake this 


Post-Hearing Questions Submitted by Democratic Members 

Availability of Genomic Information to the Scientific Community 

Ql. Although details of your business venture with Perkin-Elmer Cooperation 
may not be finalized, you and Tony White, Chair, President, and Chief 
Executive Officer of Perkin-Elmer, have indicated that you intend to make 
genomic information from this venture available to the scientific community. 
How can we be assured that this will happen? 

Al On June 20, 1997, The Institute for Genomic Research (TIGR) and Human 
Genome Science (HGS) ended a collaborative arrangement that required TIGR to 
forego payments totalling $38 million The primary reason for my choosing to end 
this relationship and access to significant financial resources was a philosophical 
disagreement about the public release of DNA sequence data. The day after this 
relationship was terminated, TIGR made the largest deposit of DNA sequence data 
into the public domain in history. When I entered negotiations with the Perkin- 
Elmer Corporation to undertake this new venture, the first point of agreement was 
the requirement that human genome data would be made publicly available If 
agreement had not been reached on this point, we would not be discussing this 
new venture I don't know of many organizations that would forego $38 million 
to ensure that DNA sequence data would be made publicly available, and this act 
should provide a high-level of comfort to you and others that this data will be 
made available to the public. 

Timeliness of Release of and Compensation for Human DNA Sequence Data 

Q2. Once obtained, how soon and for what economic compensation will this 
information be released by your new company? 

A2 As previously indicated, the human DNA sequence data will be made publicly- 
available at no charge on a quarterly basis for the scientific community. The 
details and pricing models for the new venture's products are still being determined 
at this time. 


Plans to Patent Genomic Sequences 

Q3. Obviously, you and the Perkin-Elmer Corporation plan to patent a number 
of genomic sequences. 

Q3.1 Since the patenting criteria include utility, in addition to novelty and 
unobviousness to peers, will the sequences you plan to patent 
correspond to particular biological functions or genetic traits? 

Q3.2 Your past patenting attempts involved these expressed sequence tags 
(ESTs) you discussed in your testimony. To the best of my knowledge, 
these requests were denied. Could you explain to me (1) why that was 
and (2) what in your current EST strategy will allow for the patenting 
of these tags. 

A3 As you correctly noted, the NIH chose to file patents for the ESTs identified by my 
lab. This initial application was rejected and NIH chose not to appeal the ruling. 
We are not planning to seek patents on broad sets of ESTs similar to what was 
done at NIH. Instead, we plan to fiilly characterize a small subset of key genes for 
which we will seek to identify and understand their biological significance In an 
article published in the May 1, 1998 issue of Science, John Doll, Director of 
Biotechnology Examination at the U.S. Patent and Trademark Office (PTO), 
indicated that the same patentability analysis which is conducted for any other 
application will be conducted in the area of genomics It is our intent to satisfy the 
PTO standards for those discoveries on which we seek to file for patents. I have 
attached a copy of that article for your information. 

Uniqueness of Expressed Sequence Tags 

Q4. How unique are these tags in terms of their ability to identify an expressed 
gene or locate a gene on a larger map of the genome. Is it a 1:1 
correspondence in terms of ONE tag corresponding to a ONE part of the 
genome? What does that tell us about the functional purpose of that gene? 

A4. There is generally a 1 : 1 to correspondence between an EST and its location on the 
genome With regard to functionality, it depends upon what else we know about 
the EST as to whether it indicates any specific function For example, if a human 
EST matches a sequence from another organism and there is some function 
associated with it, then it is likely the sequence will have a similar function in 


Role of DOE and NLH in Collaboration with Private-Sector Venture 

QS. What do you see as DOE and NIH's role in collaboration with yourself and 

QS.l How would this collaboration be done? 
Q5.2 At what stage would it be done? 

A5. NIH, DOE and the new venture could establish the basis for collaboration nearly 
immediately, and to some degree we already have. As I indicated in my testimony, 
certain resources that have been publicly-funded like bacterial artificial 
chromosomes (BACs), will provide the fi^amework for assembling the genome data 
that we will generate. As v/e publicly release DNA sequence data, this data will be 
available for all DOE and NIH grantees to use in their research 

There are more specific areas of collaboration that could be undertaken that have 
been discussed on a preliminary basis One area of particular significance that 1 
have spoken about with Dr. Varmus is that of the ethical, legal, and social 
implications of the genomic research. A number of concerns have been raised in 
the past few years about issues relating to genetic testing, discrimination in 
insurance, and privacy of individual genetic information. These issues and other 
issues will only become more important in the coming years, especially as we 
speed up completion of the sequence of the human genome. NIH has set aside a 
portion of its annual fianding to address these issues, and this is an important and 
logical area for collaboration. I intend to follow-up on my conversation with Dr. 
Varmus to identify specific activities which we can jointly undertake. 

Restrictions on Researchers' Ability to Obtain Human DNA Sequence Information 

Q6. What restrictions will be placed on researchers' ability to obtain this 

A6. The human DNA sequence information will be made publicly available to 
researchers on a quarterly basis. There will be no restrictions placed on this data 
by the new venture. 


Relation of New Venture to the Federally-Funded Human Genome Sequencing 

Q7. How will you and Perkin-Elmer executives relate your program to the 
federally funded human genome sequencing effort? To the efforts of other 
biotechnology companies? 

A7 The new venture that we are undertaking, if successful, will advance the efforts of 
all human genome research activities. All programs either publicly or privately 
funded will gain some advantage by utilizing the information encoded in the entire 
human genome. We hope to work with all researchers to improve understanding 
into the genetic basis of disease and to one day assist in the creation of 
therapeutics that will improve human health 






The Human Genome Project: 
How Private Sector Developments Affect the Government Program 

June 17, 1998 

Post-Hearing Questions Submitted to 

Dr. David J. Galas 

President and Chief Scientific Officer 

Chiroscience R&D Inc. 

Bothell, WA 

Post-Hearing Questions Submitted by Chairman Calvert 

Practical Value of Federal Completion of Entire Human Genome Sequencing 

Ql. In your testimony, you say that, even if the Federal program agrees to the 
"first draft" approach you recommend, it should then go on to complete the 
entire sequencing process. Please tell us the practical value this will have. 

Al. The "first draft" approach will make available valuable information that can be 

used to locate genes and certain other important tasks for projects currently being 
pursued in the public and private sectors. It is important, as I testified, that this 
information be available as soon as possible to help advance a wide range of 
present and planned research work - thus the value of the "first draft". 
Researchers will use this information to provide clues to enable them to do ftirther 
work, including more detailed sequencing, in specific places in the genome of 
direct interest. In no way, however, should this "first draft" be viewed as the final 
result of the genome project. The complete sequence information is needed in any 
case to provide a complete picture of the biological Sanction of the genome When 
the final product is available in the databases any fijrther sequencing by researchers 
will not be necessary, and even more time and resources will be saved than with 
their use of the "first draft" data 


Post-Hearine Questions Submitted by Democratic Members 

Impact on Current Efforts 

Ql. How would your current efforts be affected by the joint venture? 

Al If the joint venture succeeds as planned, we would welcome the new data that will 
be available in the databases, and use it as soon as it is available Our efforts will 
thus be enhanced by the joint venture. 

Importance of Genomic Data That May Be Withheld 

Q2. How important do you feel the 100 to 300 sequences that would be withheld 
are to the broad assemblage of knowledge? 

A2 Since many companies now withhold the results of their own proprietary work on 
genes, including their identity and function, I doubt if this will change the 
landscape to a significant degree. I am confident that any withheld genes will be 
discovered in short order in the course of normal efforts by the federal program or 
by other academic or industry researchers I would expect that any gene withheld 
in this way would result only in a short delay in its availability to the rest of the 

Reasonable Fees and Conditions to Private-Controlled Genetic Information 

Q3. Could you share with the committee what you feel are reasonable fees and 
conditions to the genetic information Perkin-Elmer will control. 

A3. Unfortunately it is too early for me to make reasonable estimates of this. It 
depends on the specific information (which is highly variable in its value to the 
commercial sector) and the context of the state of knowledge at the time when it 
would actually be made available. 


Rights of Individuals — Privacy and Compensation Issues 

Q4. Please discuss rights of individuals whose specific genomic sequences could 
lead to a commercially successful drug? Are there privacy issues? Are their 
fair compensation issues? 

A4. Use of individual's DNA should only be done under fully informed consent, which 
should include the use of genetic information for research purposes While there 
are strong privacy issues that, in my view, must be dealt with clearly and carefijUy, 
in my opinion, individuals should have no rights to research information that is 
gained by using a biological sample as part of a research program. Any fliture 
claims to completely unknowable future resuUs that their sample may be used to 
produce should be explicitly renounced ahead of time in the informed consent 
process by the individual. The advance of medical science helps all of us and our 
future descendents. This is part of the fair compensation for cooperation in a 
research study of any kind, including one that involves genetics. 






The Human Genome Project: 
Haw Private Sector Developments Affect the Government Program 

June 17, 1998 

Post-Hearing Questions Submitted to 

Dr. Maynard V. Obon 

Professor of Medical Genetics and Genetics 

Department of Molecular Biotechnology 


Director, Genome Center 

University of Washington 

Seattle, WA 

Post-Hearing Questions Submitted by Democratic Members 

Concerns About Ability to Access Genomic Information 

Ql. Do you have concerns about your ability to obtain access to genomic 
information that may come out of this new venture? If so, what are they? 
Are you aware of any past or current problems in this area? 

Al. I have concerns in two areas. First, current promises about data release cannot be 
regarded as binding commitments. The public position taken by Perkin Elmer is 
that there will be excellent access to all the data However, the business interests 
of the firm will be constantly re-evaluated in the years ahead. Perkin Elmer is fi^ee, 
as it should be, to change its position Secondly, much of the utility of the data to 
experts will depend on access not just to processed data, but also to the raw 
output fi^om the instruments The amount of raw data will be vast and it will 
require pro-active effort on Perkin Elmer's part to insure that these data are 
accessible in a readily analyzed form Since it is difficult to see why Perkin Elmer 
will have any incentive to make the needed effort, accessibility is likely to become 
bogged down in haggling with federal agencies about who will pay for and take 
responsibility for the data handling and whether or not the cost is justified. 


Impact on Current Efforts 

Q2. How would your current efforts be affected by the joint venture? 

A2. The answer to this question depends on how it goes Right now the only effect is 
that it has generated inordinate amounts of discussion for which there is not much 
basis If the effort actually results in quick delivery of a high-quality human 
sequence, it would have a major effect on my activities: I could move on a few 
years earlier than planned to other research goals. However, I will only 
contemplate such a move once I see that the venture is really fulfilling the strong 
claims that have been made for it My expectation is that the venture will end up 
having only a minor effect on my activities. Scientists are always making minor 
adjustments to rapidly changing external developments. It will have more impact 
on scientists who are in the thick of analyzing particular problems in human 
genetics (as opposed to engaging in large-scale genome analysis). These 
scientists will benefit from earlier access to valuable data than they would 
otherwise have been the case. 

Importance of Genomic Data That May Be Withheld 

Q3. How important do you feel the 100 to 300 sequences that would be withheld 
are to the broad assemblage of knowledge? 

A3. As long as all the data are released, as promised, and there is no effort to deter 
academic researchers from using these data in follow-up studies, I am 
unconcerned about whether Perkin Elmer attempts to patent 100 genes or 100,000 
genes. It is not up to scientists to write or interpret the patent law. I only become 
concerned when intellectual-property issues become an obstacle to the free pursuit 
of new knowledge. 

Reasonable Fees and Conditions to Priyate-Controlled Genetic Information 

Q4. Could you share with the committee what you feel are reasonable fees and 
conditions to the genetic information Perkin-Elmer will control. 

A4. I assume that this question concerns licensing fees to commercial firms who want 
to use information that is protected through patents or copyrights. I have no 
expertise in this area My opinion, expressed as that of a scientist rather than an 
expert in the commercial aspects of biotechnology, is that it does not serve the 
public interest for pharmaceutical companies to confront a tangle of expensive 
licensing issues whenever they choose to pursue a new product-development 
program. Most of the real costs and real difficulies associated with drug 
development lie far downstream from DNA sequencing, and the rewards of 
successful drug-development efforts should be kept well aligned with the steps in 
the process that involve the highest risk and require the largest investment. 


Rights of Individuals — Privacy and Compensation Issues 

Q5. Please discuss rights of individuals whose specific genomic sequences could 
lead to a commercially successful drug? Are there privacy issues? Are their 
fair compensation issues? 

A5. This area bears watching. Certainly, there are privacy issues whenever DNA 
sequences go into databases. I believe that all such data should meet a high 
standard of anonymity, and we should also avoid drifting toward, just as a matter 
of convenience, obtaining a high proportion of human sequence fi'om the DNA of 
a small number of individuals. In general, the tradition of obtaining research 
samples from individuals who are largely motivated by altruism—with 
compensation that is only related to the time and effort that they must expend in 
providing the samples—serves the public interest well. 

Biomedical research depends on ready availability of enormous numbers of 
research samples acquired from patients and volunteers, under conditions of 
informed consent, every day. It would not serve the public interest to inject legal 
contracts and commercial agreements into the relationship between research 
subjects and researchers. We also do not want to turn the process into a lottery. 
Any particular commercially important discovery can be traced to a particular 
sample or small number of samples, however, in most cases, the individuals who 
provided those samples are no more deserving of special rewards than the 
thousands of other people who also allowed their samples to be used for similar 
research purposes 

In short, we should insist on high standards of privacy, anonymity, and informed 
consent but should not start a system in which donors of research samples have an 
ongoing legal and commercial interest in the research projects that employ their 
samples. However, sticky issues will still arise, particularly when the special 
commercial potential of a particular sample can be recognized in advance of 
extensive scientific analysis or when samples are collected in cultural settings 
where the research subjects have had little exposure to modem medicine or do not 
feel they benefit from advances in medical knowledge. Nonetheless, the more 
closely we can stick to a system in which well informed research subjects volunteer 
to provide research samples out of altruistic motives, the better the public interest 
will be served. 


APPENDIX 2: Additional Materials for the Record 



policy: Genomics 

Shotgun Sequencing of the 
Human Genome 

J. Craig Venter. Mark D. Adams, Granger G. Sutton, 
Anthony R. Kerlavage, Hamilton O. Smith, Michael Hunkapiller 

The Human Genome Project (HOP) was 
officially launched in the United States on 1 
October 1990 as a 15-year program to map 
and sequence the complete set of human 
chromosomes and those of several model or- 
ganisms. The HGP is laying the groundwork 
for a revolution in medicine and biology. Its 
imF>ortance is underscored by the level of 
fundmg from the National Institutes of 
Health, the Department of Energ>' (DOE), 
the Wellcome Trust, and other govern- 
ments and foundations around the world. 

From the inception of the HGP, major 
technical innovatior\s that would affect its 
timetable and cost were considered essential 
to success. The development of bacterial ar- 
tificial chromosomes (BACs) (/) provided a 
key advance. BACs are propagated in 
Escherichia coU and can^ large [- 1 50- 
kilobase pairs (kbp)] inserts stably. In con- 
trast, ordered cosmid clones that served as 
the basis of yeast (2) and Caenorhabditts 
ekgans (3) genome sequencing projects are 
less stable and much shorter (-35 kbp). 
Fluorescent labeling oi DNA fragments gen- 
erated by the Sanger dideoxy chain termi- 
nation method has been the mainstay of al- 
mobt all large-scale sequencing projects 
since the introduction of the first semi-auto- 
m.ued sequencer by Applied Biosystems in 
1987 and the development of Taq cycle se- 
quencmi: in 1990. New models of the se- 
quencer that Cim process mure samples. Taq 
polymerase engineered especially for se- 
quencing, and higher sensitiv itv dyes have 
improved throughput, accuracy, and operat- 
ing costs. Publication oi the first genome 
from a seif-replicatmg organism, Hatmxo- 
phdus influenzae, w-as based on a whole-ge- 
nome shotgun (random sequencing) method 
{4). A set of algorithms called the T!GR 
Assembler (5) together with scaffoldmg se- 
quences from both ends ot 18-kbp inserts in 
bacteriophage lambda clones were critical 
for determination o( correct order and as- 
sembly. Eight additional genomes have 
since been ct>mpleted bv these methods (4. 

J C Venter u D Adams, G G Sutton A R 
Kerlavage. ana H O Smith are at The institute 'or 
Genomic fiesearch(TiGR) Rockviiie. MO 20850 USA 
M Hunkapillef IS ai Perkm-Elmef Applied BiOSyStems. 
Foster Oty CA 94404-1128. USA 

6, 7), and several others are nearing com- 
pletion, including genomes with high GC 
(-65%) and high AT (-82%) composition, 
which present special problems for sequenc- 
ing and assembly. 

Current approaches to human genomic 
sequencing rely on building sequence-ready 
maps over regions ranging in si:e from hun- 
dreds of kilobase pairs to whole chromo- 
somes and then sequencing individual 
BACs spanning these regions through a 
combination of shotgun and directed ap- 
proaches This method can produce highly 
accurate sequence with few gaps, although 

_^ 8AC ends 

10-kbp clones 
100 per 100 kbp 

Covering ihe genome, A lOO-kbp portion of the 
genome showing expected done coverage 

most ^icqucncing centers have encountered 
rci:iiinN appear to be unsequence.ible by 
current technology. The up-front steps ot 
butiding .ind validating the sequence-ready 
ni.ip and subclone library constniction and 
the down^tream steps of directed gap tilling 
are genenilly considered to be rate limiting. 
.About 120 Mbp of human genomic se- 
quence were completed through 1997. and 
another 200 Mbp are planned for 1993. 

The recent announcement by Perkin- 
Eimer of .i new, fully automated sequencer 
(ABI PRISM 3700) permits a reevaluation 
ot strategies for completing the human ge- 
nome sequence. This instrument is a capil->-b.i-^'d -iequencer that can process - 1000 
samples per day with minimal hands-on op- 
erator time (-15 mm compared wich -8 
hours for the same number oi samples on 
ABI PRISM 577s). This reduction m oper- 
ating laK^r. coupled with automation o( 

sample purification and sequencing chemis- 
try enabled by the sequencer's improved de- 
tection sensitivity, suggests that the tens ot 
millions oi sequencing reactions necessarv 
to complete the human genome can be per- 
formed more quickly and at lower cost than 
previously anticipated. The Institute for Ge- 
nomic Research (TIGR) and Perkin-Elmer 
have started a program to complete this task 
withm 3 years using this new technolog> 
and a whole-genome shotgun strategy- that 
obvia&es the need for a sequence-ready map 
before sequencing. We intend to form a new 
company to carry* out this venture and de- 
velop a commercial business based on these 
efforts. The cost of the project is estimated 
to be between $200 million and $250 mil- 
lion, including the complete computational 
and laboratory infrastructure to develop the 
finished sequence and informatics tools to 
support access to it. 

The whole-genome shotgun strategy in- 
volves randomly breaking DNA into seg- 
ments o( various si:es and cloning these 
fragments into vectors. The presence of re- 
peat elements, regions that are unclonable 
in a particular vector, and the benefit of 
having more DNA a\ailable in clones than 
IS actually sequenced (see figure and table) 
require that multiple vector libraries be 
used. A library of pUClS-based plasmids 
containing -2-kbp inserts will provide most 
of the sequencing templates. These clones 
will be sequenced from both ends to produce 
pairs ot linked sequences representing -5C*C 
bp at the ends of each insen. End sequences 
trom a library of low-copy number plasmid 
clones containing -lO-kbp inserts will pro- 
vide medium-range linking, including span- 
ning the common Line-1 and THE repeat 
elements. Use of multiple cloning systems 
should help to reduce the effect of sequences 
that are unclonable or otherwise not present 
in one ot the libraries. The goal is to gener- 
ate 70 million high-quality DN.A, sequences 
totaling -35 billion bp (lOx coverage) of 
raw human sequence. 

An argument for whole-genome shotgun 
sequencing of the human genome was made 
(8) and rebutted (9) in 1997. A year later, 
we see developments in technology and a 
new resource for this project consisting of a 
large database of end sequences of B.AC 
clones. This will provide a framework for 
linking contigs over larger regioru. Cur- 
rently, the DOE IS funding a program at 
TIGR and the University of Washington to 
sequence both ends (-500 bp from each 
end) of 300.000 human BAG clones. This 
BAC-«nd sequencing strategy was origi- 
nally proposed to accelerate genome se- 
quencing by providing markers every 5 kbp 
throughout the genome ilO). 

The new human genome sequencing fa- 
cility will be located on the TIGR campus 


SCIENCE • VOL. :S0 • 5 JUNE 199S • 


in Rockvtlle, Maryland, and will consist ot 
230 ABI PRISM 3700 DNA sequencers with 
a combined dailv capacity of - 100 Mbp of 
raw sequence. The facility will also have the 
infrascnjcture to produce -100.000 template 
preps and -200,000 sequencing reactioru 
daily. This includes both custom and off-the- 
shelf robotic devices for picking colonies, 
pipetting, and thermal cycling. Quality con- 
trol and assessment procedures will be imple- 
mented at each stage of the process. 

Accompanying the challenge of obtain- 
ing the primary sequence data in a rapid and 
cost-effective way is the major challenge of 
assembling raw data into contiguous blocks 
(contigs) and assigning those to the conect 
location in the genome. Complete contigu- 
ity of the clone map should theoretically be 
achieved by about 9x coverage, so the 46x 
coverage (sec table) allows for substantial de- 
viation from the statistical model. The pairs 
of end sequences from each template are con- 
strained by the assembly algorithms to be di- 
rected toward one another in the final assem- 
bly and located at a given distance apan de- 
pending on the insert siie of the originating 
library. Although the BAC end sequences 
will be the primary scaffold onto which the 
end sequences from the smaller clones will 
be assembled, other available resources will 
be used to verify the alignments and place 
contigs on individual chromosomes. The 
most important of these resources is the 
large number of sequence tagged site (STS) 
markers that constitute the physical maps 
that have been produced by many laborato- 
ries during the first phase of the HGP. There 
currently are about 45,000 STS sequences, 
including about 30,000 that are well ordered 
along the chromosomes and provide a de- 
fined marker approximately every 100 kbp 
(li). Expressed sequence tags (ESTs) that 
tag 50 to 80% of human genes {12) and full- 
length cDNA sequences spanning up to 5 
Mbp of genomic sequence will be used to 
verify the final assemblies. There are likely 
to be contigs that are misassembled or incor- 
rectly linked together because of the pres- 
ence of long, duplicated segments of the ge- 
nome. We expect to recognize ar\d correct 
ambiguous or conflicting assembly struc- 
tures using a combination of manual inspec- 
tion artd directed experimental effort. 

The aim of this project is to produce 
highly accurate, ordered sequence that 
spans more than 99.9% of the human ge- 
nome (13). The lOx sequence coverage 
mearu that the accuracy of the sequence 
will be comparable to the standard now 
prevalent in the genome sequencing com- 
munity of fewer than one error in 10,000 bp. 
It is likely that several thousatKl gaps wilt re- 
mair\. although we caimot predict with con- 
fidence how many uncloT\able or urue- 
quenccablc regions may be encountered. 

We look forward to working with other ge- 
nome centers to ensure that the sequence 
meets the requirements of the scientific 
community for accuracy and completeness; 
this will include making clones and electro- 
pherograrm available. 

An essential feature of the business plan 
15 that it relies on complete public availabil- 
ity of the sequence data. The four primary 
business areas are high-throughput contract 
sequencing, gene discovery, database ser- 
vices, and high-throughput polymorphism 
screening. A major consequence of the 
analysis of data generated by this project 
will be the creation of a comprehensive hu- 
man genomic database It will contain an 


with particular genetic loci. The 
assay systems will also be marketed by 
Perkin-Elmer to third parties for in-house 
research. Although we do not plan to seek 
patent protection for the randomly selected 
SNPs. we may seek patents on diagnostic 
tests based on the association of particular 
SNPs with important phenotypic traits. 

We also do not plan to seek patents on 
primary human genome sequences. However, 
we e.xpect that we and others will be able to 
use these primary data as a stanir\g point for 
additional biological studies that could iden- 
tify and define new pharmaceutical and diag- 
nostic targets. Once we have fully character- 
ized important structures (including, for ex- 


size (kbp) 

High^copy plasmid 2 

Low^copy plasmkl 10 

BAC 150 

Number of Coverage (■) 

Clones Sequences Sequences Clones 

30.000.000 60.000.000 

S.000.000 10,000.000 

300.000 600.000 

35,300,000 70,600,000 




' 14 





Analysis o1 cov«r»g«. As each clone is not completely sequenced, there is a greater coverage oi 
clones than sequences in the assembly We assume a 500-bp average read length and 3.5-Gbp 
genome size 

exteiuive set of DNA and protein features 
derived from the primary sequence. DNA 
features will include identified genes and 
their regulators, repeats, lirJcs with genetic 
and physical mapping data, synteny with 
other species, and polymorphisms. Because 
of the importance of this information to the 
entire biomedical research community, key 
elements of this database, including primary 
sequence data, will be made available with- 
out use restrictions. In ifiis regard, we will 
work closely with national DNA reposito- 
ries such as National Center for Biotechnol- 
ogy Information. .We plan to release contig 
data into the public domain at least every 3 
months and the complete human genome 
sequence at the end of the project. We also 
envision providing at a minimum coi\nect 
fee online access to these data and many of 
the informatics tools to interpret them. We 
will also market the database system to com- 
mercial companies engaged in pharmaceuti- 
cal and biotechnology research. 

Because the whole-gerwxnc shotgun ap- 
proach will contain data from multiple irwli- 
viduals (the exact number has not yet been 
determined), we will generate a large number 
of precisely located single-nucleoride poly- 
fiHXphic (SNP) sites spaiuuiig the genome. 
Using technology beirig developed at Perkin- 
Elmer, we will generate assay systems to vali- 
date these iTurkers and select a highly infor- 
mative set of at least 100,000 SNPs. We plan 
to work with commercial partnen to screen 
DNA samples associated with diseases or 
other coruJitions in an effort to link them 

ample, defining biological function), we ex- 
pect to seek patent protection as appropri- 
ate. Given both the complexity and scope of 
the information contained in human ge- 
nome sequence, as well as its public avail- 
ability, we would expect to focus our own 
biological research efforts on 100 to 300 
novel gene systems horn among the thou- 
sands of potential targets. If we are success- 
ful in these efforts, the patents would be 
available for liceruing to interested parties. 
Although it is clear that shotgun se- 
quencing at this scale has never been at- 
tempted, it is our hypothesis that the desired 
result is achievable- While buildir\g the hu- 
man genome sequencir^g infiastructure we 
plan to attempt to demorutratc the effec- 
tiveness of the shotgun strategy on a large 
and complex genome, in collaboration with 
Gerald Rubin (Howard Hughes Medical In- 
stitute/University of California Berkeley) 
and the Berkeley Drosopfula Genome 
Prc>ject (BDGP). Dmsophila melanogasm 
represents a good system for testing the 
whole-genome shotgun strategy because of 
the extensive physical and genetic maps 
that exist, the preseiKie of about 12% of the 
genome as high-quality finished sequence 
with which to compare shotgun assembly 
results, 3sv6 its importance as a model organ- 
ism. We will work fully with the BDGP to 
Militate the final closure process (which 
includes making dona and electrof>hero- 
granu available), with the expected result 
being a highly accurate and contiguous set 
of chromosome sequences. The Omsophia 

wwwjciencemag.ors • SCIENCE • VOL 280 • 5 JUNE 1998 



genome sequence will be deposited in 
GenBank both while in progress and at 
completion. An international workshop is 
being organized for September 1998 to de- 
velop a plan for completing the Drosophila 
genome that encourages panicipation of all 
groups currently working on this project. 

It IS our hope that this program is comple- 
mentary to the broader scientific efforts to 
define and understand the information con- 
tained in our genome. It owes much to the 
efforts of the pioneers both in academia and 
government who conceived and initiated the 
HGP with the goal of providing this intbrma- 
tion as rapidly as possible to the international 
scientific community. The knowledge gained 
will be key to deciphenng the genetic con- 

tnbution to imponant human conditions 
and justifies e.xpanded government invest- 
ment in funher understanding of the ge- 
nome. We look forward to a mutually re- 
warding partnership between public and 
private institutions, which each have an 
important role in using the marx-els of mo- 
lecular biology for the benefit of all. 

References and Notes 

1 H Shizuya et al . Prx Natl Acad Sc< USA 89. 
8794 (1992) 

2 A GoKeau ei al . Nature 387 (suppi ) 5 (1997) 

3 J Suision era/, /O'Cf 356. 37(1992) 

4 R Fieischmann etal.. Science 269, 496(1995) 

5 G G Sunon, O While. M. D Adams. A R 
Kerlavage. Genome Sc/ Technoi 1.9(1995) 

6 C M Ffasefefa/.Sc*ence270.397(l995).C J 

Bull ef at. iD'd 273. 1058 (1996). J -F Tofnb el 
al. Nature 388. 539 (1997). H -P Klenk ef a/., 
ibia 390. 364 (1997). C M Fraser ef a/., 'bid . p 
590. C M Fraser ef al . Scier^ce. m press 

7 D R Smith era/. J Bactenol 179,7135(1997), 
G Oecken era/, Wafure 392, 353(1998) 

6 J Weber ana E W Myers. Genome Res 7 401 

9 E Green, ibid , p 4lO 

10 J C Venier. H O Smrth. L Hood, Nature2B^. 

11 T Hudson ef al. Science 270. 1945 (1995), C 
D<b eral. Nature 380. 152(1996), G D. Schuler 
ef al . Science 274, 540 ( 1 996) 

12 M Adams ef al . Science 252. 1651 (1991): M 
Adams ef al . Nature 37 (suppl). 3 (1995); L. 
HiUier ef a/.. Genome fles 6,607(1996). 

13 Considerable sequence will De generated from 
regions of heterochromatin including cen- 
tromeres, telomeres, and nbosomai DMA arrays. 
which are noi targeted by HGP sequencing labo- 
ratories We will make unique assemblies where 
possible m these regions 

books: paleobiology 


Paul Copper 

Lite. A Natural History of the First Four Billion 
Years of Life on Eanh. RICHARD FORTEY 
Knopf. New York, 1998. xiv. 347 pp.. + plates. 
$30 or C$42. ISBN 0-375-401 1 9-9. 

A ponentous book title as bold as this — 
Life — is bound to raise a few eyebrows. It is 
also almost certain to catch the eye of the 
book browser. In a drama bolder and more 
sweeping than Gone with the Wmd^ Richard 
Fortey sketches the full story of life on 
Earth, the stage and the actors, over more 
than four billion years. Originally published 
in Britain as life: An Unauthorized Biography 
(Haq>er Colliris, 1997). this bright brown 
volume, plastered with the imprint of At- 
chaeopteryx (the oldest known bird), is as 
encompassing as its title suggests. Fortey, se- 
nior palaeontologist at the Natural History 
Museum. London, takes us on a roller 
coaster from the spawning of the simplest 
unicellular organisms during violent infancy 
of the Earth; through monumental crustal 
upheavals, voyages of continents, and mass 
extinctions; to an ending at the dawn of hu- 
man-recorded history. 

The key to this book, a layperson's guide 
to the secrets of fossils and environments 
most ancient, is the way the author has 
magically transposed and integrated his aca- 
demic biography 'and intellectual growth 
into the natural history of life. I know of no 
other "autobiography" — if the book can be 
called one — quite like this, where the 
author's life is stitched into such an im- 

The author rs at the Deparimeni of Earth Sciences, 
Laurentian UniversiTy, Sudbury. Ontario, Canada P3E 
2C6 E-mail pcopperOnickel laurentian ca 

mense stretch of time. Neatly and adroitly. 
Foney weaves his personal observations, his 
encounters with scientists (famous and less 
well known), and his introductions to con- 
troversies (century-old and contemporary) 
into a chronological tapestry of life on Earth. 
The text literally begir\s with Sakerella, 
the vessel that in 1967 carried Foney, then a 
young Cambridge undergraduate, to his first 
field season in Spitsbergen- SahereUa is also 
one of the oldest shelly fossils, a curious Early 
Cambrian genus named after the pioneering 

Ordoviclan "sea beetle." Guaranteed an ex- 
cellent fossil record Dy their calcile carapaces, 
trilobites are the characteristic creatures of the 
Early Paleozoic. (Ceraurus pleurexanthemus. 
from Ontario.) 

..--■-■' I.-. . 
trilobite specialist John W. Salter. First de- 
scribed in 1861 from the shores of Labrador 
(where 1 have collected thousands of the 
little conical shells around some of the earli- 
est metazoan reefs), its affinities can only be 
guessed: is it a worm, a coral, a mollusk? 

Coincidence, circumstance, and chance, 
and their effects on the global gene pool 

through time, are pervasive themes articu- 
lated throughout the book. At the personal 
level, Fortey explores how one chooses a ca- 
reer path, who happei\s to win the prizes 
and scholarships, and who loses out to dis- 
appear from sight. In the fossil record we 
learn about the luck of the gene draw, evo- 
lution through the trials of mass extinctions, 
the consequences of changing climates, 
continental drift, and cosmic impacts. 

The book has many strengths. Fortey lyri- 
cally raises fossils from the dead, re-creating 
vibrant, vivid organisms that absorb light, 
breathe, eat, function, and interact with 
their ecosystems. Read his descriptions of 
the Middle Cambrian Burgess Shale from 
Canada ("on the dark shales there was a 
fishmonger's slabful of arthropods"), a Car- 
boniferous rainforest ("the air is so humid that 
the moisture congeals upon your shoulders"), 
and the Eocene Messel Grube from Germany 
("imagine a delicate bat, Paiaeoduropteryx, as 
fragile as a paper kite, with every bone laid 
out upon a dark slab, as if it had been waiting 
its turn as an extra in a Dracula movie"). 
The author presents bites of life's story se- 
quentially, from oldest to newest, as if to 
suggest (probably rightly so) chat the past is 
the key to understanding the present and the 
future. He moves continents about like card- 
board ci'' outs to explain migration paths of 
contin^ ..j1 tetrapods and plants. He lucidly 
spells out the "rules of the evolutionary 
game" (which organisms needed to follow to 
succeed, compete, and survive over millenia), 
and how these are displayed in the fossil 
record. Fortey provides a bird's eye view of 
the science of paleontology, aiKl an insider's 
perspective of the "psycho-cultural" she- 
nanigans that often come with the paleo- 
priesthood: :. the cladist cult, the mass ex- 
tinction dichotomy of catastrophists and 
uniformitariaru, the taxonomic schism of 
splitters and lumpers, the heretic leaders, and 
the hermits who wait in isolation to reach 


SCIENCE • VOL 280 • 5 JUNE 1998 • 


eljr ^'c\u Jlork (Timrs HAY i o 1998 

Scientist 'sPlan: 
Within 3 Years 


A pioneer <n genetic seipiendas 
and a private company are joining 
forces with the aim of dedpherlag 
ttie entire DMA, or genome, of hu- 
mans within three years, far faster 
and cheaper than the Federal Gov- 
cnunent is plannlns. 

If wccessful, the venture would 
o utiulp and to some extent make 
redundant the Government's (3 bil- 
ban program to sequence tlw fauman 
genome by 2005. 

Despite a host of new questions, 
the charting of tiw full human ge- 

 nome would offer enormous medical 
■ad scicmtfic benefits. 

The principals have high credlbll- 
tly in the world of genome sequenc- 
ing. They are Dr. J. Craig Venter. 

for Genomic Sciences in RockviUe, 
Md.. and Michael W. HunlcapUler. 
president and tedmical maestro of 
the Applied Biosystems division of 
the Perldn-Elmer Corporation of 
Norwalk. Conn. 

The director of the Federal human 
genome project at ti»e National Insti- 
tutes of HealtJi. Dr. Francis Collins, 
first heard of the new company's 
plan on Friday, as did the director of 
the N.1J1.. Dr Harold Varmus. Both 
(aid that the plan. If successful, 

 would enable them to reach a desired 
goal sooner. Dr. Collins said h e 
jlanned to Integrate his program 
with tiie new cwnpanys mrtlB Wt. 
The Government would adjust by 
focusing on ttie many projects that 
are needed to tnierpret the luunan 
DNA sequence, such as sequencing 
the geno m es of mice and other anl- 

Both Dr. Varmus and Dr. Collins 
expressed confidence that they could 
nersuad i* CnnfT ft^ JA mgtll UW 
need for this cfaanae in focus, noiing 
that the sequencing o( mouse and 
other genomes has always been in- 
cluded as a necessary part of the 
human genome project. 

Circ: 1,767,836 

Mr. Hunicaplller's unit is a princi- 
pal manufacturer of the machines 
used to sequence DMA, or determine 
the order of chensical units. The ven- 
ture will be financed by Perlcin- 
Elmer, a toiigtime scientific instru- 
ment maker that has recently 
branched into the genome field under 
ttK leadership of its new chief execu- 
tive, Tony 1_ White. 

A plan to form a new company for 
I the venture was appic i v ed by Perkln- 
Elmer's board on Friday afternoon. 
The project could have wide ramifi- 
cations for IndustTy. acad em ia and 
tiie piiblic because It would make 
possible almost overnight many de- 
velopments tliat bad been expected 
to unfold over the next decade. 

One such development is individ- 
ualized medicine, tlie tailoring of 
drugs and other treatments to pa- 
tients depending on specific varia- 
tions in their DNA sequence. The 
wide availability of individual DNA 
sequences would raise more urgent- 
ly the kngstanding but unresolved 
issues of privacy and control of ge- 
netic information. 

The possible possession or control 
of the entire iuunan gentime by a 
single private company could also 
become an issue of public concern. 

The new venture was conceived 
only a few months aga Mr. Hunka- 
piller believed tiiat a new generation 
of sequencing machines coming on 
line would be so fast that the whole 
human genome could be completed 
far sooner and 10 times more cheap- 
ly than envisaged by the National 
Institutes of Health. 

He approached Or. Venter, who 
had developed tl>e Mea for a new 
sequencing strategy but lacked the 
means to execute it. The two men 
concluded In January that it would 
be pocsibie to sequence the three 
billion letters of human DNA within 
three years, at a cost of $150 million 
toSZOO mlUloa. 

The >3-hiUion Federal program, by 
cnrtT»sL is BOW at the haHwa y pomt 
of its 15-ye«" course, and only 3 pe f- 
tent of tlie »enom'< Hli IWtin e- 
qu^ce d the strategy has been to 
.dIV!3e~!he task and assign parts to 
various universities. Although the 
program has ttad many successes in 
pioneering a daunting task, serious 
doubts have emer g ed as to whether 
the universities can meet rht target 
date of 2005. 

The human genome contains all 
the instructions — some 60,000 or so 
genes — needed to design and oper- 
ate the human organism. Dedplier- 
ing the script in which the instruc- 
tions are written — the chemical 
units of DNA — would yield a trove of 
knowledge about human physiology 
and disease, as well as the power, in 
principle, to comeci the errors in 
DNA programming that cause genet- 
ic disease. The genome, once deci- 
phered, is likely to be seen as the 
foundation of human biology, and 
. hence is the object of intense sclentif - 
' ic and commercial interest. 

The proposal to substantially com- 
plete the human genome in three 
years would seem extreme hubris 
coming from almost anyone but Dr. 
Venter. But other experts deemed 
his approach technically feasible. 

"It's not impossible at all that he 
could succeed," said Dr. William A. ' 
Haseltine, chief executive o f Human 

Genome Sciences of RockvUle, Md. 
"he has oemonstrated a fuie track 
record of innovation and organiza- 

Dr. Haseltine's company was for 
several years In uneasy partnership 
with Dr. Venter's instttme. 

If successful, the new venture 
seems likely to Impose adjustments 
on all tlie others involved in genome 
research, and to offer new opportuni- 
ties. Congress, for instance, might 
ask why it should continue to finance 
tiie human genome project through 
the National Institines of Health and 
the Department of Energy if tiie new 
company is going to finish first. 

The sponsors of the new venture 
insist that ttiere will be more work 

A new private 
venture has lofty 
goals but also 
much credibility. 

for the human genome project par- 
ticipants to do, not less, because ob- 
taining the DNA sequence is only tiie 
first step toward understanding what 
tlie genetic instructions mean and 
how tliey operate. 



eljr JTrUi Jlork e-imcs ^ay i o 


"There is a strong case for Con- 
gress to Increase funding (or this 
work." said Mr. White of Perkiji- 
Elmer. "The post-genomic world will 
be much more exciting." 

With the new company, Perkin- 
Elmer would seem (or the (irst time 
to be stepping into direct competition 
with the customers who buy its se- 
quencing machines and other ge- 
nome-analysis equipment. Mr. 
While, however, has no evident ambi- 
tion.s to become the Bill Gates o( the 
genome world. 

"We are anxious to talk to anyone 
who might (eel threatened by this to 
make very sure that we are doing 
tomething compatible." Mr. White 

Even Dr. Venter, who is known (or 
his direct approach, said, "We are 
trying to do this not with an in-your- 
(ace kind o( attitude." He added that 
he intended to work closely with the 
National Institutes of Health. 

Dr. Venter forecast that the pos- 
session o( the human genome se- 
quence would stimulate new direc- 
tions in medicine and biology, just as 
his sequencing of the (irst banerial 
genome has led to a wave o( other 
microbes being spun through se- 
quencing machines. He said he in- 
tended to build a network o( collabo- 
rators around the world to work on 
human genetic diseases. 

Dr. Venter and his new colleagues 
plan not just to sequence the human 
genome but to construct a "de(ini- 

Circ: 1,767,836 

tive" data base that will integrate 
' medical and other in(ormatlon with 
the basic DNA sequence. An Impor- 
< tant component o( the new dau base 
I will i>e human polymorphisms, the 
' geneticists' term for commonly 
(ound variations in DNA. Though all 
people and ethnic groups are tliought 
to have an overwhelmingly similar 
sequence at DNA letters in their ge- 
nome, tiiere are many minor varia- 
tions at certain sites on the genome, 
and these variations make each indi- 
vidual unique. 

The new company's data base 
seems likely to rival or supersede 
Genbank, the data t>ank operated by 
the National Instttmes of Health. 

Having so much information in<he 
control o( one csmpany is also likely 
to be a matter of some public con- 

"The question is, can the moral 
and legal questions be addressed if 
the largest scientific revolution of 
the next century is going to be done 
under private auspices?" said Dr. 
Arthur Caplan, an ethicist at the Uni- 
versity of Pennsylvania with whom 
Dr. Venter has discussed ttie new 
company's goals. 

The issues of geiMtic counseling 
and insurance have been around for 
some time. Dr. Caplan noted, but the 
new company's plans "accentuate 
the need to improve statutes govern- 
ing the control of genetic informa- 

PerkiivElmer intends to be spar- 
ing in laying claim to intellectual 
property rights over the genome, be- 
lieving the company will create more 
demand (or its machines 1( it allows 
its sequences to be widely accessible. 
Mr. White said his company bad a 
track record o( liberally licensing its 
inventions so as to improve the 
chances o( tlieir becoming the indus- 
try standard. 

Whether the new company could 
gain a significant lock on the human 
genome In terms o( patents Is not at 
all clear. Human Genome Sciences, 
for example, has already obtained 
the full-length sequence of 80 percent 
of human genes, Dr. Hasehine said, 
and has presumably filed patent ap- 
plications. The new company may 
therefore (ind that oiiiers have beat- 
en it to tlie treasure trove. 

Even tliough many have now been 
sequenced, genes constitute only 3 
percent o( the total genome. Dr. Ha- 
seltine suggested that the long re- 
gions o( DNA in between the genes 
were like cosmology, (ascinating to 
know about but o( little commercial 

The new company will be 80 per- 
cent ovimed by Perkin-Elmer, with 
Dr. Venter and others owning the 
balance. Dr. Venter said he would 
resign as president of the institute 
for Genomic Sciences, his place be- 
ing taken by Dr. Claire Eraser, his 



Circ: 1.852.863 

Perkin-Elmer Jumps Into Race to Decode Genes 

By Bill RiCKAims IpY 

Staff Rrponrr efTm Wau. S-mirr JoimxAi. 

Sdentific-lnstrument maker PerfclD-El- 
Mer Corp. said It will Join oik of tbe 
nation's leading genetic researchers in a 
bold venture to speed up the decoding of 
human genes. 

PerUn-Etmer. a Norwalk, Conn., com- 
pany that recently moved into the genetic- 
•equendng field, caid Saturday it signed 
letters of intent with J. Craig Venter and 
Dr. Venter's Institute for Genomic Re- 
search to form the project They said 
they expect state-of-the-art sequencers 
from Perkin-EUmer's Applied Biosystems 
Division to give Dr. Venter's new project 
greater genetic-sequendng capacity than 
the entire current world genetic-sequenc- 
ing output. 

The announcement brings a new com- 
petitor to a race already being run by a 

host of companies, including locyte Phar- 
maceuticals anri Hyman Cntamr Self nces. 
With which Dr. Venter was anmaied. 
Researchers are continually improving the 
speed and accuracy of decoding tech- 
niques, and it remains to be seen whether 
the new project represents a major ad- 
vance or simply an incremental step, ana- 
lysts say. 

'Sequencing the human genome - the 
sum of DMA. which contains the inherited 
instructions for devetopment - is the pro- 
cess of identifying the precise order of the 
genetic letters that make up DNA. With 
this sequence in hand, scientists expect to 
be able to more easily identify the esti- 
mated SO.tXW or so genes titat make up the 
entire genetic map. Scientists hope to 
pinpoint all the genes sometime around the 
year 2010. but it will still take years after 
that to figure out what the genes actually 


The stepped-up capability, the project's 
leaders have told federal officials, could 
cut as much as three or four years off the 
complete-decoding timetable for the hu- 
man genome. The National Institutes of 
Health's human-genome project has se- 
quenced only about 3% of the three billion 
base pairs of DNA that make up the human 

"This will help us to get to our goal a 
littie sooner, and that is good news." said 
Dr. Prands Collins, director of the NIH's 
National Genetic Research Institute, 
which is conducting the human-genome 

But Dr. OoUios and NIH Director Dr. 
Harold Varmus said yesterday that re- 
searchers at the doxen genome centen 
now working on the federal project still will 
have plenty to do. "If the complete genome 
Is like an instruction book, what Dr. Ven- 
ter's group will have when they are done 
would be Uke a group of paragraphs 
that still need to be tied together." said Dr. 

Drs. OdOins and Vannus said they only 
learned of the new venture at a briefing on 
Friday. They said the project's senior 
officials assured them that whatever infor- 
mation Is developed will remain In the 
public domain. Por example, drug compa- 
nies working on developing new geneti- 
cally engineered pharmaceuticals would 
be able to go to Dr. Venter's group and 
license infonnation for a fee. 

In New York Stock Exchange composite 
trading Friday, before the news. Perkin- 
Elmer dosed at S68.50, up 43.75 cents. 

Some researchers have voiced concern 
that the first private company to decode 
the human genome would be able to com- 
pletely control future genetic engineering, 
as software giant MIcrDsoft Corp. has been 
able to coativl the development of com- 
puter software. "We were given assur- 
ances they don't plan to lock it up," said 
Dr. Collins. The new company said it 
"plans to make sequencing data publicly 
available to ensure that as many research- 
ers as possible are examining it." 

While titere have been rumors in the 
scientific community that a private com- 
pany might step up to the challenge of 
deciphering the entire human genome, 
Perkin-Elmer's venture is the first to take 
that step. The company said yesterday 
that it has developed "a breaiohrougti 
DNA-analysis tecfanotogy" that wiu vasUy 
speed up the sequencing process. Perkin- 
Elmer said its new analyzers will cost 
about S3O0,00O each and will be ready for 
the commercial maitet early next year. 

The NIH's Dr. Vannus called the com- 
pany's tecfanofaiglcal advance "a stepping 
(tone " to hastening the decoding of the hu- 
man genome. "They appear to have 
pushed technology to the next notch." Dr. 
Ooliins added. 

Dr. Venter's participation In the new 
sequencing company gives it unusual legit- 
imacy in a field where optimism has 
sometimes outstripped reality. In the past 
few years. Dr. Venter and his Rockville. 

Md.. Institute have pioneered methods for 
quickly deciphering the entire genetic se- 
quence qf bacteria. The institute recently 
identified <he genetic sequences for mi- 
crobes that cause Lyme disease, syphUis 
and stomach ulcers. 

Under the .igreement, Perkin-Elmer 
will own W% o( the new qompany. to be 
based in Rockville. 


(Tljc ^'fUI Jlork Ein\c 

Circ: 1,187, 91.0 

^^^ 1 2 me 

Beyond Sequencing of Human DNA 


THE sequencing of the human 
genome, a historic goal in bio- 
medical research. was 
snatched away last Friday from its 
. Government sponsor, the National 
Institutes of Health, by a private 
venture that says it can get the job 
done faster. Now Government offi- 
cials are scrambling to adjust to the 
stunning turn of events, saying that 
the task of interpreting the genome 
may begin much sooner now, and 
thai there is every reason for Con- 
gress to continue to fund the project. 
Having the human ONA sequence 
in hand much earlier than anticipat- 
ed will significantly accelerate the 
pace of biomedical research. "Peo- 
ple will sign on to the concept that 
genome sequences are the underpin- 
ning of biology," said Dr. Richard 
Robens. a Nobel prize winner who is 
the research direaor of New Eng- 
land Biolabs. "I thinic we are enter- 
ing the most exciting era of biology. 

Adjusting to a bold 
new entry in the 
genome race. 

Finally we might understand what 
life is and how it works. The genome 
is just a start" 

The takeover of the human ge- 
nome project is a venture of unusual 
audacity. Almost equally remark- 
able Is that other genome experts 
seem to accept with little reservation 
that ttie abduaors have a reasonable 
chance of making good on their 
claim to substantially complete the 
human genome, starting from 
scratch. In three years. The National 
Institutes of Health had planned to 
complete the sequence by the year 
2005, after a 15-year program costing 
Si billion. 

The new venture will be financed 
by Perfcln-Eliiier, the scientific in- 
strument maker, at an estimated 
cost of only $200 million. The Idea 
was conceived by Michael W. Hunka- 
piller, head of Perfcin-Eliner's Ap- 
plied Biosystems division. "I won't 
say Mike is a genius because he'd hit 
me up for a raise." Tony L. White, 
the chief exectitive of Perkin-Elmer, 
said last week. An aide added, "Let's 
just say he is smart" 

Dr. HunkapiUer is one of the co- 
inventors, akng with Or. Leroy Hood 
of the University of Washington, of 

the DNA sequencing. machines that 
determine the order of the ctiemical 
units in the genetic material. His 
division recently developed a new 
model of their standard sequencing 
machine, one that is more highly 
automated and altows the machines 
to work round tite clock with very 
little attendance. Dr. HunkapiUer re- 
alized the new machines were so 
much more efficient than their pred- 
ecessors that a roomful of 200 or so 
might t>e able to complete the whole 
human genome in just a few years. 

The human genome, with 3 billion 
units of DNA altogether, Is distribut- 
ed over 23 chromosomes, each of 
which is a single DNA molecule 
about 100 mlllkm units kmg. Dr. Hun- 
kapiller's machines can determine 
the order of units in fragments of 
DNA. which are about SOO units in 
length. Some 60 million of these over- 
lapping. SOO-unit pieces of DNA must 
tten be reassembled to give the se- 
quence of the full-length chromo- 
somes from which they are derived. 

The reassembly process is far 
from straightforward, and Dr. Hun- 
kapiUer turned to Dr. J. Crmig Ven- 
ter, a leading DNA sequencer who 
heads the Institute for Genomic Re- 
search in RockvUle, Md. He Invited 
Dr. Venter to a meeting and told him 
he thought it might be possible to 
sequence the whole genome. "Craig 
said, 'You've got to be crazy,' " Dr. 
HunkapiUer said. "We spent a few 
days working through the math and 
came away thinking mayt>e It's do- 
able. They went back and redid the 
calculations and so did we." 

The idea of a single organization 
cracking the genome in a single pro- 
cedure, known as a shotgun experi- 
ment. Is extremely bold. Under the 
approach adopted by the National 

Institutes of Health, half a dozen 
university laboratories are worUng 
on the sequence, each tackling a dif- 
ferent chromosome. 

Dr. Francis Collins, the N.LH. di- 
rector of the human genome project, 
is pnnid of tlielr progress, noting that 
4 percent of the genome has already 
been sequenced, whereas the initial 
plan called for only 1 percent to be 
completed by this stage. But some 
scientists in ttie biotechnology indus- 
try say N.l.H.'s management of this 
industrial-4cale project has been 
flawed from the start 

"There have been serious prob- 
lems of organization and manage- 
ment both at the Department of En- 
ergy and at N.l.H,," together with 
mtemal dissetislon among the senior 
scientists Involved, said Dr. William 
A. Haseltine, chief of Human Ge- 
nome Sciences, a genome sequencing 
company in Rockvilie, Md. 

That issue will be moot if the se- 
quencing of human DNA is assumed 

by the new private venture. Howev- 
er, It is hard to see how the new 
venture could have started without 
the substantial groundwork laid by 
N.l.H. and by the university pro- 
grams it fimded, particularly ttie 
team at Washington University at St. 
Louis, led by Dr. Robert Waterston. 

Recognizing the credibility of the 
new venture by Dr. Venter and Per- 
Un-Elmer, N.l.H. officials are pre- 
paring to persuade Congress to con- 
tinue funding the genome project but 
to switch ttie focus from getting the 
aequence to tlie enormous tasking of 
interpreting it. Dr. Venter plans to 
enter his findings in a public data- 

One essential aid to understanding 
ttie human genome is to sequence the 
surprisingly similar genome of the 
mouse. Though all biologists recog- 
nise ttie need for such a project it 
may not be immediately clear to 
members of Congress ttiat having 
forfeited the grand prize of human 
genome sequence, they should now 
be equally happy wjth the glory of 
paying for similar research on mice. 

The new venture accentuates the 
emergmg importance of genomics as 


CONTINUED el)f Mv llork eiir.cs ^^Y 1 2 1998 

Circ: l,187,9i/0 

the central framework of biology and 
medicine. "There is a real treasure 
trove to be found in the total genome 
and its evolutionary history, particu- 
larly as other genomes, those of 
chimpanzees, new and old world 
monlceys and mice, become se- 
quenced," said Dr. Haseltine. "Once 
that picture is put together we'll 
have a very good idea of our evolu- 
tionary history." 


Piivate Firm 
Aims to Beat 
To Gene Map 

^c b)a5t}ln%Um post 

Cizc: 352, i62 

ami Ricr^X''Eiss 

Scicntiss \«lerday said the>- 
utMid form a new company in Rock- 
\tUc that aims to unravel the entire 
human genetic code by the year 
2001. four years sooner than the 
(edcraJ Kuvemmenl expects to com- 
plete' a similar project 

TIh- pnvateh- funded enterprise, 
which hacker-ji said could be cornplet- 
ed kT perhaps one<enlh the cost of 
the government pntgrim. raised im- 
mediate questMms about the rele- 
vancr and future of the S3 billion. 
15-)"ear federal effort. It also raised 
fresh concern"; about the prospect of 
the human genetic code being expro- 
priated !)>■ entreprenetjrs who plan to 
patent and scD aa-es,- to the most 

medically valuable parts. 

Some biotechnokigy experts not 
involved in the new company raved 
about the venture, saying it promises 
to generate enormous amounts of 
genetic data that may quickly be 
translated into better diagnostic lesL<: 
and treatments for diseases. 

But other experts expressed skep- 
ticism that the company could 
achieve its ambitious goals, saying 
the new technology remains unprov- 
en and the novel aiulytical approach 
to be used may generate less useful 

information than other methods. 

Federal officials said the accelerating govern- 
ment effort to find and decode all 60.000 or iTK>rc 
genes in the human body woukl remain on iL.'^ 
current course for the next 12 to 18 months, by 
which time it wiU be dearer whether the prpjea 
should change its approach to accommodate the 
new players in the Sdd. 

It would be vastly premature to go out and .. . 
change the jha of our gerxxne center-.' said 
Francis CoDms, head of the National Human 
Genome Research Institute, the branch ol the 
National Institutes of Health that ctKiirects the 
federal effort with the Department of Energj-. 

The new company — not yet named — wiUbeled 
by J. Craig Venter, a pioneer in finding fasL cheap 
ways to decode genetic information. It will be 
badted by Perkin-Ebw Corp. of Norwalk. Conn., 
a major supplier of etjuipment for genetic anah'sis. 
and wiD depend on machines developed by Perkin- 

The new company wiD lease space near Shad>' 
Grove Adventist Hospital just off Interstate 27D in 
(Vlontgamery Comity's hooming biotechnokigy 
corridor. Venter said. The new venture, which 
expects to go into operation early in 1999, will be 
80 percent owned by Perkin-Ebner. 

The company wiD employ between 400 and 800 
people to run 230 spedafized new machines — each 
about the size of a minibar — that wiD operate 24 
Ikmis a day d ec oding information from human 
genes that have been isdated from sperm and 
other rrlK Venter said. The electric bQl alone is 
expected to hit $5,000 a day. 

Venter helped found Human Genome Scence< 
Inc of RockviQe. the first private company in the 
iBtion to amass larp amounts of genetic data, and 
now heads the nonprofit Institute for Genomic 
Research, also in RodcviDe. 

Several biotechnology oooipanies. including Hu- 
man Genome Sdeixzs, are in the business of 
decoding genetic iufaimt ion and selling it to 

MAY ] 2 1998 

pharmaceutical companies and others who hope to 
profit. Most of these biolech companies daim to 
have decoded nxirc than 80 percent of human 
genes already, although the functions of most 
remain a myster>'. 

These companies have been eranied scores o( 
patents on their genetic discoveries, raising fears 
among some critics that a handful of companies 
will control the coiitfncrcialiation of a vast and 
potentially lucrative biological resource. Those 
fears arose again yesterday with Venter's an- 
nouncement of his r>ew' project. 

"Even though they are promising public access. 
they control the terms and there is a history of 
terms being more onerous than Ls aazptaUe to 
most sdentaas," said Maynard Olson, a medical 
geneticist at the Universit>- of Washington. 

Venter said that with the exception of perhaps 
100 to 300 genetic sequences that he ecpects will 
show s^)ecial commercial promise, the company 
will make aE the genetic information available free 
to the worlds srientists. It would be moialh- 
wrong to hold the data hostage and keep it secret' 
he said. 

PeridivBmer senior vice president Michael W. 
Hunkapiller said the company wiD make mone\' b\- 
analyzing the genetic information and then scDinc 
the results to pharnuceutical companies. The 
company also plans to anahT* the tiny genetic 
differeiKSS between individual*, as opposed to 
getting a "generic" genetic sequence for the 
average human being. That new level of informa- 
tion, also being sought by federal laboratories, may 
help drug companies customize medicines for 
individuals or smaD groups of people. 

Venters technique wiD differ markedlj' from 
that being used by bioiech companies. Those 
cmnpani e s use a shortcut that deliberately omits 
large amounts of information whose role in the 
body is undear. 

By Venter's pn>iect aims to unravel 
every bit of genetic information. re^utUcss of 
whether it's suspected to be usefiil. and to organize 
the resulting database into a ma.ssive and readih- 



£l)c tOosliingtxm post 

Ci:c: 352. 2S2 ^^^ ^ 2 ISSQ 

consulted blueprinl of human life. 

To do so. the Prrkin-Dmer nachines vnH use a 
controvereiai approach called 'shotgun whole pe- 
nomc sequencing." Instead of focu-sing on large 
pieces of DNA. this process decodes tin>- pieces 
thai bier must be assembled Kke interlocking 
pieces of a jigsaw puzzle. Be\3u<e of the added 
dilficuhy of dealing with so many sroaD pieces, the 
resuhing picture of the human genome is likd>' to 
be pcppeivd with more and larger holes than that 
produced b>- the federal program. Collins said. 

The government consideied switching to the 
approach that Venter will use a few years ago. 
Coffins said, and "roundl>' rejected" it as too 
problematic. But Venter and others said recent 
tedmical improvements make the approach superi- 

Executives of biotechnoloo- companies in- 
voK'ed in genetic research have long argued that 
the>- could do the work of the federal genome 
project faster and more cheapK-. ^\'^liiam Hasehine. 
head of Human Genome Sciences, yesterday called 
the governments program a "gra\y train" and 
feuhed its leaders for what he descnbed as a iiaihire 
to cnhst private industry'. 

While expressing some doubt thai Venter and 
Peridn-Elmer would find ways to make money on 
their new endeavor, he said he had little doubt they 
would ja j cceed in decoding the entire human 
genome in three years. 

"This, has to fed Uke a bomb dropped on the 
head of the Human Genome Project," Haseltinc 
said b>- teJephaie from Frankfurt. "All of a sudden 
somebody is gomg to paB a S3 biDion rug out from 
under >txi? They mu5t be deeply shocked." 


^i)c}JOmi)ln^ton S^iiiico 

Circ: 66,662 


Academics race 
private enterprise 

MAY 1 7 1998 

By Clive Cookson 


LONDON — The race between 
academic and commercial interests 
to unravel the entire human genet- 
ic code took another twist Wednes- 
day when the British-based Well- 
come TVust, the world's largest 
charity, announced that it would 
spend an extra $184 million on the 
projea over the next seven years. 

The trust's commitment, on 
behalf of the public sertor, is a chal- 
lenge to the commercial genomics 
venture announced in the United 
States last weekend. 

Perkin-Elmer, the scientific 
instrumentation company, said it 
would set up a new company with 
Craig Venter, president of the Insti- 
tute for Genomic Research, "to sub- 
stantially complete the sequencing 
of the human genome [all human 
DNA] within three years." 

Wellcome said in a statement 
Wednesday: "The TVust is con- 
cerned that commercial entities 
might file opportunistic patents on 
DNA sequences." 

The trust is conducting an urgent 
review of the credibility and scope 
of gene patents. In a clear threat to 
Perkin-Elimer and other commer- 

cial organizations, Wellcome said it 
"is prepared to challenge such 

The Human Genome Project — a 
$3 billion, 15-year effort to spell out 
all 3 billion chemical "letters" in 
human DNA — was started in 1990 
in the public sector, with funding 
mainly from the U.S. government 
But during the 1990s the private 
sector moved in, led by Human 
Genome Sciences, a U.S. biotech- 
nology company. 

Now there's intense competition 
— not only between gene-hunting 
companies but also between the pri- 
vate and academic sectors as a 

The private sector says the prof- 
it motive is accelerating the medical 
application of genetic information, 
while the academics, led by the 
Wellcome "Ihjst. claim that compa- 
nies are delajong progress by pre- 
venting the open release of infor- 

The trust's new commitment will 
bring its total spending on the 
Human Genome Project to $328 
million. The work is based at Well- 
come 's new Genome Campus in 
Cambridge, England, where DNA 
sequences are released freely on 
the Internet as they are produced. 
In the United States, Venter plans 
to use ultrafast DNA sequencing 
machines developed by Perkin- 
Elmer, together with a new scientific 
strategy, to move ahead faster than 
the public-sector genome project. 
The new company is expeaed to 
have a research budget of about 
$200 million. 

Although the data will be made 
publicly available after a delay, the 
company plans to build up a com- 
mercial database and to patent some 

Michael Morgan, who runs Well- 
come^ genomics program, said \fen- 
ter's shotgun approach remained 
speculative and had not been proved 
to work. "At best it will give a quick 
and dirty version of the genome," be 
•Distributed by Saipps Howard 


(r;i)r;\rUi31orkttn«r5 „ay 17 1998 

Circ: 1,767,836 

International Gene Project Gets Lift 

Wellcome Trust Doubles Commitment to Public-Sector Effort 



The politics of the hutnan genome 
project, the plan to sequence or ana- 
lyze the entire DNA of human cells, 
has become suddenly more compli- 
cated, on both a personal and inter- 
national level 

The project, a glittering scientific 
prize expected to form the underpin- 
ning of biology and medicine in the 
next century, is a S3 billion Federal 
effon, bolstered with a significant 
British contribution, that aims to de- 
code the three billion chemical let- 
ters of human DNA by 2005. 

This program, now half way 
through Its IS-year course, was up- 
staged by the announcement on May 
10 that a private company would 
Stan and aim to complete the human 
DNA sequence in three years at a 
fraction of the cost. 

Now the Wellcome Trust of Lon- 
don, the world's largest medical phi- 
lanthropy, has stepped into the fray 
in an effort to maintain the impetus 
of the publicly financed program and 
to prevent the human genome se- 
quence from falling under the control 
of a private company. 

The trust said this week that it 
would double the money it gives to 
the Sanger Centre near Cambridge, 
England, enabling biologists there to 
sequence one-third of the genome, up 
from their previous goal of one-sixth. 
In addition, the trust said it stood 
ready to pay (or half of the entire 
human genome, or DNA sequence. 

"To leave this to a private compa- 
ny, which has to make money, seems 
to me completely and utterly stu- 
pid." said Dr. Michael J. Morgan, 
program director for the Wellcome 

Asked if the trust was prepared to 
finance the sequencing of the entire 
human genome. Dr. Morgan said. "If 
we had to and if we wanted to. we 
could do it." The Wellcome Trust, he 
noted, has assets of S19 billion. 

The Wellcome Trust's firm sup- 
pon of the existing program seems 
to have had a bracing effect on its 

American partner, the National In- 
stitutes of Health. Officials there 
were talking last week o( how to 
"integrate" their program with the 
commercial venture, as if there were 
no point in the Government continu- 
ing its sequencing effons, and of 
switching their program from se- 
quencmg to understandmg how the 
genome works. But as the rival com- 
mercial venture has come under 
criticism from academic scientists, 
the officials no longer assume it is a 
probable (ait accompli. The new 
company will produce only a "rough 
draft" of the DNA sequence, which 
may not meet scientific needs. Dr. 
Harold Varmus, director of the 
N.l.'H., wrote in a recent letter to The 
New York Times. 

Dr. John E. Sulston. director of the 
Sanger Centre, criticized Dr. J. Craig 
Venter, the head of the new venture, 
for opimg out of the international 
collaboration among academic cen- 
ters, and for his plan to leave gaps in 
pans of the sequence "I really don't 
see this as being any great advance 
whatever," he said. "We are going to 
provide the complete archival prod- 
uct and not an intermediate, transito- 
ry version of it." 

The Sanger Centre has sequenced 
a third of the human DNA now in the 
data banks, a larger contribution 

Politics swirls 
about a glittering 
scientific prize. 

than that of any other institution. 

The fighting words from the N.I.H 
and the Wellcome Trust suggest that 
these two agencies are not about to 
(old their hands and will continue to 
sequence the human genome in com- 
petition with the new company. This 
venture, which has yet to be named. 

is being financed by the scientidc 
instrument maker Perkin-Elmer, 
under the direction of Dr. Venter, a 
leading DNA sequencer and presi- 
dent of the Institute for Genomic 
Research in Rockville. Md. 

Congress will presumably face the 
decision of whether to continue pay- 
ing (or N.I.H. to sequence the ge- 
nome, possibly both lagging and du- 
plicating Dr. Venter's e(fon, or to 
have the N.I.H. switch the emphasis 
of its program to interpreting the 
genome. Sequencing the genomes of 
much-studied laboratory animals 
like the mouse and the Drosophila 
fruitfly would be a major part of an 
Interpretive, post-genomic program 
but doubtless less glamorous, in Con- 
gress's eyes, than obtaining the hu- 
man genome sequence. 

Dr. Venter, a scientist who prizes 
his independence and has seldom 
been averse to criticizing the scien- 
tific establishment, says his critics 
are reacting from emotion and an 
incomplete understanding of what he 
proposes to do. Despite the commer- 
cial basis of his new venture, he says 
he will attain the same accuracy — 
no more than one error in 10,000 units 
of DNA — as the academic centers. 
On the issue of completeness. Dr. 
Venter acknowledges he will leave 
cenain gaps in the genome sequence 
but he and his critics differ on the 
significance. Dr. Roben Waterston. a 
leading DNA sequencer at the Uni- 
versity of Washington in St. Louis, 
said the quality of Dr. Venter's se- 
quence will be "very significantly 
compromised," with the final prod- 
uct being similar to "an encyclope- 
dia ripped to shreds and scattered on 
the door." 

Dr. Venter said he planned to leave 
no gaps in the genes themselves or in 
any imponant region between the 
genes. "These argumenu and debate 
are over less than 100th of 1 percent 
of the genome," he said. 



CONTINUED Cljc ^Trui Jlofk eimrs MAY 1 7 


Circ: 1,767,836 

Dr. Venter icnows that If his 
project succeeds, he will force a ma- 
jor adjustment on his academic com- 
petitors. He alternates between of- 
fering balm and salt for his rivals' 
wounds. He says he seeks to cooper- 
ate with other centers and will share 
his raw data, the chromatographic 
traces from the DNA sequencing ma- 
chines, on request. But he also says 
he plans to sequence the genome of 
the Drosophila fruitfly, an imponant 
laboratory organism, as a trial run 
• for the humaji sequence, and adds. 
"We are going to do the Drosophila 
genome in one-tenth the time of the 
C. elegans sequence and more accu- 

This is a jibe at Dr. Sulston and Dr. 
Waterston, who expect to complete 
the DNA sequence of the C. elegans 
nematode worm, another imponant 
laboratory organism, by the end of 
this year. This spectacular achieve- 
ment will mark the first animal ge- 
nome to be sequenced. 

Dr. Sulston and Dr. Waterston 
have collaborated for many years in 
a friendship that began in Cam- 
bridge. They chose the worm ge- 
nome as the pilot project for their 
assault on the human genome 

They and Dr. Venter are well 
known as pioneers in the field of 
genomics, the study of an organism's 
full set of genes. Dr. Sulston and Dr. 
Waterston have been influential in 
setting the technical standards of the 
human genome project and the ethi- 
cal standards for making data im- 
mediately available to other re- 
searchers. Dr. Venter has pioneered 
the sequencing of bacterial genomes, 
a flourishing new field that is likely 
to have a broad impact on medicine. 


circ: *"i,i87.?.c MAY 2 ] 1996 

Gene-Mapping, Without Tax Money 

By William A. Haseltine 



Sometimes, It's smart not 
L to compete. .Th e Ener- 
gy Departmetn and the 
I National institutes o( 
I Health are spending $3 
billion to decxNle the en- 
tire human genetic structure by 2005 
But this effon has recently been up- 
staged by a new private company 
founded by Dr J. Craig Venter, pres- 
tdenl ot the nonprofit Institute for 
Genomic Research, and the Perkin- 
Elmer Corporation This venture, 
vhich Kill spend about 1200 million. 
pnnnises to complete the job m a 

mere three years. In response, the 
Wellcome Trust, a British founda- 
tion, pledged to double its J185 mil- 
lion grant to a nonprofii laboratory 
for similar work. 

Decoding the entire genome would 
surely be a glittering scientific 
achievement and may lead to some 
scientific breakthroughs. And know- 
tng how individual genes work and 
how they fail is the key to discover- 
ing new ways to predict, detect, treat 
and cure many, if not most, diseases 

But there is a good reason that the 
Federal Government should end its 
effort: decoding the entire genome 
doesn't add significantly to the infor- 
mation we already possess. 

Imagine that the genome is an 

encyclopedia with about three billion • 
letters. Buried within this text are 
about 100,000 sentences (the genes) 
that tell the body what essential pro- 
tems to make 

The sentences are separated from 
one another by page after page of 
random leners — what scientists call 
juiik DNA. To make matters even 
more complicated, the sentences 
themselves are also fragmented and 
interrupted by pages and pages of 
random leners — more junk DNA.Jn 
fact, lee than i percent of our DNS 

contJIb real talonnation. i he anXt 
§S percent has n6 genetiTmeaning 

How do we Itnow this is really 
true? WeVe already decoded 3 per- 
cent of the entire genome And this is 
the picture we get 

Each o( the human genome 
proiects. however, seeks to read the 
entire text from beginning to end — 
regardless of whether the informa- 

tion is useful. And regardless of the 
fact that we've already decoded the 
useful DNA. 

About eight years ago, a new 
means to discover genes using com- 
puterized robots was developed. This 
method takes advantage of the fact 
that the human body is an excellent 
editor, that It can splice together the 
gene fragments to form a coherent 

Instead of searching (or relevant 
gene fragments within ]unk DNA. 
the new robotic method ignores the 
Junk DNA and isolates only the 
body's edited text. This new method 
has been used to discover about 
100.000 useful genes — almost a com- 
plete set (My company has filed 
patents on more than SOO of these 
genes.) This information is now 
available for medical research: 
much of It is even on the World Wide 

V' " milH'"' '""'' '""' ^°' ^^ 
Federal Government to to to the 
trouble of decpdme the lunk DNA . 
Today's task is to discover the metii- 
cal uses of each gene and to find 
gene-based cures for cancer, hean 
disease, Alzheimer's, osteoportKis 
and other diseases. The S3 billion ot 
Federal money now devoted to the 
entire human genome should be 
spent instead on university-based re- 
search, initiated by mdrvidual medi- 
cal investigators. 

The era of government-sponsored 
big science, in which a few laborato- 
ries receive as much as $10 million a 
year to analyze mostly junk DNA. 
while scientists doing disease-relat- 
ed research beg for f utanong. should 

Let prtvate companies and chari- 
table foundations finish the job of 
sequencing the human genome. Na- 
tional pride should come fnmt con- 
quest of disease, not winiui>g a race 
that is not worth wmiUng 

William A Haseltine. a pnfeisor oi 
Harvard Medical School from 1976 to 
1993. IS chief executive officer of 
Human Genome Scitncey which 
does gene research From 1992 lo 
1996, his company helped finance the 
Instituie for Genomic Research. 


Science & Technology 



Craig Venter and Perkin-Eliner target the human genome 

In [ate 1997, an ambitious idea oc- 
curred to technology guru Michael 
W. Hunkapiller of Perkin-Elmer 
Corp. Hunkapiller's team was devel- 
oping a robotic machine that promised to 
decipher human genes far faster and 
more cheaply than any previous system. 
Why not use the new device, 
Hunkapiller wondered, to tackle one of 
the biggest prizes in all of biology — suc- 
cessfully deciphering the entire human 
genetic code? He brought his idea to 
gene sleuth extraordinaire J. Craig Ven- 
ter, president of the nonprofit Institute 

for Genomic Research in Rockville, Md. 
The result, announced on May 9. is a 
still unnamed company that will deci- 
pher what one "might describe as the 
full Monty — the entire genome." says 
Venter. With some 230 of the new 
S300.000 Perkin-Elmer machmes run- 
ning around the clock. Venter and col- 
league Mark Adams figure they can 
break the 3 billion individual units of 
human dna — the genome — into pieces 
and decode a staggering 100 million in- 
dividual units a day. They plan to finish 
the genetic code in three years, at a 

Venter plans to finish the 
genetic code in three 
years— ■nith Perldn-Elmer 

total cost of about $200 million — 
with Perkin-Elmer picking up the 
tab. That is a fraction of what the 
federal -government is spending to 
complete the task — and Venter 
vows to finish four years sooner. 

What's more, Venter and 
Perkin-Elmer will give away the 
entire human DNA sequence, just 
as the govenunent plans to do. 
■We agreed it would be morally 
wrong to hold the data hostage," 
says Venter. The gamble for 
Peridn-Elmer — a pioneer in gene 
sequencing — is that it can make 
money by selling information about 
what the sequence means, as well 
as finding new genes for develop- 
ing medical therapies. 
■VLACK EYE.- The announcement 
sent shock waves through the red- 
hot field of gene-mining. This dis- 
cipline, called genomics, is already 
populated by dozens of companies 
(table, page 72) and academic labs 
seeking to understand and profit 
from DNa's secrets. Companies 
such as Human Genome Sciences 
Inc. (HGS) and Incyte Pharmaceu- 
ticals Inc. have already made millions 
selling access to their private stashes 
of gene sequences. But the new compa- 
ny is a formidable competitor— "a 1,000- 
pound gorilla," says analyst Elizabeth 
Silverman of BancAmerica Robertson 
Stevens. Adds Randal W. Scott, presi- 
dent of Incyte in Palo Alto, Cali£ "This 
puts a new competitor into play." And 
the idea that a pnvate company can 
soundly beat the existing taxpayer-fimd- 
ed effort to the prize "is a tremendous 
black eye for the government," says 
William A. Haseltine. CEO of HCS. 'They 
will lose the race to the genome." 

But the venture also raises a host of 
questions. Does the massive private ef- 
fort mean that the government's Hu- 
man Genome Project should redirect its 
efforts? And will Perkin-Elmer actually 
be able to make money from its radi- 
cally different business plan? 

On the science, few are betting 
against Venter. "There's no question 
that the person who can put together an 
operation like this and make more head- 
way than anyone else is Craig Venter." 
says Stanford University biochemist and 
Nobel laureate Paul Berg. Back in the 

70 BUSINESS WEtK / MAY 25. 1998 

51 217 156 


Science & Technology 

mid-1990s. Venter pioneered a "shot- 
gun" approach to deciphering entire 
genomes. The idea was to chop the dna 
of an organism into pieces, decipher 
each of them, and then use computers 
to compare and assemble them in the 
right order. Using the technique. Venter 
astounded the scientific world by de- 
coding the first complete genetic se- 
quence of a living organism — a bacteri- 
um called Haemcrphilus infiuenzae. 

Perkin-Elmer's new machines will 
speed up the process. Its Applied 
Biosystems Division sold $650 million 
worth of DNA sequencers and related 
instruments and services in fiscal 1997. 
The new tool, available next year, *is an 
evolution of our cur- 
rent system," says 
HunkapUler. Its im- 
proved sensitivity and 
autonnation will dra-; 
matically boost pro- 

is hinting that the 
government's genome 

cal. "The human genome project has 
never been a commerdal venture," he 
says. "This is more in the tradition of 
the Mellons and Camegies" — funding a 
project that promises mainly to push 
back the bounds of knowledge. 

Perkin-Elmer execs insist that their 
proposal has been misunderstood. "People 
still don't see how, if we give away the 
data, we will make money," sighs ceo 
Tbny L. White, as he patiently explains 
the plan. Stanford's Berg says that "the 
big game is how to make use of the in- 
formation," and that's the information 
White plans to sell Rival Incyte is al- 
ready an old hand at this. In fid, one of 
its products is a repackaging of publicly 


Craig Vaiter's new venture is entering a crowded field: Here are 
some key players that want to unlock the secrets of genes: 

CET^ET Un^Step^r^nch^mole^arbiologistipaniel Cohen, it offers ;' 
gS^icjnform^ionifcfielp^rugnMkers tailof^cugs to.Jndividuals. : '. 

_^ ^. -HUM/W GENC^sicScES A pioneer in tfi!?^ HGS has built a, Vast, 

project should shift -aclfel^ of' g^^4p8%^iV(tii(SHttSs using tpw^^ r?'' ■-• 

ita focus to, perhaps, '"iiYSEq' has awelope^jVchnol^-for rapid swfuenclng. Coll^ 

sequencuig tne dna ^^jfh p|rkin-amef, andjs Lrejn^.i1s;bwn'tools™r-dru_g discovery. : 

INCYTE OwT^^liuge-cl^^jse otgenes ani£gene fragments, and sells 
iMtti^jts'rseqi^^es and~iB%ed biqlogical infonriation. Collaborates with 

of animals instead of 
people. That's not 
likely. Dr. Francis 
Collins, head of the 
National Institutes of 
Health's genome cen- 

'micTKfilp firiife.tb d^^pid^ienff'analysis. 

genetic variations. And companies such 
as Affymetrix Inc. will benefit, analysts 
predict. Affymetrix makes gene chips, 
which can almost instantly spot the 
presence of thousands of different genes 
or gene variations. 

should also benefit The $1.4 bQiion com- 
pany has moved aggressively to acquire 
companies and new technology, trans- 
forming Perkin-Elmer from an instru- 
ment maker to one that provides ser- 
vices and information as well. Since 
White took over in 1995, the company 
has acquired TVopix, a leader in screen- 
ing drug candidates, and GenScope, de- 
veloper of gene expression technology, 
and forged partner- 
ships- with other 
players. For instance, 
it teamed up last 
-June_with gene-chip 
developer Hyseq Inc. 
whose products can 
be used to search for 
gene variations. 

Venter's and 

Perkin-Elmer's ven- 
ture may also profit 
from new genes that 
Venter finds. The 
main current ap- 
proach for finding 
genes involves fishing 
out those that are ac- 
tually turned on in 

^_,_ _ MERCK Funded a Uif)^gene-tiunfing project at Washington University, ^ 

ter. wants more proof St'Louis. AH'ofifs findings have Been deposited in public databases. veils' Venter argues 

that the new compa- myrmD Discovered -the breast cancer gene by studying the genes of that this tack, which, 

ny will hve up to Its affected families. Novirsearching for more genes and developing u-omcally, he pio- 

promises before he diagnostic tests. neered, misses some 

alters his course. And ..^- — -- ^- - of the genomes rea! 

even if Venter sue- AXYS Finding-genes fof diseases such as asthma, then searching for gold. That's because 

drugs to tackle the diseases. 

ceeds, making sense 
of the flood of infor- 
mation won't be easy. Only about Z% 
of human genetic material is actual 
genes. Some of the remaining 97% of 
the DNA turns genes on and off, and 
scientists think that much of the rest is 
meaningless junk. Part of Venter's job 
will be to figure out what's what, and 
that could be tough. *The genes jump 
right out at you in mi<Tobial sequences." 
says Richard K. Wilson of Washington 
University's gene-sequencing center "In 
humans, it's much more difficult." 

Many are confident of Venter's sci- 
entific claims, but the business end of 
this venture is another story. Perkin- 
Elmer faces an uphill battle convincing 
the biotech world that this is a money- 
making idea. "What they're describing is 
not a commercial venture," says Incyte's 
Scott. "It's really Craig Venter going 
after the Nobel prize for sequendng the 
genome." hcs's Haseltine is also skepti- 

available data in more usable form, says 
analyst Mike G. iOng of Vector Securities 
International. Haseltine wonders how 
Peridn-Elmer can do this "better than 
the rest of the world combined" 

Venter and Perkin-Elmer execs re- 
tort that the new company will have 
enough experience and smarts to be a 
leader in this toughly competitive field. 
They envision signing up hundreds of 
thousands of subscribers — both compa- 
nies and academics — for a database that 
offers such vital information as which 
sequences are genes, what the genes 
do, and how genes can vary from person 
to person. Such variations, called "poly- 
morphisms." determine whether indi- 
viduals are susceptible to certain dis- 
eases or how well drugs will work. 
Doctors and pharmaceutical companies 
can use the information to better diag- 
nose and treat people based on their 

some genes may turn 
on too rarely to be 
discovered.- He estimates that by se- 
quencing the entire-genome, hell find 
10,000 to 20,000 new genes. Many will 
be -genes for- Jrital. signaling pathways 
in the body and brain — ideal candidates 
or targets for drugs. As a result, "these 
genes will have tremendous value on 
their own," he says. He expects the 
new company to pluck out a few hun- 
dred of the most promising to patent 
and use for drug development 

The risks, of course, are high. Hasel- 
tine and others think the new company 
may very well succeed at deciphering 
the entire human genome. Making mon- 
ey, however, will be harder. Venter 
knows that, but thinks hell prove the 
skeptics wrong within a year. By then, 
he and his supporters believe, the new 
tools will prove their worth, and vindi- 
cate Venter's hxmches once again. 

By John Carey in Washingtcn 

72 ausiNEss ween / mat 25. 1993 



policy: Biomedicine 

An Independent Perspective on 
the Human Genome Project 

Steven E. Koonin 

The U.S. Human Genome Project (HOP) 
IS a joint effort of the Department of En- 
ergy and the National Institutes of Health, 
formally initiated in 1990. Its stated goal is 
". . . to characterize all the human genetic 
material — the genome — by improving ex- 
isting human genetic maps, constructing 
physical maps of entire chromosomes, and 
ultimately determining the complete se- 
quence ... to discover all of the more than 
50,000 human genes and render them ac- 
cessible for further biological study." The 
original 5-year plan was updated and modi- 
fied in 1993 (i . 2). 

DOE'S Office of Biological 
and Environmental Sciences re- 
cently chartered the JASON 
group to review the DOE compo- 
nent of the HOP This group, 
mainly consisting of physical and 
information scientists, was asked 
to consider three areas: technol- 
ogy-, qualit>' assurance and quality 
control, and informatics. This ar- 
ticle summarizes the group's find- 
ings and recommendations (3). 

Technology. The present state 
of the art for determining the se- 
quence of DNA is defined by 
Sanger sequencing, in which 
DNA fragments are labeled by 
fluorescent dyes and separated 
according to length with poly- 
acrylamide gel electrophoresis 
(PAGE) (4). The base at the end of each 
fragment can then be vi>uali:ed and identi- 
fied by the dye with which it reacts. Al- 
though more than 95% of the genome re- 
mains to be sequenced, roughly 55 
megabases (Mb) have been completed in 
the past year (see the figure). The world's 
large-scale sequencing capacity (not all of 
which IS applied to the human genome) is 
estimated to be roughly 100 Mb per year. It 
is sobering to contemplate that an average 
production of 400 Mb will be required each 
year lo complete the human sequence by 
the target date of 2005. 

The author is prolessor o( Theoretical P^iysics and vice 
president and provosi at the California instituieol Tech- 
nology He led me jason siudy reported on m ff>is 

an.cie E-maii koonmCcaiiech edu 

The present technology has only a lim- 
ited read-length capability (the number ot 
contiguous bases that can be identified 
from each fragment); the best current prac- 
tice can read 700 to 800 bases, with per- 
haps 1000 bases as the ultimate limit. Be- 
cause the DNA segments of interest are 
much longer than this (40 kilobases (kb) for 
a cosmid clone; 100 kb or more for a bacte- 
nal artificial chromosome or a gene], the 
present technology requires that long lengths 
of DNA be cut into overlapping short seg- 
ments (-1 kb in length) that can be se- 
quenced directly. The sequences from these 

Percentage of the human genome sequenced to date. Almost 3% of 
the genome has been sequenced in contiguous stretches longer than 
10 kb and is now deposited in publicly accessible databases Compiled 
by J. Roach, as descnbed in httpy/weberu Washington edu/-roach/ 
human_genome_progress2 html , 

shorter pieces must then be assembled into 
the final sequence. Up to 50% of the ef- 
fort at some sequence centers goes into 
[his final assembly and finishing of the se- 
quence. The ability to read longer frag- 
ments would step up the pace and quality 
of sequencing. 

Apan from the various genome projects, 
however, there is little pressure to achieve 
longer read lengths. The 5(X) to 100 base 
lengths read by the current technology are 
well suited to many scientific needs, includ- 
ing pharmaceutical searches, studies of some 
polymorphisms, and studies oi some genetic 

Other drawbacks of the present technol- 
ogy include the time- and labor-intensive 
nature oi gel preparation and running, as 
well as the comparatively large amounts of 

sample required, which also increases the 
cost of reagents and necessitates extra am- 
plitic.Kion steps. 

Thus, the present sequencing technology 
leaves much to be desired and must be sup- 
planted in the long term if the potential for 
genomic science is to be fully realized. 
Promising methods that could be cheaper 
and faster than PACE include single-mol- 
ecule sequencing, mass spectrometric meth- 
ods, hybridization arrays, and microtluidic 
capabilities. None of these is sufficiently 
mature, however, to be a candidate for near- 
term major scale-up. It is therefore impor- 
tant to support research aimed at improving 
the present method. Advances in hardware 
development could, for example, increase 
the lateral scan resolution of the machine so 
that more lanes of a gel can be analyzed. 
The genome community should unify its ef- 
forts to enhance the performance of 
present-day ii\struments. 

Better software will improve the lane 
tracking, base identification, assembly, and 
finishing processes. Many of the problems of 
base identification also occur in the de- 
modulation of signals in com- 
munication and magnetic re- 
cording systems, and some of the 
existing literature in these areas 
should be used by the HGP. The 
ability to correctly assemble a fi- 
nal sequence without manual 
editing would markedly speed 
up the process. It would also be 
helpful to develop a common set 
of finishing rules. 

Because sequencing technol- 
ogy should (and is likely to) 
evolve rapidly, the large-scale 
sequencing centers must be flex- 
ible enough to incorporate new 
technologies. There is a great 
need to support the develop- 
ment of non-PAGE-based se- 
quencing that goes beyond the 
current goals of a faster version of PAGE. 
The funding for such advanced technology 
IS a small fraction of the total HGP but 
should be increased by approximately 50%. 
Qualivy assurance and quality control. 
DOE and NIH are recogniring that the 
HGP must make data accuracy and data 
quality integral to its execution. A high- 
quality database can provide useful, derisely 
spaced markers across the genome and en- 
able large-scale statistical studies. A quanti- 
tative understanding of data quality across 
the whole genome sequence is thus almost 
as important as the sequence itself. Among 
the top-level steps that should be taken are 
allocating resources specifically for quality is- 
sues and establishing a separate research pro- 
gram for quality assurance and control (per- 
haps a group at each sequencing center). 


SCIENCE • VOL 279 • 2 JANU.^RY 1998 • 


The stated accuracy goal of the HGP is 
one error in 10* bases, which is set to be less 
than the polymorphism rate. However, this 
has been a controversial issue, as genomic 
data of lower accuracy are still of great util- 
ity. For example, pharmaceutical companies 
searching for genes can use short sequences 
(400 bases) at an accuracy of one error per 
too bases. The debate on error rates should 
focus on the level of accuracy needed for 
each specific scientific objective or use of 
the genome data. The necessity of finishing 
sequences without gaps should be subject to 
the same considerations. 

in the real world, accuracy requirements 
must be balanced against what users need, 
the cost, and the capability of the sequenc- 
ing technology to deliver a given level of 
accuracy. Establishing this balance requires 
an open dialogue among the sequence pro- 
ducers, sequence users, and the funding 
agencies, informed by quantitative analyses 
and experience. 

Assays should be developed that can accu- 
rately and efficiently measure sequence qual- 
ity. For example, it would be appropriate to 
develop, distribute, and use "gold standard" 
DNA samples that could be used routinely by 
the whole sequencir\g community for assessing 
the quality of the sequence output. 

Research into the origin and propagation 
of errors through the entire sequencing pro- 
cess is tijlly warranted. We see two useful 
outputs from such studies: (i) more reliable 
descriptions of expected error rates in final 
sequence data, as a companion to database 
entries; and (ii) "error budgets" to be as- 
signed to different segments of mapping and 
sequencing processes to aid in developing 
the most cost-effective strategies for se- 
quencing and other needs. 

DOE and NIH should solicit and support 
detailed Monte Carlo computer simulation 
of the complete mapping and sequencing 
processes. The basic computing methods are 
straightforward: a reference segment of 
DN.A (with all of the peculiarities of human 
sequence) is generated and subjected to 
models of all steps in the sequencing pro- 
cess; individual bases are randomly altered 
according to errors introduced at the various 
stages; and the final reconstructed segment 
or simulated database entry is compared 
with the input segment and errors are noted. 

Results from simulations are only as 
good as the models used for introducing 
and propagating errors. For this reason, 
the computer models must be developed 
in close association with technical experts 
in all phases of the process being studied, 
so that they best reflect the real world. 
This exercise will stimulate new experi- 
ments to validate the error-process models 
.ind thus will lead to increased experimen- 
tal understanding of process errors as well. 

Improved software is needed to enhance 
the ability of database centers to check the 
quahty of submitted sequence data before its 
inclusion in the database. Many of the cur- 
rent algorithms are highly experimental and 
will be improved substantially over the next 
5 years. In addition, an ongoing software 
quality assurance program should be consid- 
ered for the large community databases, 
with advice from commercial and academic 
experts on software engineering and quality 
control. It is appropriate for the HGP to in- 
sist on a consistent level of documentation, 
both in the published literature and in user 
manuals, of the methods and structures used 
in the database centers that it supports. 
DOE and NIH should also decide on stan- 
dards for the inclusion of quality metrics for 
base identification and DNA assembly along 
with every database entry submitted. 

Informatics. Genome informatics is a 
child of the information age, a status that 
brings clear advantages and new hurdles. 
Managing such a diverse, large-scale, rapidly 
moving informatics effort is a corwiderable 
challenge for both DOE and NIH. The in- 
frastructure supporting the requisite soft- 
ware tools ranges from small research 
groups (for example, for local special-pur- 
pose databases) to large Genome Centers 
(for process management and robotic con- 
trol systems) to community database centers 
(for GenBank and the Genome Database). 
The resources that each of these groups can 
put into increasing software sophistication, 
into ensuring ease of use. and into quality 
control vary widely. Thus, in informatics ar- 
eas requiring new research (such as gene 
finding), a broad-based approach of "letting 
a thousand flowers bloom" is most appropri- 
ate. At the other end of the spectrum. DOE 
and NIH must impose community-wide 
standards for software consistency and qual- 
ity in areas of hiformatics in which a large 
user community will be accessing major ge- 
nome databases. 

DOE and NIH should adhere to a bot- 
tom-up. customer approach to informatics. 
Part of this process would be to encourage 
forums, including close collaborative pro- 
grams, between the users and providers of 
informatics tools, with the purposes of de- 
termining what tools are needed and of 
training researchers in the use of new 

To erasure that all the database centers are 
user-oriented and that they are providing ser- 
vices that are genuinely useful to the genome 
community, each database center should be 
required to establish its own "users group" (as 
IS done by facilities as diverse as the National 
Science Foundation's Supercomputer Cen- 
ters and N.-\SA's Hubble Space Telescope). 
Funher, informatics centers must be criti- 
cally evaluated as to the actual use of their 


information and services by the" 
Data formats, software components, and 
nomenclature should be standardized across 
the community. If multiple formats exist, it 
would be worthwhile to invest in systems 
that can translate among them. Data 
archiving, data retrieval, and data manipu- 
lation should be modularized so that one da- 
tabase is not overextended, and several 
groups should be involved in the develop- 
ment effort. The community should be sup- 
porting several database efforts and promot- 
ing standardized interfaces and tools among 
those efforts. 

FiTud notes. The HGP involves technol- 
ogv development, production sequencing, 
and sequence utilization. Greater coupling 
of these three areas can only improve the 
project. Technology development should be 
coordinated with the needs and ptoblems of 
production sequencing, whereas sequence 
generation and informatics tools must ad- 
dress the needs of data users. Promotion of 
such coupling is an important role for the 
hinding agencies. 

The HGP presents an unprecedented set 
of organizational challenges for the biology 
community. Success will require setting ob- 
jective and quantitative standards for se- 
quencing costs (capital, labor, and opera- 
tions) and sequencing output (error rate, 
continuity, and amount). It will also require 
coordinating the efforts of many laborato- 
ries of varying sizes supponed by multiple 
funding sources in the United States and 
a broad - 

A number of diverse scientific fields 
have successfully adapted to a "big science" 
mode of operation (nuclear and particle 
physics, space and planetary science, as- 
tronomy, and oceanography are among the 
prominent examples). Such transitions 
have not been easy on the scientists in- 
volved. However, in essentially all of these 
cases, the need to construct and allocate 
scarce facilities has been an important or- 
ganizing factor. No such centrahzing force 
is apparent in the genomics community, 
but the HGP is very much in need of the 
coordination it would produce. 

References and Notes 

1 F CollinsandD Galas. Science 262, 43(1993) 

2 The status and challenges of the HGP have been 
recently reviewed [L Howen et al , ib'd 278. 605 

3 The MITRE Corporation, JASON Report JSR-97- 
315 {The MITRE CorporatfOn, McLean, VA. 1997). 
The participants included S Block, J, Cornwall, 
W Dally. F Dyson, N Fonson, G Joyce, HJ 
Kimble, N Lewis. C Max, T Pnnce. R 
Schwiners, P Weinberger, and W H Vi/oodm 

4 For a basic discussion and explanation o( the 
termmogy used, see httpV/wwworni gov/ 
primer/intro html • SCIENCE • VOL. 279 • 2 JANU.\RY 1998 



DOE/EH-0713 (Parti) 


Part 1, Overview and Progress 

Date Published: November 1997 

Prepared for the 

U.S. Department of Energy 

Office of Energy Research 

Office of Biological and Environmental Research 

Cermantown, MD 20874-1290 

Prepared by the 

Human Genome Management Information System 

Oak Ridge National Laboratory 

Oak Ridge, TN 37830-6480 

managed by 

Lockheed Martin Energy Research Corporation 

for the 

U.S. Department of Energy 

Under Contract DE-AC0S-96OR22464 




LANL and LLNL begin 
production of DNA clone 
(cosmid) libraries 
representing single 


cosponsor Alta, Utah, 
conference highlighting 
the growing role of 
recombinant DNA 
technologies. OTA 
incorporates Alta 
proceedings into a 1986 
report acknowledging 
value of human genome 
reference sequence. 

DOE advisory committee, 
HERAC, recommends a 
IS-year, multidisciplinary, 
scientific, and technological 
undertaking to map and 
sequence the human 
genome. DOE designates 
multidisciplinary human 
genome centers. 

* NIH NICMS begins funding 
of genome projects. 

* Robert Sinsheimer holds 
meeting on human 
genome sequencing at 
University of California, 
Santa Cniz. 

At OHER, Charles DeLisi 
and David A. Smith 
commission the first Santa 
Fe conference to assess the 
feasibility of a Human 
Genome Initiative. 


Following the Santa Fe 
conference, DOE OHER 
announces Human 
Genome Initiative. With 
$5.3 million, pilot projects 
begin at DOE national 
laboratories to develop 
critical resources and 

* Reports by OTA and NAS 
NRC recommend concerted 
genome research program. 

HUGO founded by scientists 
to coordinate efforts 

* First annual Cold Spring 
Harbor Laboratory meeting 
held on human genome 
mapping and sequencing. 

DOE and NIH sign MOU 
outlining plans for 
cooperation on genome 

Telomere (chromosome 
end) sequence having 
implicafons for aging and 
cancer research Is identified 
at LANL 


DNA STSs recommended 
to con-elate diverse types of 
DNA clones. 

DOE and NIH establish 
joint ELSI Working Group. 


DOE and NIH present joint 
5-year U.S. HGP plan to 
Congress. The 1 5-year 
project formally begins. 

Projects begun to mark 
genes on chromosome 
maps as sites of mRNA 

R&D begun for efficient 
production of more stable, 
large-insert BACs. 


Human chromosome 
mapping data repository, 
CDB, established. 


* Low-resolution genetic 
linkage map of entire 
human genome published. 

Guidelines for data release 
and resource sharing 
announced by DOE 
and NIH. 


International IMAGE 
Consortium established to 
coordinate efficient 
mapping and sequencing of 
gene-representing cDNAs. 

DOE-NIH joint ELSI Working 
Group's Task Force on 
Genetic Information and 
Insurance releases 

DOE and NIH revise 5-year 
goals [Science 262, 43-46 
(Oct. 1,1993)]. 

* French G^n^thon provides 
mega-YACs to the genome 

lOM releases U.S. HCP- 
funded report, "Assessing 
Genetic Risks." 

GRAIL sequence 
interpretation service with 
Internet access initiated at 

ADA Arrwricans with Disabiliti<?s Act 

ANL Arcjonne Nalinnal L.aboratory 

BAC bacterial artificial chromosome 

cDNA comptementar)' deoKyribonucte-ic acid 

CCAP Cancer Genome Anatomy Project 

DNA deoxyribonucleic acid 

DHHS Department of Health and Human Services CNIH) 

DOE Department of Energy 

EEOC Equal Employment Opportunity Commission 

ELS) ethical, legal, and social issues 

CDB Ceitome Database 

CRAIL Gene Recognition and Analysis Internet Link 

HERAC Health and EnvironmenLal Research Advisory Committee 

HGP Human Genome Project, Human Genome Program 

HUGO Human Genome Organisation 

ICPEMC international Commission tor Protection Against 

Environmental Mutagens and Carcinogens 

IMAGE Integrated Molecular Analysts of Gene Expression 

lOM Institute of Medicine (NAS) 



* Genetic-mapping 5-year 
goal achieved 1 year ahead 
of schedule. 

Completion of second- 
generation DNA done 
libraries representing each 
human chromosome by 

- Genetic Privacy Act, first U.S. 
HCP legislative product, 
proposed to regulate 
collection, analysis, storage, 
and use of DNA samples 
and genetic information 
obtained from them; 
endorsed by DOE-NIH |oint 
ELSI Working Group. 

DOE Microbial Genome 
Program launched; spin-off 
of HCP. 

LLNL chromosome paints 

SBH technologies from ANL 

DOE HGP Information Web 
site activated for public and 


LANL and LLNL announce 
high-resolution physical 
maps of chromosome 1 6 
and chromosome 1 9, 

* Moderate-resolution maps 
of chromosomes 3, 11, 12, 
and 22 maps published. 

* First (nonviral) whole 
genome sequenced (for the 
bacterium Haemophilus 

Sequence of smallest 
bacterium. Mycoplasma 
genitalium, completed, 
displaying the minimum 
number of genes needed 
for independent existence. 

* EEOC guidelines extend 
ADA employment 
protection to cover 
discrimination based on 
genetic information related 
to illness, disease, or other 

LANL Los Alarr-os National Laboratory 

LBNL Liwrence Berkeley National L<iboratory 

LLNL Lawrence Livermore Naiional Laboratory 

MCP Microbial Genome Project 

MOU Memorandum of Understanding 

mRNA messenger ribonucleic acid 

NAS Naiional Academy of Sciences 

NCHCR National Center for Human Genome Research (NIH) 

NCI National Cancer Institute (NIH) 

NHCRI National Human Genome Research Institute (NIH) 

NICMS National Institute of General Medical .Sconces (NIH) 

NIH National Institutes of Health 

NRC National Research Council 

ONER Office of Health and Environmental Research 

ORNL Oak Ridge National Laboratory' 

OTA Office of Technology Assessirient 

R&D Research and Development 

SBH sequencing by hybridization 

STS sequence tagged site 

YAC yeast artificial c hromosome 

Methanococcus jannaschii 
genome sequenced; 
confirms existence of third 
major branch of life, the 

DOE-NIH Task Force on 
Genetic Testing releases 
interim principles. 

* Integrated STS-based 
detailed human physical 
map with 30,000 STSs 
achieves an HGP goal. 

* Health Care Portability and 
Accountability Art 
prohibits use of genetic 
information in certain 
health-insurance eligibility 
decisions, requires DHHS 
to enforce health- 
information privacy 

Working Group releases 
guidelines on informed 
consent for large-scale 
sequencing projects. 

DOE and NCHGR issue 
guidelines on use of 
human subjects for large- 
scale sequencing projects. 

* SaccharomYces cerevisiae 
(yeast) genome sequence 
completed by 
international consortium. 

Sequence of the human 
T-cell receptor region 

Wellcome Trust sponsors 
large-scale sequencing 
strategy meeting in 
Bermuda for international 
coordination of human 
genome sequencing. 


DOE forms joint Genome 
Institute for implementing 
sequencing at DOE HGP 

* NIH NCHGR becomes 

* Escherichia coli genome 
sequence completed. 

Second large-scale 
sequencing strategy 
meeting held in Bermuda. 

* High-resolution physical 
maps of chromosomes X 
and 7 completed. 

genome sequence 

Archaeoglobus fulgidus 
genome sequence 

* NCI CCAP begins. 

* DOE had limited or no 
involvement in this event. 


j Preface 

More than a decade ago, the Office of Health and Environmental Research (OHER) of the U.S. Depart- 
ment of Energy (DOE) stnick a bold course in launching its Human Genome Initiative, convmced that 
its mission would be well served by a comprehensive picture of the human genome. Organizers recog- 
nized that the information the project would generate — both technological and genetic — would con- 
tribute not only to a new understanding of human biology and the effects of energy technologies but 
also 10 a host of practical applications in the biotechnology industry and in the arenas of agriculture and environmental 

Today, the project's value appears beyond doubt as worldwide participation contributes toward the goals of determining 
the human genome's complete sequence by 2005 and elucidating the genome structure of several model organisms as 
well. This report summarizes the content and progress of the DOE Human Genome Program (HOP). Descriptive 
research summaries, along with information on program history, goals, management, and current research highlights. 
provide a comprehensive view of the DOE program. 

Last year marked an early transition to the third and final phase of the U.S. Human Genome Project as pilot programs to 
refine large-scale sequencing strategies and resources were funded by DOE and the National Institutes of Health, the two 
sponsoring U.S. agencies. The human genome centers at Lawrence Berkeley National Laboratory, Lawrence Livermore 
National Laboratory, and Los Alamos National Laboratory had been serving as the core of DOE multidisciplinary HGP 
research, which requires extensive contributions from biologisLs, engineers, chemists, computer scientists, and mathema- 
ticians. These team efforts were complemented by those at other DOE-supported laboratories and about 60 universities, 
research organizations, companies, and foreign institutions. Now, to focus DOE's considerable resources on meeting the 
challenges of large-scale sequencing, the sequencing efforts of the three genome centers have been integrated into the 
Joint Genome Institute. The institute will continue to bring together research from other DOE-supported laboratories. 
Work In other critical areas continues to develop the resources and technologies needed for production sequencing; com- 
putational approaches to data management and interpretation (called informatics); and an exploration of the important 
ethical, legal, and social issues arising from use of the generated data, particularly regarding the privacy and confidenti- 
ality of genetic information. 

Insights, technologies, and infirastructure emerging from the Human Genome Project are catalyzing a biological revolu- 
tion. Health -related biotechnology is already a success story — and is still far from reaching its potential. Other applica- 
tions are likely to beget similar successes in coming decades; among these are several of great importance to DOE. 
We can look to improvements in waste control and an exciting era of environmental bioremedlailon. we will see new 
approaches to improving energy efficiency, and we can hope for dramatic strides toward meeting the fuel demands of 
the future. 

In 1997 OHER, renamed the Office of Biological and Environmental Research (OBER). is celebrating 50 years of con- 
ducting research to exploit the boundless promise of energy technologies while exploring their consequences to the 
public's health and the environment. The DOE Human Genome Program and a related spin-off project, the Microbial 
Genome Program, are major components of the Biological and Environmental Research Program of OBER. 

DOE OBER is proud of its contributions to the Human Genome Project and welcomes general or scientific inquiries 
concerning its genome programs. Announcements soliciting research applications appear in Federal Register. Science, 
Human Genome News, and other publications. The deadline for formal applications is generally midsummer for awards 
to be made the next year, and submission of preproposals in areas of potential interest Is strongly encouraged. Further 
information may be obtained by contacting the program office or visiting the DOE home page (301/903-6488, 
Fax: -^52U genome<^, URL: 

Aristides Patnnos. Associate Director 

Office of Biological and Environmental Research 

U.S. Department of Energy 

November 3. 1997 



Introduction l 

Project Origins 1 

Anticipated Benefits of Genome Research 2 

Coordinated Efforts 2 

DOE Genome Program 3 

Five- Year Research Goals 5 

Evolution of a Vision 6 

Highlights of Research Progress 9 

Clone Resources for Mapping, Sequencing, and Gene Hunting 9 

Of Mice and Humans: The Value of Comparative Analyses 13 

DNA Sequencing 14 

Informatics: Data Collection and Analysis 16 

Ethical, Legal, and Social Issues (ELSI) 18 

Technology Transfer 21 

Collaborations „ 21 

Patenting and Licensing Highlights, FY 1994-96 22 


Technology Transfer Award 24 

1997 R&D 100 Awards 24 

Research Narratives 25 

Joint Genome Institute 26 

Lawrence Livermore National Laboratory Human Genome Center 27 

Los Alamos National Laboratory Center for Human Genome Studies 35 

Lawrence Berkeley National Laboratory Human Genome Center 41 

University of Washington Genome Center 47 

Genome Database 49 

National Center for Genome Resources ~ 55 

Program Management 59 

DOE OBER Mission 59 

Human Genome Program 62 



Coordination with Other Genome Programs 67 

U.S. Human Genome Project: DOE and NIH 67 

Other U.S. Programs 68 

International Collaborations 68 

Appendices 71 

A: Early History, Enabling Legislation (i984-90) 73 

B: DOE-NIH Sharing Guidelines (i992) 75 

C: Human Subjects Guidelines (i996) 77 

D: Genetics on the World Wide Web (i997) 83 

E: 1996 Human Genome Research Projects (i996) 89 

F: DOE BER Program (i997) 95 

Glossary loi 

A crony m List inside back cover 




Now completing its first de- 
cade, the Human Genome 
Program of the U.S. De- 
partment of Energy (DOE) 
is the longest-running 
federally funded program to analyze the 
genetic material — the genome — that de- 
termines an individual's characteristics 
at [he most fundamental level. Part of 
the Biological and 
Environmental Re- 
search (BER) 
Program spon- 
sored by the 
DOE Office of 
Biological and 
(OBER*), the 
genome program 
is a major com- 
ponent of the 
larger U.S. Hu- 
man Genome 

Since October 1990, the 
project has been supported jointly by 
DOE and the National Institutes of 
Health (NIH) National Human Genome 
Research Institute (formerly National 
Center for Human Genome Research). 
Together, the DOE and NIH components 
make up the world's largest centrally co- 
ordinated biology research project ever 

The U.S. Human Genome Project is a 
15-year endeavor to characterize the hu- 
man genome by improving existing hu- 
man genetic maps, constructing physical 
maps of entire chromosomes, and ulti- 
mately determining a complete sequence 
of the deoxyribonucleic acid (DNA) 
subunits. Parallel studies are being car- 
ried out on selected model organisms to 
facilitate interpretation of human gene 

The ultimate goal of the U.S. project is 
to identify the estimated 70.000 to 
100,000 human genes and render them 
accessible for future biological study. A 
complete human DNA sequence will 
provide physicians and researchers in 
many biological disciplines with an ex- 
traordinary resource: an "encyclopedia" 
of human biology obtainable by com- 
puter and available 
to all. 

genome (je'nom), n. 
all the genetic. material 
in the chromosomes of 
an or^ 

Sriftttific and technical terms are 
defined in the Glossan\ p. lOi. Mare 
historical details and other information 
appear in the Appendices bcf^inning on 

p. 71. 

For SO years, programs in the OOE Office of 
Biological and Ennronmental Research have crossed 
traditional research boundaries in seeking new 
solutions to ettergy -related biological and 
en vironmental challenges (see Appendix F. p. 9fi, and 

•In 1997 ihc Office of Hcaldi and Environ- 
mental Research (OHER) was reoamcd 
Office of Biological and EnvironmeQial 
Research (OBER). 

Obtaining the 
complete se- 
quence by 2(X)5 
will require a 
highly coordinated 
and focused inter- 
national effort generat- 
ing advances in biological methodology; 
instnmientation (particularly automa- 
tion); and computer-based methods for 
collecting, storing, managing, and ana- 
lyzing the rapidly growlDg body of data. 

Project Origins 

The potential value of detailed genetic 
information was recognized early; until 
recently, however, obtaining this infor- 
mation was far beyond \ht capabilities of 
biomedical research. DOE OBER and its 
two predecessor agencies — the Atomic 
Energy Commission and the Energy Re- 
search and Development Administra- 
tion — had long sponsored genetic 
research in both microbial and higher 
systems. These studies included explora- 
tions into population genetics; genome 
structure, maintenance, replication, dam- 
age, and repair, and the consequences of 
genetic mutations. These traditional DOE 
activities evolved naturally into the Hu- 
man Genome Program. 

DOE Human Gvnonw Program Report 


OB£R*s mission is described 
mure fiiJly in the Prugmm 
Management section (p. 59) 
of this report 

By 1985. progress in genetic and DNA 
technologies led to serious discussions 
in the scientific community about initi- 
ating a major project to analyze the 
structure of the human genome. After 
concluding that a DNA sequence would 
offer the most useful approach for de- 
tecting inherited mutations, DOE in 
1986 announced its Human Genome 
Initiative. The initiative emphasized de- 
velopment of resources and technolo- 
gies for genome mapping, sequencing, 
computation, and inft^structure supp>ort 
that would culminate in a complete se- 
quence of the human genome. 

The National Research Council issued a 
report in 1988 recommending a dedi- 
cated research budget of $200 milUon 
annually for 15 years to determine the 
sequence of the 3 billion chemical sub- 
units (base pairs) in the human genome 
and to map and identify all human genes. 

To launch the nation's Human Genome 
Project, Congress appropriated funds to 

IX)E and also to NIH, which had long 
supported research in genetics and mo- 
lecular biology as an integral part of its 
mission to improve the health of all 
Americans. Other federal agencies and 
foundations outside the Human Genome 
Project also contribute to genome re- 
search, and many other countries are 
making important contributions through 
their own genome research projects. 

Coordinated Efforts 

In 1988 DOE and NIH signed a Memo- 
randum of Understanding in which the 
agencies agreed to work together, coordi- 
nate technical research and activities, and 
share results. The two agencies assumed 
a joint systematic approach toward estab- 
lishing goals to satisfy both short- and 
long-term project needs. 

Early guidelines projected three 5-year 
phases, for which the fu^l plan was pre- 
sented to Congress in 1990. The 1990 

Anticipated Benefits of Genome Research 

Predictions of biology as Tbe science 
of ihe 21st century" have been made 
by observers as diverse as Microsoft's 
Bin Gates and U.S. President Bill 
Clinton. Already rcvolutioijizing biol- 
ogy, genome rese^ch has spawned a 
burgeoning biotechnology industry 
and is providing a vital timist to tbe 
increasing prodoctivity and perva- 
siveness of the life sciences. 

Technology and resources promoted 
by the Htunan Genome Project al- 
ready have bad profound impacts on 
biomedical research and promise to 
revolutionize biological research and 
clinical medicine, increasingly de- 
tailed genome maps have aided re- 
searchers seeMng genes associated 
with dozens of genetic conditions, in- 
cluding myotottic dystrophy, fragile X 

syndronte, neuiofthroraatosis types I 
and 2, a land of inherited colon cancer. 
.Alzheimer's disease, and familial t^east 


Current and potential applications of 
genome research will address national 
needs in molecular medicine, waste 
control and environmental cleanup, 
biotechnology, energy sources, and risk 

Molecular Medicine 

On the horizon is a new era of molecu- 
lar medicine characterized less by treat- 
ing symptoms and more by looldng to 
the most fiudameatai causes of disease. 
Rapid and more specific diagnostic tests 
will make possible earlier treatment of 
countless maladies. Medical researchers 

also win be able to devise novel therapeu- 
tic regimens based on new classes of 
drugs, immtmotherapy techniques, avoid- 
ance of environmental conditions that 
may trigger disease, and possible aug- 
mentation or even replacement of 
defective genes through gene therapy. 

Microbial Genomes 

in 1994. taidng advantage of new capa - 
bilities developed by the genome project, 
DOE formulated the Microbial Genome 
Initiative to sequence the genomes of 
bacteria useful in the areas of energy pro- 
duction, environmental remediation, toxic 
waste reduction, and industrial process- 
ing. In the resulting Microbial Genome 
Project, six microbes that live under ex- 
treme conditions of temperaltire and pres- 
sure have been sequenced completely as 

DOC Human Genome Program Report, introduction 


plan emphasized the creatioo of chromo- 
some maps, software, and automated 
technologies to enable sequencing. 

By 1993. unexpectedly rapid progress in 
chromosome mapping required updating 
the goals [Science 262, 43-46 (October 
1. 1993)], which now project through 
1998 (see p. 5). This plan is being re- 
vised again in anticipation of the ap- 
proaching high-throughput sequencing 
phase of the project. Last year marked an 
early transition to this phase as many 
more genome sequencing projects were 
funded. The second and third phases of 
the project will optimize resources, re- 
fine sequencing strategies, and, finally. 
completely determine the sequence of all 
base pairs in the genome. 

Another area of DOE and NIH coopera- 
tion is in exploring the ethical, legal, and 
social issues (ELSI) arising fixjm in- 
creased availability of genetic data and 
growing gene be -testing capabilities. The 

two agencies established a joint work- 
ing group to confront these ELSI chal- 
lenges and have cosponsored joint 
projects and woricshops. 

DOE Genome Program 

A general overview follows of recent 
progress made in the DOE Human Ge- 
nome Program. Refer to the timeline 
(pp. ii-iii) for other achievements to- 
ward U.S. goals, including contribu- 
tions made outside DOE. 

Physical maps 

For DOE. an early goal was to develop 
chromosome physical maps, which in- 
volves reconstructing the order of cloned 
DNA fragments to represent their spe- 
cific originating chromosomes. (A set of 
such cloned fragments is called a library.) 
Critical to this effort were the libraries 
of individual human chromosomes 


of Ai^ust 1997 Structural studies are 
under way to team what is uruque 
aboBt the pnoteins of these organisms — 
the ultiiBate aim being to use &e mi- 
ctobes and their enzymes for such 
practical ptirposes as waste control 
and environtnenta! cleanup. 


The potential for commercial develop- 
ment presents U.S. industry with a 
wealth of i^iportunities. Sales of bio- 
technology products are projected to 
exceed $20 billion by the year 2000. 
Tlje getwtne ^ffoject already has 
stimulated significant investitient by 
large corporations and prompted the 
creation of new biotechnol<^y compa- 
nies ix^ing to capitaBze on the far- 
teaching implications of its research. 

Energy Sources 

Biotechnology, fueled by insights reaped 
from the genome project, wiD play a sig- 
nificant tole in improving the use of fos- 
sil-based resources. Increased energy 
demands, projected over the next 
50 years, require strategies to circumvent 
the maay probienis associated with 
ttwiay's dommant energy technologies. 
Biotechnology promises to help address 
these needs by providing cleaner means 
for the bioconversion of raw materials to 
refined products. In addition, there is the 
possibility of developing entirely new 
biotnass-based energy sources. Having 
the genomic seqtieoce of the methane- 
producing microoiganism Methano- 
coi-cusjannaxchii, for example, will eti- 
able researchers to explore the process of 
methanogenesis in more detail and could 

lead to cheaper prodncdon of fuel - 
grade methane. 

Risk Assessment 

Understanding the human genome 
will have an enormous impaci on the 
ability to assess risks posed to indi- 
viduals by etrvironmertfal exposure to 
toxic agents. Scientists kiKiw that ge- 
netic diSereoces make some people 
more sosceplihle — and odiers more 
resistant— to such agents. Far more 
work must be done to determine the 
genetic basis of such variability. This 
knowlet^e will directly address 
DOE'S long-term mission to under- 
stand the effects of low-level 
exposures to radiation and other 
energy-related agents, especially in 
terms of cancer risk. 

DOE Human Genome Program Report, Introduction 


produced at Los Alamos National Labo- 
ratory (LANL) and Lawrence Livermore 
National Laboratory (LLNL). These librar- 
ies allowed the huge task of mapping and 
sequencing the entire 3 billion bases in 
the human genome to be broken down into 
24 much smaller single-chromosome 
units. Availability of the libraries has en- 
abled the participation of many laborato- 
ries worldwide. Some three generations 
of clone libraries with improving charac- 
teristics have been produced and widely 
distributed. In the DOE-supported proj- 
ects, DNA clones representing chromo- 
somes 16, 19. and 22 have been ordered 
(mapped) and are now providing mate- 
rial needed for large-scale sequencing. 


Toward the goal of greatly increasing the 
speed and decreasing the cost of DNA 
sequencing. IX)E has supported im- 
provements in standard technologies and 
has pioneered support for revolutionary 
sequencing systems. Marked improve- 
ments have been made in reagents, en- 
zymes, and raw data quality. Such novel 
approaches as sequencing by hybridiza- 
tion (using DNA "chips") and mass spec- 
trometry have already found important, 
previously unanticipated applications 
outside the Human Genome Project. 

Joint Genome Institute 

In early 1997. the human genome centers 
at Lawrence Berkeley National Labora- 
tory. LANL. and LLNL began collabo- 
rating in the Joint Genome Institute 
(JGI). within which high-throughput 
sequencing will be implemented Isee 
p. 26 and Human Genome News 8(2), 
1-2]. The initial JGI focus will be on se- 
quencing areas of high biological interest 
on several chromosomes, including hu- 
man chromosomes 5. 16. and 19. Estab- 
lishment of JGI represents a major 
transition in the DOE Human Genome 

Previously, roost goals were pursued by 
small- to medium-sized teams, with 

modest multisite collaborations. The JGI 

will house high -throughput implementa- 
tions of successful technologies that 
will be run with increasingly stringent 
process- and quality -control systems. 

In addition, a small component aimed at 
understanding how genes function in the 
body — a field known as functional ge- 
nomics — has been established and will 
grow as sequencing targets are met 
High-throughput functional genomics 
represents a new era in human biology, 
one which will have profound implica- 
tions for solving biological problems. 


In preparation for the production- 
sequencing phase, many algorithms for 
interpreting DNA sequence have been 
developed, and an increasing number 
have become available as services over 
the Internet. Last year, the GRAIL (for 
Gene Recognition and Analysis Internet 
Link) and GenQuesi servers, developed 
and maintained at Oak Ridge National 
Laboratory, processed an average of 
almost 40 million bases of sequence 
each month. 

As technology improves and data accu- 
mulates exponentially, continued progress 
in the Human Genome Project will de- 
pend increasingly on the development of 
sophisticated computational tools and 
resources to manage and interpret the in- 
formation. The ease with which re- 
searchers can access and use the data 
will provide a measure of the project's 
success. Critical to this success is the 
creation of interoperable databases and 
other computing and informatics tools to 
collect, organize, and interpret thousands 
of DNA clones. 

For additional information on the DOE 
genome programs, refer to Research 
Highlights, p. 9; Research Narratives, 
p. 25; this report's Part 2. 1996 Re- 
search Abstracts; and the Web site 

DOE Human Genome Program Report, Introduction 


FIve-Year Research Goals 
of the U.S. Human Genome Project 

October 1, 1993, to September 30, 1998 (FY 1994 through FY 1998)* 

Mufor events in ike US. Human Genome Fnyecty including progress made toward these 
goals, are charted in a timeline on pp. ii-iiL 

Genetic Mapping 

• CompletB <he 2- to 5-cM map by 1995. 

• Develop («ehr>o(ogy lor rapkj 

• Develop marttars tttat are ^sier to 

• Develop new mapping technofo^es. 

Physical Mapping 

• CompteiB a sequence tagged sSe 
(STS) map of the human genome at 
a resolution of 100 kb. 

DMA Sequencing 

• Devalop eSldent approaches to se- 
quendfig one- to several-magabase 
redone ot ONA of higtt biofogicai 

- Develop teciinotogy for tvgh- 
througtiput sequence, focusing on 
systems Integ^atton of ail aaps from 
template preparation to <feta analysis. 

• Bgia up a sequencing capacity to 
a8ow sequencing at a coSective rate 
of 50 Mb per year by the end of the 
period. T)«8 rate shoulq result In an 
aggregate of 80 Mb of ONA sequence 
complied tiy fte end of fY 1998. 

Gene Idetrtlflcation 

• Develop efftdent meSiodsforidentify- 
ing genes arid for placement of known 
genes on physical mtoa of sequenced 

Technology Development 

Substantially expand suppott of in- 
novative technological develop- 
ments as well as tmprovemerHs in 
eurteni teotwology fo» DIMA se- 
quencing and for meeting 8ie needs 
of the Human Genome Project as a 

Model Organisms 

- Finish an STS map of the mouse 
gencme at a SOO-kb resolution. 

 Finish Bie sequence of the Sschert- 
cfaa ciWiand Saccharomyces caravi- 
slaagenomea by 1998 or earlier. 

• Con&iue sequer>ctng Caenortmb- 
dttia elegansarKi Dmsophiia 
mela/iogaslergenomea with the aim 
of brtnging C olegans to near 
completion by 1998. 

• Sequence selected segments of 
mouse DNA side ijy side with corre- 
spomfing human DIMA tn areas of 
high biotogical interest. 


• ContlngB to create, develop, and 
operate databases and database 
toob tor easy access to data, includ- 
ing effective toots aid 3t»ufafds for 
data exchange and links among 

Consolidate, (Sstribute. and continue 
to develop effec^e software for 
large-ecale genome projects. 

Continue to develop tools for com- 
paring and inteipisting genome 

Ethical, Legal, and Social 
hn plications 

• Ccmtinue to iderttify and defme 
Issues and develop policy options 
to address them. 

• Oevekip and disseminate poscy 
opbons nagardins genetic testing 
sevicss wi* potenSa) widespread 

• l^ter greater acceptance of 
human genetic variation. 

• Enhance and expand public and 
professional education tfiat is 
sensi^e to sociocultur* and 
psychokjgtcal issues. 


• Continue to encourage training 
of sdenttsts m interdiscipltnary 
sciences related to genome 

Technology Transfer 

• Encourage and enhance technol- 
ogy bansfBf both into and out of 
centers of genome research. 


• Cooperate with those who would 
establish distritiutxin centers for 
genome materials. 

Share aB informatton and matert- 
aJs within 6 months ol their 
development This should be 
aeconr^jsshed by submission of 
informaSon to public datatrases 
or repositortes. or both, where 

'Oii^nal 1990 goals were revised in 1993 due id rapid progress. A second cevision was being developed at press tuna. 

DOE Human Genome ProQfam Report, Introduction 


Eyolutiqn of a Vision: 

Genome Project Origins, 

In an interview al a DNA sequencing conjerence in Hilton Head. 
South Carolina. * David Smith, a founder and former Director of the 
DOE Human Genome Program, recalled the establisfim^nt of this 
country's first human genome project. Tlie impressive early a<hieve- 
meiits and spin-off benefits, he noted, offer more tlian mere vindica- 
tion for project founders. They also provide a tantalizing glimpse 
into the future where, he observed "scientists will be empowered to 
study biology and make connections in ways urulreamt of before. " 

The DOE Human Genome Pro- 
gram began as a natural out- 
growth of the agency's 
long-term mission to develop 
better technologies for measur- 
ing health effects, panicuiarly induced mu- 
tations. As Smith explained ii. "DOE had 
been supponing mutation studies in Japan. 
where no heritable mutations could be de- 
tected in the o^spring of populations ex- 
posed to the atomic blasts at Hiroshima and 
Nagasaki. The program really grew out of a 
need to characterize DNA differences be- 
tween parents and children more efficiently 
DOE led the development of many muta- 
tion tests, and wc were interested in devel- 
oping even more sensitive detection 
methods Mortimer Mendelsohn of 
Lawrence Livermore National Laboratory, 
a member of the Intemauonal Commission 
for Protection Against Environmental 
Mutagens arKl Carcinogens, and I decided 
to hold a workshop to discuss DNA-based 
methods (sec Human Genome Project 
chronology, p. ii). 

"Ray White (University of Utah) organized 
the meeting, which took place in Alta. 
Utah, in December 1984 It was a small 
meeting but very stimulating intellectually. 
We concluded the obvious — that if you re- 
ally wanted to use DNA-based technolo- 
gies, you had to come up with more 
efficicm ways to characterize the DNA of 
much larger regions of the genome. And the 
ultimate sensitivity would be the capability 
to compare the complete DNA sequences 
of parents and their oEFspring." 

Project Begins 

Smith recalled reaction to the first public 
sutement that DOE was starting a program 
with the aim of sequencing the human ge- 
nome, "1 announced it at the Cold Spring 

view. "In fact, individual investigators can 
do things they would never be able to do 
otherwise We're beginning to see that 
demonstrated at this meeting For the first 
lime, we're finding people exploring sys- 
tematic ways of looking at gene function in 
organisms. The genome project opens up 
enormoas new research fields to be mined 
Cottage- industry biologists won't need a lot 
of robots, but they will have lo be computer 
literate to put the information all together," 

The genome project also is providing en- 
abling technologies essential to the future 
of the emerging biotechnology indastry, 
catalyzing its tremendous growth. Accord- 
ing to Srtiith, the technologies arc 

4 4 Genomics has come of age. and it is 
opening the door to entirely new 
approaches to biology, jy 

1 '"Hie Seventh Iniematiooal Geoome Sequenc- 
ing and Analysis Conference. September 199S 

OOE Human Genome Program Report Intrtxluction 

Hart>or meeting in May 1986, and there was 
a big hullabaloo." After a year-long review, 
a National Academy of Sciences National 
Research Council panel endorsed the 
project and the basic strategy proposed 
Smith pointed out that NIH and others were 
also having discussions on the feasibility of 
sequencing the human genome "Once NIH 
got interested, many more people became 
involved DOE and NIH signed a Memo- 
randum of Undcntanding in October 1988 
to coordinate our activities aimed at charac- 
terizing the human genome." But, he ob- 
served, it wasn't all smooth sailing The 
nascent project had many detractors. 

Responding to Critics 

Many scientists, prominent biologists 
among them, thougik^ having the sequence 
would be a misuse of scarce resources. 
Smith, laughing now, recalls one scientist 
complaimng. "Even if I had the sequence. 
I wouldn't know what to do with it" Other 
critics worried that the genome project 
would siphon shrinking research hinds 
away from individual investigator- initiated 
research projects. Smith takes the opposite 

capable of more than elucidating the human 
genome. "We're developing an infrastruc- 
ture for future research- These technologies 
will allow us to efficiendy characterize any 
of the organisms out there that pertain to 
varioas DOE missions, with such applica- 
tions as better fuels from biomass, 
bioremediation, and waste control. They 
also will lead to a greater understanding of 
global cycles, such as the carbon cycle, and 
the identification of potential biological in- 
terventions. Look at the ocean; an amazing 
number of microbes are in there, but we 
don't know how to use them to influence 
cycles to control some of the harmful 
things that might be happening. Up to now, 
biotechnology has been neariy all health 
oriented, but applications of genome re- 
search to modem biology really go beyond 
health- Thai's one of the things motivating 
our program lo try to develop some of these 
other biotechno logical applicatioits." 

Responding to criticism about not research- 
ing gene function early in the project 
Smith reasserted that the purpose of the 
Human Genome Project is lo build tech- 
nologies and resources that wUt enable re- 
searchers to learn about biology in a much 


Present and Future Challenges, 

- - - .-■--" . >' ^ - 

Far-Reaching Benefits 

more efficient way. ""nic genome budget is 
devoted to very specific goals, and wc 
make sure that projecLs contribule toward 
reaching them." 

International Scope 

Smith credited the international community 
with contributing to many project suc- 
cesses. "The initial ptanmng was for a U.S. 
project, but the outcome, of course, is thai 
it is truly intemationaj, and we would not 
be nearly as far as wc are today without 
those contributions Also, there's been a fair 
amount of money from private companies, 
and support from the Muscular Dystrophy 
Association in France and The Wellcome 
Trust in the United Kingdom has been ex- 
tremely important," 

Technology Advances 

While noting enormous advances across the 
board. Smith cited automation progress and 
observed that tremendously powerful ro- 
bots and automated processes are changing 
the way molecular biology is done. "A lot 
of novel technologies probably won't be 
useful for initial sequencing but will be 
very valuable for comparing sequences of 
different people and for polymorphism 
studies. One of the most gratifying recent 
siiccesses is the DNA polymerase engineer- 
ing project Researchers tnade a fairly 
simple change, but it resulted in a 
therraosequcnase thai may answer a lot of 
problems, reduce the cost of sequencing, 
and give us bcacr data." 

Progress in genome research requires the 
use of maturing technologies in other 
fields "The combination of technologies 
that are coming together has been fortu- 
itous; for example, advances in informatics 
and data-handling technologies have had a 
tremendous impact on the genome project 
We would be in deep trouble if they were at 
a less-m^ure stage of developmenL They 
have been an important E>OE focus." 


Smith described tangible progress toward 
goals associated with programs on the ethi- 
cal, legal, and social issues (ELSO related 
to data produced by the genome projcct. 
"ELSI programs have done a lot to educate 
the thinkers, and this has produced a higher 
level of discourse in the country about 
these is.sucs. DOE is spending a large frac- 
tion of its ELSI money on informing spe- 
cial populations who can reach others 
Educating judges has been especially well 
received because they realize the potential 
impact of DNA technology on the courts." 

According to Smith, more people and 
groups need to be involved in ELSI mat- 
ters. "We have some ELSI products: the 
DOE-NIH Joint ELSI Working Grtxip has 
an insurance task force report, and a DOE 
ELSI grantee has produced draft privacy 
legislation. Now it's tiioe for others to 
come and translate ELSI efforts into policy. 
Perhaps the new National Bioethics Advi- 
sory Commission can do some of this." 

New Model for Biological 

Smith spoke of a changing paradigm guid- 
ing DOE-suppoited biology. "Some years 
ago, the central idea or dogma in molecular 
biology research was that information in 
DNA directs RNA, and RNA directs pro- 
teins Today, I think there is a new para- 
digm to guide us: Sequence implies 
strucQire, and strucoire implies function. 
The word 'implies' in our new paradigm 
means there are rtJies," continued Smith, 
"but these arc rules we don't underitand 
today. With the aid of structural informa- 
tion, algorithms, and computers, we will be 
able to relate sequence to structure and 
eventually relate stiticture to function. Our 
effon fociLses on developing the technolo- 
gies and tools thai will allow as to do this 

"That's how I think about what we do at 
DOE." he said. "We're working a lot on 
technology and projects aimed at human 
and microbial genome sequencing. For un- 
derstanding sequence imphcations, we are 
making major, increasing investments in 
synchrotrons, synchrotron user facilities, 
neutron user facilities, and big nuclear 
magnetic resonance machines These are all 
aimed at rapid structure determin^on." 
Smith explained that now we arc seeing the 
begiiuiings of the biotechnology revolution 
implied by the sequencc-to-stmcture- 
to-function paradigm. "If you really under- 
stand the relationship between sequence 
and function, you can begin to design se- 
quences for particular purpo.scs. We don't 
yet know that much about the world around 
us, but there are capabitiiics out there in the 
biological world, and if we can understand 
them, wc can put those capabilities to use." 

"Comparative genomics." he continued, 
"will teach as a tremendoas amount about 
human evolution. The current phytogenetic 
tree is based on ribosomal RNA sequences, 
bui when we have determined whole ge- 
nomic sequences of different microbes, 
they will probably give us different ideas 
about relationships among archaebacteria, 
eukaryotes. and prokaryoies." 

Feeling good about progress over the previ- 
ous 5 years. Smith sununed it up suc- 
cinctly: "GciKimics has come of age, and it 
is opening the door to entirely new ap- 
proaches to biology." 

David Smith rflttrrd ai dtg end of January 
1996. Tuiuns resporMhilily for ihe DOE 
Human Gen<fnie Prugrum ix Ari.sfides 
fdtritut's, whf is tj.'so A.\uit iatf Di'ircl-pr 
nfthr DOE Officf nfBiofn^xca} and En\4- 
ntnmenlul Rrsearcli Margin Frazier is 
Oirvclor ufthe HeuUh Effects and Ufe 
Sciencex HeseurrM Division, ■•vhuh man- 
a/ifx me Human Gfnonw Pnigram. 

DOE Human Genome Program Report, Introduction 



Looking to the Future 

Insights, technologies, and resources already emerg- 
ing from the genome project, together with advances 
in such fields as computational and structural biology, 
will provide biologists and other researchers with im- 
portant tools for the,21st century. 

DOE Human Genomo Program noport, introduction 


Highlights of Research Progress 

The early years of the Hu- 
man Genome Program 
have been remarkably suc- 
cessful. Critical resources 
and infrastructures have 
been established, and technologies have 
been developed for producing several 
useful types of chromosomal maps. 
These gains are supporting the project's 
transition to the large-scale sequencing 
phase. Some highlights and trends in the 
U.S. Department of Energy's (DOE) 
Human Genome Program after FY 1 993 
are presented in this section. 

Clone Resources for 
Mappings Sequencing, 
and Gene Hunting 

The demands of large chromosomal 
mapping and sequencing efforts have 
necessitated the development of several 
different types of clone collections 
(called libraries) carrying human DNA. 
Three generations of DOE-developed li- 
braries are being distributed to research 
teams in the United States and abroad. 
In these libraries, human DNA seg- 
ments of various lengths are maintained 
in bacterial cells. 

genome researchers worldwide [hnp:// 

www-bio. Unigov/genome/html/ 
cosmtd.htm!). Very high resolution chro- 
mosome maps based principally on 
NLGLP libraries were published in 
1995 for chromosomes 16 and 19. 
These are described in detail in the Re- 
search Narratives section of this report 
(see LLNL, p. 27, and LANL. p. 35). 

PACs and BACs 

The third generation of clone resources 
supporting chromosome mapping is 
composed of PI artificial chromosome 
(PAC) and bacterial artificial chromo- 
some (BAC) libraries. A prototype PAC 
library was produced by the team of 
Leon Rosner (then at DuPont) many 
years ago, but more efficient produc- 
tion began with improvements intro- 
duced by the DOE-supported teams 
headed by Melvin Simon at Caltech 
(BACs) and Pieter de Jong at Roswell 
Park (PACs). 

In contrast to cosmids. BACs and PACs 
provide a more uniform representation 
of the human genome, and the greater 
length of their inserts (90,000 to 

Transitioning to 
large-scale sequencing 

DOE Genome Research 
Web Site 


NLGLP Libraries 

The first two generations are 

chromosome -specific libraries carrying 
small inserts of human DNA ( 1 5,000 lo 
40,000 base pairs). As part of the Na- 
tional Laboratory Gene Library Project 
(NLGLP) begun in 1983. these libraries 
were prepared at Los Alamos National 
Laboratory (LANL) and Lawrence 
Livermore National Laboratory (LLNL) 
using DOE flow-sorting technology to 
separate individual chromosomes. Li- 
brary availability has allowed the very 
difficult whole-genome tasks to be di- 
vided into 24 more manageable single- 
chromosome projects that could be 
pursued at separate research centers. 
Completed in 1994. NLGLP libraries 
have provided critical resources to 

Research Narratives 

Separate narfatives, be^nning on p. 25. contain aeaaed 
descriptions of reseafch prograns and acoompBshments at 
these n^ajof DOE genome research facitities. 

• Lawrence Uvannoie National Laboratoiy 

• Los AJamos National Laboratoiy 

 tawfance Beriteley National Laboratory 

* University ot WasNngton Qenome Sequencing 

» Genome Database 

 National Center for Genome Resources 

Research Abstracts 

Desoiiptians of individual research pio|acts at other institu- 
tions are given in rtw 2 19$6R»S9arch Abstracts. 

DOE Human Genome Program Report 


300.000 base pairs) facilitates both 
mapping and sequencing. Their useful- 
ness was illustrated dramatically in 
1 993 when the first breast cancer- 
susceptibility gene (BRCAl) was found 
in a BAC clone after other types of re- 
sources had failed. The next year, with 
major support from NIH, de Jong's PACs 
contributed to the isolation of the second 
human breast cancer-susceptibility gene 


The assembly of ordered, overlapping 
sets (contigs) of higli-qualiry clones has 
long been considered an essential step 
toward human genome sequencing. 
Because the clones have been mapped 
to precise genomic locations, DNA 
sequences obtained from them can be 
located on the chromosomes with mini- 
mal uncertainty. 

The large insert size of B ACs and 
PACs allows researchers to visually 
map them on chromosomes by using 
fluorescence in situ hybridization 
(FISH) technology (see photomicro- 
graph below). These mapped BACs and 
PACs represent very valuable resources 
for the cytogeneticist exploring chromo- 
somal abnormalities. Two major medi- 
cal genetics resources have been 
developed: ( I ) The Resource for Mo- 
lecular Cytogenetics at the University of 
California. San Francisco, in collabora- 
tion with the Lawrence Bericeley Na- 
tional Laboratory (LBNL) team led by 
Joe Gray { and 
(2) The Total Human Genome BAC- 
PAC Resource at Cedars-Sinai Medical 
Center. Los Angeles, developed by Julie 
Korenberg's laboratory (see map. p. 12, 
and Web site, 


Coordinated Mapping 
and Sequencing 

A simple strategy was proposed in 1996 
for choosing BACs or PACs to elongate 
sequenced regions most efficiently 
[Nature 381, 364-66 (1996)). The first 
step is to develop a BAC end sequence 
database, with each entry having the 
BAC clone name and the sequences of 
its human insert ends, tn toto, the source 
BACs should represent a 15- to 20-fold 
coverage of the human genome. Then 
for any BAC or chromosomal region se- 
quenced, a comparison against the data- 
base will return a list of BACs (or 
PACs) that overlap it. Optimal choices 
for the next BACs (or PACs) to be se- 
quenced can then be made, entailing 
minimal everlap (and therefore minimal 
redundancy of sequencing). 

Two pilot BAC-PAC end-sequencing 
projects were initiated in September of 
1996 to explore feasibility, optimize 
technologies, establish quality controls, 
and design the necessary informatics in- 
frastructure. Particular benefits are an- 
ticipated for small laboratories that will 
not have to maintain large libraries of 
clones and can avoid preliminary contig 
mapping (see abstracts of Glen Evans; 
Julie Korenberg; Mark Adams, Leroy 
Hood, and Melvin Simon; and Pieter de 
Jong in Part 2 of this report). 

Updated information on BAC-PAC re- 
sources can be found on the Web (http:// 
html). [See Appendix C: Human Subjects 
Guidelines, p. 77 ot http://www.oml. 
gov/hgmis/archive/nchgrdoe. html for 
DOE-NIH guidelines on using DNA 
from human subjects for large-scale 

cDNA Libraries 

In 1990, DOE initiated projects to en- 
rich the developing chromosome contig 
maps with markers for genes. Although 
the protein-encoding messenger RNAs 
are good representatives of their source 

genes, they are unstable and must be 
converted to complementary DNAs 
(cDNAs) for practical applications. 
These conversions are tricky, and arti- 
facts are introduced easily. The team led 
by Bento Scares (University of Iowa) 
has optimized the steps and continues to 
produce cDNA libraries of the highest 
quality. At LL>fL, individual cDNA 
clones are put into standard arrays and 
then distributed worldwide for charac- 
terization by the international IMAGE 
(for Integrated Molecular Analysis of 
Gene Expression) Consortiiun (see box. 
p. 13). 

Initially supported under a DOE cDNA 
initiative, Craig Venter's team (now at 
The Institute for Genomic Research) 
greatly improved technologies for read- 
ing sequences from cDNA ends (ex- 
pressed sequence tags, called ESTs). 
Together with complementary analysis 
software, ESTs were shown to be a valu- 
able resource for categorizing cDNAs 
and providing the fu-st clues to the func- 
tions of the genes from which they are 
derived. This fast EST approach has at- 
tracted millions of dollars in commercial 
investment. Mapping the cDNA onto a 
chromosome can identify the location of 
its corresponding gene. Many laborato- 
ries worldwide are contributing to the 
continuing task of mapping the estimated 
70,000 to 100,000 human genes. 


All the previously described DNA 
clones are maintained in bacterial host 
cells. However, for unknown reasons, 
some regions of the human genome ap- 
pear to be unclonable or unstable in 
bacteria. The team led by Jean-Michel 
Vos (University of North Carolina, 
Chapel Hill) has developed a human ar- 
tificial episomal chromosome (HAEC) 
system based on the Epstein-Barr virus 
that may be useful for coverage of these 
especially difficult regions. In the broader 
biomedical community, HAECs also 
show promise for use in gene therapy. 

DOE Human Genome Program Report, Highlights 


ill 'Z'" 



BAC-/MC Map. '!%■ Total Human Genome HAC-PAC 
Resource represents an important tool for umlersfanding 
the genes responstbte for human development and disease 
'!%• Hesource, consisting of more than 5()00 BAC and PAC 
f clones, covers every human chromosome band and 25% 

OOE Human O^nonw Program Raport. Highlights 

• MC mapped piiRiAniy ID WdiOGStion 

• dJ<;ffl<»e»d'aacond«i%te<te kipafeg^ i a at i ^uw il' 
< BJlC iT^ipBdlto eautUpte oe ttej r imma 

• SMCmMjE^ftdtoimA^WamvM 

• SAO fiMppvtf vtobaiwiy i» •>« teAHfwv 

» fine «n^p»q<q W» iw O c i i i {»*»&Hn»w»> 

t)//>i^ enure human genome. Each color dot repn'senis a 
single BAC or /MC cloiu- mapped by FISH to a specific 
chromosome band represented in black arid white. The 
clones, which an- .stable and useful for sequeruing. have 
been integrated with the genetic arul physural chromosome 

maps. (Source- Julie Korcnben^. CeJun-Sinai Medical Center} 


Resources for Gene 

Hunting for disease genes is not a spe- 
cific goal of the DOE Human Genome 
Program. However, DOE-supported 
libraries sent to researchers worldwide 
have facilitated gene hunts by many re- 
search teams. DOE libraries have played 
a role in the discovery of genes for cystic 
fibrosis, the most common lethal inher- 
ited disease in Caucasians; Huntington's 
disease, a progressive lethal neurological 
disorder; Batten's disease, the most 
prevalent neurodegenerative childhood 
disease; two forms of dwarfism; Fanconi 
anemia, a rare disease characterized by 
skeletal abnormalities and a predisposi- 
tion to cancer; myotonic dystrophy, the 
most common adult form of muscular 
dystrophy; a rare inherited form of breast 
cancer; and polycystic kidney disease, 
which affects an estimated 500,000 
people in the United States at a healthcare 
cost of over $1 billion per year. 

The team led by Fa-Ten Kao (Eleanor 
Roosevelt Institute) has microdissected 

several chromosomes and made deriva- 
tive clone libraries broadly available to 
disease-gene hunters. This resource 
played a critical role in isolating the 
gene responsible for some 15% of colon 

Of Mice and Humans: 
The VaJue of 
Comparative Analyses 

A remaining challenge is to recognize 
and discriminate all the fimctional con- 
stituents of a gene, particularly regula- 
tory components not represented within 
cDNAs, and to predict what each gene 
may actually do in human biology. 
Comparing human and mouse se- 
quences is an exceptionally powerful 
way to identify homologous genes and 
regulatory elements that have been sub- 
stantially conserved during evolution. 

Researchers led by Leroy Hood (Uni- 
versity of Washington, Seattle) have 
analyzed more than I million bases of 
sequence from T-cell receptor (TCR) 

To IMAGE the Human 
Gene Map 

Since 1993, the Integrated Molecular 
Analysis of Gene Expression (IM- 
AGE) Consortium has played a major 
role in ttie development of a human 
gene map. Founding members of the 
IMAGE Consortium are Bettto Scares 
(Columbia University, now at Univer- 
sity of Iowa), Gregory Lennon 
(LLNL). Mihael Polyraerc^ulos 
(National Institutes of Health's Na- 
tional Institute of Mental Health), 
and Charles Auftrey {G^nfithon, in 
France). Because cDNA molecules 
represent coding (expressed-gene) 
areas of the genome, sets of cloned 
cDNAs are a valuable resource to 
the gene-mapping commiinity. The 

cDNA libraries representing different 
tissues have many members in com- 
mon. Thus, good coordination among 
participating laboratories can minimize 
redundant work. The intexnaBoaal IM- 
AGE Consortium laboratories fulfill 
this role by developing and arraying 
cDNA clones for worldwide use. 

From the IMAGE cDNA clones, re- 
searchers at the Washington University 
(St. Louis) Sequencing Center deter- 
mine ESTs with support from Merck, 
Itic. The data, which are used in gene 
localization, are then entered into public 
databases. More than 10,000 chromo- 
somal assignments have been entered 
into Genome Database {http.Z/www.gdb. 
org). Including replica copies, over 

3 million clones have been distrib- 
uted, probably representing daout 
50,000 distinct human genes. 

The IMAGE infrastructure is being 
used in two additional prograttvs. At 
LLNL, the IMAGE laboratory arrays 
mouse cDNA libraries produced by 
Soares for the Washington University 
Mouse EST project {ht:p://genome. 
wustt. edu/est/mousejMthmpg. hrnif) 
with sequencing sponsored by the 
Howard Hughes Medical Institute. 
Additional clone libraries are being 
used in a collaborative sequencing 
project sponsored by the NIH Na- 
tional Cancer Institute as part of the 
Cancer Genome Anatomy Project to 
i<Jeatify and fully sequence genes 
implicated in major cancers (http:// 

DOE Human Genome Program Report, Highlights jS13 i 


chromosome regions of both human and 
mouse genomes. Many subtle functional 
elements can be recognized only by 
comparing human and mouse sequences. 
TCRs play a major role in immunity 
and autoimmune disease, and insights 
into their mechanisms may one day help 
treat or even prevent such diseases as 
arthritis, diabetes, and multiple sclerosis 
(possibly even AIDS). 

Comparative analysis is also used to 
model human genetic diseases. Given 
sequence information, researchers can 
produce targeted mutations in the mouse 
as a rapid and economical route to elu- 
cidating gene function. Such studies 
continue to be used effectively at Oak 
Ridge National Laboratory (ORNL). 

DNA Sequencing 

From the beginning of the genome 
project, DOE's DNA sequencing- 
technology program has supported both 
improvements to established method- 
ologies and innovative higher-risk strat- 
egies. The first major sequencing 
project, a test bed for incremental im- 
provements, culminated with elucida- 
tion of the highly complex TCR region 
(described above) by a team led by 

A novel "directed" sequencing strategy 
initiated at LBNL in 1993 provides a 
potential alternative approach that can 
include automation as a core design fea- 
ture. In this approach, every sequencing 
template is first mapped to its original 
position on a chromosome (resolution, 
30 bases). The advantages of this method 
include a large reduction in the number 
of sequencing reactions needed and in 
the sequence-assembly steps that follow. 
To date, this directed strategy has 
achieved significant results with simpler, 
less repetitive nonhuman sequences, par- 
ticularly in the NIH-fimded Drvsophila 
genome program. The system also is in 
use at the Stanford Human Genome 
Center and Mercator Genetics, Inc. 

The preparation of DNA clones for se- 
quencing involves several biochemical 
processing steps that require different 
solution environments. At the White- 
head Institute, Trevor Hawkins has im- 
proved systems for reversible binding of 
DNA molecules to magnetic beads that 
are compatible with complete robotic 
management. The second-generation 
Sequatron fits on a tabletop with a 
single robotic arm moving sample trays 
between servicing stations. This very 
compact system, supported by sophisti- 
cated software, may be ideal for labora- 
tories with limited or costly floor space. 

Fluorescent tags are critical components 
of conventional automated sequencing 
approaches. The team of Richard 
Mathies and Alexander Glazer (Univer- 
sity of California, Berkeley) has made a 
series of improvements in fluorescence 
systems that have decreased DNA input 
needs and markedly increased the qual- 
ity of raw data, thereby supporting 
longer useful reads of DNA sequence. 

Complementary improvements in enzy- 
mology have been achieved by the team 
of Charles Richardson and Stanley Ta- 
bor (Harvard Medical School). Current 
widely used procedures for automated 
DNA sequencing involve cycling be- 
tween high and low temperatures. The 
Harvard researchers used information 
about the three-dimensional structure of 
polymerases (enzymes needed for DNA 
replication) and how they function to 
engineer an improved Taq polymerase. 
ThermoSequenase, which is now pro- 
duced commercially as part of the 
ThermoSequenase kit, reduces the 
amount of expensive sequencing re- 
agents required and supports popular 
cycle -sequencing protocols. 

The application of higher electrical 
fields in gel electrophoresis separation 
of DNA fragments can increase se- 
quencing speed and efficiency. Conven- 
tional thick gels cannot adequately 
dissipate the additional heat produced, 
however. Two promising routes to 
"thinness" are ultrathin slab gels and 

14 9 DOE Human Genome Program Raport, Highlights 


capillary systems. An ultrathin gel sys- 
tem was developed by Lloyd Smith 
(University of Wisconsin, Madison) and 
licensed for commercial development. 

The replacement of gels by pumpable 
solutions of long polymers is making 
capillary array electrophoresis (CAE) 
potentially practical for DNA sequenc- 
ing. The first CAE system for DNA was 
demonstrated by the team of Barry 
Kaiger (Northeastern University). In 
1995, Kargerand Norman Dovichi (Uni- 
versity of Alberta, Canada) separately 
identified CAE conditions under which 
DNA sequencing reads could be ex- 
tended usefully up to the 1000-base 
range. Another CAE system, developed 
by Edward Yeung (Iowa State Univer- 
sity), has been Ucensed for commercial 
production (see box, p. 23). Mathies has 
developed a system in which a confocal 
microscope displays DNA bands. Appli- 
cation of this system to the sizing of 
larger DNA fragments binding multiple 
fluors allows single-molecule detection. 

Replacing the gel-separation step with 
mass spectroscopy (MS) is another 
promising approach for rapid DNA se- 
quencing. MS uses differences in mass- 
to-charge ratios to separate ionized 
atoms or molecules. Early efforts at MS 
sequencing were plagued by chemical 
reactivity during the "launching" phase 
of matrix-assisted laser desorption ion- 
ization (MALDI). MALDI badly de- 
graded die DNA sample input. However, 
the degradation chemistry was elucidated 
in Smith's laboratory, leading to improve- 
ments. At ORNL, the team of Chung- 
Hsuan Chen has performed extensive 
trials of alternative matrices and has 
achieved significant improvements that 
now support sequence reads up to 100 
DNA bases. The system is undergoing 
trials for DNA diagnostic applications. 

The most revolutionary sequencing tech- 
nology is being pursued by the team of 
Richard Keller and James Jett at LANL. 
Their goal is to read out sequence ftom 
single DNA molecules, work that builds 

on LANL's expertise in flow cytometry. 
The strand to be sequenced is labeled 
first with fluors that distinguish the 
four DNA subunits and is then sus- 
pended in a flow stream. An exonu- 
clease cleaves the subunits, which flow 
past an interrogating laser system that 
reports the subunits' identities. All sys- 
tem constituents are operational but 
limited by the low subunit release rates 
of commercially available exonu- 
cleases. A current developmental focus 
is on identifying more active exonu- 

Synthetic DNA strands in the 15- to 30- 
base range (oligomers) play essential 
roles in DNA sequencing; in sample- 
preparation steps for the polymerase 
chain reaction, which copies DNA 
strands millions of times; and in DNA- 
based diagnostics. The cost of custom 
oligomer synthesis once was a limiting 
factor in many research projects. A 
more economical, highly parallel oligo- 
mer synthesis technology was devel- 
oped by Thomas Brennan at Stanford 
University (see last bullet, p. 22, for 
further details). 

The sequencing by hybridization 
(SBH) technology provides information 
only on short stretches of DNA in a 
single trial (interrogation), but thou- 
sands of low-cost interrogations can be 
performed in parallel. SBH is very use- 
ful for rapid classification of short 
DNAs such as cDNAs, very low cost 
DNA resequencing, and detection of 
DNA sequence differences (polymor- 
phisms) over short regions. The team of 
RadomirCrkvenjakov and Radoje 
Drmanac invented one format of SBH 
while in Yugoslavia, made substantial 
improvements at Argonne National 
Laboratory (ANL), and later started 
Hyseq Inc. to commercialize these 
technologies. At ANL, another imple- 
mentation, SBH on matrices (SHOM) 
of gels, holds promise for high-accu- 
racy sequence firoofreading and diverse 
DNA diagnostics. The ANL team, led 
by Andrei Mirzabekov, collaborates 

DOE Human Genome Program Report, Highlights jfilSi 


with the Englehardt Institute in Moscow, 
where SHOM was demonstrated initially. 

Informatics: Data 
Collection and Analysis 

Explosive growth of information and the 
challenges of acquiring, representing, 
and providing access to data pose continu- 
ing monumental tasks for the large public 
databases. Over the last 3 years, the Ge- 
nome Database (GDB), the major inter- 
national repository of human genome 
mapping data, has made extensive changes 
culminating in the enhanced representa- 
tion of genomic maps and gene informa- 
tion in GDB V6.0. Major issues for the 
Genome Sequence DataBase (GSDB), 
established in 1994, are to capture and 
annotate the sequence data and to repre- 
sent it in a form capable of supporting 
complex, ad hoc queries. Both GDB and 
GSDB have been restrucnired recently to 
handle the increasing flood of data and 
make it more useful for downstream 
biology (see Research Narratives, GDB, 
p. 49, and GSDB, p. 55. [hnp://www.gdb. 
org and] 

Victor Markowitz, formerly of LBNL, has 
developed a suite of database tools allow- 
ing substantial modifications of underly- 
ing data structures while the biologists' 
query tools remain stable. lhttp://gizmo.] 

The Genome Annotation Consortium 
(based at ORNL) was initiated in 1997 to 
be a modular, distributed informatics fa- 
cility for analyzing and processing (e.g., 
annotating) genome-scale sequence data. 

The many improvements in World Wide 
Web software now enable maps to be 
downloaded simply by using a browser 
with accessory software provided by 
GDB. Computers sift stretches of DNA 
sequence for patterns that identify such 
biologically important features as pro- 
tein-coding regions (exons), regulatory 
areas, and RNA splice sites. Other com- 
puter tools are used to compare a new se- 

quence (i.e., a putative gene) against all 
other database entries, retrieve any ho- 
mologous sequences that already have 
been entered, and indicate the degree of 

The Gene Recognition and Analysis 
Internet Link (GRAIL) at ORNL local- 
izes genes and other biologically impor- 
tant sequence features (see box, p. 17). 

Another analytical service that returns 
informative, annotated data is MAG- 
PIE, provided through ANL by Terry 
Gaasterland. MAGPIE is designed to 
reside locally at the site of a genome 
project and actively carry out analysis 
of genome sequence data as it is gener- 
ated, with automated continued reevalu- 
ation as search databases grow (http:// 
www. mcs. an I. go v/home/gaaste rl/ 
magpie.html). Once an automated func- 
tional overview has been established, it 
remains to pinpoint the organisms' ex- 
act metabolic pathways and establish 
how they interact. To this end, the WIT 
CWhat is There) system, which succeeds 
PUMA, sufjports the construction of 
metabolic pathways. Such constructions 
or models are based on sequence data, 
the clearly established biochemistry of 
specific organisms, and an understand- 
ing of the interdependencies of bio- 
chemical mechanisms. WIT, which was 
developed by Evgenij Selkov and Ross 
Overbeek at ANL, offers a particularly 
valuable tool for testing current hypoth- 
eses about microbial biology, fhttp://] 

Researchers at the University of Colo- 
rado have developed another approach 
for predicting coding regions in ge- 
nomic DNA, combining multiple types 
of evidence into a single scoring func- 
tion, and returning both optimal and 
ranked suboptimal solutions. The ap- 
proach is robust to substitution errors 
but sensitive to frameshift errors. The 
group is now exploring methods for 
predicting other classes of sequence re- 
gions, especially promoters, /software 

DOE Human Genome Program Report, Highlights 


GRAIL and C^nQuest 

Itt 1996 the Gene Recognitjon and 
Ajialysis Internet Link (GRAIL) 
processed nearly 40 million bases 
of seqaence per month, making it 
the most widely used "gene- 
ftodjng" system available. Devel- 
oped at Oak Ridge National Labo- 
ratory (ORNL) by a team led by 
Ed Uberbacber, GRAIL uses arti- 
ficial intelligence and machine 
learning to discover complex rela- 
tionships in sequence data. Tbe 
geaQuest server, also at ORNL, 
compares information generated 
by GRAIL with data in protein, 
DNA. sod motif datable* to add 
furtlier value to annotation of 
DNA sequences. 

GRAIL'S latest version ( J J) com- 
birtes a Motif Orapbicai Client 
with improved sensitivity and 
spUce-&ite recognition, better per- 
formaoce in AT-rich regions, new 
analysis systems for model organ- 
isms, and fraanieshi^ detection. 
Hiis system can be used on a wide 
variety of UNIX platforms, includ- 
ing Sim, DEC and SG!. Tbe many 
ways to access GRAiL include a 
command line sockets client thai 

pernuts remote program calls to all 
basic GRAIL-genQuest analysis 
services, thus allowing convenient 
inte^atioQ of GRAIL results into 
automated analysis pipelines. 

Contact GRAIL staff through the 
Web site at http://comphw.omL 
gov or at GRAILMAJL@omLgov 
for e-mati and ftp access. 

and information: http://beagle.colorado. 


The Baylor College of Medicine (BCM) 
Search Launcher improves user access 
to the wide variety of database-search 
tools available on the Web. Search 
Launcher features a single point of en- 
try for related searches, Che addition of 
hypertext links to results returned by re- 
mote servers, and a batch client [hup:// 
launcher.html I 

FASTA-SWAP, also from the BCM 
group, is a new pattern-search tool for 
databases that improves sensitivity and 
specificity to help detect related se- 
quences. BEAUTY, an enhanced ver- 
sion of tfie BLAST database-search 
program, improves access to informa- 

tion about the functions of matched 
sequences and incorporates additional 
hypertext links. Graphical displays al- 
low correlation of hit positions with an- 
notated domain positions. Fumre plans 
include providing access to information 
from and direct links to other databases. 
including organism -specific databases. 

PROCRUSTES uses comparisons of 
the same gene of different species to 
delimit gene structure much more accu- 
rately. The product of a collaboration 
between Pavel Pevzner (University of 
Southern California) and two Russian 
researchers, PROCRUSTES is based on 
the spliced-alignment algorithm, which 
explores all possible exon assemblies 
and finds the multiexon structure that 
best fits a related protein, [hrtp:// 
wwW'hto. use. edu/software/procrustes / 

DOE Human Qcnome Pragram Rsport, Highlights 


/.VAU*** componrni ofthf DOE 
Human Genome Program 
supports projects to help judges 
understand the xcientifn' 
validity- of the genetics-based 
claims that are poised to flood 
the nation s courtrooms Robert 
E Orr (left) of the North 
Carolina Supreme Court and 
Francis X. Spina of the Massa- 
chusetts Appeals Court at the 
New England Regional 
Conference on the Courts and 
Genetics (July 1997) {v: rite ipate 
in a hands-on laboratory 
session. As a prelude to learning 
the fundamentals ofDNA 
science and genetic testing, the 
judges are precipitating DNA 
(seen as streaks on the glass rod 
in the tuhejfrrjm a solution 
containing the bacterium 
Escherichia coli. (Courts and 
Science On-IJtw Magazine: 
hap ://wTv w.oml .go v/courti-/ 

Ethical, Legal, and 
Social Issues (ELSI) 

From the outset of the Human Genome 
Project, researchers recognized that the 
resulting increase in knowledge about 
human biology and personal genetic in- 
formation would raise complex ethical 
and policy issues for individuals and 
society. Rapid worldwide progress in 
the project has heightened the uigency 
of this challenge. 

Most observers agree thai personal 
knowledge of genetic susceptibility can 
be expected to serve humankind well. 
opening the door to more accurate diag- 
noses, preventive intervention, intensi- 
fied screening, lifestyle changes, and 
early and effective treatment. But such 
knowledge has another side, too: risk of 
anxiety, unwelcome changes in personal 
relationships, and the danger of stigma- 
tization. Often, genetic tests can indi- 
cate possible future medical conditions 
far in advance of any symptoms or 
available therapies or treatments. If 
handled carelessly, genetic information 
could threaten an individual with dis- 
crimination by potential employers and 

Other issues are perhaps less immediate 
than these personal concerns but no less 
DOE Human Genome Program Report. Highlights 

challenging. How. for example, 
are products of the Human Ge- 
nome Project to be patented and 
commercialized? How are the ju- 
dicial, medical, and educational 
communities — not to mention the 
public at large — to be educated 
effectively about genetic research 
and its implications? 

To confront these issues, the DOE 
and NIH ELSI programs jointly 
established an ELSI working 
group to coordinate policy and 

research between the two agencies. 
/An FY 1997 report evaluating 
the joint ELSI group is available 
on the Web {http://www.omLgov/ 

The DOE Human Genome Program has 
focused its ELSI efforts on education. 
privacy, and the fair use of genetic in- 
formation (including ownership and 
commercialization); workplace issues, 
especially screening for susceptibilities 
to environmental agents; and implica- 
tions of research fmdings regarding in- 
teractions among multiple genes and 
environmental influences. 

A few highlights from the DOE ELSI 
portfolio for FY 1994 through FY 1997 
are outlined below. 

• Three high school curriculum mod- 
ules developed by the Biological 
Sciences Curriculum Study (BSCS). 
{ } 

• An educational program m Los Ange- 
les to develop a culturally and linguis- 
tically appropriate genetics curriculum 
based on a BSCS module (see above) 
for Hispanic students and their fami- 
lies. l] 

• A series of workshops to educate a 
core group of 1000 judges around the 
nation and a handbook with compan- 
ion videotape to assist federal and 
state judges in understanding and as- 
sessing genetic evidence in an in- 
creasing number of civil and criminal 
cases (see photo above). 


• Educational materials developed by 
the Science+Literacy for Health 
Project of the American Association 
for the Advancement of Science 
(AAAS) and targeted at or above the 
6th- to 8th-grade reading levels. 
[AAAS: 202/326-6453,- Your Genes. 
Your Choices booklet; hrtp://www. html/ 

• A program at the University of Chi- 
cago aimed at developing a knowl- 
edge base for physicians and nurses 
who will train other practitioners to 
introduce new genetic services. 

• A series of radio programs (see photo 
at right) on the science and ethical 
issues of the genome project and a 
TV documentary program on ELSI 
issues. lhnp://www.pbs.orgJ 

• The Gene Letter, a monthly online 
newsletter on ELSI issues for 
healthcare professionals and consimi- 
ers. Ihttp.V/www.geneletter.orgl 

• A congressional fellowship program 
in human genetics, administered 
through AAAS, for one annual fel- 
lowship for a mid-career geneticist. 

• The draft Genetic Privacy Act, pre- 
pared as a model for privacy legisla- 
tion and covering the collection, 
analysis, storage, and use of DNA 
samples and the genetic information 
derived from them. lhttp://www.oml. 
gov/hgmis/resource/p: ivacy/ 

• Privacy studies at the Center for So- 
cial and Legal Research, including an 
analysis of the effects of new genetic 
technologies on individuals and insti- 

For details on these and other projects, 
see ELSI Abstracts, p. 45. in Part 2 of this 
report In addition to the specific projects 
listed in Part 2, the DOE program spon- 
sors a nimiber of conferences and work- 
shops on ELSI topics. 

DOE ELSI Web Site 

Protection of Human Research Subjects 

In i 9%. President Clinton appoimcd the Nauonal Bioeihics Advisory Com- 
missioa lo provide guidance on the ethical conduct of cuneni and fiimrc bio- 
logical and behavioral research, especially that related lo genetics and the 
rights and welfare of human research subjects (hlxp:// 

Also in 19%, DOE and NW issued a document providing invesugatots with 
guidance in the use of DNA from human subjects for laige-scale sequencing 
projecK (see Appendix C Human Subjects Guidelines, p. 17). ihnp://www. 

DOE Human Genome Program Report, Highlights 

«Teiii-e t.iirrmurc Siiiuimil Lahuraiiin i 

Ida XsliKilrlli. I.I SI. I 

DOE Human Genonw Profinm Report 


Technology Transfer 

Transferring technology lo 
the private sector, a pri- 
mary mission of OOE. is 
strongly encouraged in the 
Human Genome Program 
to enhance the nation's investment in 
research and technological competitive- 
ness. Human genome centers at 
Lawrence Berkeley National Laboratory 
(LBNL), Lawrence Livermore National 
Laboratory (LLNL), and Los Alamos 
National Laboratory (LANL) provide 
opportunities for private companies to 
collaborate on joint projects or use labo- 
ratory resources. These opportunities in- 
clude access to information (including 
databases), personnel, and special facili- 
ties; informal research collaborations; 
Cooperative Research and Development 
Agreements (CRADAs); and patent and 
software licensing. For information on 
recently developed resources, contact 
individual genome research centers or 
see Research Highlights, beginning on 
p. 9. Many universities have their own 
licensing and technology transfer offices. 

Some collaborations and technology- 
transfer highlights from FY 1994 
through FY 1996 are described below. 


Involvement of the private sector in re- 
search and development can facilitate 
successful transfer of technology to the 
marketplace, and collaborations can 
speed production of essential tools for 
genome research. A number of interac- 
tive projects are now under way, and 
others are in preliminary stages. 


One technology-transfer mechanism 
used by DOE national laboratories is 
the CRADA, a legal agreement with a 
nongovernmental organization to col- 
laborate on a defined research project. 
Under a CRADA. the two entities share 
scientific and technological expertise, 
with the governmental organization pro- 
viding personnel, services, facilities. 

equipment, or other resources. Funds 
must come from the nongovernmental 
partner. A benefit to participating com- 
panies is the opportutiity to negotiate 
exclusive licenses for inventions arising 
from these collaborations. For periods 
through 1996. the CRADAs in place in 
the DOE Human Genome Program in- 
cluded the following: 

• LLNL with Apphed Biosystems 
Division of PCTkin-Elmer Corporation 
to develop analytical instrumentation 
for faster DNA sequencing instru- 

• LANL with Amgen. Inc.. to develop 
bioassays for cell growth factors; 

• Oak Ridge National Laboratory 
(ORNL) with Darwin Molecular, 
Inc., for mouse models of human 
immunologic disease; 

• ORNL with Proctor & 

Gamble. Inc.. for 
analyses of liver regen- 
eration in a mouse 
model; and 

• Brookhaven National 
Laboratory with U.S. 
Biochemical Corpora- 
tion to identify proteins 
useful for primer- 
waUdng methods and 
large-scale sequencing. 

Work for Others 

In other collaborations, 
the LBNL genome center 
is participating in a Work 
for Others agreement 
with Amgen to automate 
the isolation and charac- 
terization of large num- 
bers of mouse cDNAs. 
The center group is focus- 
ing on adapting LBNL's 
automated colony -picking 
system to cDNA protocols 
and applying methods to 
generate large numbers of 
filter replicas for colony 

Couverting scientific 
knowledge into 
commercially useful 

Technology Transfer 

Technology transfer involves converting 
scientific koowtedge into corametciaDy 
useful product-s. Through the 1980s, a se- 
ries of laws was enacted to encourage the 
deveJopmeot of commercial appiications 
of federally funded research at univetsitjes 
and federal laboraioiies. Such laws {chiefly 
the Bayh-Dofe Act of 19S0. Stevenson- 
Wydier Actof 15»S0, and Federal Technol- 
ogy TransfCT Act of 1986 < Public Laws 
96-51 7, 96-480, and 99-502, respectively)) 
were not aimed specifically at genome or 
even biotnedica] research. However, such 
research and the surrounding commercial 
biotechnology enterprises cleariy have 
benefited from them. The biotechnology 
sector's success owes much to federal 
policies on technology transfer and intel- 
lecwal property, [Soufcet U.S. Congress, 
Office of Technology Assessment, Fed- 
eral Technology Transfer and the Human 
Genome Project, OT.ii|,-BP-EHR-t62 
(Washington, DC: US Govetnmenl 
Printing OfficB. September 1995)] 

DOE Human Genome Program Roporl '§'2ll 


filter hybridization and subsequent 
analysis. ["Work for Others" projects 
supported by an agency or organization 
other than DOE (e.g., NIH, National 
Cancer Institute, or a private company) 
can be conducted at a DOE installation 
because this work is complementary to 
DOE research missions and usually re- 
quires multidisciplinary DOE facilities 
and technologies.] 

The Resource for Molecular Cytogenetics 
was established at LBhJL and the Uni- 
versity of California (I'C), San Fran- 
cisco, with the support of the Office of 
Biological and Environmental Research 
and Vysis, Inc. (formerly Imagenetics). 
The Resource aims to apply fluorescent 
in situ hybridization (FISH) techniques 
to genetic analysis of human tissue 
samples; produce probe reagents; design 
and develop digital-imaging micros- 
copy; distribute probes, analysis tech- 
nology, and educational materials in the 
molecular cytogenetic community; and 
transfer useful reagents, processes, and 
instruments to the private sector for 

NIST Advanced Technology Program 

Several commercia} applications of research sponsored by the VS. 
H^man Genome Project have been fiirthered by ihe Advanced 
Technology Program (ATP) of the II ,S National Institute of Stan- 
dards and Technology. ATP's missioB is to stimulate economic 
growth and indu.-!lrial competitiveness by encouraging hjgb-risk 
but powerful new technologies. Its Tools for DN A Diagnostics 
progiara uses coUabotations among researchers and industry to 
develop (I) cost-effective methods for determining, analyzing, and 
storing DN A sequences for a wide %'ariet) of diagnostic applica- 
tions ranging from healthcare to agriculture to the eavironmeai and 
(2) a new and potentially very large market for DNA diagnostic 

Awardecs have included companies developing DNA diagnostic 
Chips, more powerful cytogenetic diagnostic techniques based on 
compaiative genomic hybridization, DNA sequencing insutuoen- 
tatiofi, and DNA analysis tech&ology. Eventually, commercializa- 
tion of these underlying technologies is expected to generate 
hundreds of thousands of jobs, /80Q/287-3863, Fax: 3017926-9524,, http:/Avww.atp.niift.jiavj 

Patenting and 
Licensing Highlights, 
FY 1994-96 

• A development license for single- 
molectile DNA sequencing replaced 
the 1991-94 CRADA (the first 
CRADA to be established in the U.S. 
Human Genome Project) between 
LANL and Life Technologies, Inc. 

• In 1995, a broad patent was awarded 
to UC for chromosome painting. This 
technology uses FISH to stain spe- 
cific locations in cells and chromo- 
somes for diagnosing, imaging, and 
studying chromosomal abnormalities 
and cancer. Resulting from a 1989 
CRADA between LLNL and UC, 
FISH was licensed exclusively to 

• Hyseq, Inc., was founded in 1993 by 
former Argonne National Laboratory 
researchers Radoje Drmanac and 
Radomir Crkvenjakov to commer- 
cialize the sequencing by hybridiza- 
tion (SBH) technology. Hyseq has 
exclusive patent rights to a variation 
known as format 3 of SBH or the 
"super chip." Hyseq later won an Ad- 
vanced Technology Program award 
from the U.S. National Institute of 
Standards and Technology to develop 
the technology further. 

• Oligomers — short, single-stranded 
DNAs — are crucial reagents for ge- 
nome research and biomedical diag- 
nostics. ProtoGene Laboratories, 
Inc., was founded to conunercialize 
new DNA synthesis technology 
(developed initially at LBNL with 
completed prototypes at Stanford 
University) and to offer the first 
lower-cost custom oligomer syn- 
thesis. The Parallel Array Synthesis 
system, which independently synthe- 
sizes 96 oligomers per run in a stan- 
dard 96-well microliter plate format, 
shows great promise for significant 
cost reductions. I*rotoGene first 

DOE Human Genome Program Report, Technology Transfer 


licensed sales and distribution to LTI 
and, later, production rights as well. 
LTI operates production centers in 
the United States, Europe, and Japan. 

• The GRAIL-genQuest sequence- 
analysis software developed at 
ORNL was licensed by Martin 
Marietta Energy Systems (now 
Lockheed Martin Energy Research) 
to ApoCom, Inc., for pharmaceutical 
and biotechnology company re- 
searchers who cannot use the Internet 
because of data-security concerns. 
The public GRAIL-genQuest service 
remains freely available on the 
Internet (see box, p. 17). 

• In 1995, an exclusive license was 
granted to U.S. Biochemical Corpo- 
ration for a genetically engineered, 
heat-stable, DNA-replicating enzyme 
with much-improved sequencing 
properties. The enzyme was devel- 
oped by Stanley Tabor at Harvard 
University Medical School. 

• In 1995, an advanced capillary array 
electrophoresis system for sequenc- 
ing DNA was patented by Iowa State 
University. The system was licensed 
to Premier American Technologies 
Corporation for commercialization 
(see graphic at right and R&D 100 
Awards, next page). 

• In 1996, a patent was granted to 
LANL researchers for DNA fragment 
sizing and sorting by laser-induced 
fluorescence. An exclusive license 
was awarded to Molecular Technol- 
ogy, Inc., for commercialization of 
the single- molecule detection capa- 
bility related to DNA sizing (see 
R&D 100 Awards, next page). 


Small Business Innovation Research 
(SBIR) Program awards are designed to 
stimulate commercialization of new 
technology for the benefit of both the 
private and public sectors. The highly 
competitive program emphasizes 

cutting-edge, high-risk research with 
potential for high payoff in different ar- 
eas, including human genome research. 
Small business firms with fewer than 
500 employees are invited to submit 
applications. SBIR human genome top- 
ics concentrate on innovative and ex- 
perimental approaches for carrying out 
the goals of the Human Genome Project 
(see SBIR, p. 63, in Part 2 of this re- 
port). The Small Business Technology 
Transfer (STTR) Program fosters trans- 
fers between research institutions and 
small businesses. ^DOE SBIR and 
STTR contact Kay Etzler (301/903- 
5867, Fax: -5488, Kay.Etzler@oer.doe. 
gov), http://sbirerdoe.gOv/.sbir 

Capillary Array Electrophoresis (CAE). CAE .syMfm.', promise dramaluaUy 
fcisrfr arid hisher-re.solutior. fragment .separation for DNA sequencing. A 
mulliple.xed CAl: .'^y.slem designed by (:dv.(xrd Yeur.g ilov.n State University) 
ha.s been developed for commercial production by Prcmii-r American 
Technologies Corporation IPATCO). In the f'AlCO ESiVOiX) model. DS'A 
samples are introduced into the 96-capdlary array: as the separated 
fragments pass through the capillaries, thev ore irradiated all at once with 
laser light. Fluorescence is measured hv a charged coupled device that acts 
as a simultaneous multichannel detector tin.sel circle at upper left: Cl<\seup 
view of individual capiiiary lanes Kith separated .samples.) Because every 
fragment length exists in the sample, bases arc identified in order accord- 
ing to the time required for then: to reach the laser-detector region 
iSoune: Thamas Kiir.t;. PATCOJ 

DOE Human Genome Program Report, Technology Transfer 


Technology Transfer 

A Federal Laboratory Consortium 
Award for Excellence in Technology 
Transfer was presented to Edward 
Yeung and a research team at Iowa 
State University's Ames Laboratory in 
1993. Their laser-based method for 
indirect fluorescence of biological 
samples may have applications for rou- 
tine high-speed DNA sequencing (see 
graphic, p. 23). Yeung also won the 
1994 American Chemical Society 
Award for Analytical Chemistry. 

1997 R&D 100 Awards 

DOE researchers in 12 facilities across 
the country won 36 of the R&D 100 
Awards given by Research and Devel- 
opment Magazine for 1996 work. DOE 
award-winning research ranged from 
advances in supercomputing to the bio- 
logical recycling of tires. Announced in 
July 1997, these awards bring DOE's 
R&D 100 toul to 453, the most of any 
single organization and twice as many as 
all other govenunent agencies combined. 

Two DOE genome-related research 
projects received 1997 R&D 100 
Awards. One was to Yeung {see text at 
left and graphic, p. 23) for '■ESY9600 
Multiplexed Capillary Electrophoresis 
DNA Sequencer." 

The other award was to Richard Keller 
and James Jett (LANL) with Amy 
Gardner (Molecular Technologies, Inc.) 
for "Rapid-Size Analysis of Individual 
DNA Fragments." This technology 
speeds determination of DNA fragment 
sizes, making DNA fingerprinting ap- 
plications in biotechnology and other 
fields more reliable and practical. 

R&D Magazine began making annual 
awards in 1963 to recognize the 100 
most significant new technologies, 
products, processes, and materials de- 
veloped throughout the world during 
the previous year (hnp://www.rdmag. 
com/rdlOO/IOOaward.htm). Winners are 
chosen by the magazine's editors and a 
panel of 75 respected scientific experts 
in a variety of disciplines. Previous 
winners of R&D 100 Awards include 
such well-known products as the flash- 
cube (1965), antilock brakes (1969), 
automated teller machine (1973), fax 
machine (1975), digital compact cassette 
(1993), and Taxol anticancer drug (1993). 

DOE Human Genome Program Report, Technology Transfer 

Research Narratives 

I /rum ail miUmidlal tl.\.\ siiimnciilf; liimlliilc lUpicH Ihv oriUr iifllic 
nfiiii III nl III,,,, ilinii snilhincs. ISmirci-: liiltia Aihworlli. l.l.M.I 

. I 1. 7; CniiilCii 



.'^li. f. '^. \ '0f'A 


»(t Wi^ 




Joint Genome Institute _ _ 26 

Lawrence Livermore National Laboratory _ ..._.27 

Los Alamos National Laboratory 35 

Lawrence Berkeley National Laboratory _ _.4i 

University of Washington Genome Center 47 

Genome Database _ 49 

National Center for Genome Resources _ ..55 

OOe Human GenorrM Program Report ^25 J 


Joint Genome Institute 

DOE Merges Sequencing Efforts of Genome Centers 

KUtert KrHnsLOfiiti 
JCil .Scientific Oitcctor 
La«Tence Livermorc 

Naliuniil Laboratory 
lOim l':as) Avenue. L-452 
I,ivern»ore,CA 94551 
Hhen^aht.Unl.^ov or 

In a major restructuring of its 
Human Genome Program, on 
October 23, 1996. the DOE 
Office of Biological and Envi- 
ronmental Research estab- 
lished the Joint Genome Institute (JGI) 
to integrate work based at its three 
major human genome centers. 

The JGI merger represents a shift to- 
ward large-scale sequencing via intensi- 
fied collaborations for more effective 
use of the unique expertise and resources 
at Lawrence Berkeley National Labora- 
tory (LBNL). Lawrence Livermore Na- 
tional Laboratory (LLNL), and Los 
Alamos National Laboratory (see Re- 
search Narratives, beginning on p. 27 in 
this report). Elbert Branscomb (LLNL) 
serves as JGI's Scientific Director. 
Capital equipment has been ordered, 
and operational support of about 
$30 million is projected for the 1998 
fiscal year. 

Production DNA Sequencing Begun Worldwide 

The year 1996 marked a transition to the final and most challenging 
phase of the U.S. Human Genome Project, as pilot programs aimed at 
refming large-scale sequencing strategies and resources were funded 
by DOE and NiH (see Research Higbli^ts. DNA Sequencing, p. 14). 
Inleraationally. large-scale human genome sequencing was kicked 
off in fate \99S when The Wellcome Trust announced a 7-year, 
$75-million grant lo the private Sanger Centre to scale up its sequenc- 
ing capabilities. French investigators also have announced intentions 
to begin production sequencing. 

Funding agencies woridwide agree that rapid and free release of data 
is critical. Other issues include sequence accuracy, types of ajwotaiion 
that will be most useful to biologists, and how to sustain the reference 

Tlie international Human Genome Organisation maintains a Web page 
to provide information on current and future sequencing projects and 
links to sites of participating groups {http://huso, The site 
also links to reports and resources developed at the February 1996 and 
1997 Bermuda meetings on large-scale human genome sequencing, 
which were sponsored by The Wellcome Trust. 

With easy access to both LBNL and 
LLNL. a building in Walnut Creek. 
California, is being modified. Here, 
starting in late FY 1998, production 
DNA sequencing will be carried out for 
JGI. Until that lime, large-scale se- 
quencing will continue at LANL. 
LBNL. and LLNL. Expectations are 
that within 3 to 4 years the Production 
Sequencing Facility will house some 
200 researchers and technicians work- 
ing on high -throughput DNA sequenc- 
ing using state-of-the-art robotics. 

Initial plans are to target gene-rich re- 
gions of around I to 10 megabases for 
sequencing. Considerations include gene 
density, gene families (especially clus- 
tered families), correlations to model 
organism results, technical capabilities. 
and relevance lo the DOE mission (e.g.. 
DNA repair, cancer susceptibility, and 
impact of genoioxins). The JGI program 
is subject to regular peer review. 

Sequence data will be posted daily on 
the Web; as the information progresses 
to finished quality, it will be submit- 
ted to public databases. 

As JGI and other investigators involved 
in ihe Human Genome Project are be- 
ginning lo reveal the DNA sequence of 
the 3 billion base pairs in a reference 
human genome, the data already are 
becoming valuable reagents for explora- 
tions of DNA sequence function in the 
body, sometimes called "functional 
genomics." Although large-scale se- 
quencing is JGI's major focus, another 
important goal will be to enrich the se- 
quence data with information about its 
biological function. One measure of 
JGI's progress will be its success at 
working with other DOE laboratories, 
genome centers, and non-DOE aca- 
demic and industrial collaborators. In 
this way. JGI's evolving capabilities can 
both serve and benefit from the widest 
array of partners. 

DOE Human Genome Program Report 


Research Narratives 

Lawrencv Livermore National Laborator> Human C^nome Center 

The Human Genome Center 
at Lawrence Livermore 
National Laboratory 
(LLNL) was established by 
DOE in 1991 . The center 
operates as a multidisciptinary team 
whose broad goal is understanding hu- 
man genetic material. It brings together 
chemists, biologists, molecular biolo- 
gists, physicists, mathematicians, com- 
puter scientists, and engineers in an 
interactive research environment fo- 
cused on mapping, DNA sequencing, 
and characterizing the human genome. 

Goals and Priorities 

In the past 2 years, the center's goals 
have undergone an exciting evolution. 
This change is the result of several fac- 
tors, both intrinsic and extrinsic to the 
Hunoan Genome Project. They include: 
( I ) successful completion of the 
center's fust-phase goal, namely a 
high -resolution, sequence -ready map of 
human chromosome 19; (2) advances in 
DNA sequencing that allow accelerated 
scaleup of this operation; and (3) devel- 
opment of a strategic plan for LLNL's 
Biology and Biotechnology Research 
Program that will integrate the center's 
resources and strengths in genomics 
with programs in structural biology, in- 
dividual susceptibility, medical biotech- 
nology, and microbial biotechnology 

The primary goal of LLNL's Human 
Genome Center is to characterize the 
mammalian genome at optimal resolu- 
tion and to provide information and ma- 
terial resources to other in-house or 
collaborative projects that allow exploi- 
tation of genonuc biology in a synergis- 
tic manner. DNA sequence information 
provides the biological driver for the 
center's priorities: 

• Generation of highly accurate se 
quence for chromosome 19. 

• Generation of highly accurate se- 
quence for genomic regions of high 
biological interest to the mission of 

the DOE Office of Biological and 
Environmental Research (e.g., genes 
involved in DNA repair, replication, 
recombination, xenobiotic metabo- 
lism, and cell-cycle control). 

• Isolation and sequence of the full in- 
sert of cDN A clones associated with 
genomic regions being sequenced. 

• Sequence of selected corresfmnding 
regions of the mouse genome in paral- 
lel with the human. 

" Armotation and position of the se- 
quenced clones with physical land- 
marks such as linkage markers and 
sequence tagged sites (STSs). 

• Generation of mapped chromo- 
some 19 and other genomic clones 
Icosmids, bacterial artificial chromo- 
somes (BACs). and PI artificial chro- 
mosomes (PACs)l for collaborating 

• Sharing of technology with other 
groups to minimize duplication of 


• Support of downstream biology 
projects, for example, structural 
biology, functional studies, human 
variation, transgenics, medical bio- 
technology, and microbial biotechnol- 
ogy with know-how. technology, and 
material resources. 

Center Organization 
and Activities 

Completion and publication of the metric 
physical map of human chromosome 19 
(see p. 28) in 1995 has led to consolida- 
tion of many functions associated with 
physical mapping, with increased empha- 
sis on DNA sequencing. The center is or- 
ganized into five broad areas of research 
and support: sequencing, resources, func- 
tional genomics, informatics and analyti- 
cal genomics, and instrumentation. Each 
area consists of multiple projects, and 
extensive interaction occurs both within 
and among projects. 

Hiinuin Cfenoroe Ontf r 
Iviiwrence Livt;miort Nulionat 

Hit^og>' and Dioted)niilog>' 

Research Program 
7000 Avenue I,-4S2 
Uvermorc, C A 94551 

Anthony V. Carrano 


510/422-5698. Fax; /4i3-3110 

carrano K^llnLgov 

Linda .Ashworih 
Assistant to Center CHrcct«r 
5I0.''422-56A5. Fax: -2282 

In heu of individual abstracts, 
research projects and investi- 
gators at the LLNL Human 
Genome Center are repre- 
sented in this narrative. More 
information can be found on 
the center's Web site (see URL 


In 1997 Lawrence Berkeley Na- 
tional Laboratory. Lawrence 
Liveraioic National Laboratory, 
and Los Alamos National Labora- 
tory began collaborating in a Joint 
Genome Insutute to implement 
high- throughput sequencing Isce 
p 26 and Human Genome News 
8(2). 1-21, 

DOE Human Ganome Program Report 


P-TEL r 00 








<0 CO 
oil UII 





19474 + 


23268 ,„+ 
33133 + 
28741 + 
33516 f 
18174 X 
20233 + 

Apa81 3 















29957 + ab1c12 

19401 + D19S373 

29192 + RPS15 



pi 3.3 








In the column labeled cosmid clones, black 
Indicates a FISH-ordered done where 
distance between clones has been 
measured Other cosmids are shown in 
red. Genes are In red to the left of the 
metric scale. Other markers are labeled In 
black. A disease associated with a specific 
gene is shown in blue 1o the hght of the 
metric scale. 

Restriction-mapped contig 

BAG, PAC. or Pi ckine 

YAC with known and concordant size 

YAC wtth unknown or discordant size 

+ Sequence lagged site (STS) 

STS and/or hybridization results 

§ Polymorphic marker 

Chronuisame 19 Map. In the 

current m/jp {nt left) of the first 
2 million bases at the p-telomtre 
end of chromosome 79, the 
IxoR I restriction-mapped 
contigs (represented hy red lines) 
provide the stoning material for 
genomic sequencing acmss a 

cosmid genes 2.0 Mb 

clones (red) 





The sequencing group is divided into 
several subprojects. The cote team is re- 
sponsible for the construction of se- 
quence libraries, sequencing reactions, 
and data collection for all templates in 
die random phase of sequencing. The 
finishing team worlcs with data pro- 
duced by the core team K) produce 
highly redundant, highly accurate "fin- 
ish" sequence on targets of interest Fi- 
nally, a ttam of researchers focuses 
specifically on dewlopmcnt, testing, 
and implementation of new protocols 

Construttitm of the human 
chrrjmrjsome 1 'J physical map was 
based on a similar strategy for 
mapping the roundworm 
Ciienorhjtxlitis elegans. Wfvv the 
complete map on the World Wide 
Web fhttp-./Zft-ww-bio.,' 

ISiturre Adapted ffiimjtgure pnnided 
hy Lmdo Arhwntib. UML} 

for the entire group, with an emphasis 
on improving the efficiency and cost ba- 
sis of the sequencing operation. 


The resources group provides mapped 
clonal resources to the sequencing 
teams. This group performs physical 
mapping as needed for the DNA se- 
quencing group by using fingerprinting, 
restriction mapping, fluorescence in situ 
hybridization, and other techniques. A 
small mapping effort is under way to 
identify, isol^e. and characterize BAC 

DOE Hunan Oanonw Program Raport, ULHL 



I Structure <»- Cytoticototal 

I Deveiopmentaf contr<d 

I Triinvcr^fon faetore 

I Energy metid»oHsm 

■■■■■■■■■1 C«tf sifffaee 

6 8 

Number of genes 



Putative-Gene Classification. Tfie figure depicts the functional classificatior. of putative ^enes identified in a !. 02-Mb 
region on the lon}{ arm of human thromfisome J9. Analy^>Ly f/fthe completed sequence between markers D19S20H and 
COX7A } revealed -tS open reading frames ( OKFsj or putative genes. <An URF i i a ON A region eontainins^ specific 
sequences that signal the beginning and ending of a gene.) 

Tkirry of these putative genes Mere found to ftave sequence similarities to a wide variety ofk3wy.n genes or proteins, 
including some involved in transcription, cell adliesion and signaling, and metabolism. Many appear to he related 
funciianaUy to such known proteins as the GJV-ase activating proteins or the ETS family of transcription factors. Others 
seem to be new members of existing gene families, for example, the mRNA splicing factor, or of such p.%eudogenes as the 
elongation factor Tu 

In addition to those that could he classified, l^ novel gene.'! were identified, including one with high sinularlrv to a 
predicted QUI- of unknown function in the roundworm Oienorhabditis eiegaiis. !S->iirce: AdofurJ frnm aniph I'vnided by 
Linda .\shwonSt, LLSI.{ 

clones (from anywhere in the human ge- 
nome) that relate to susceptibility genes, 
for example, DNA repair. These clones 
will be characterized and provided for 
sequencing and at the same time con- 
tribute to understanding the biology of 
the chromosome, the genome, and sus- 
ceptibility factors. The mapping team 
also collaborates with others using the 
chromosome 19 map as a resource for 
gene hunting. 

Functional Genomics 

The functional genomics team is respon- 
sible for assembhng and characterizing 
clones for the Integrated Molecular 
Analysis of Gene Expression (called 
IMAGE) Consortium and cDNA se- 
quencing, as well as for work on gene 
expression and comparative mouse 

genomics. The effort emphasizes genes 
involved in DNA repair and links 
strongly to LLNL's gene-expression and 
stnictural biology efforts. In addition, 
this team is working closely with Oak 
Ridge National Laboratory (ORNL) to 
develop a comparative map and the se- 
quence data for mouse regions syntenic 
to human chromosome 19 (see p. 32). 

Informatics and Analytical 

The informatics and analytical genom- 
ics group provides computer science 
support to biologists. The sequencing 
ixifonnatics team works directly with 
the DNA sequencing group to facilitate 
and automate sample handing, data ac- 
quisition and storage, and DNA se- 
quence analysis and annotation. The 

DOE Human Genome Pro9ram Report, LLNL 


analytical genomics team provides sta- 
tistical and advanced algorithmic exper- 
tise. Tasks include development of 
model-based methods for data capture, 
signal processing, and feature extraction 
for DNA sequence and fingerprinting 
data and analysis of the effectiveness of 
newly proposed methods for sequencing 
and mapping. 


The instrumentation group also has 
multiple components. Group members 
provide expertise in instrumentation and 
automation in high-throughput electro- 
phoresis, preparation of high-density 
replicate DNA and colony filters, fluo- 
rescence labeling technologies, and au- 
tomated sample handling for DNA 
sequencing. To facilitate seamless inte- 
gration of new technologies into pro- 
duction use, this group is coupled 
tightly to the biologist user groups and 
the informatics group. 


The center interacts extensively with 
other efforts within the LLNL Biology 
and Biotechnology Research Program 
and with other programs at LLNL, the 
academic community, other research in- 
stitutes, and industry. More than 250 
collaborations range from simple probe 
and clone sharing to detailed gene fam- 
ily studies. The following list reflects 
some major collaborations. 

• Integration of the genetic map of hu- 
man chromosome 19 with correspond- 
ing mouse chromosomes (ORNL). 

• Miniaturized polymerase chain reac- 
tion instrumentation (LLNL). 

• Sequencing of IMAGE Consortium 
cDNA clones (Washington Univer- 
sity, Sl Louis). 

• Mapping and sequencing of a gene 
associated with Fmnish congenital 
nephrotic syndrome (University of 
Oulu, Finland). 

^30.^ Human Qcnome Program Report, LLNL 


The LLNL Human Genome Center has 
excelled in several areas, including 
comparative genomic sequencing of 
DNA repair genes in human and rodent 
species, construction of a metric physi- 
cal map of human chromosome 19, and 
development and application of new 
biochemical and mathematical ap- 
proaches for constructing ordered clone 
maps. These and other major accom- 
plishments are highlighted below. 

• Completion of highly accurate se- 
quencing totaling 1.6 million bases 
of DNA, including regions spanning 
human DNA repair genes, the candi- 
date region for a congenital kidney 
disease gene, and other regions of 
biological interest on chromo- 
some 19. 

• Completion of comparative sequence 
analysis of 107,500 bases of genomic 
DNA encompassing the human DNA 
repair gene ERCC2 and the corre- 
sponding regions in mouse and ham- 
ster (p. 32). In addition to ERCC2. 
analysis revealed the presence of two 
previously undescribed genes in all 
three species. One of these genes is a 
new member of the kinesin motor 
protein family. These proteins play a 
wide variety of roles in the cell, in- 
cluding movement of chromosomes 
before cell division. 

• Complete sequencing of human ge- 
nomic regions containing two addi- 
tional DNA repair genes. One of 
these, XRCCi, maps to human chro- 
mosome 14 and encodes a protein 
that may be required for chromo- 
some stability. Analysis of the ge- 
nomic sequence identified another 
kinesin motor protein gene physi- 
cally linked to XRCCi. The second 
human repair gene. HHR23A. maps 
to 19pl3.2. Sequence analysis of 

1 10,000 bases containing HHR23A 
identified six other genes, five of 
which are new genes with similarity 


to proteins from mouse, human, 
yeast, and Caenorhabditis elegans. 

' Complete sequencing of full-length 
cDNAs for three new DNA repair 
genes (XRCC2. XRCC3, and X/iCC9) 
in collaboration with the IXNL DNA 
repair group. 

' Generation of a metric physical map 
of chromosome 19 spanning at least 
95% of the chromosome. This unique 
map incorporates a metric scale to 
estimate the distance between genes 
or other markers of interest to the 
genetics community. 

Assembly of nearly 45 million bases 
of EcoR I restriction-mapped cosmid 
contigs for human chromosome 19 
using a combination of fingerprinting 
and cosmid walking. Small gaps in 
cosmid continuity have been spanned 
by BAC, PAC, and PI clones, which 
are then integrated into the restriction 
maps. The high depth of coverage of 
these maps (average redundancy, 
4.3-fold) permits selection of a mini- 
mimi overlapping set of clones for 
DNA sequencing. 

Placement of more than 400 genes, 
genetic markers, and other loci on the 
chromosome 19 cosmid map. Also, 
165 new STSs associated with pre- 
mapped cosmid contigs were gener- 
ated and added to the physical map. 

Collaborations to identify the gene 
(COMP) responsible for two allelic 
genetic diseases, pseudoachondro- 
plasia and multiple epiphyseal dys- 
plasia, and the identification of 
specific mutations causing each 

Through sequence analysis of the 2A 
subfamily of the human cytochrome 
P450 enzymes, identification of a 
new variant that exists in 10% to 
20% of individuals and results in re- 
duced abihty to metabolize nicotine 
and the antiblood-clotting drug 

• Location of a zinc finger gene that 
encodes a transcription factor regu- 
lating blood-cell development adja- 
cent to telomere repeat sequences, 
possibly the gene nearest one end of 
chromosome 19. 

• Completion of the genomic and 
cDNA sequence of the gene for the 
human Rieske Fe-S protein involved 
in mitochondrial respiration. 

• Expansion of the mouse-human com- 
parative genomics collaboration with 
ORNL to include study of new 
groups of clustered transcription fac- 
tors found on human chromosome 
19q and as syntenic homologs on 
mouse chromosome 7 (p. 32). 

• Numerous collaborations (in particu- 
lar, with Washington University and 
Merck) continuing to expand the 
LLNL-based IMAGE Consortium, 
an effort to characterize the tran- 
scribed human genome. The IMAGE 
clone collection is now the largest 
public collection of sequenced cDNA 
clones, with more than one million 
arrayed clones, 800,000 sequences in 
public databases, and 10,000 mapped 

• Development and deployment of a 
comprehensive system to handle 
sample tracking needs of production 
DNA sequencing. The system com- 
bines databases and graphical inter- 
faces running on both Mac and Sun 
platforms and scales easily to handle 
large-scale production sequencing. 

• Expansion of the LLNL genome 
center's World Wide Web site to in- 
clude tables that link to each gene be- 
ing sequenced, to the quality scores 
and assembled bases collected each 
night during the sequencing process, 
and to the submitted GenBank se- 
quence when a clone is completed. 

OOE Human Genome Program Report, LLNL 


Hwnsn chrotaamofvutm 

» 4 B e 



9*° M 


■Ka ^ 






1,. 1 



y u 

Hunian-Mottse Honutlogies. IJ.NI r< wun hei I i\a 

Stuhbx (ahm>e) is shown in thp Mivne (jenetics 
Reseanh h'aciUly ui ORNL il)H>Jl.f,'tu>ti,i 

'I'lu'figun' at left demi'nstratis the gini'tic similarity 
[homology) of the superficially dissimilar mouse and 
huitiiw species. The .^itniiarity is .vuch thai human 
chronuisames can be cut (.^clumatically at into 
about 150 pieces (only about }0() are large enough 
to appear here}, then reassembled into a rea.ionahle 
appm.ximution of the mou.'ie genotne. The colors and 
corresponding numbers on the mouse chmmosomes 
indicate the human chromosomes containing 
homologous .'ieginems. i&mrvf: Siuiw>. i.lj^'i.j 

Comparative sequencing of homologous regions in 
human and mouse at LIJ^L has enhanced the ability 
to identify protein-coding (exon) and noncoding 
DN,\ regions that have remained unchanged over the 
course of evolution. Colors in the figure below depict 
similarities m mouse and human genes involved in 
DN.\ repair, a .-esearch interest rooted in DOE's 
mission to develop better technologies for measuring 
health effects, particularly mutations. JSouitk: l.inja 

.^\kvj>>rsh. UJJl.i 

ERCC2 Region 

Gene C  





Gene C 


11 '^ •' ' 'II 

tllili i 


5 kb Legend 

— Exons from "Gene C" 

— Exons of ERCC2 gene 

^325 DOE Human Genome Program Report, LLNL 

Exons of KLC2 gene 
Non-coding conserved element 


• Implementation of a new database to 
support sequencing and mapping 
work on multiple chromosomes and 
species. Web-based automated tools 
were developed to facilitate construc- 
tion of this database, the loading of 
over 100 million bytes of chromosome 
19 data from the existing LLNL data- 
base, and automated generation of 
Web-based input interfaces. 

• Significant enhancement of the 
LLNL Genome Graphical Database 
Browser software to display and link 
information obtained at a subcosmid 
resolution from both restriction map 
hybridization and sequence feature 
data. Features, such as genes linked 
to diseases, allow tracking to frag- 
ments as small as SOO base pairs of 

• Development of advanced micro- 
fabrication technologies to produce 
electrophoresis microchannels in 
large glass substrates for use in DNA 

• Installation of a new filter-spotting 
robot that routinely produces 6x6 

X 384 filters. A 16 x 16 x 384 pattern 
has been achieved. 

• Upgrade of the Lawrence Berkeley 
National Laboratory colony picker 
using a second computer so that im- 
aging and picking can occur simulta- 

Future Plans 

Genomic sequencing currently is the 
dominant function of Livermore's Hu- 
man Genome Center. The physical map- 
ping effort will ensure an ample supply 
of sequence-ready clones. For sequenc- 
ing targets on chromosome 19, this in- 
cludes ensuring that the most stable 
clones (cosmids, BACs, and PACs) are 
available for sequencing and that re- 
gions with such known physical land- 
marks as STSs and expressed sequenced 
tags (ESTs) are annotated to facilitate 
sequence assembly and analysis. The 

following targets are emphasized for 
DNA sequencing: 

• Regions of high gene density, includ- 
ing regions containing gene families. 

• Chromosome 19, of which at least 42 
million bases are sequence ready. 

• Selected BAC and PAC clones repre- 
senting regions of about 0.2 million 
to 1 million bases throughout the 
human genome; clones would be 
selected based on such high-priority 
biological targets as genes involved 
in DNA repair, replication, recombi- 
nation, xenobiotic metabolism, cell- 
cycle checkpoints, or other specific 
targets of interest. 

• Selected BAC and PAC clones from 
mouse regions syntenic with the 
genes indicated above. 

• Full-insert cDNAs corresponding to 
the genomic DNA being sequenced. 

The informatics team is continuing to 
deploy broader-based supporting data- 
bases for both mapping and sequencing. 
Where appropriate, Web- and Java-based 
tools are being developed to enable bi- 
ologists to interact with data. Recent re- 
organization within this group enables 
better direct support to the sequencing 
group, including evaluating and inter- 
facing sequence-assembly algorithms 
and analysis tools, data and process 
tracking, and other informatics func- 
tions that will streamline the sequencing 

The instrumentation effort has three 
major thrusts: (1) continued develop- 
ment or implementation of laboratory 
automation to support high-throughput 
sequencing: (2) development of the 

next-generation DNA sequencer; and » 

(3) development of robotics to support 
high-density BAC clone screening. The 
last two goals warrant further expla- 

The new DNA sequencer being devel- 
oped under a grant from the National 
Institutes of Health, with minor support 

DOE Human Genom* Program Report, LLNL 


through the DOE genome center, is de- 
signed to run 384 lanes simultaneously 
with a low- viscosity sieving medium. 
The entire system would be loaded au- 
tomatically, run, and set up for the next 
run at 3-hour intervals. If successful, it 
should provide a 20- to 40-fold increase 
in throughput over existing machines. 

An LLNL-designed high-precision spot- 
ting robot, which should allow a density 
of 98,304 spots in 96 cm^ is now oper- 
ating. The goal of this effort is to create 
high-density filters representing a I Ox 
BAC coverage of both human and 
mouse genomes (30,000 clones = Ix 
coverage). Thus each filter would pro- 
vide -3x coverage, and eight such filters 
would provide the desired coverage for 
both genomes. The filters would be hy- 
bridized with amplicons from individual 
or region-specific cDNAs and ESTs; 
given the density of the BAC libraries, 
clones that hybridize should represent a 
binned set of BACs for a region of in- 
terest. These BACs could be the initial 
substrate for a BAC sequencing strategy. 
Performing hybridizations in parallel in 
mouse and human DNA facilitates the 
development of the mouse map (with 
ORNL involvement), and sequencing 

BACs from both species identifies 
evolutionarily conserved and, perhaps, 
regulatory regions. 

Information generated by sequencing 
human and mouse DN.\ in parallel is 
expected to expand LLNL efforts in 
functional genomics. Comparative se- 
quence data will be used to develop a 
high-resolution synteny map of con- 
served mouse-human domains and 
incorporate automated northern ex- 
pression analysis of newly identified 
genes. Long range, the center hopes to 
take advantage of a variety of forms of 
expression analysis, including site- 
directed mutation analysis in the mouse. 


The Livermore Human Genome Center 
has undergone a dramatic shift in empha- 
sis toward commitment to large-scale, 
high-accuracy sequencing of chromo- 
some 19, other chromosomes, and tar- 
geted genomic regions in the human 
and mouse. The center also is commit- 
ted to exploiting sequence information 
for functional genomics studies and for 
other programs, both in house and 

g34S DOE Human Genomo Program Report. LLNL 


Research Narratives 

Los Alamos National Laboratory Center for Human Genome Studies 

Biological research was ini- 
tiated at Los Alamos Na- 
tional Laboratory (LANL) 
in the 1940s, when the 
laboratory began to inves- 
tigate the physiological and genetic 
consequences of radiation exposure. 
Eventual establishment of the national 
genetic sequence databank called 
GeoBank, the National Flow Cytometry 
Resource, numerous related individual 
research projects, and fulfillment of a key 
role in the National Laboratory Gene Li- 
brary Project all contributed to LANL's se- 
lection as the site for the Center for 
Human Genome Studies in 1988. 

Center Organization 
and Activities 

The LANL genome center is organized 
into four broad areas of research and sup- 
port: Physical Mapping, DNA Sequenc- 
ing, Technology Development, and 
Biological Interfaces. Each area consists 
of a variety of projects, and work is dis- 
tributed among five LANL Divisions 
(Life Sciences; Theoretical, Computing. 
Information, and Communications; 
Chemical Science and Technology; and 
Engineering Sciences and Applications). 
Extensive interdisciplinary interactions 
are eacouraged. 

Physical Mapping 

The construction of chromosome- and 
region-specific cosmid, bacterial artifi- 
cial chromosome (BAC), and yeast artifi- 
cial chromosome (YAC) recombinant 
DNA libraries is a primary focus of 
physical mapping activities at LANL. 
Specific work includes the construction 
of high-resolution maps of human chro- 
mosomes 5 and 16 and associated 
informatics and gene discovery tasks. 


• Completion of an integrated physical 
map of human chronK)Some 16 con- 
sisting of both a low -resolution YAC 

contig map and a high-resolution 
cosmid contig map (pp. 37-39). 
With sequence tagged site (STS) 
markers provided on average every 
i25,0(X) bases, the YAC-STS map 
provides almost-complete coverage 
of the chromoson>e's euchromatic 
arms. All available loci continue to 
be iiKorporated into the map. 

• Construction of a low-resolution STS 
map of human chromosome 5 con- 
sisting of 517 STS markers region- 
ally assigned by somatic-cell hybrid 
approaches. Around 95% mega- 
YAC-STS coverage (50 million 
bases) of 5p has been achieved. Ad- 
ditionally, about 40 million bases of 
5q mega- YAC-STS coverage have 
been obtained collaboratively. 

• Refinement of BAC cloning proce- 
dures for future production of 
chromosome-specific libraries. 
Successful partial digestion and clon- 
ing of microgram quantities of chro- 
mosomal DNA embedded in agarose 
plugs. Efforts continue to iitcrease 
the average insert size to about 
100.000 bases. 

DNA Sequencing 

DNA sequencing at the LANL center 
focuses on low-pass sample sequencing 
(S ASE) of large genomic regions. SASE 
data is deposited in publicly available 
databases to allow for wide distribution. 
Finished sequencing is prioritized from 
initial SASE analysis and pursued by par- 
allel primer walking. Infonnatics devel- 
opment includes data tracking, gene- 
discovery integration with the Sequence 
Comparison ANalysis (SCAN) program, 
and functional genomics interaction. 


• SASE sequencing of 1.5 million 
bases from the pl3 region of human 
chromosome 16. 

• Discovery of more than 100 genes in 
SASE sequences. 

Crnirr fnr Human Genome 

LosAlammNatiooal Laboratory 
P.O. Box 1663 
Los ,Vlamo5.NM 87545 

I>arry L. Deaven 
Acting Director 
505/667-3912. Fax: -2891 

Lynn Clark 
Technicitl Cnordinatnr 
505/667-9376. Fax: -2891 
cUirk@telomere. lattLgo v 

Robert K. Moyzi.1 
Director. 1V8S-W* 

In lieu of individual abstracts, 
research projects and investi- 
gators at the LANL Center for 
Human Genome Studies are 
represented in this narrative. 
More inforrnation can be found 
on the center's Web site (see 
URL above). 


In 1997 Lawrence Berkeley Na- 
tional Laboratory, Lawrence 
Livennore National Laboratory, 
and Los Alamos National Labora- 
tory began collaborating in a Joint 
Genome Institute to iinplemem 
high-throughpui sequencing (see 
p. 26 and Human Genome News 
8(2). 1-2]. 

*Now at Uoiversiiy of Califor- 
ma. Irvine 

DOE Human Gcnont* Program Report ^35j 


• Generation of finished sequence 
for a 240,000-base telomeric re- 
gion of human chromosome 7q. 
From initial sequences generated 
by SASE, oligonucleotides were 
synthesized and used for primer 
walking directly from cosmids 
comprising the contig map. Com- 
plete sequencing was performed to 
determine what genes, if any, are 
near the 7q terminus. This intri- 
guing region lacks significant 
blocks of subtelomeric repeat DNA 
typically present near eukaryotic 

• Complete single-pass sequencing of 
2018 exon clones generated from 
LANL's flow-sorted human chromo- 
some 16 cosmid library. About 950 
discrete sequences were identified by 
sequence analysis. Nearly 800 appear 
to represent expressed sequences 
from chromosome 16. 

• Development of Sequence Viewer to 
display ABI sequences with trace 
data on any computer having an 
Internet connection and a Netscape 
World Wide Web browser. 

• Sequencing and analysis of a novel 
pericentromeric duplication of a 
gene-rich cluster between 16pll.l 
and Xq28 (in collaboration with 
Baylor College of Medicine). 

Technology Development 

Technology development encompasses 
a variety of activities, both short and 
long term, including novel vectors for 
library construction and physical map- 
ping; automation and robotics tools for 
physical mapping and sequencing; 
novel approaches to DNA sequencing 
involving single-molecule detection; 
and novel approaches to informatics 
tools for gene identification. 

Accomplishmen ts 

• Development of SCAN program for 
large-scale sequence analysis and an- 
notation, including a translator con- 
verting SCAN data to GIO format for 
submission to Genome Sequence 

• Application of flow-cytometric ap- 
proach to DNA sizing of PI artificial 
chromosome (PAC) clones. Less than 
one picogram of linear or supercoiled 
DNA is analyzed in under 3 minutes. 
Sizing range has been extended 
down to 287 base pairs. Efforts con- 
tinue to extend the upper limit be- 
yond 167,000 bases. 

• Characterization of the detection of 
single, fluorescently tagged nucleo- 
tides cleaved from multiple DNA 
fragments suspended in the flow 
stream of a flow cytometer (see pic- 
ture, p. 70). The cleavage rate for 
Exo III at 37°C was measured to be 
about 5 base pairs per second per 
M13 DNA fragment. To achieve a 
single-color sequencing demonstra- 
tion, either the background burst rate 
(currently about 5 bursts per second) 
must be reduced or the exonuclease 
cleavage rate must be increased sig- 
nificantly. Techniques to achieve 
both are being explored. 

• Construction of a simple and com- 
pact apparatus, based on a diode- 
pumped Nd:YAG laser, for routine 
DNA fragment sizing. 

• Development of a new approach to 
detect coding sequences in DNA. 
This complete spectral analysis of 
coding and noncoding sequences is 
as sensitive in its first implementa- 
tions as the best existing techniques. 

• Use of phylogenetic relationships to 
generate new profiles of amino acid 
usage in conserved domains. The 
profiles are particularly useful for 
classification of distantly related 

OOE Human Genome Program Report, LANL 


Biological Interfaces 

The Biological Interfaces effort targets 
genes and chromosome regions asso- 
ciated with DNA damage and repair, 
mitotic stability, and chromosome struc- 
ture and function as primary subjects 
for physical mapping and sequencing. 
Specific disease-associated genes on 
human chromosome 5 (e.g., Cri-du-Chat 
syndrome) and on 16 (e.g.. Batten's dis- 
ease and Fanconi anemia) are the sub- 
jects of collaborative biological 


• Identification of two human 7q exons 
having 99% homology to the cDNA 
of a known human gene, vasoactive 
intestinal peptide receptor 2A. Pre- 
liminary data suggests that the 
VIPR2A gene is expressed. 

• Identification of numerous expressed 
sequence tags (ESTs) localized to the 
7q region. Since three of the ESTs 
contain at least two regions with high 
confidence of homology (-90%), 
genes in addition to VIPR2A may 
exist in the terminal region of 7q. 

• Generation of high-resolution cosmid 
coverage on human chromosome 5p 
for the larynx and critical regions 
identified with Cri-du-Chat syndrome, 
the most common human terminal- 
deletion syndrome (in collaboration 
with Thomas Jefferson University). 

• Refinement of the Wolf-Hirschhom 
syndrome (WHS) critical region on 
human chromosome 4p. Using the 
SCAN program to identify genes 
likely to contribute to WHS, the 
project serves as a model for defining 
the interaction between genomic se- 
quencing and clinical research. 

• Collaborative construction of contigs 
for human chromosome 16, includ- 
ing 1 .05 million bases in cosmids 
through the familial Mediterranean 
fever (FMF) gene region (with 

members of the FMF Consortium) 
and 700,000 bases in PI clones en- 
compassing the polycystic kidney 
disease gene (with Integrated 
Genetics, Inc.). 

Collaborative identification and de- 
termination of the complete genomic 
structiu^ of the Batten's disease gene 
(with members of the BDG Consor- 
tium), the gamma subunit of the hu- 
man amiloride-sensitive epithelial 
channel (Liddle's syndrome, with 
University of Iowa), and the polycys- 
tic kidney disease gene (with Inte- 
grated Genetics). 

Participation in an international col- 
laborative research consortium that 
successfully identified the gene re- 
sponsible for Fanconi anemia type A. 

Chromosome 16 Physical Map {pp. 3li-39). A condensed throiiw.some 16 
physical map constructed at Los Alamos National Laboratory iLANL) i< 
shown in two parts on the foltowir.^ pages. Besides facilitating the isolation 
and charactnrizatinn oj di.wase genes, the inap provides the framework for 
a large-scale sequencing effort by lANL Ihe Institute for Genomic 
Research, and the Sanger Centre. 

Distinct types of maps and data are shown as levels or tiers on the 
integrated map. /\t the lop of each page is a view /if the handed human 
chromosome to which the map is alr^ned. A somalic-cel! hybrid breakpoint 
map, which divides the chromosome into 90 mlen'a's. was u.ied as a 
backbone for much of the map integration. 

The physual map consists of both a low-resolution artificial 
chromosome I YAC) c.ontig map localized to and ordered within the 
breakpoint intervals with .sequence tagged. sites (SfSs) and a high- 
resolution bacteria-ba.sed clone map. Vie YAC-STS map provides almost 
complete coverage of the chrnino.^ome's euchromatic arm, with STS 
markers on average every 100.000 bases. 

A high- resolution, sequence-ready cosmid contig map is anchored to the 
MC and breakpoint maps via STSs developed f mm cosmid contigs and by 
hybridizations berv^een YACs and cosmids. 

.As part of the ongoing effort to incorporate all available loci onto a single 
map of this chromosome, the integrated map also features genes, expressed 
sequence tags, e.Kons (gene-coding regions), and genetic markers. 

The mouse chromosome segments at the bottom of the map contain gn.^iips 
that correspond to human genes mapped to the regionx shown above them. 

ISouKe: tvoimim Oogj>en. L\t^LJ 

DOE Human Genome Program Report, LANL 


' * 5 S a 1 *■ - "* 

DOG Htm«n Q«nom« Program Report LANL 








OOE Human Q«nome Program R^iort, LANL 


miJtMifvj tia iin«r" 

bi^iidmmi ij 

The exhibit "Un.., 

Science Museum tn Ijj.\ AUiina.\, At iw Mcxim. de\LnbtA ike lJu\L Onier 
for Human Genome Sttuiies ' conrributions to the Human Genome Project. 
The exftihit's centerpiece m a Id-foof-long version o/LANL'x niap of human 

chnmiOSOmc 16. {Sourrr. I AM Cmltr/iii Humtin Ctnomr Slurli>fsj 

Patents, Licenses, and 

• Rhen L. Affleck, James N. Demas, 
Peter M Gocxiwin. Jay A. Schecker. 
Ming Wu. and Richard A. Keller, 
"Reduction of Diffusional Defocusing 
in Hydrodynamically Focused Flows 
by Complexing with a High Molecular 
Weight Adduct," United States Patent, 
filed December 1996. 

• R.L.Affleck, W.P.Ambrose, J. D. 
Demas, P.M. Goodwin. M.E. Johnson, 
R.A. Keller. J.T. Petty. J.A. Schecker. 
and M. Wu. "Photobleaching to Re- 
duce or Eliminate Luminescent [mpu- 
rilies for Ultrasensitive Luminescence 
Analysis." United States Patent, S-87, 
208, accepted September 1997. 

DOE Human Genome Program Report. LANL 

' J.H. Jeti, M.L. Hammond. 
R.A. KeUer. B.L. 
Marrone, and J.C. Martin, 
"DNA Fragment Sizing 
and Sorting by Laser- 
Induced Fluorescence." 
United Slates Patent. 
S.N. 75.001. allowed 
May 1996. 

 James H. Jett. "Method 
for Rapid Base Sequenc- 
ing in DNA and RNA 
with Three Base Label- 
ing," in preparatioD. 

" Development license and 
exclusive license to 
LANL's DNA sizing 
patent obtained by Mo- 
lecular Technology. Inc.. 
for commercialization of 
single -molecule detection 
capability to DNA sizing. 

Future Plans 

LANL has joined a collabo- 
ration with California Institute of Tech- 
nology and The Institute for Genomic 
Research to construct a BAC map of 
the p arm of human chromosome 1 6 
and to complete the sequence of a 20- 
railUon-base region of this map. 

In its evolving role as part of the new 
DOE Joint Genome Institute. LANL 
will continue scaleup activities focused 
on high- throughput DNA sequencing. 
Initial targets include genes and DNA 
regions associated with chromosome 
structure and function, syntenic break- 
points, and relevant disease-gene loci. 

A joint DNA sequencing center was es- 
tablished recently by LANL at the Uni- 
versity of New Mexico. This facility is 
responsible for determining the DNA 
sequence of clones constructed at LANL. 
then returning the data to LANL for 
analysis and archiving. 


Research Narratives 

Lawrence Berkeley National Laboratory Human Genome Center 


Since 1937 the Ernest Or- 
lando Lawrence Berkeley 
National Laboratory 
(LBNL) has been a major 
contributor to knowledge 
about human health effects resulting 
from energy production and use. That 
was the year John Lawrence went to 
Berkeley to use his brother Ernest's 
cyclotrons to launch the application of 
radioactive isotopes in biological and 
medical research. Fifty years later. 
Berkeley Lab's Human Genome Center 
was established. 

Now. after another decade, an expansion 
of biological research relevant to Hu- 
man Genome Project goals is being car- 
ried out within the Life Sciences 
Division, with support from the Infor- 
mation and Computing Sciences and 
Engineenng divisions. Individuals in 
these research projects are making 
important new contributions to the 
key fields of molecular, cellular, and 
structural biology; physical chemistry; 
data management; and scientific instru- 
mentation. Additionally, industry in- 
volvement in this growing venture is 
stimulated by Berkeley Lab's location 
in the San Francisco Bay area, home to 
the largest congregation of biotechnol- 
ogy research facilities in the world. 

In July 1997 the Berkeley genome 
center became pan of the Joint Genome 
Institute (see p. 26). 


Large-scale genomic sequencing has 
been a central, ongoing activity at Ber- 
keley Lab since 1991. It has been 
funded jointly by DOE (for human ge- 
nome production sequencing and tech- 
nology development) and the NIH 
National Human Genome Research In- 
stitute [for sequencing the Drosophila 
melanogaster model system, which is 
carried out in partnership with the Uni- 
versity of California, Berkeley (UCB)l. 
The human genome sequencing area at 
Berkeley Lab consists of five groups: 

Bio instrumentation. Automation. 
Informatics, Biology, and Development. 
Complementing these activities is a 
group in Life Sciences Division devoted 
to functional genomics, including the 
transgenics program. 

The directed DNA sequencing strategy 
at Berkeley Lab was designed and 
implemented to increase the efficiency 
of genomic sequencing (see figure, 
p. 45). A key element of the directed ap- 
proach is maintaining information about 
the relative positions of potential se- 
quencing templates throughout the entire 
sequencing process. Thus, intelligent 
choices can be made about which tem- 
plates to sequence, and the number of 
selected templates can be kept to a 
minimum. More important, knowledge 
of the interrelationship of sequencing 
runs guides the assembly process, mak- 
ing it more resistant to difficulties im- 
posed by repeated sequences. As of 
July 3. 1997. Berkeley Lab had generated 
4.4 megabases of human sequence and. 
in collaboration with UCB. had tallied 
7.6 megabases of Drosophila sequence. 

Instrumentation and 

The instrumentation and automation 
program encompasses the design and 
fabrication of custom apparatus to facili- 
tate experiments, the programming of 
laboratory robots to automate repetitive 
procedures, and the development of 
(I) improved hardware to extend the 
applicability range of existing commer- 
cial robots and (2) an integrated operat- 
ing system to control and monitor 
experiments. Although some discrete 
instrumentation modules used in the 
integrated protocols are obtained com- 
mercially. LBNL designs its own custom 
instruments wher existing capabilities are 
inadequate. The instnimentalion modules 
arc then integrated into a large system 
to faciliute large-scale production 
sequencing. In addition, a significant 
effort is devoted to improving 


Hiimjin Cennme Crntfr 
Lawrence Herlteky National 

Tji bora lory 
1 ( ycloinm R^tad 
Berkeley, CA W720 

Mohanda.v NjHa 
.';i0/4«6-7029. Fax: -*,?4fi 
mohatulas narta*f'macmaiUbLgov 

Jovct PfeifTer 
Administrativt; As^stant 

Mich^iel Paluxznlo* 
OirT*-lor, 1996-97 

In lieu of individual abstracts, 
research projects and investi- 
gators at the LBNL Human 
Genome Center are repre- 
sented in this narrative. More 
information can be found on 
the centers Web site (see URL 


In 1997 Lawrence Berkeley Na- 
tional Laboratory. Lawrence 
Ltvemion: National Laboruory, 
and Los Alamos National Labora- 
tory began collaborating In a Joint 
Genome Institute to implement 
high-throughput sequencing Isee 
p 26 and Human Genome News 
8(2). l-2|. 

•Now at Amgco, Inc. 
Hurrwn Genom* Program Report ^41 \ 


DSA Prep Machine. The DiS'A 
Frrp moihine {above) was 
designed by fterlu'fey Lab *v 
Martin PolianJ to perform 
plasmtd preparation on 192 
samples (2 rmcrofiter plates) 
in about 2. .*> to 4 hours. 
depending on the protocol. 
Controlled by a personal 
computer running a Visual 
Bfi.\ic Control program, the 
instrument includes a gantry 
robot equipped v/ith pipettor^. 
reagent dispensers, hot and 
cold temperature stations, arid 
a r>neumatic gripper. fStmn.-r: 

fluorescence-assay methods, including 
DNA sequence analysis and mass spec- 
trometry for molecular sizing. 

Recent advances in the instrumentation 
group include DNA Prep machine and 
Prep Track. These instruments are de- 
signed to automate completely the highly 
repetitive and labor-intensive DNA- 
preparation procedure to provide higher 
daily throughput and DNA of consistent 
quality for sequencing (see photos, p. 43. 
and Web pages; 
esd/DNAPrep/TitlePage.html and http:// 

Berkeley Lab's near-term needs are for 
960 samples per day of DNA extracted 
from ovemighi bacteria growths. The 
DNA protocol is a modified boil prep 
prepared in a %-well format. Overnight 
bacteria growths arc lysed, and samples 
are separated from cell debris by cen- 
thfugation The DNA is recovered by 
ethanol precipitation. 


The informatics group is focused on 
hardware and software support and 
system administration, software 

development for end sequencing, 
transposon mapping and sequence tem- 
plate selection, data-flow automation, 
gene finding, and sequence analysis 
Data-flow automation is the main em- 
phasis. Six key steps have been identi- 
fied in this process, and software is 
being written and tested to automate all 
six. The first step involves controlling 
gel quality, trimming vector sequence, 
and storing the sequences in a database. 
A program module called Move-Track- 
Tnm, which is now used in production, 
was written to handle these steps. The 
second through fourth steps in this pro- 
cess involve assembling, editing, and 
reconstructing PI clones of 80,000 base 
pairs from 400-base traces. The fifth 
step is sequence annotation, and the 
sixth is data submission. 

Annotation can greatly enhance the bio- 
logical value of these sequences. Useful 
annotations include homologies to 
known genes, possible gene locations, 
and gene signals such as promoters. 
LBNL is developing a workbench for 
automatic sequence annotation and an 
notation viewing and editing. The goal 
is to run a series of sequence -analysis 
tools and display the results to compare 
the various predictions. Researchers 
then will be able to examine all the an- 
notations (for example, genes predicted 
by various gene-finding methods) and 
select the ones that look best. 

Nomi Hams developed Genotator, an 
annotation workbench consisting of a 
stand-alone annotation browser and sev- 
eral sequence-analysis functions. The 
back end runs several gene finders, 
homology searches (using BLAST), 
and signal searches and saves the results 
in ".ace" format Genotator thus auto- 
mates the tedious process of operating a 
dozen different sequence-analysis pro- 
grams with many different input and 
output formats. Genotator can function 
via command-line arguments or with 
the graphical user interface {http:// 

DOE Human Genonw Program Raport, LBNL 


Prep Track. Developed at the Berkeley Lab. Prep Track is a 

high-throughput, microtiter-plate. liquid-handling rot'Otic 

system for automatins; DSA preparation procedures. 

Microtiwr plates anf fetched from cassettes, moved to one of 

fM'o conveyor belts, and transported to protocol-defined modules. 

Plates are moved continuously and automatically through the system as each module 

simultaneously processes plates in the module lift statiotts. Vne plates exit the system and are 

stored in microtiter-plate cassetre<. 

Modules include a station capable of dispensing liquid<i in volumes from as lov.' iv: 5 nucnyliters 
to several milliliters, four 96-rhannel pipettors. arui the plate -fetching module. I'.ach moflulc is 
controlled independently bv programmable io'^ic controllers (PLCs). The overall system is 
controlled by a personal computer and a Visuid Hasie Control master that determines thf order 
in which plare-i are processed. Tfie actions of each lift station and dfspen.ser or pipettor are 
determined locally hy programs resident in each module \ PLC The Visual liasic Control 
pnjgram moves the plates Inrow^h the system based on the predefined protocol and on module 
status reports av monitowd bv PLCs. 

The cunvnt belt length on the Prep Track supports eight standard modules, which can be 
reconfigured to any order .Standardization of mechanical, electrical, and communication 
components allows new modules to he designed and manufactured easilv. 'The current standard 
module footprint >s 250 mm wuie. 600 mm deep, and 250 mm to tlie conveyor belt decL The first 
protocol to be implemented on Prep Track will be polymerase chain reaction setups, y/ith 
sequence- reaction setups to follow. {S.mrce: UiSL] 

DOE Human Genome Program Report, LBNL 


Progress to Date 
Chromosome 5 

Over the last year, the center has focused 
its production genomic sequencing on the 
distal 40 megabases of the human chro- 
mosome 5 long arm. This region was cho- 
sen because it contains a cluster of growth 
factor and receptor genes and is likely to 
yield new and functionally related genes 
through long-range sequence analysis. 
Results to date include: 

• 40-megabase nonchimeric map con- 
taining 82 yeast artificial chromosomes 
(YACs) in the chromosome 5 distal 
long arm. 

• 20-megabase contig map in the region 
of 5q23-q33 that contains 198 Pis, 60 
PI artificial chromosomes, and 495 
bacterial artificial chromosomes 
(BACs) linked by 563 sequenced 
tagged sites (STSs) to form contigs. 

• 20-megabase bins containing 370 BACs 
in 74 bins in the region of 5q33-q35. 

Chromosome 21 

An early project in the study of Down 
syndrome (DS), which is characterized by 
chromosome 2 1 trisomy, constructed a 
high-resolution clone map in the chromo- 
some 2 1 DS region to be used as a pilot 
study in generating a contiguous gene 
map for all of chromosome 2 1 . This 
project has integrated PI mapping efforts 
with transgenic studies in the Life Sci- 
ences Division. PI maps provide a suit- 
able form of genomic DNA for isolating 
and mapping cDNA. 

• 1 86 clones isolated in the major DS re- 
gion of chromosome 2 1 comprising 
about 3 megabases of genomic DNA 
extending from D2IS17 to ETS2. 
Through cross-hybridization, overlap- 
ping Pis were identified, as well as 
gaps between two PI contigs, and 
transgenic mice were created from PI 
clones in the DS region for use in phe- 
notypic studies. 

Transgenic Mice 

One of the approaches for determining 
the biological function of newly identi- 
fied genes uses YAC transgenic mice. 
Human sequence harbored by YACs in 
transgenic mice has been shown to be 
correctly regulated both temporally and 
spatially. A set of nonchimeric overlap- 
ping YACs identified from the 5q31 re- 
gion has been used to create transgenic 
mice. This set of transgenic mice, which 
together harbor 1.5 megabases of hu- 
man sequence, will be used to assess the 
expression pattern and potential func- 
tion of putative genes discovered in the 
5q3 1 region. Additional mapping and 
sequencing are under way in a region of 
human chromosome 20 amplified in 
certain breast tumor cell lines. 

Resource for Molecular 

Divining landmarks for human disease 
amid the enormous plain of the human 
genetic map is the mission of an ambi- 
tious partnership among the Berkeley 
Lab; University of California, San Fran- 
cisco; and a diagnostics company. The 
collaborative Resource for Molecular 
Cytogenetics is charting a course toward 
important sites of biological interest on 
the 23 pairs of human chromosomes 

The Resource employs the many tools 
of molecular cytogenetics. The most 
basic of these tools, and the cornerstone 
of the Resource's portfolio of proprietary 
technology, is a method generally known 
as "chromo.some painting," which uses 
a technique referred to as fluorescence 
in situ hybridization or FISH. This tech- 
nology was invented by LBNL Re- 
source leaders Joe Gray and Dan Pinkel. 

A technology to emerge recently from 
the Resource is known as "Quantitative 
DNA Fiber Mapping (QDFM)." High- 
resolution human genome maps in a 
form suitable for DNA sequencing tra- 
ditionally have been constructed by 

S447 DOE Human Oenome Program Report, LBNL 



sts 2 sts 3 


physical mapping clones 
genomic region 




300 (kb) 

1 - shear and subclone physical mapping clone 
2- generate spanning set using end sequencing 

single mapping clone 

minimal spanning set 
of 3-kb subclones 







80 (kb) 

generate set of transposon insertions 
in each 3-kb subclone in the spanning set 


v¥^ ¥ 

mapped transposons 
(subset to be 
sequenced shown 
in solid color) 



2 sequencing runs 
from each selected 



+200 +400 (bp) 

Sequtncing Strategy. The directed sequencing smaegy used a; LBNL involves four steps: (1) generate a 
Pi -based ph.ysii.iil map {using STS-content mapping} to provide a set ofmininiaHy overlapping clones, 
1 2) shear and subclone each PI clone into i-idlohase fragments and identify a minimallv overlapping 
siihcione set, (3) generate and map transposon inserts in each subclone, and (4) .sequence using 
commercial pnmer-binding sites engineered into the tran.spo.son. Subclone secjuences ore then assembled 
arid edited, and the gaps are identified. Pi clones are reeonstnieted, and the resulting composite data is 
analyzed, annotated and finally submitted to the diitniiases. The production .sequencing effort has 
generated 12 of finished, double-stranded genomic ONA .sequence from both Drosophila 
and human templates. ISowcc: Adapted from f.eun; provuied h\ UiSLl 

various methods of fingerprinting, hybrid- 
ization, and identification of overlapping 
STSs. However, these techniques do not 
readily yield information about sequence 
orientation, the extent of overlap of these 
elements, or the size of gaps in the map. 
Ulli Weier of the Resource developed the 
QDRVI method of physical map assembly 
that enables the mapping of cloned DNA 
directly onto linear, fully extended DNA 

molecules. QDFM allows unambiguous 
assembly of critical elements leading to 
high-resolution physical maps. This task 
now can be accomplished in less than 
2 days, as compared with weeks by con- 
ventional methods. QDFM also enables 
detection and characterization of gaps in 
existing physical maps — a crucial step 
toward completing a definitive human 
genome map. 

DOE Human Genome Program Report, LBNL 


g469 DOE Human G*nome Program Report 


Research Narratives 
University of Washington Genome Center 

The Human Geoome Project 
soon will need lo increase 
rapidly the scale at which 
human DNA is analyzed. 
The ultimate goal is to de- 
termine the order of the 3 billion bases 
that encode all heritable information. 
During the 20 years since effective 
methods were introduced to carry out 
DNA sequencing by biochemical analy- 
sis of recotnbinant-DNA molecules, 
these techniques have improved dra- 
matically. In the late 1970s, segments of 
DNA spatming a few thousand bases 
challenged the capacity of world-class 
sequencing laboratories. Now. a few 
million base pairs per year represent 
state-of-the-art output for a single se- 
quencing center. 

However, the Human Genome Project is 
directed toward completing the human 
sequence in 5 to 10 yean, so the data 
must be acquired with technology avail- 
able now. This goal, while cleariy fea- 
sible, poses substantial organizational 
and technical challenges. Organization- 
ally, genome centers must begin build- 
ing data-production units capable of 
sustained, cost-effective operation. 
Technically, many incremental rcfme- 
ments of current technology must be in- 
troduced, particulariy those that remove 
impediments to iiKreasing the scale of 
DNA sequencing. The University of 
Washington (UW) Genome Center is 
active in both areas. 

The HLA locus encodes genes that must 
be closely matched between organ donors 
and organ recipients. This sequence data 
is expected to lead to long-term improve- 
ments in the ability to achieve good 
matches between unrelated organ donors 
and recipients. 

The mouse locus that encodes compo- 
nents of the T-cell-receptor family is of 
interest for several reasons. The locus 
specifies a set of proteins that play a 
critical role in cell-mediated immune re- 
sponses. It provides sequence data that 
will help in the design of new experi- 
mental approaches to the study of immu- 
nity in mice — one of the most important 
experimental animals for immunological 
research. In addition, the locus will pro- 
vide one of the first large blocks of DNA 
sequence for which both human and 
mouse versions are known. 

Human-mouse sequence comparisons 
provide a powerful means of identifying 
the most important biological features of 
DNA sequence because these features are 
often highly conserved, even between 
such biologically different organisms as 
human and mouse. Finally, sequencing 
an "anonymous" region of human chro- 
mosome ?. a region about which little 
was known previously, provides experi- 
ence in carrying out laige-scale sequenc- 
ing under the conditions that will prevail 
throughout most of the Human Genome 

UniifTjitj of Wa-shington 

(veROme Center 
Department of Medldnc 
Box 352145 
Saltle, WA 9«195 

Maynard Olson 
Direct (H* 

M6/ASS-7.Ui6. Fax: -7344 
mvo&u. teaihingtoiv edu 

For more information on 
research projects and investi- 
gators at the University of 
Washington Geoome Center, 
see abstracts in Part 2 of this 
report and the center's Web 
site (see URL above). 

Production Sequencing 

Both to gain experience in the production 
of high-quality, low-cost DNA sequence 
and to generate data of immediate bio- 
logical interest, the center is sequencing 
several regions of human and mouse 
DNA at a current throughput of 2 mil- 
lion bases per year. This "production se- 
quencing" has three major targets: the 
human leukocyte antigen (HLA) locus 
on human chromosome 6, the mouse lo- 
cus encoding the alpha subunit of T-cell 
receptors, and an "anonymous" region 
of human chromosome 7. 

Technology for Large- 
Scale Sequencing 

In addition to these pilot projects, the 
UW Genome Center is developing incre- 
mental improvements in current sequenc- 
ing technology. A particular focus is on 
enhanced computer software to process 
raw data acquired with automated labora- 
tory instruments that are used in DNA 
mapping and sequencing. Advanced in- 
stniroentation is commercially available 
for determining DNA sequence via the 
"four-color-fluorescence method," and 
this instrumentation is expected to carry 

DOE Human GancnM Program Report 


the main experimental load of the Human 
Genome Project. Raw data produced by 
these instruments, however, require ex- 
tensive processing before they are ready 
for biological analysis. 

Large-scale sequencing involves a "divide- 
and-conquer" strategy in which the huge 
DNA molecules present in human cells 
are broken into smaller pieces that can be 
propagated by recombinant- DNA 
methods. Individual analyses ultimately 
are carried out on segments of less than 
1 000 bases. Many such analyses, each of 
which still contains numerous errors, must 
be melded together to obtain finished se- 
quence. During the melding, errors in in- 
dividual analyses must be recognized and 
corrected. In typical large-scale sequenc- 
ing projects, the results of thousands of 
analyses are melded to produce highly 
accurate sequence (less than one error in 
10,000 bases) that is continuous in 
blocks of 100,000 or more bases. The 
UW Genome Center is playing a major 
role in developing software that allows this 
process to be carried out automatically 
with little need for expert intervention. 
Software developed in the UW center is 
used in more than 50 sequencing laborato- 
ries around the world, including most of 
the latge-scale sequencing centers produc- 
ing data for the Human Genome Project. 

Physical Mapping 

The UW Genome Center also is develop- 
ing improved software that addresses a 
higher-level problem in large-scale se- 
quencing. The starting point for large-scale 
sequencing typically is a recombinant- 
DNA molecule that allows propagation 
of a particular human genomic segment 
spanning 50,000 to 200,000 bases. 
Much effort during the last decade has 
gone into the physical mapping of such 
molecules, a process that allows huge 
regions of chromosomes to be defmed 

in terms of sets of overlapping 
recombinant-DNA molecules whose 
precise positions along the chromosome 
are known. However, the precision re- 
quired for knowing relationships of 
recombinant-DNA molecules derived 
from neighboring chromosomal por- 
tions increases as the Human Genome 
Project shifts its emphasis from map- 
ping to sequencing. 

High-resolution maps both guide the or- 
derly sequencing of chromosomes and 
play a critical role in quality control. 
Only by mapping recombinant-DNA 
molecules at high resolution can subtle 
defects in particular molecules be rec- 
ognized. Such defective human DNA 
sources, which are not faithful replicas 
of the human genome, must be weeded 
out before sequencing can begin. The 
UW Genome Center has a major program 
in high-resolution physical mapping 
which, like the work on sequencing it- 
self, uses advanced computing tools. 
The center is producing maps of regions 
targeted for sequencing on a just-in- 
time basis. These highly detailed maps 
are proving extremely valuable in fa- 
cilitating the production of high-quality 

Ultimate Goal 

Although many challenges currently 
posed by the Human Genome Project 
are highly technical, the ultimate goal is 
biological. The project will deliver 
immense amounts of high-quality, 
continuous DNA sequence into pub- 
licly accessible databases. These data 
will be annotated so that biologists who 
use them will know the most likely 
positions of genes and have convenient 
access to the best available clues about 
the probable function of these genes. 
The better the technical solutions to cur- 
rent challenges, the better the center 
will be able to serve future users of the 
hiunan genome sequence. 

DOE Human Genome Program Report, University of Washington 


Research Narratives 

Genome Database 

The release of Version 6 of the 
Genome Database (GDB) in 
January 1996 signaled a ma- 
jor change for both the scien- 
tific community and GDB 
staff. GDB 6.0 introduced a number of 
significant improvements over previous 
versions of GDB, most notably a revised 
data representation for genes and ge- 
nomic maps and a new curatorial model 
for the database. These new features, 
along with a remodeled database structure 
and new schema and user interface, pro- 
vide a resource with the potential to inte- 
grate alt scientific information currently 
available on human genomics. GDB rap- 
idly is becoming the international biomedi- 
cal research community's central source 
for information about genomic structure. 
content, diversity, and evolution. 

A New Data Model 

Inherent in the underlying organization of 
information in GDB is an improved 
model for genes, maps, and other classes 
of data. In particular, genomic segments 
(any named region of the genome) and 
maps are being expanded regularly. New 
segment types have been added to support 
ti)e integration of mapping and sequencing 
data (for exanq}le, gene elements and n:- 
peals) and the construction of comparative 
maps (syntenic regions). New map types 
include comparative maps for represent- 
ing conserved syntenies between species 
and comprehensive maps that combine 
data from all the various submitted maps 
within GDB to provide a single integrated 
view of the genome. Experimental obser- 
vations such as order, size, distance, and 
chimerism are also available. 

Through the Worid Wide Web. GDB links 
its stored data with many other biological 
resources on the Internet. GDB's External 
Link category is a growing collection of 
cross-references established between 
GDB entities and related information in 
other databases. By providing a place for 
these cross-references, GDB can serve as 
a central point of inquiry into technical 
data regarding human genomics. 

Direct Comraimity 
Data Submission and 

Two methods for data submission are in 
use. For individuals submitting small 
amounts of data, interactive editing of 
the database through the Web became 
available in April 1996. and the process 
has undergone several simplifications 
since that time. This continues to be an 
area of development for GDB because 
all editing must take place at the Balti- 
more site, and Internet connections 
from outside North America may be too 
slow for interactive editing to be practi- 
cal. Until these difficulties are resolved, 
GDB encourages scientist with limited 
connectivity to Baltimore to submit 
their dau via more traditional means 
(e-mail, fax, mail, phone) or to prepare 
electronic submissions for entry by the 
data group on site. 

For centers submitting large quantities 
of data, GDB developed an electronic 
data submission (EDS) tool, which pro- 
vides the means to specify login pass- 
word validation and commands for 
inserting and updating data in GDB. 
The EDS syntax includes a mechanism 
for relating a center's local naming con- 
ventions to GDB objects. Data submit- 
ted to GDB may be stored privately for 
up to 6 months before it automatically 
becomes public. The database is pro- 
grammed to enforce this Human Genonte 
Project policy. Detailed specifications 
of GDB's EDS syntax and other sub- 
mission instructions are available (EDS 
prototype, hnp:/ 

Since the EDS system was imple- 
mented, GDB has put forth an aggres- 
sive effort to increase the amount of 
data stored in the database. Conse- 
quently, the database has grown tremen- 
dously. During 19% it grew from 1.8 to 
6.7 gigabytes. 

To fnovide accountability regarding data 
quality, (he shift to community curation 
introduced the idea that individuals and 

Johns Hopktn.s I'niventUy 
2024 E. Monument Street 
Baltimore, MD 21205-223^ 

St a tile} Letovsky 
Informutics Dirrctor 

Robert C otUnghant 
Operations Director 

Telephone for both: 4I0''955-970S 
Fix for both: 41(V614-0434 

David Kingsbur> 
Director, 1993-97* 

In lieu of individual abstracts, 
research projects and investi- 
gators at GDB are represented 
in this narrative. More infor- 
mation can be found on GDB's 
Web site (see URL above). 

*Now at Chiron Phaimaccuti- 
cals. Emetyvillc Califomia 

DOE Humcn Genoma Program Report ^491 


laboratories own the data they submit to 
GDB and that other researchers cannot 
modify it. However, others should be 
able to add infoimation and comments, 
so an additional feature is the commu- 
nity's ability to conduct electronic 
online public discussions by annotating 
the database submissions of fellow re- 
searchers. GDB is the first database of 
its kind to offer this feature, and the 
number of third-party aimotations is 
increasing in the form of editorial com- 
mentary, links to literature citations, and 
links to other databases external to 
GDB. These links are an important part 
of the curatorial process because they 
make other data collections available to 
GDB users in an appropriate context. 

Improved Map 
and Querying 

Accompanying the release of GDB 6.0, 
the program Mapview creates graphical 
displays of maps. Mapview was devel- 
oped at GDB to display a number of 
map types (cytogenetic, radiation hybrid, 
contig, and linkage) using common 
graphical conventions found in the lit- 
erature. Mapview is designed to stand 
alone or to be used in conjunction with 
a Web browser such as Netscape, thereby 
creating an interactive graphical display 
system. When used with Netscape, 
Mapview allows the user to retrieve de- 
tails about any displayed map object. 

Maps are accessed through the query 
form for genomic segment and its sub- 
classes via a special program that al- 
lows the user to select whole maps or 
slices of maps from specific regions of 
interest and to query by map type. The 
ability to browse maps stored in GDB 
or download them in the background 
was also incorporated into GDB 6.0. 

GDB stores many maps of each chro- 
mosome, generated by a variety of map- 
ping methods. Users who are interested 

in a region, such as the neighborhood of 
a gene or marker, will be able to see all 
maps that have data in that region, 
whether or not they contain the desired 
marker. To support database querying 
by region of interest, integrated maps 
have been developed that combine data 
from all the maps for each chromosome. 
These are called Comprehensive Maps. 

Queries for all loci in a region of inter- 
est are processed against the compre- 
hensive maps, thereby searching all 
relevant maps. Comprehensive maps are 
also useful for display purposes because 
they organize the content of a region by 
class of locus (e.g., gene, marker, clone) 
rather than by data source. This approach 
yields a much less complex presentation 
than an alignment of numerous primary 
maps. Because such information as de- 
tailed orders, order discrepancies be- 
tween maps, and nonlinear metric 
relations between maps is not always 
captured in the comprehensive maps, 
GDB continues to provide access to 
aligned displays of primary maps. 

A Variety of Searching 

Recognizing the eclectic user commu- 
nity's need to search data and formulate 
queries, GDB offers a spectrum of 
simple to complex search strategies. In 
addition, direct programming access is 
available using either GDB's object 
query language to the Object Broker 
software layer or standard query lan- 
guage to the underlying Sybase rela- 
tional database. 

Querying by Object Directly 
from GDB's Home Page 

The simplest methods search for objects 
according to known GDB accession 
numbers; sequence database-accession 
numbers; specified names, including 
wiidcanl symbols that will automatically 
match synonyms and primary names; and 
keywords contained anywhere in the text. 

DOE Human Genome Program Report, GDB 


Querying by Region of Interest Work in Progress 

A region of interest can be specified us- 
ing a pair of flanking markers, which 
can be cytogenetic bands, genes, 
ampUmers (sequence tagged sites), or 
any other mapped objects. Given a re- 
gion of interest, the comprehensive 
maps are searched to find all loci that 
fall within them. These loci can be dis- 
played in a table, graphically as a slice 
through a comprehensive map, or as 
slices through a chosen set of primary 
maps. A comprehensive map slice 
shows all loci in the region, including 
genes, expressed sequence tags (ESTs), 
ampUmers, and clones. A region also 
can be specified as a neighborhood 
around a single marker of interest. 

Results of queries for genes, amplimers, 
ESTs, or clones can be displayed on a 
GDB comprehensive map. Results are 
spread across several chromosomes dis- 
played in Mapview (see figure, p. 52). A 
query for all the PAX genes (specified 
as symbol = PAX* on the gene query 
form) retrieves genes on multiple chro- 
mosomes. Double-clicking on one of 
these genes brings up detailed gene in- 
formation via the Web browser. 

Querying by Polymorphism 

GDB contains a large niunber of poly- 
morphisms associated with genes and 
other markers. Queries can be con- 
structed for a particular type of marker 
(e.g., gene, amplimer, clone), polymor- 
phism (i.e., dinucleotide repeat), or 
level of heterozygosity. These queries 
can be combined with positional queries 
to find, for example, polymorphic 
amplimers in a region bounded by 
flanking markers or in a particular chro- 
mosomal band. If desired, the retrieved 
markers can be viewed on a comprehen- 
sive map. 

Mapview 23 

Mapview 2. 1 , the next generation of the 
GDB map viewer, was released in 
March 1997. The latest version, 
Mapview 2.3, is available in all com- 
mon computing environments because 
it is written in the Java programming 
language. Most important, the new 
viewer can display multiple aligned 
maps side by side in the window, with 
alignment lines indicating common 
markers in neighboring maps. As be- 
fore, users can select individual markers 
to retrieve more information about them 
from the database. 

GDB developers have entered into a 
collaborative relationship with other 
members of the bio Widget Consortium 
so the Java-based alignment viewer will 
become part of a collection of freely 
available software tools for displaying 
biological data ( 

Future plans for Mapview include pro- 
viding or enhancing the ability to gener- 
ate manuscript-ready Postscript map 
images, highlight or modify the display 
of particular classes of map objects 
based on attribute values, and requery 
for additional information. 


Since its inception, GDB has been a re- 
pository for polymorphism data, with 
more than 18, (KM) polymorphisms now 
in GDB. A collaboration has been initi- 
ated with the Human Gene Mutation 
Database (HGMD) based in Cardiff, 
Wales, and headed by David Cooper 
and Michael Krawczak. HGMD's ex- 
tensive collection of human mutation 
data, covering many disease-causing 
loci, includes sequence -level mutation 
characterizations. This data set will be 
included in GDB and updated from 
HGMD on an ongoing basis. The 
HGMD team also will provide advice 

OOE Human Genome Prtigram Report. GOB >S5^ ' 


OD GDB's representation of geneuc 
variation, which is being enhanced to 
model mutations and polymorphisms at 
the sequence level. These modifications 
will allow GDB to act as a repository 
for single- nucleotide polymorphisms, 
which are expected to be a major source 
of information on human genetic varia- 
tion in the near future. 

Mouse Synteny 

Genomic relationships between mouse 
and man provide important clues regard- 
ing gene location, phenotype. and fuiK- 
tion (see figure, p. 53). One of GDB's 
goals is to enable direct comparisons be- 
tween these two organisms, in collabora- 
tion with the Mouse Genome Database 

DOE Human G«nom« Program Raport* GOB 


Human Map 

Mouse Maps 

Jte»a« 5«prt™i£ twfcagp 


DOE Human Cenomc Program Report, GDB ^53 ? 


at Jackson Laboratory. GDB is making 
additions to its schema to represent this 
information so that it can be displayed 
graphically with Mapview. In addition, 
algorithmic work is under way to use 
mapping data to automatically identify 
regions of conserved synteny between 
mouse and man. These algorithms will 
allow the synteny maps to be updated 
regularly. An important application of 
comparative mapping is the ability to 
predict the existence and location of un- 
known human homologs of known, 
mapped mouse genes. A set of such pre- 
dictions is available in a report at the 
GDB Web site, and similar data will be 
available in the database itself in the 
spring of 1998. 


GDB is a participant in the Genome 
Annotation Consortium (GAC) project, 
whose goal is to produce high-quality, 
automatic annotation of genomic se- 
quences (hnp:// 
CoLab). Currently, GDB is developing 
a prototype mechanism to transition 
from GDB's Mapview display to the 
GAC sequence-level browser over 
common genome regions. GAC also 
will establish a human genome refer- 
ence sequence that will be the base 
against which GDB will refer all poly- 
morphisms and mutations. Ultimately, 
every genomic object in GDB should be 
related to an appropriate region of the 
reference sequence. 

Sequencing Progress 

The sequencing status of genomic re- 
gions now can be recorded in GDB. 

Based on submissions to sequence data- 
bases, GAC will determine genomic re- 
gions that have been completed. GDB 
also will be collaborating with the Euro- 
pean Bioinformatics Institute, in con- 
junction with the international Human 
Genome Organisation (HUGO), to 
maintain a single shared Human Se- 
quence Index that will record commit- 
ments and status for sequencing clones 
or regions. As a result, the sequencing 
status of any region can be displayed 
alongside other GDB mapping data. 


The Genome Database continues to 
seek direct community feedback and in- 
teract with the broader science commu- 
nity via various sources: 

• International Scientific Advisory 
Committee meets annually to offer 
input and advice. 

• Quarterly Review Committee confers 
frequently with the staff to track 
GDB progress and suggest change. 

• HUGO nomenclature, chromosome, 
and other editorial committees have 
specialized functions within GDB, 
providing official names and consen- 
sus maps and ensuring the high qual- 
ity of GDB's content. 

Copies of GDB are available worldwide 
firom ten mirror sites (nodes) that make 
the data more easily accessible to the in- 
ternational research community. GDB 
staff meet annually with node managers 
to facilitate interaction and to benefit 
from other user perspectives. 

DOE Human Genome Program Report, GOB 


Research Narratives 
National Center for Genome Resources 

The National Center for 
Genome Resources 
(NCGR) is a not-for- 
profit organization cre- 
ated to design, develop, 
support, and deliver resources in sup- 
port of public and private genome and 
genetic research. To accomplish these 
goals, NCGR is developing and publish- 
ing the Genome Sequence DataBase 
(GSDB) and the Genetics and Public 
Issues tGPI) program. 

NCGR is a center to facilitate the flow 
of information and resources ftx)ra ge- 
nome projects into both public and pri- 
vate sectors. A broadly based board of 
governors provides direction and strat- 
egy for the center's development. 

NCGR opened in Santa Fe in July 1994. 
with its initial bioioformatics work 
being developed through a coopera- 
tive 5-year agreement with the Depart- 
ment of Energy funded in July 1995. 
Committed to serving as a resource for 
all genomic research, the center 
works collaboratively with researchers 
and seeks input from users to ensure 
that tools and projects under develop- 
ment meet their needs. 

Genome Sequence 

GSDB is a relational database that con- 
tains nucleotide sequence data (see pie 
chart) and its associated annotation 
from all known organisms (http:// All data are freely 
available to the public. The major goals 
of GSDB are to provide the support 
structure for storing sequence data and 
to fiimish useful data-retrieval services. 

GSDB adheres to the philosophy that 
the database is a "community-owned" 
resource that should be simple to update 
to reflect new discoveries about se- 
quences. A corollary to this is GSDB's 
conviction that researchers know their 
areas of expertise much better than a 
database curator and, therefore, tbey 

should be given ownership and control 
over the data they submit to the data- 
base. The true role of the GSDB staff is 
to help researchers submit data to and 
retrieve data from the database. 

GSDB Enhancements 

During 1996, GSDB underwent a major 
renovation to support new data types 
and concepts that are important to ge- 
nomic research. Tables within the data- 
base were restructured, and new tables 
and data fields were added. Some key 
additions to GSDB include die support 
of data ownership, sequence align- 
ments, and discontiguous sequences. 

The concept of data ownership is a cor- 
nerstone to the functioning of the new 
GSDB. Every piece of data (e.g.. se- 
quence or feature) within the database is 
owned by the submitting researcher, and 
changes can be made only by the data 
owner or GSDB staff. This implementa- 
tion of data ownership provides GSDB 
with the ability to support community 
(third-party) aimotation — the addition 
of annotation to a sequence by other 
conununity researchers. 

Genome Sequence DataBu'W: 
1 mo Old Pectis Trail. Suite A 
SanU Fc Wl S7505 

Peter Schad 

Vice-President, Bioiufonuatics 

and Biotechnology 
505/995-4447, Fax: -4432 

CbtxA Harger 
GSDB Manager 
5ft5.'"982.784fl, Fax: -7690 

In lieu of individual abstracts, 
research projects and investi- 
gators at NCGR are repre- 
sented in this narrative. More 
information can be found on 
the center's Web site (see URL 

7VJI.V chari Ulustraies the 

taxonomic distribution ofihe 
L076.4HI.W2 base pairs in the 
Cenomv Sequence DataHase. 
About 47% oj the ba.\e pairs 
and 58% of the total database 
records represent huirum 
sequences (August 1997). 

jS-^ittre: Adaptfi! from chan ;triivaUd 
h\ Carol Herder, GSDI) I 

Rodent -•• Primate 

DOE Human Genome Program Report ^^55^ 

<i_'^n ofi 


A second enhancement of GSDB is the 
ability to store and represent sequence 
alignments. GSDB staff has been con- 
structing alignments to several key se- 
quences including the env and pol 
(reverse transcriptase) genes of the HIV 
genome, the complete chromosome VIII 
of Saccharomyces cerevisiae, and the 
complete genome of Haemophilus 
influenzae. These alignments are useful 
as possible sites of biological interest and 
for rapidly identifying differences be- 
tween sequences. 

A third key GSDB enhancement is the 
ability to represent known relationships 
of order and distance between separate 
individual pieces of sequence. These 
sets of sequences and their relative posi- 
tions are grouped together as a single 
discontiguous sequence. Such a sequence 
may be as simple as two primers that de- 
fine the ends of a sequence tagged site 
(STS), it may comprise all exons that are 
part of a single gene, or it may be as 
complex as the STS map for an entire 

GSDB staff has constructed discontigu- 
ous sequences for human chromosomes I 
through 22 and X that include markers 
from Massachusetts Institute of Technol- 
ogy-Whitehead Institute STS maps and 
from the Stanford Human Genome Cen- 
ter. The set of 2000 STS markers for 
chromosome X, which were mapped re- 
cently by Washington University at 
St. Louis, also have been added to chro- 
mosome X. About 50 genomic sequences 
have been added to the chromosome 22 
map by determining their overlap with 
STS markers. Genomic sequences are 
being added to all the chromosomes as 
their overlap with the STS markers is 
determined. These discontiguous se- 
quences can be retrieved easily and 
viewed via their sequence names using 
the GSDB Annotator Sequence names 
follow the format of HUMCHR#MP, 
where # equals 1 through 22 or X. 

GSDB staff also has utilized discontigu- 
ous sequences to construct maps for 
maize and rice. The maize discontiguous 
DOE Human Genome Program Report, NCGR 

sequences were constructed using mark- 
ers from the University of Missouri, 
Columbia. Markers for the rice 
discontiguous sequence were obtained 
from the Rice Genome Database at 
Cornell University and the Rice Ge- 
nome Research Project in Japan. 

New Tools 

As a result of the major GSDB renova- 
tion, new tools were needed for submit- 
ting and accessing database data. 
Annotator was developed as a graphical 
interface that can be used to view, up- 
date, and submit sequence data (http:// Maestro, 
a Web-based interface, was developed 
to assist researchers in data retrieval 
html). Although both these tools cur- 
rently are available to researchers, 
GSDB is continuing development to 
add increased capabilities. 

Annotator displays a sequence and its 
associated biological information as an 
image, with the scale of the image ad- 
justable by the user. Additional informa- 
tion about the sequence or an associate 
biological feature can be obtained in a 
pop-up window. Annotator also allows a 
user to retrieve a sequence for review, 
edit existing data, or add armotation to 
the record. Sequences can be created us- 
ing Annotator, and any sequences cre- 
ated or edited can be saved either to a 
local file for later review and further ed- 
iting or saved directly to the database. 

Correct database structures are impor- 
tant for storing data and providing the 
research community with tools for 
searching and retrieving data. GSDB is 
making a concerted effort to expand and 
improve these services. The first gen- 
eration of the Maestro query tool is 
available from the GSDB Web pages. 
Maestro allows researchers to perform 
queries on 18 different fields, some of 
which are queryable only through 
GSDB, for example, D segment num- 
bers from the Genome Database at 
Johns Hopkins University in Baltimore. 


Additionally, Maestro allows queries 
with mixed Boolean operators for a 
more refined search. For example, a 
user may wish to compare relatively 
long mouse and human sequences that 
do not contain identified coding re- 
gions. To obtain all sequences meeting 
these criteria, the scientific name field 
would be searched first for "Mus mus- 
culus" and then for "Homo sapiens" us- 
ing the Boolean term "OR." Then the 
sequence -length filter could be used to 
refine the search to sequences longer 
than 10,000 base pairs. To exclude se- 
quences containing identified coding-re- 
gion features, the "BUT NOT' term can 
be used with the Feature query field set 
equal to "coding region." 

With Maestro, users can view the list of 
search matches a few at a time and re- 
trieve more of the list as needed. From 
the list, users can select one or several 
sequences according to their short de- 
scriptions and review or download the 
sequence information in GIO, FASTA, 
or GSDB flatfile format 

Future Plans 

Although most pieces necessary for op- 
eration are now in place, GSDB is still 
improving functionality and adding en- 
hancements. During the next year 
GSDB, in collaboration with other re- 
searchers, anticipates creating more 
discontiguous sequence maps for sev- 
eral model organisms, adding more 
functionality to and providing a Web- 
based submission tool and tool kit for 
creating GIO files. 

the organism being sequenced, sequenc- 
ing groups involved, background infor- 
mation on the organism, and its current 
location on the Carl Woese Tree of Life. 
As the Microbial Genome Project 
progresses, the pages will be updated as 

Genetics and Public 
Issues Program 

GPI serves as a crucial resource for 
people seeking information and making 
decisions about genetics or genomics 
( GPI develops 
and provides information that explains 
the ethical, legal, policy, and social rel- 
evance of genetic discoveries and appli- 

To achieve its mission, GPI has set forth 
three goals: (1) preparation and devel- 
opment of resources, including carefiil 
delineation of ethical, legal, policy, and 
social issues in genetics and genomics; 
(2) dissemination of genetic information 
targeted to the public, legal and health 
professionals, policymakers, and deci- 
sion makers; and (3) creation of an in- 
formation network to faciUtate 
interaction among groups. 

GPI delivers information through four 
primary vehicles: online resources, con- 
ferences, publications, and educational 
programs. The GPI program maintains a 
continually evolving World Wide Web 
site containing a range of material 
freely accessible over the Internet. 

Microbial Genome 
Web Pages 

NCGR also maintains informational 
Web pages on microbial genomes. 
These pages, created as a community 
reference, contain a list of current or 
completed eubacterial, Archaeal, and 
eukaryotic genome sequencing projects. 
Each main page includes the name of 

DOE Human Genome Program Report. NCGR 


g58 7 DOE Human Ganonw program Report 


Program Management 

The Human Genome Program 
was conceived in 1986 as an 
initiadve within the DOE Of- 
fice of Health and Environ- 
mental Research, which has 
been renamed Office of Biological and 
Environmental Research (OBER) (see 
chart below). The program is administered 
primarily through the OBER Health Ef- 
fects and Life Sciences Research Division 
(HELSRD), both directed by David A. 
Smith until his retirement in January 
1996. Marvin Frazier is now Director of 
HELSRD, and OBER is led by Associ- 
ate Director Aristides Patrinos. who also 
serves as Human Genome Program 
manager. Previous directors and manag- 
ers are listed in the Ubie below. OBER 
is within the Office of Energy Research, 
directed by Martha Krebs. 

DOE OBER Mission 

Based on mandates from Congress. 
DOE OBER's principal missions are to 
(1) develop the knowledge necessary to 
identify, understand, and anticipate 
long-term health and environmental 
consequences of energy use and devel- 
opment and (2) employ DOE's unique 
scientific and technological capabilities 
in solving major scientific problems in 
medicine, biology, and the environment 

Genome integrity and radiation biology 
have been a long-term concern of 
OBER at IX)E and its predecessors — 
the Atomic Energy Commission (AEC) 
and the Energy Research and Develop- 
ment Administration (ERDA). In the 
United States, the first federal support 


Sec Appendix A, p. 73. for 
information on Human 
Genome Project history, 
including enubliiiji 




Projects at Universities, 

National Laboratories, 

and lr>dustrial Instrtuttorw 

OHER Associate 
or Acting Directors 

CtmAn 0« Lw t885 
Robert W.Wood 1987 
OavHjJ. QaUis 1990 

Office of 
Energy Researeti 



Blologica) and 

ErrvironmentaJ Research 

Advisory Committee 

Office of Biological 

and Environmental 


KAertt Panel 

Health Effects 
arxl Ufe Sciences 
Research Dtviston 

Human GervxTie 
Task Group 


Human Genome 
Program Managers 

Ber^amm J. Bamhart 
Aristides Paeino» 


Institutions Conducting 


Genome Research 

DOE national taboratorlss 


Academic institutions 


Privale-ssctDr irttlitulicms 


C<8npanies. indudmg SmaN 


Business Innovation Raseareh 

Foreiipiinsttutions (Russia. 


Canada, i»(aet> 

DOE Human GanonM Program Report 'SSSll 


DOE Human Genome Task Group 


Chain Arisfides Patrtnos 
Beniamln J. Bamhart 
Elbert Branscotnb 

Oantel W. Drelt 

Lududg Feinendegen 
Marvtn Frazler 
Gerald Go<d«teln' 
D. Jay Qrimes* 
Roland HIrscb 
Anna Palml»ano*' 
Michael Riches 
Jay Snodity* 
Marvtn Stodolcky 
Davtd Q. Thomassen 
John C. Wooley 


Physical sciences 
Genetics, Radiation biotogy 
Sdentific Director, Joint Qenotne 

Biology. ELSt, tnfomiatics. 

Microbial genome 
Medicine, Radiation biology 
Molecular and ceSular biology 
Physical sciance, lt«tnjmeMation 

Structural biology, Instatmentation 
Physical science* 
NiScrobiology, MicrtAnal genome 
Phy^cal sciences 
Molecular biology, tntorroatics 
Molecular biology, Biopbyacs 
Ceii ar«d mdecular biology 
Computational biology 

•Joined. 1997. 
'Lett 0868,1997. 

Biotechnology Consortium 

Chair: ArisUdes Patrlnos 

Charles Arntzen* 
Elbert Branscomb 

Charles Cantor 
Anthony Carrano 

Thomas Caskey 
David Elsenberg 
Chris Fieids' 
David Gtalas 
Raymond Qesteland 
Keltti Kodsson 
David Kingsbury^ 
Robert Moyzis' 
Mohandas Naria* 
Michael Paiazzolo 
Melvin Simon* 
HamlHon Smith* 

Lloyd SmlS) 

Edwvd Uberfaacher 
Marc Van Montagu* 
Executive Officer: 
SyWIa SpenglM- 

OOE Otftce of Biological and 

Environmental Research 
Corrwll University 
Lawrence Livermore t^tional 

Boston University 
Lawrence Uvemiore National 

Merck Rassard^ Laboratories 
University at CaBforrga, Los Angeles 
National Center for Genome Resources 
Oarwin Motecutar, Inc. 
University of Utah 
Stanford University 
Untvarsity of Washington, Seattle 
Chiron Pharmaceutic's 
University of Cafifomia, lr\nne 
Lawrence Beri^elay National uax>ratory 
Amgen, fnc. 

Cafifomia fnslilute of Te<^c*ogy 
Johns Hopkins University School of 

Uryversily of Wfisconsin, MaSson 
Lawrence Uvemwre National 

Oak ffidga National Laboratory 
Ghent University, Belgium 
Lawrence Berkeley National Laboratory 

•Appaented aftar Octotiar 1996, 

'Fl9sigr»e<l. 1997. 

Not* All mambBfs of ttw 0O6 Human Genome Ta^ Group ar« ftK^officio 

nwntMrs of ttt» 8i<}t9<5lwK)k>gy Consortium. 

for genetic research was through AEC, In 
the early days of nuclear energy develop- 
ment, the focus was on radiation effects 
and broadened later under ERDA and 
DOE to include health implications of all 
energy technologies and their by-products. 

Today, extensive OBER-sponsored re- 
search programs on genomic structure, 
maintenance, damage, and repair con- 
tinue at the national laboratories and uni- 
versities. These and other OBER 
efforts support a DOE shift toward a pre- 
ventive approach to health, environment, 
and safety concerns. World-class scien- 
tists in top facilities working on leading- 
edge problems spawn the knowledge to 
revolutionize the technology, drive the 
future, and add value to the U.S. 
economy. Major OBER research includes 
characterization of DNA repair genes and 
improvement of methodologies and re- 
sources for quantifying and characteriz- 
ing genetic polymorphisms and their 
relationship to genetic susceptibilities. 

To carry out its national research and de- 
velopment obligations, OBER conducts 
the following activities: 

• Sponsors peer-reviewed research and 
development projects at universities, 
in the private sector, and at DOE na- 
tional laboratories (see box, p. 59). 

• Considers novel, beneficial initiatives 
with input from the scientific commu- 
nity and governmental sectors. 

• Provides expertise to various govern- 
mental working groups. 

• Supports the capabilities of multi- 
disciplinary DOE national laborato- 
ries and their unique user facilities 
for the nation's benefit (p. 61). 

Human Genome Program resources and 
technologies are focused on sequencing 
the human genome and related infor- 
matics and supportive infrastructure (see 
chart and tables, p. 62). The genomes of 
selected microorganisms are analyzed 
under the separate Microbial Genome 

BOT* OOE Human Genome Program Report, Program Management 


Major DOE User Facilities and Resources 
Relevant to Molecular Biology Research 

Although the genome program Ls contributing fundame mat information about the structure of chromoxomes 
and genes, other types of knowledge are required to understand how genes and their products functitm. Three- 
dimensional protein structure studies are still essential because structure cannot be predicted fully from its 
encoded DNA sequence. 

To enhance these and other studies, DOK builds and maintains structural biology user facilities that enable 
scientists to gain an understanding of relationships between biological structures and their functions, study 
disease processes, develop new pharmaceuticals, and conduct basic research in molecular biology and 
environmental processes. These resources are used heavily by both academic and private-sector scientists. 

Other important resources available to the research community include the clone libraries developed in the 
National laboratory Ocne lihrary Project and distrihuifd worldwide, the GkAll, Online Sequence 
Interpretation .Service, and the Mouse Genetics Research Facility. 

Argonne National Laboratory 
Advanced Photon Source 

Brookhaven National Laboratory 
High-Flux Beam Reactor 
National Synchrotron Light Source 
Protein Structure Data Bank 
Scanning Transmission Electron IMIcroscope 

Lawrence Berkeley Itlational Laboratory 
Advanced Light Source 
Center for X-Ftay Optics 
National Energy Research Scientific Computing Center 

Lawrence Livermore National Laboratory 
National Laboratory Gene Library Project 

Los Alamos National Laboratory 

National Flow-Cytometry Resource 
National Laboratory Gene Library Project 
Neutron-Scattering Center 

Oak Ridge National Laboratory 

GRAIL, Online Sequence Interpretation Service 
Mouse Genetics Research Facility 

Pacific Northwest National Laboratory 

Environmental Molecular Sciences Laboratory 

Stanford University 

Synchrotron Radiation Laboratory 

OOE Human Genome Program Report, Program Management 


Human Genome Program 


Operating Expenditures and FY 1998 Projected Budget 
(or the DOE Human Genome Program 

Coordination and Resources 

Program coordination is the responsibility of the Human Genome Task Group (see 
box, p. 60), which, beginning in 1997, includes Elbert Branscomb, the Joint Genome 
Institute's Scientific Director. The task group is aided by the Biotechnology Consor- 
tium (which succeeded the former Human Genome Coordination Committee; see 
box, p. 60) to foster information exchange and dissemination. The task group admin- 
isters the DOE Human Genome Program and iLs evolving needs and reports to the 

Associate Director for Biological and 
Environmental Research (currently 
Aristides Patrinos). The task group ar- 
ranges periodic workshops and coor- 
dinates site reviews for genome 
centers, the Joint Genome Institute, 
databases, and other large projects. It 
also coordinates peer review of research 
proposals, administration of awards, and 
collaboration with all concerned agen- 
cies and organizations. 

The Biotechnology Consortium pro- 
vides the OBER Associate Director with 
external in all aspects of ge- 
nomics and informatics and a mecha- 
nism by which OBER can keep track of 
the latest developments in the field. It 
facilitates development and dissemination 
of novel genome technologies through- 
out the DOE system, ensures appropri- 
ate management and sharing of data and 
resources by all DOE contractors and 
grantees, and promotes interactions with 
other national and international ge- 
nomic entities. 

87 88 

Human Genome Program Fiscal Year Expenditures <SM) 

Yaar Operating 

Capital Equipment Construction 


t9» 68.3 

5.6 S.7 

1997 73.9 

6,0 1.0 


1996* 79.9 

S,2 0.0 


'Protected expenses. 

Human Genonne Program Operating Furids Distribution in FY 1996 (SK) 

FY 1996 

























£f 120 















Tc«at \^^ 













3J . 



•Includes DOE labofalorles' nonresearch costs but not U.S government adminlstl^lon or SBIR 
"tX>E contribution to the International Human Fronters Neurosclences Program. 

52 9 OOE Human Genome Program Report, Program Management 



The EKJE Human Genome Program 
communicates information in a variety 
of ways. These communication systems 
include the Human Genome Manage- 
ment Information System (HGMIS), 
projects in the Ethical, Legal, and Social 
Issues (ELSI) Program, electronic re- 
sources, meetings, and fellowships. 
Some of these mechanisms are de- 
scribed below. For more details, see Re- 
search Highlights. ELSI projects, p. 18. 


HGMIS provides technical communica- 
tion and information services for the 
DOE OBER Human Genome Program 
Task Group. HGMIS is charged with 

(1) helping to communicate genome- 
related matters and research to contrac- 
tors, grantees, other (nongenome project) 
researchers, and other multipliers of in- 
formation pertaining to genetic research; 

(2) serving as a clearinghouse for inquir- 
ies about the U.S. genome project; and 

(3) reducing research duplication by pro- 
viding a forum for interdisciplinary in- 
formation exchange (including resources 
developed) among genetic investigators 

HGMIS publishes the newsletter Human 
Genome News, sponsored by OBER. 
Over 14,000 HCyV subscribers include 
genome and basic researchers at national 
laboratories, universities, and other re- 
search institutions; professors and teach- 
ers; industry representatives; legal 
personnel; ethicists; students; genetic 
counselors; physicians; science writers; 
and other interested individuals. 

HGMIS also produces the DOE Primer 
on Molecular Genetics; a compilation of 
ELSI abstracts; and reports on the DOE 
Human Genome and Microbial Genome 
Programs, contractor-grantee work- 
shops, and other related subjects. 

Electronic versions of the primer and 
other HGMIS publications are available 
via the Worid Wide Web. HGMIS also 

initiates and maintains other related 
Web sites (see DOE Electronic Genome 
Resources section below and DOE Web 
Sites at right). 

In addition to their print and online pub- 
lishing efforts, HGMIS staff members 
answer questions generated via Web 
sites, telephone, fax, and e-mail. They 
also furnish customized information 
about the genome project for multipliers 
of information (contact: Betty Mansfield 
at 423/576-6669, Fax: /574-9888, 

DOE Electronic Genome 

Web Sites. The DOE Human Genome 
Program Home Page displays pointers 
to other programs within OBER and the 
Office of Energy Research. Links are 
made to additional biological and envi- 
ronmental information and to HGMIS, 
Genome Database, and other sites. 

HGMIS initiates and maintains the 
searchable Human Genome Project In- 
formation Web site. This site contains 
more than 1 700 text files of information 
for multidisciplinary technical audiences 
as well as for lay persons interested in 
learning about the science, goals, 
progress, and history of the project. Us- 
ers include almost all levels of students; 
education, medical, and legal profes- 
sionals; genetic society and support 
group members; biotechnology and 
pharmaceutical industry personnel; ad- 
ministrators; policymakers; and the press. 

The site also houses a section of fre- 
quently asked questions, a quick fact 
finder. Primer on Molecular Genetics, 
all issues of Human Genome News, 
DOE Human Genome Program and 
contractor-grantee workshop reports. 
To Knew Ourselves, historical docu- 
ments, research abstracts, calendars of 
genome events, and hundreds of links to 
genome research and educational sites. 
More than 1 000 other Web pages link to 
this site, resulting in more than 100,000 
text file transfers each month. This 

DOE Web Sites 

DOF, Huniiiri (iL'tionie Pmgrain 



i>tifr/iihi:r titp.himl 

<K(ict' irf Fnvriiy Rcsearclt 

Hiinutn (ieoonK- Pnywi 


IHif and Rvluu-d Mwlings 


OOE Human Genome Program Report. Program Management jS63 i 


DOE Human Genome Program Report. Program Management 


HGMIS site has received a Four-Star 
designation firom the Magellan Group 
and the Editor's Choice Award firom 
Looks mart 

Genome-project and related meetings 
are listed at a Web site (see box, p. 63), 
through which users can register and 
submit research abstracts. Another listed 
related site discusses issues at the criti- 
cal intersection of genetics and the court 
system. This Web page is part of a 
project to educate and prepare the judi- 
ciary for the coming onslaught of cases 
involving genetic issues and data. 

Newsgroup. The Human Genome Pro- 
gram Newsgroup operates through the 
BIOSCI electronic bulletin board net- 
work to allow researchers worldwide to 
communicate, share ideas, and fmd so- 
lutions to problems. Genome-related in- 
formation is distributed through the 
newsgroup, including requests for grant 
applications, reports from recent scien- 
tific and advisory meetings, announce- 
ments of future events, and listings of 
firee software and services {gnome-pr@ or 

Postdoctoral Fellowships 

OBER established the Human Genome 
Distinguished Postdoctoral Research 
Program in 1990 to support research on 
projects related to the DOE Human Ge- 
nome Program. Beginning in FY 1996, 
the Human Genome Distinguished 
Postdoctoral Fellowships were merged 
with the Alexander Hollaender Distin- 
guished Postdoctoral Fellowships, 
which provide support in all areas of 
OBER-sponsored research. Postdoctoral 
programs are administered by the Oak 
Ridge Institute for Science and Educa- 
tion, a university consortium and DOE 
contractor For additional information, 
contact Linda Holmes (423/576-3192, 
holme si® or see the Web site 

Human Genome Distinguished 
Postdoctoral Fellows 

Names oi past and current <^aw$ in sename topics are given below 
vxith S»tr research institutions «kJ ti^as <^ proposed research. For 1996 
research aJBtracts, reter to Index of PrintSpai and Coinvestigators on 
p. 71 In Part 2 ot tftts report. 

1 994 Mark Qrav«« (6a^r Coflege o4 Medicine): Graph Data 
Modal* for Ger«)me Mapping 

WHiam Maw* (Oui<e University); Synthesis of Peptide Nucleic 
Acsds for DMA Sequencing by Hybridization 

JbiSyue Ju (University of Catifomia, Beriseley): Des^, 
Synthesis, and Use of Oigonucteotide Ptimers Labeled wHh 
Energy Transfer-Coupleci Dyes 

Hark Starmoii (Oak Ridge National Laboratoiy): Compara- 
6we Study of a Conserved Zinc RngerGene Region 

1 9^ Cvan Etehler (Lawrertce Uvenmore Natiot^ Latxvatory): 
idenWicatlon, OrganizaJion, and Characterization of Zinc 
Finger Genes in a 2-Mb Cluster on I9pl2 

Kelly Ann Fraser (Uwrawa Be*etey National L^xxatDfy): 
fri Vivo Complement^on of the Murtne Mutations Griazied, 
Mocha, and Jitteri 

Soo-ln Hwang (Lawrence Beriteley National Laboratory): 
Position^ Cloning of Oncogenes on 20(j13.2 

Jams* Lal>renz (University of Wasfsnpjn, Seattle): Enor 
Analysis of Ptindpal Se<juendng Data and Its Role in Process 
Optirnization for Genoine-Scate Sequencing Pr<^ects 

Marte Ruiz-Martinez (Northeastern University}: Multiplex 
Purfficatlon Schemes for dna Se<}u9o<sng-«eactlon Products: 
ApplicaSon to Qat-Fitled Capillary Electrophoresis 

Todd Smlfti (University of Washi»>gton, Seattie): Managing (he 
Flow of Large-Scate DNA Sequence Infonnation 

Alexander Hollaender Distinguished 
Postdoctoral Fellows in Genome Research 


Cyntbeline CuHot (Oak Ridge NaSorat Laboratory): Cloning 
of a Mouse Gene Causing Severe Deafness and Balance 

T«u-Mi4 Y) (Latwratory of Staicturai Biology and Molecular 
Medtefne, Los Angeles): Structure-Function Ar^^sis of 
Alpha-Facior Receptor 

1 997 Jeffrey Koahl (Los Atamcs National Laboratory): Conslructitvi, 
Anaty^, and Use of O^mal 01^ Mutation Matrices 

SamM MeCutcl<*n<IMon*y (Lawrence Uvenmore National 
Laboratory): Structure and Functioo of a Dama^-SpecHic 
Endonudoase Complex 

DOE Human Genome Program Report, Program Management '§65 ! 


DOE Human QenonM Program Report 


Coordination with Other Genome Programs 

The U.S. Human Geaome 
Project is supported jointly 
by the Department of En- 
ergy (DOE) and the Na- 
tional Institutes of Health 
(NIH), each of which emphasizes dif- 
ferent facets. The two agencies coordi- 
nate their efforts through development 
of common project goats and joint sup- 
port of some programs addressing ethi- 
cal, legal, and social issues (ELSI) 
arising from new genome tools, tech- 
nology, and data. 

Extraordinary advances in genome re- 
search are due to contributions by many 
investigators in this country and abroad. 
In the United States, such research (in- 
cluding nonhuman) also is funded by 
other federal agencies and private foun- 
dations and groups. Many countries are 
major contributors to the project through 
international collaborations and their own 
focused programs. Coordinating and 
facilitating these diverse research ef- 
forts around the world is the aim of 
the nongovernmental international 
Human Genome Organisation. 

Some details of U.S. and worldwide 
coordination are provided below. 

U.S. Human Genome 
Project: DOE and NIH 

In 1988 DOE and NIH developed a 
Memorandum of Understanding that 
formalized the coordination of their ef- 
forts to decipher the human genome and 
thus "enhance the human genome re- 
search capabilities of both agencies." In 
early 1990 they presented Congress 
with a joint plan. Understanding Our 
Genetic Inheritance. The U.S. Human 
Genome Project: The First Five Years 
(1991-1995). Referred to as the Five- 
Year Plan, it contained short-term scien- 
tific goals fcv the coordinated, multiyear 
research project and a comprehensive 
spending plan. Unexpectedly rapid 
progress in mapping prompted early re- 
vision of the original S-year goals in the 

faU of 1993 [Science 262. 43-46 (Octo- 
ber I, 1993)). Cuirenl goals, which run 
through September 30, 1998. are listed 
on page 5; text of both 5-year plans is 
accessible via the Web {http://www.omL 

DOE and NIH have adopted a joint 
policy to promote sharing of genome 
data and resources for facilitating 
progress and reducing duplicated work. 
(See Appendix B: DOE-NIH Sharing 
Guidelines, p. 75.) 

ELSI Considerations 

NIH and DOE devote at least 3% of 
their respective gerrame program bud- 
gets to identifying, analyzing, and ad- 
dressing the ELSI considerations 
surrounding genome technology and 
the data it produces. The DOE ELSI 
component focuses on research into 
the privacy and confidentiality of per- 
sonal genetic information, genetics 
relevant to the workplace, commercial- 
ization (including patenting) of gertome 
research data, and genetic education for 
the general public and targeted commu- 
nities. The NIH ELSI component sup- 
ports studies on a range of ethical issues 
surrounding the conduct of genetic re- 
search and responsible clinical integra- 
tion of new genetic technologies, 
especially in testing for mutations asso- 
ciated with cystic fibrosis and heritable 
breast, ovarian, and colon cancers. 

In 1990. the DOE-NIH Joint ELSI 
Working Group was established to 
identify, address, and develop policy 
options; stimulate bioethics research; 
promote education of professional and 
lay groups; and collaborate with such 
international groups as the Human Ge- 
nome Organisation (HUGO); United 
Nations Educational, Scientific, and 
Cultural Organization; and the Euro- 
pean Communit)'. Research funded by 
the U.S. Human Gettome Project 
through the joint working group has 
produced policy recommendations 
in various areas. In May 1993, for 

Enliainiiii' liciininv 
rcsvaixh capabilities 

DOE Human OanofiM Program 


example, the DOE-NIH Joint ELSI 
Working Group Task Force on Genetic 
Information and Insurance issued a re- 
port with recommendations for manag- 
ing the impact of advances in human 
genetics on the current system of 
healthcare coverage. In 1996, the work- 
ing group released guidelines for inves- 
tigators on using DNA from human 
subjects for large-scale sequencing 
projects. The guidance emphasizes nu- 
merous ways to preserve donor ano- 
nymity [see Appendix C, p. 77, and the 
World Wide Web ( 

In 1997. following an evaluation, the 
two agencies modified the ELSI work- 
ing group into the ELSI Research and 
IVogram Evaluation Group (ERPEG). 
ERPEG will focus more specifically on 
research activities supported by DOE 
and NIH ELSI programs. 

Other U.S. Programs 

The potential impact of genome re- 
search on society and the rapid growth 
of the biotechnology industry have 
spurred the initiation of other genome 
research projects in this country and 
worldwide. These projects aim to create 
maps of the human genome and the ge- 
nomes of model organisms and several 
economically important microbes, 
plants, and animals. 

• The DOE Microbial Genome Pro- 
gram, begun in 1994, is producing 
complete genome sequence data on 
industrially important microorgan- 
isms, including those that live under 
extreme environmental conditions. 
The sequences of several microbial 
genomes have been completed. 
ober/EPR/mig_top. html] 

• In 1 990, the National Science Founda- 
tion, DOE, and the U.S. Department 
of Agriculture (USDA) initiated a 
project to map and sequence the 
genome of the model plant Arabidop- 

^68 g DOE Human Genome Program Report, Coordination 

sis thaliana. The goal of this project 
is to enhance fundamental understand- 
ing of plant processes. In 1996, the 
three agencies began funding system- 
atic, large-scale genomic sequencing 
of the 1 20-megabase Arabidopsis 
genome, with the goal of completing 
it by 2004, with DOE support 
through the Office of Basic Energy 
Sciences. l] 

• USDA also funds animal genome 
research projects designed to obtain 
genome maps for economically im- 
portant species (e.g., com, soybeans, 
poultry, cattle, swine, and sheep) to 
enable genetic modifications that will 
increase resistance to diseases and 
pests, improve nutrient value, and 
increase productivity. 

• The Advanced Technology Program 
(ATP) of the U.S. National Institute 
of Standards and Technology pro- 
motes industry-government partner- 
ships in DNA sequencing and 
biotechnology through the Tools for 
DNA Diagnostics component. DOE 
staff participates in the ATP review 
process (see box, p. 22). [http://www.] 

• In 1997 the NIH National Cancer In- 
stitute established the Cancer Ge- 
nome Anatomy Project (CGAP) to 
develop new diagnostic tools for un- 
derstanding molecular changes that 
underlie all cancers {http://www. DOE 
researchers are generating clone 
libraries to support this effort. 


The current DOE-NIH Five-Year Plan 
commends the "spirit of international 
cooperation and sharing" that has char- 
acterized the Human Genome Project 
and played a major role in its success. 
Cooperation includes collaborations 
among laboratories in the United States 


and abroad as well as extensive sharing 
of materials and information among 
genome researchers around the world. 
The DOE Human Genome Program 
supports many international collabo- 
rations as well as grantees in several 
foreign institutions. 

Collaborations involving the DOE hu- 
man genome centers include mapping 
chromosomes 16 and 19, developing re- 
sources, and constructing the human 
gene map from shared cDNA libraries. 
These libraries were generated by the 
Integrated Molecular Analysis of Gene 
Expression (called IMAGE) Consor- 
tium initiated by groups at Lawrence 
Livermore National Laboratory, Colum- 
bia University, NIH National Institute 
of Mental Health, and G£n6thon 

Investigators from almost every major 
sequencing center in the world met in 
Bermuda in February 1 996 and again in 
1997 to discuss issues related to large- 
scale sequencing. These meetings were 
designed to help researchers coordinate, 
compare, and evaluate human genome 
mapping and sequencing strategies; 
consider new sequencing and infor- 
matics technologies; and discuss re- 
lease of data. 

Human Genome 

Founded by scientists in 1989, HUGO 
is a nongovernmental international 
organization providing coordination 
functions for worldwide genome efforts. 
HUGO activities range from support of 
data collation for constructing genome 

maps to organizing workshops. HUGO 
also fosters exchange of data and 
biomaterials, encourages technology 
sharing, and serves as a coordinating 
agency for building relationships among 
various govenunent funding agencies 
and the genome community. 

HUGO offers short-term (2- to 10-week) 
travel awards up to $1500 for investiga- 
tors under age 40 to visit another coun- 
try to learn new methods or techniques 
and to facilitate collaborative research 
between the laboratories. 

HUGO has worked closely with interna- 
tional funding agencies to sponsor 
single-chromosome workshops (SCWs) 
and other genome meetings. Due to the 
success of these workshops as well as 
the shift in emphasis from mapping to 
sequencing, DOE and NIH began to 
phase out their funding for international 
SCWs in FY 1996 but encouraged appli- 
cations for individual SCWs as needed. 
In 1996, HUGO partially funded an in- 
ternational strategy meeting in Bermuda 
on large-scale sequencing. Principles re- 
garding data release and a resources list 
developed at the meeting are available 
on the HUGO Web site {hnp://hugo.gdb. 

Membership in HUGO (over 1000 
people in more than 50 countries) is 
extended to persons concerned with 
human genome research and related 
scientific subjects. Its current president 
is Grant R. Sutherland (Adelaide Women 
and Children's Hospital, Australia). 
Directed by an 18-member interna- 
tional council, HUGO is supported by 
grants from the Howard Hughes Medi- 
cal Institute and The Wellcome Trust. 

Countries with 
Genome Programs 

Countries witii genome 
prc^raois or strong pro- 
grams is human genetics 
jnchKle Australia, Brarll, 
Canada, China. Denmark, 
European Union, France, 
Germany, Israel, Italy, 
Japan, Korea, Mexico, 
Netherlands, Russia, 
Sweden, United Kingdom, 
and United States. 

DOE Human Genome Program Report, Coordination w69 ( 


^70^ DOE Human Gonome Program Report 



Appendix A: Early History, Enabling Legislation (1984-90) 73 

Appendix B: DOE-NIH Sharing Guidelines (1992) 75 

Appendix C: Human Subjects Guidelines (1996) 77 

Appendix D: Genetics on the World Wide Web (1997) 83 

Appendix E: 1996 Human Genome Research Projects (1996) 89 

Appendix F: DOE BER Program (1997) „ ^ 95 

DOE Human Genome Program Report, Appendices 71 




Appendix A 
DOE Human Genome Program: Early History, Enabling legislation 

A brief history of the U.S. Department of Energy (DOE) Hu- 
man Genome Program will be useful in a discussion of the 
objectives of the DOE program as well as those of the col- 
laborative U.S. Human Genome Project. The DOE Office of 
Biological and Environmental Research (OBER) of DOE 
and its predecessor agencies — the Atomic Energy Commis- 
sion and the Energy Research and Development Administra- 
tion — have long sponsored research into genetics, both in 
microbial systems and in mammals, including basic studies 
on genome structure, replication, damage, and repair and the 
consequences of genetic mutations. (See Appendix E for 
a discussion of the DOE Biological and Environmental 
Research Program.) 

In 1984, OBER [then named Office of Health and Environ- 
mental Research (OHER)] and the International Commission 
on Protection Against Environmental Mutagens and Carcino- 
gens cosponsored a conference in Alta, Utah, which high- 
lighted the growing roles of recombinant DNA technologies. 
Substantial portions of the meeting's proceedings were incor- 
porated into the Congressional Office of Technology Assess- 
ment report. Technologies for Detecting Heritable MutatiorLi 
in Humans, in which the value of a reference sequence of the 
human genome was recognized. 

Acquisition of such a reference sequence was, however, far 
beyond the capabilities of biomedical research resources 
and infrastructure existing at that time. Although the 

small genomes of several microbes had been mapped or par- 
tially sequenced, the detailed mapping and eventual sequenc- 
ing of 24 distinct human chromosomes (22 autosomes and 
the sex chromosomes X and Y) that together comprise an 
estimated 3 billion subunits was a task some thousandsfold 

CKDE OHER was already engaged in several multidisciplinary 
projects contributing to the nation's biomedical capabilities, 
including the GenBank DNA sequence repository, which 
was initiated and sustained by DOE computer and data- 
management expertise. Several major user facilities support- 
ing microstructure research were developed and are main- 
tained by DOE. Unique chromosome-processing resources 
and capabilities were in place at Los Alamos National Labo- 
ratory and Lawrence Livermore National Laboratory. Among 
these were the fluorescence-activated cell sorter (called 
FACS) systems to purify human chromosomes within the 
National Laboratory Gene Library Project for the production 
of libraries of DNA clones. The availability of these mono- 
chromosomal libraries opened an important path — a practical 
means of subdividing the huge total genome into 24 much 
more manageable components. 

With these capabilities, OHER began in 1986 to consider the 
feasibility of a dedicated human genome program. Leading 
scientists were invited to the March 1986 international con- 
ference at Santa Fe, New Mexico, to assess the desirability 

Enabling Legislation 

In the United States, the first federal 
support for genetics research was 
through the Atomic Energy Commis- 
sion. In die early days of nuclear en- 
ergy development, the focus was on 
radiation effects and later broadened 
under the Energy Research and De- 
vefofsneni Administration (ERDA) 
and the Departaient of Enetgy to in- 
clude the health implications of all 
energy technologies and their 
by-products. Major en^Iing legisla- 
tion follows. 

Atomic Energy Act of 1946 

<Pi-. 79-585); Provided file initial 
charter for a consprehensive program 
of research and developroeot related 
to die utilization of fissionable and 

radioactive materials for medical, 
bitrfogical, and health purposes. 

Atomic Energy Act of 1954 

(PJL. 83-703): Further authorized 
AEC "to concUict research on the bio- 
logic effects of ionizing radiation." 

Energy Reorganization Act of 1974 

(PI.. 93^38): Provided that responsi- 
bilities of ERDA should include "en- 
gaging in and supporting environ' 
mental, btooiedica}, physical, and 
safay research related to the develop- 
ment of energy resources and utiliza- 
tion technologies." 

Federal NoB-Nudear Energy 
Research and Devdopment Act of 
1974 (PX. 93-577): Authorized 
ERDA to conduct* eontiwebeosive 

non-nuclear energy research, devel- 
opment, and demonstration prograin 
to include the enviromnental and so- 
cial consequences of the various tech- 

DOE Organization Act of 1977 

{Pi. 95-91): Instructed the depart- 
ment "to assure incorporation of na- 
tional environmental protection goals 
in the formulation and implementa- 
tion of energy progjaflis; and to ad- 
vance the goal of restoring, protect- 
ing, and ei^iaDcing environmental 
qnality. and assuring public heald) 
and safety," and to conduct "a com- 
pteheosive program of research and 
development on the environroenal 
effects of energy technology and 

DOE Human Genome Program Report, Appendices 73 


and feasibility of implementing such a project With virtual 
unanimity, participants agreed that ordering and eventually 
sequencing DNA clones representing the human genome 
were desirable and feasible goals. With the receipt of this 
enthusiastic response. OHER initiated several pilot projects. 
Program guidance was further sought from the DOE Health 
Effects Research Advisory Committee (HERAC). 

HERAC Recommendation 

The April 1987 HERAC report recommended that DOE and 
the nation commit to a large. multidiscipUnary scientific and 
technological undertaking to map and sequence the human 
genome. DOE was particularly well suited to focus on re- 
source and technology development, the report noted; 
HERAC further recommended a leadership role for DOE 
because of its demonstrated expertise in managing complex 
and long-term multidiscipUnary projects involving both the 
development of new technologies and the coordination of 
efforts in industries, universities, and its own laboratories. 

Evolution of the nation's Human Genome Project further ben- 
efited from a 1988 study by the National Research Council 
(NRC) entitled Mapping and Sequencing the Human Ge- 
nome, which recommended that the United States support this 
research effort and presented an outline for a multiphase plan. 

DOE and NIH Coordination 

The National Institutes of Health (NIH) was a necessary par- 
ticipant in the large-scale effort to map and sequence the hu- 
man genome because of its long history of support for bio- 
medical research and its vast community of scientists. This 
was confirmed by the NRC report, which recommended a 
major role for NIH. In 1987, under the leadership of Director 
James Wyngaarden, NIH established the Office of Genome 
Research in the Directors Office. In 1988, DOE and NIH 
signed a Memorandum of Understanding in which the agen- 
cies agreed to work together, coordinate technical research 
and activities, and share results. In 1990, DOE and NIH sub- 
mitted a joint research plan outlining short- and long-term 
goals of the project 

74 DOE Human Genome Program Report, Appendices 


Appendix B 
DOE-NfH Guidelines for Sharing Data and Resources 

Ai in Dfit'niher 7, 19^2. inectw^. th.c DOliNltl Joint Suh- 
cummiltee on the Human Gtumitic np/iroved the foCmvini; 
sharing guidelines, developed from the DOE drap of Septem- 
ber } 991* 

The information and resources generated by the Human Ge- 
nome Project have become substantial, and the interest in 
having access to them is widespread. It is therefore desirable 
to have a statement of philosophy concerning the sharing of 
these resources that can guide investigators who generate the 
resources as well as those who wish to use them. 

A key issue for the Human Genome Project is how to pro- 
mote and encourage the rapid sharing of materials and data 
that are produced, especially information that has not yet 
been published or may never be published in its entirety. 
Such sharing is essential for progress toward the goals of the 
program and to avoid unnecessary duplication. It is also de- 
sirable to make the fruits of genome research available to the 
scientific community as a whole as soon as possible to expe- 
dite research in other areas. 

Although it is the policy of the Human Genome Project to 
maximize outreach to the scientific community, it is also nec- 
essary to give investigators time to verify the accuracy of 
their data and to gain some scientific advantage from the ef- 
fort they have invested. Furthermore, in order to assure that 
novel ideas and inventions are rapidly developed to the ben- 
efit of the public, intellectual property protection may be 
needed for some of the data and materials. 

After extensive discussion with the community of genome 
researchers, the advisors of the NIH and DOE genome pro- 
grams have deterniined that consensus is developing around 
the concept that a 6-month period from the time the data or 
materials are generated to the time they are made available 
publicly is a reasonable maximum in almost all cases. More 
rapid sharing is encouraged. 

Whenever possible, data should be deposited in public data- 
bases and materials in public repositories. Where appropriate 
repositories do not exist or are unable to accept the data or 
materials, investigators should accommodate requests to the 
extent possible. 

The NIH and DOE genome programs have decided to re- 
quire all applicants expecting to generate significant amounts 
of genome data or materials to describe in their application 
how and when they plan to make such data and materials 
available to the community. Grant solicitations will specify 
this requirement. These plans in each application will be re- 
viewed in the course of peer review and by staff to assure 
they are reasonable and in conformity with program philoso- 
phy. If a grant is made, the applicant's sharing plans will be- 
come a condition of the award and compliance will be re- 
viewed before continuation funding is provided. Progress 
reports will be asked to address the issue. 

♦Reprinted from Human Genome News 4(5), 4 (1993). 

DOE Human Genome Program Report, Appendices 75 





The Human Genome Project (HOP) is now entering into 
large-scale DNA sequencing. To meet its complete sequenc- 
ing goal, it will be necessar>' to recruit volunteers willing to 
contribute their DNA for this purpose. The guidance pro- 
vided in this document is intended to address ethical issues 
that must be considered in designing strategies for recruit- 
ment and protection of DNA donors for large-scale 

Nothing in this document should be construed to differ from, 
or substitute for, the policies described in the Federal Regu- 
lations for the Protection of Human Subjects [45CFR46 
(NIH) and 10CFR745 (DOE)]. Rather, it is intended to 
.supplement those policies by focusing on the particular is- 
sues raised by large-scale human DNA sequencing. This 
statement addresses six topics: ( 1 ) benefits and risks of ge- 
nomic DNA sequencing; (2) privacy and confidentiality; (3) 
recruitment of DNA donors as sources for library construc- 
tion; (4) informed consent; (5) IRB approval; and (6) use of 
existing libraries. 

The guidance provided in this statement is intended to afford 
maximum protection to DNA donors and is based on the be- 
lief that protection can best be achieved by a combination of 
approaches including: 

• ensuring that the initial version of the complete human 
DNA sequence is derived from multiple donors; 

• providing donors with the opportimity to make an in- 
formed decision about whether to contribute their DNA 
to this project; and 

• taking effective steps to ensure the privacy and confi- 
dentiality of donors. 

1. Beneflts and Risks of Genomic DNA 

The HGP offers great promise for the improvement of human 
health. As a consequence of the HGP, there will be a more 
thorough understanding of the genetic bases of human biol- 
ogy and of many diseases. This, in turn, will lead to better 
therapies and, perhaps more importantly, prevention strate- 
gies for many of those diseases. Similarly, as the technology 
developed by the HGP is applied to understanding the biol- 
ogy of other organisms, many other human activities will be 
affected including agriculture, environmental management, 
and biologically based industrial processes. 

Appendix C 
NIH'DOE Guidance on Human Subjects Issues 

in Large-Scale DNA Sequencing 

Date ixnui'd: August 9. 1996 
While the HGP offers great promise to humanity, there will 
be no direct benefit, in either clinical or financial terms, to 
any of the individuals who choose to donate DNA for 
large-scale sequencing. Rather, the motivation for donation is 
likely to be an altruistic willingness to contribute to this his- 
toric research effort 

However, individuals who donate DNA to this effort may 
face certain risks. Information derived from the donors will 
become available in public databases. Such information may 
reveal, for example. DNA sequence-based information about 
disease susceptibility. If the donor becomes aware of such 
information, it could lead to emotional distress on her/his 
part. If such health-related information becomes known to 
others, discrimination against the donor (e.g., in insurance or 
in employment) could result. Unwanted notoriety is another 
potential risk to donors. Therefore, those engaged in 
large-scale sequencing must be sensitive to the unique fea- 
tures of this type of research and ensure that both the protec- 
tions normally afforded research subjects and the special is- 
sues associated with human genomic DNA sequencing are 
thoroughly addressed. 

While some risks to donors can ah'eady be identified, the 
probability of adverse events materializing appears to be 
low. However, the risks of harm to individuals will increase 
if confidentiality is not maintained and/or the number of do- 
nors is limited to a very few individuals. Either, or both, of 
these situations would increase the possibility of a donor's 
identity being revealed without his/her knowledge or 

A final issue to consider is characterized in a statement taken 
from the OPRR Guidebook' which points out that "some ar- 
eas [of genetic research] present issues for which no clear 
guidance can be given at this point, either because enough is 
not known about the risks presented by the research, or be- 
cause no consensus on the appropriate resolution of the prob- 
lem exists." It is anticipated that the DNA sequence informa- 
tion produced by the Human Genome Project will be used in 
the future for types of research which cannot now be pre- 
dicted and the risks of which cannot be assessed or disclosed. 

2. Privacy and Confldentiality 

In genera], one of the most effective ways of protecting vol- 
unteers from the unexpected, unwelcome or unauthorized use 
of information about them is to ensure that there are no op- 
portunities for linking an individual donor with information 
about him/her that is revealed by the research. By not col- 
lecting information about the identity of a research subject 
and any biological material or records developed in the 
course of the research, or by subsequently removing all 

DOE Human Genome Program Report, Appendices 77 


identifiers ("anonymizing" the sample), the possibility of risk 
to the subject stemming from the results of the research is 
greatly reduced. Large-scale DNA sequence determination 
represents an exception because each person's DNA sequence 
is unique and, ultimately, there is enough information in any 
individual's DNA sequence to absolutely identify her/him. 
However, the technology that would allow the unambiguous 
identification of an individual from his/her DNA sequence is 
not yet mature. Thus, for the foreseeable future, establishing 
effective confidentiality, rather than relying on anonymity, 
will be a very useful approach to protecting donors. 

Investigators should introduce as many disconnects between 
the identity of donors and the publicly available information 
and materials as possible. There should not be any way for any- 
one to establish that a specific DNA sequence came from a par- 
ticular individual, other than resampling an individual's DNA 
and comparing it to the sequence information in the public data- 
base. In particular, no phenotypic or demographic information 
about donors should be linked to the DNA to be sequenced.^ 
For the purposes of the HGP such information will rarely be 
useful, and recording such information could result in possible 
misuse and compromise donor confidentiality. 

Confidentiality should be "two way." Not only should others 
be unable to link a DNA sequence to a particular individual, 
but no individual who donates DNA should be able to confirm 
directly that a particular DNA sequence was obtained firom 
their DNA sample.' This degree of confidentiality will pre- 
clude the possibility of re-contacting DNA donors, providing 
another degree of protection for them. It should be clear to 
both investigators and to donors that the contact involved in 
obtaining the initial specimen will be the only contact.'' 

Another approach for protecting all DNA donors is to reduce 
the incentive for wanting to know the identities of particular 
donors. If the initial human sequence is a "mosaic" or "patch- 
work" of sequenced regions derived ftom a number of differ- 
ent individuals, rather than that of a single individual, there 
would be considerably less interest in who the specific donors 
were. Although there may be scientific justification that each 
clone Ubrary used for sequencing should be derived firom one 
person, there is no scientific reason that the entire initial hu- 
man DNA sequence should be that of a single individual. As 
approximately 99.9% of the human DNA sequence is common 
between any two individuals, most of the fundamental bio- 
logical information contained in the human DNA sequence is 
common to all people. 

To increase the likelihood that the first human DNA sequence 
will be an amalgam of regions sequenced from different 
sources, a number of clone libraries must be made available. 
Although a number of large insert libraries have been made, 
78 DOE Human Genome Program Report, Appendices 

most do not meet all of the standards set in this document; 
therefore, these libraries should be used as substrates for 
large-scale sequencing only under circumscribed conditions 
(see section 6, p. 79). Starting immediately, new libraries 
will be developed that have the advantage of being con- 
structed in accordance with the ethical principles discussed 
in this document; they may also confer some additional sci- 
entific benefit. Such libraries are critical for the long-range 
needs of the HGP. 

3. Source/Recruitment of DNA Donors 
for Library Construction 

Another implication of the fact that 99.9% of the human 
DNA sequence is shared by any two individuals is that the 
backgrounds of the individuals who donate DNA for the first 
human sequence will make no scientific difference in terms 
of the usefulness and applicability of the information that 
results from sequencing the human genome. At the same 
time, there will undoubtedly be some sensitivity about the 
choice of DNA sources. There are no scientific reasons why 
DNA donors should not be selected from diverse pools of 
potential donors.' 

There are two additional issues that have arisen in consider- 
ing donor selection. These warrant particular discussion: 

• It is recognized that women have historically been 
underrepresented in research, so it can be anticipated 
that concerns might arise if males (sperm DNA) were 
used exclusively as the source of DNA for large-scale 
sequencing. Although there would be no scientific basis 
for concern, because even in the case of a male source, 
half of the donor's DNA would have come from his 
mother and half from his father, nevertheless perceptions 
are not to be dismissed. While the choice of donors will 
not be dictated to investigators, it is expected that, be- 
cause multiple libraries will be produced, a number of 
them will be made from female sources while others will 
be made from male sources. 

* Staff of laboratories involved in library construction and 
DNA sequencing may be eager to volunteer to be donors 
because of their interest and belief in the HGP. However, 
proximity to the research may create some special vul- 
nerabilities for laboratory staff members. It is also pos- 
sible that they will feel pressure to donate and there may 
be an increased likelihood that confidentiality would be 
breached. Finally, there is a potential that the choice of 
persons so closely involved in the research may be inter- 
preted as elitist For all of these reasons, it is recom- 
mended that donors should not be recruited from labora- 
tory staff, including the principal investigator. 


4. Informed Consent 

Obtaining informed consent specifically for the purpose of 
donating DNA for large-scale sequencing raises some unique 
concerns. Because anonymity cannot be guaranteed and con- 
fidentiality protections are not absolute, the disclosure pro- 
cess to potential donors must clearly specify what the pro- 
cess of DNA donation involves, what may make it different 
from other types of research, and what the implications are 
of one's DNA sequence information being a public scientific 

Federal regulations C45CFR46 and 10CFR745) require the 
disclosure of a number of issues in any informed consent 
document. They include such issues as potential benefits of 
the research, potential risks to the donor, control and owner- 
ship of donated material, long-term retention of donated ma- 
terial for future use, and the procedures that will be followed. 
In addition, there are several other disclosures that are of 
special importance for donors of DNA for large-scale se- 
quencing. These include: 

• the meaoing of confidentiality and privacy of informa- 
tion in the context of large-scale DNA sequencing, and 
how these issues will be addressed; 

• the lack of opportunity for the donor to later withdraw 
the libraries made from his/her DNA or his/her DNA 
sequence information from public use; 

• the absence of opportunity for information of clinical 
relevance to be provided to the donor or her/his family; 

• the possibility of unforeseen risks; and 

• the possible extension of risk to family members of the 
donor or to any group or community of interest (e.g., 
gender, race, ethnicity) to which a donor might belong. 

Many academic human genetics units have considerable ex- 
perience in dealing with research subjects and obtaining in- 
formed consent, while the laboratories that are likely to be 
involved in making the libraries for sequencing have, in gen- 
eral, much less experience of this type. Therefore, library 
makers are encouraged to establish a collaboration with one 
or more human genetics units, with the latter being respon- 
sible for recruiting donors, obtaining informed consent, ob- 
taining the necessary biological samples, and providing a 
blinded sample to the library maker Collaboration with tis- 
sue banks may be considered as long as these banks are col- 
lecting tissues in accordance with this guidance. The library 
maker should have no contact with the donor and no oppor- 
tunity to obtain any information about the donor's identity. 

5. IRB Approval 

Effective immediately, projects to construct libraries for 
large-scale DNA sequencing must obtain Institutional Re- 
view Board (IRB) approval before work is initiated. IRBs 
should carefully consider the unique aspects of large-scale 
sequencing projects. Some of the informed consent provi- 
sions outlined may be somewhat at odds with the usual and 
customary disclosures found in most protocols involving hu- 
man subjects and which IRBs usually consider For example, 
research subjects usually are given the opportunity to with- 
draw from a research project if they change their minds 
about participating. In the case of donors for large-scale se- 
quencing, it will not be possible to withdraw either the librar- 
ies made from their DNA or the DNA sequence information 
obtained using those libraries once the information is in the 
public domain. By the time a significant amount of DNA se- 
quence data has been collected, the libraries, as well as indi- 
vidual clones from them, will have been widely distributed 
and the sequence information will have been deposited in 
and distributed from public databases. In addition, there will 
be no possibility of returning information of clinical rel- 
evance to the donor or his/her family. 

6. Use of Existing Libraries for 
Large-Scale Sequencing 

Many of the existing libraries (including those derived from 
anonymous donors) were not made in complete conformity 
with the principles elaborated above. The potential risks that 
may result from their use will be minimized by the rapid in- 
troduction of several new libraries constructed in accordance 
with this guidance, which NCHGR and DOE are taking steps 
to initiate. This will ensure that the existing libraries will 
only contribute small amoimts to the first complete human 
DNA sequence. In the interim, existing libraries can continue 
to be used for large-scale sequencing, only if IRB approval 
and consent for "continued use" are obtained'' and approval 
by the funding agency is granted. 

It is important that in obtaining consent for contined use of 
existing libraries, no coercion of the DNA donor occur. It is 
therefore recommended that consideration be given to 
whether it is appropriate for the individual who previously 
recruited the donor to recontact him/her to obtain this con- 
sent. In some cases an IRB may determine that the recontact 
should be made by a third party to assure diat the donors are 
fully informed and allowed to choose freely whether their 
DNA can continue to be used for this purpose. 

DOE Human Genome Program Report, Appendices 79 



This document is intended to provide guidance to investiga- 
tors and IRBs who are involved in large-scale sequencing 
efforts. It is designed to alert them to special ethical con- 
cerns that may arise in such projects. In particular, it pro- 
vides guidance for the use of existing and the construction 
of new DNA libraries. Adhering to this guidance will ensure 
that the initial version of the complete human sequence is 
derived from multiple, diverse donors; that donors will have 
the opportunity to make an informed decision about 
whether to contribute their DNA to this project; and that 
effective steps will be taken by investigators to ensure the 
privacy and confidentiality of donors. 

Investigators funded by NCHGR and DOE to develop new 
libraries for large-scale human DNA sequencing will be re- 
quired to have their plans for the recruitment of DNA do- 
nors, including the informed consent documents, reviewed 
and approved by the funding agency before donors are re- 
cruited. Investigators involved in large-scale human se- 
quencing will also be asked to observe those aspects of this 
guidance that pertain to them. 

Approved August 17, 19%, by: 

Francis S. Collins, M.D., Ph.D., Director, National Center 

for Human Genome Research, National Institutes 

of Health 
Aristides N. Patrinos, Ph.D., Associate Director, Office of 

Health and Environmental Research, U.S. Department 

of Energy 


1 . Office of Protection firom Research Risks, Protecting 
Human Research Subjects: Institutional Review Board 
Guidebook (OPRR: U.S. Government Printing Office, 

2. It is recognized that it wUl be trivially easy to deter- 
mine the sex of the donor of the library, by assaying for the 
presence or absence of Y chromosome in the library. 

3. There are a number of approaches to preventing a 
DNA donor from knowing that his/her DNA was actually 
sequenced as part of the HGP. For example, each time a 
clone library is to be made, an appropriately diverse pool of 
between five and ten volunteers can be chosen in such a 
way that none of them knows the identity of anyone else in 
the pool. Samples for DNA preparation and for preparation 
of a cell line can be collected from all of the volunteers 
(who have been told that their specimen may or may not 
80 DOE Human Genome Program Report, Appendices 

eventually be used for DNA sequencing) and one of those 
samples is randomly and blindly selected as the source actu- 
ally used for library construction. In this way, not only will 
the identity of the individual whose DNA is chosen not be 
known to the investigators, but that individual will also not 
be sure that s/he is the actual source. 

4. Although recontacting donors should not be possible, 
investigators will potentially want to be able to resample a 
donor's genome. Thus, at the time the initial specimen is ob- 
tained, in addition to making a clone library representing the 
donor's genome, it should also be used to prepare an addi- 
tional aliquot of high molecular weight DNA for storage and 
a permanent cell line. Either resource could then be used as a 
source of the donor's genome in case additional DNA were 
needed or comparison with the results of the analysis of the 
cloned DNA were desired. 

5. There has been discussion in the scientific community 
about the sex of DNA donors. A library prepared fitim a fe- 
male donor will contain DNA from the X chromosome in an 
amount equivalent to the autosomes, but will completely lack 
Y chromosomal DNA. Conversely, a library prepared from a 
male donor will contain Y DNA, but both X and Y DNA will 
only be present at half the frequency of the DNA from the 
other chromosomes. Scientifically, then, there are both ad- 
vantages and disadvantages inherent in the use of either a 
male or a female donor. The question of the sex of the donor 
also involves the question of the use of somatic or germ line 
DNA to make libraries. For making libraries, useful amounts 
of germ line DNA can only be obtained from a male source 
(i.e., from sperm); it is not possible to obtain enough ova 
bom a female donor to isolate germ line DNA for this pur- 
pose. Opinion is divided in the scientific community about 
whether germ line or somatic DNA should be used for 
large-scale sequencing. Somatic DNA is known to be rear- 
ranged, relative to germ line DNA, in certain regions (e.g., 
the immtmoglobulin genes) and the possibility has been 
raised that other developmentally based rearrangements may 
occur, although no example of the latter has been offered. 
While some believe that the sequence product should not 
contain any rearrangements of this sort, others consider this 
potential advantage of germ line DNA to be relatively minor 
in comparison to the need to have the X chromosome fully 
represented in sequencing efforts and prefer the use of so- 
matic DNA. 

6. Individuals whose DNA was used for library construc- 
tion (with the exception of those created from deceased or 
anonymous individuals) should be fiiUy informed about the 
risks and benefits described above, should freely choose 
whether they would like their DNA to continue to be used for 
this purpose, and their decision should be documented. 


Executive Summary of Joint 
NIH-DOE Human Subjects 

1 . Those engaged in large-scale sequencing must be 
sensitive to the unique features of this type of research 
and ensure that both the protections normally afforded 
research subjects and the special issues associated with 
human genomic DNA sequencing are thoroughly 

2. For the foreseeable future, establishing effective 
confidentiality, rather than relying on anonymity, will be 
a very useful approach to protecting donors. 

3. Investigators should introduce as many disconnects 
between the identity of donors and the publicly available 
information and materials as possible. 

4. No phenotypic or demographic information about 
donors should be linked to the DNA to be sequenced. 

5. There are no scientific reasotis why DNA donors should 
not be selected from diverse pools of potential donors. 

6. While the choice of donors will not be dictated to 
investigators, it is expected that, because multiple 
libraries will be produced, a number of them will be 
made from female sources while others will be made 
from male sources. 

7. It is recommended that donors should not be recruited 
from laboratory staff, including the principal investigator 

8. The disclosure process to potential donors must clearly 
specify what the process of DNA donation involves, 
what may make it different from other types of research, 
and what the implications are of one's DNA sequence 
information being a public scientific resource. 

9. Library makers are encouraged to establish a collabora- 
tion with one or more human genetics units [or tissue 

10. The library maker should have no contact with the donor 
and no opportunity to obtain any information about the 
donor's identity. 

1 1 . Effective immediately, projects to construct libraries for 
large-scale DNA sequencing must obtain Institutional 
Review Board (IRB) approval before work is initiated. 

12. Existing libraries can continue to be used for large-scale 
sequencing, only if IRB approval and consent for 
continued use are obtained and approval by the funding 
agency is granted. 

13. It is important that in obtaining informed consent for 
continued use of existing libraries, no coercion of the 
DNA donor occur. 

DOE Human Genome Program Report, Appendices 81 


Appendix D 
Human Genome Project and Genetics on the World Wide Web 

August lf'>7 

Tlic World Wide Wf b offrrs thf easiest mith to infonniition 
about the lliiiiuiii Oriioiiie Project and n-lated genetics topics. 
Some useful sites to visit arc included in the list below. 

Human (>eii(>nu> Project 

IH>K Hunmn Gcnumc I'roKrum 

http://www.rr.ilor gKv/prxHluition/obfr/huglop.hlml 

Devoted to the DOR coiii|H>ncnt of the U.S. Human Ge- 
nome Pniject and to the DOR Micmbial trenome Pro- 
gram. Linlcs to many other sites. 

Human Genome Projrc I Information 

http://www. orrtl. f;t>v/hymi.s 

Comprehensive site covering topics related to the US 
and worldwide Hunum Genome Projects Useful for up 
dating scicntist.s and providing educational material for 
nonscientists. in sup(M>n of DOF/s commitment to public 
education. Develo(H-d and maintained for DOF. by the 
Hunuin Genome Management Infomuition System 
(HGMIS) at Oak Ridgc National Laboratory. 

Mil National Human Genome Research Institute 

http://www. nhgn.nih.fiiH' 

Site of the NIH sector of tlie U.S. Hunuin Genome 

IK)K Human Genome Progrum 

* Human Genome News 


Quarterly newsletter ir(xirting on the worldwide Human 
GeiHime Project 

KioloKieal Sciences Curricuiuni Study (BSGS) Teaching 

Ctnline vciMons in preparation: hardcopies available 
ftom7m/.S.M .S.S.SO 

'Xiencs. Environment, and Human Behavior," tenta- 
tive title, in preparation 

• "Mapping and Sequencing the Human Genome: 
ScieiK-e, F.thics. and Public Policy" (l'»<>2) 

• "The Human Genome Project: Biology, Computcpi. 
and Privacy" ( I O^) 

*PriM copy available fn>m HCMIS (Me p. 87 ot inside fronl cover 
for coniact infomiaiion). 

• "The Pu7.zle of Inheritance: Genetics and the Meth- 
ods of Science" (I Q97) 

'Primer on Molecular Genetics, 1992 


Explains the science behind the genome research. 

•lb Know Ourselves, 1996 


Booklet reviewing DOE's role, history, and achieve- 
ments in the Human Genome Project and introducing 
the science and other aspects of the project. 

Ethical, i.egul, and Social Issues Related 
to Genetics Research 

HCMIS Gateways Web page 

http://www.oml gov^gmi.t/link.t.html 

Choose "Ethical, Ixgal, and Social Issues." 

("enter for Bioethics. University of Pennsylvania 


Full-text articles about such ethical i.s.sues as human 
cloning; includes a primer on bioethics. 

Courts and Science On-Mnr Magazine (CASOI.M) 


Coverage of genetic issues affecting the courts. 

EI^I in Science 

Teaching modules designed to .stimulate discussion on 
implications of scientific research. 

Kubios Kthlrs Institute jp/~macrr/indrx.html 

Site includes newsletter summarizing literaturc in bio- 
ethics and biotechnology. 

Genetic Privacy Act 

http:/Avww.oml gov/hgmi.t/rr.tourrr/rl.ti.htm\ 

Model legislation written with support of the DOE Hu- 
man Genome Program. 

MCKT — The Human Genome Project 

http://phorni\ mcri.rdu/humangrnome/index.html 

ELSI issues for high scliool students. 

OOE Human Qonom* Prograin Rapon, App«ndlcas 83 


National Kiocthics Advisory Committee 

hllp://www.nih.t;()v/nhar/nha( him 

The bioctiiics committee offers advice to the National 
Science and Technology Council and others on bioethi 
ca! issues arising from research related to human biol- 
ogy and behavior. 

National Center for Genomic Kesourceii 


Comprehensive Genetics and Public Issues page; in- 
cludes congressional bills related to genetic privacy. 

The Gene l^etter 

Bimonthly newsletter to inform consumers and profes- 
sionals about advances in genetics and encourage dis- 
cussion about emerging policy dilemmas. 

Your Gene», Your Choices 


Boolciet written in simple English, describing the Hu- 
man Genome Project; the science behind it; and how 
ethical, legal, and social issues raised by the project may 
affect people's everyday lives. 

General Genetics and Biotechnology 

Many of the following sites contain links to both educational 
and technical material. 

HGMIS Community Education and Outreach Gateways 
Web Page 


Acuta Excellence (om/ae/index.html 

Extensive genetic and biotechnology resources for 
teachcn and nonscientisLs. 

BIO Online (Biotechnolu|;y Industry Organization) 

http://www.hto. ( om 

Comprehensive directory of biotechnology sites on the 


Biotech industry site; profiles biotech companies by 


An interactive educational resource and biotech refer- 
ence t<H)l; includes a dictionary of 6(J(J(J life science 

BiotechnoloKy Information Center, tSDA National 
Agricultural Library 

http://www. nal. 

Comprehensive agricultural biotechnology resource; 
includes a bibliography on palenling biotechnology 
products and processes (.http://www.nal.usda.g0v/hi1:/ 

Bugs 'N Stuff 

List of microbial genomes being sequenced, research 
groups, genome sizes, and fact* aliout selected organ- 
isms. Linlcs to related sites. 

Careers in Genetics 

hiip //www 

Online booklet from the Genetics Society of America, 
including several profiles of geneticisu. .Sec also career 
sections of sites specified above, such as Access Excel- 

Carolina Bioloxical Supply (.'ompany 

htlp://www( arost I tom/lips him 

Teaching materials for all levels. Includes mini-lessons 
on seletU;d scientific t<jpics, two online magazines, 
What's New. software, catalogs, and publications. 

Cell & Molecular Biology Online 

http://www.iiai nn/usrrs/pmgannon 

Linlcs to electronic publications, current research, educa- 
tional and career resources, and more. 

CKRN Virtual Library, Genetics section, BUioclences 

genetics. html 

Includes an organism index linking to other pertinent, information on the U S and jnUrrnatiorial Hij 
man Genome Projects, and lmk.s u> research sites. 

Clavtic Papers in Genetics 


Covers the early years, with introductory notes Sec also 
Access Excellence site above for genetics history. 

84 DOC Human Qanoma Program Refrort, Ar>P«ndle«s 


Community of Science Web Server 


Linlcs to Medline, U.S. Patent Citation Database, Com- 
merce Business Daily, The Federal Register, and other 

Database of Genome Sizes 


Lists numerous organisms with genome sizes, scientific 
and common names, classifications, and references. 

Genetic and biological resources links 

Genetics Education Center, University of Kansas Medical 

Educational information on human genetics, career re- 

Genetics Glossary 


Glossary of terms related to genetics. 

Genetics Webliography 

Extensive links for researchers and nonscientists firom 
Georgetown University Library. 

Genomics: A Global Resource 

Many links. Website a joint project of the Pharmaceuti- 
cal Research and Manufacturers of America and the 
American Institute of Biological Sciences; includes 
Genomics Today, a daily update on the latest news in the 

Hispanic Educational Genome Project 

Designed to educate high school students and their fami- 
lies about genetics and the Human Genome Project. 
Links to other pn-ojects. 

Howard Hughes Medical Institute 

Home page of major U.S. philanthropic organization 
that supports research in genetics, cell biology, immu- 
nology, structural biology, and neuroscience. Excellent 
introductory information on these topics. 

Library of Congress 


Microbial Database 

http://www. html 

Lists completed and in-progress microbial genomes, 
with funding sources. 

MIT Biology Hypertextbook<:gbio/700lmain.html 

All the basics. 

Science and Mathematics Resources 

http://www:%ci lib. uci. edu 

More than 2000 Web references, including Frank 
Potter's Science Gems and Martindale's Health Science 
Guide. For teachers at all levels. 

Virtual Courses on the Web 

http://lenti. med. umn. edu/~mwd/courses. html 

Links to Web tutorials in biology, genetics, and more. 

Welch Web 


Links to many Internet biomedical resources, dictionaries, 
encyclopedias, government sites, libraries, and more, from 
the Johns Hopkins University Welch Library. 

Why Files 

Illustrated explanations of the science behind the news. 

Images on the Web 

Biochemistry Online 

http://biochem.arach-net. com 

Essays, courses, 3-D images of biomotecules, modeling, 

Bugs in the News! 

Microbiology information and a nice collection of im- 
ages of biological molecules. 

Cells Alive! 

Images (some moving) of different types of cells. 

DOE Human Genome Program Report, Appendices 85 


Cn3D (See in 3-D) 


3-D molecular structure viewer allowing the user to visual- 
ize and rotate structure data entries from Entrez. Highly 
technical, for researchers. 

Cytogenetics Gallery 


Photos (karyotypes) of normal and abnormal chromo- 

DNA Learning Center, Cold Spring Harbor Laboratory 


Animated images of PCR and Southern Blotting tech- 

Gene Map from the 1996 Genome Issue of Science 

Click on particular areas of chromosomes and find genes. 

Images of Biological Molecules 

3-D structures of proteins and nucleic acids obtained from 
Brookhaven National Laboratory Protein Database and 

Lawrence Livermore National Laboratory Chromosome 19 
Physical Map 

Los Alamos National Laboratory Chromosome 16 
Physical Map 


Science Magazine Genome Issue (10/96)'icience/content/vot274/ 

Full text includes a "clickable" gene map. 

Science News 


Online version of weekly popular science magazine with 
full text of selected articles. 

Journals and Magazines 

HGMIS Journals Gateways Web page 


Choose "Journals, Books, Periodicals." 

Biochemistry and Molecular Biology Journals 

hnp://biochem. arach-net. com/beasley/joumals. html 

Comprehensive list. 

Nature, Nature Genetics, and Nature Biotechnology 

http://w WW. nature, com 

Abstracts of articles, fiill text of letters and editorials. 

Science Magazine 

http://www. .iciencemag. org 

Abstracts and some full-text articles. 

Medical (ienetics 

Blazing a Genetic Trail 

Illustrated booklet from the Howard Hughes Medical 
Institute on hunting for disease genes. 

Directory of National Genetic Voluntary Organizations 
and Related Resources 

Support groups for people with genetic diseases and 
their families. 


A database of more than 6000 genes; describes their 
functions, products, and biomedical applications. 

Gene Therapy 

Web course covering the basics, with links to other sites. 

Inherited-Disease Genes Found by Positional Cloning 

Links to OMIM. 

NIH Office of Recombinant DNA Activities 

Includes a database of human gene therapy protocols. 

Online Mendelian Inheritance in Man (OMIM) 

A comprehensive, authoritative, and up-to-date human 
gene and genetic disorder catalog that supports medical 
genetics and the Human Genome Project 

86 DOE Human Genome Program Report, Appendices 


Promoting Safe and Effective Genetic Testing in the 
United States (1997) 

Principles and recommendations by a joint NIH-DOE 
Human Genome Project group that examined the devel- 
opment and provision of gene tests in the United States. 

Understanding Gene Testing 

Illustrated brochure from the National Cancer Institute. 

Science in the News 

InScight: hnp://^cight 

Short summaries of major stories, some with links to 
related articles in other sources. 

HMS Beagle 

Biweekly electronic journal featuring major science 
stories, profiles, book reviews, and other items of interest 

Science Daily 

Headline stories, articles, and links to news services, 
newspapers, magazines, broadcast sources, journals, and 
organizations. Also offers weekly bulletins for updates 
by e-mail. 

Science Guide 

Daily news and information service and free science 
news e-maiier. Also contains directories of newsgroups, 
grant and funding resources, employment, and online 


http://www. sciencenow. org 

Daily online news service from Science magazine offers 
articles on major science news. 

Web Search Tools 

Biosciences Index to WWW Virtual Library 


"Search the Net" 

Comprehensive list of search tools, libraries, world fact 
books, and other useful information. 



Prepared August 1997 by 

Human Genome Management Information System 

Oak Ridge National Laboratory 

1060 Commerce Park, MS 6480 

Oak Ridge, TN 37830 


DOE Human Genome Program Report, Appendices 87 


Appendix E 
1996 Human Genome Research Projects 

Research abstracts of these projects appear in Part 2 of thLs re{>ort. 


Advanced Detectors for Mass Spectrometry 
W.H. Benner and J.M. Jaklevic 

Lawrence Berkeley National Laboratory, Berkeley, California 

Mass Spectrometer for Human Genome 


Chung-Hsuan Chen 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

Genomic Sequence Comparisons 
George Church 

Harvard Medical School, Boston, Massachusetts 

A PAC/BAC End-Sequence Data Resource for 
Sequencing the Human Genome: A 2- Year Pilot 
Pieter de Jong 

Roswell Park Cancer Institute, Buffalo, New Yoric 

Multiple-Column Capillary Gel Electrophoresis 
Norman Dovichi 

University of Alberta, Edmonton, Canada 

DNA Sequencing with Primer Libraries 

John J. Dunn and F. William Studier 

Brookhaven National Laboratory, Upton, New York 

Rapid Preparation of DNA for Automated 


John J. Dunn and F. William Studier 

Brookhaven National Laboratory. Upton, New York 

A PAC/BAC End-Sequence Database for 
Human Genomic Sequencing 

Glen A. Evans 

University of Texas Southwestern Medical Center, Dallas, Texas 

Automated DNA Sequencing by Parallel Primer 

Glen A. Evans 

University of Texas Southwestern Medical Center, Dallas, Texas 

^Parallel Triplex Formation as Possible 
Approach for Suppression of DNA- Viruses 
V.L. Florentiev 

Russian Academy of Sciences, Moscow, Russia 

Advanced Automated Sequencing Technology: 
Fluorescent Detection for Multiplex DNA 
Raymond F. Gesteland 

University of Utah, Salt Lake City, Utah 

Resource for Molecular Cytogenetics 
Joe Gray and Daniel Pinkel 

University of California, San Francisco 

DNA Sample Manipulation and Automation 

Trevor Hawkins 

Whitehead Institute and Massachusetts Institute of Technol- 
ogy, Cambridge, Massachusetts 

Construction of a Genome- Wide Characterized 
Clone Resource for Genome Sequencing 
Leroy Hood, Mark D. Adams,' and Melvin Simon' 

University of Washington, Seattle 

'The Institute for Genomic Research, RockvUle, Maryland 

California Institute of Technology, Pasadena, California 

DNA Sequencing Using Capillary Electrophoresis 
Barry L. Karger 

Northeastern University, Boston, Massachusetts 

Ultrasensitive Fluorescence Detection of DNA 
Richard A. Mathies and Alexander N. Glazer 

University of California, Berkeley 

Joint Human Genome Program Between 
Argonne National Laboratory and the 
Engelhardt Institute of Molecular Biology 
Andrei Mirzabekov 

Argonne National Laboratory, Argonne, Illinois, and 
Engelhardt Institute of Molecular Biology, Moscow, Russia 

High-Throughput DNA Sequencing: SAmple 

SEquencing (SASE) Analysis as a Framework 

for Identifying Genes and Complete 

Large-Scale Genomic Sequencing 

Robert K. Moyzis 

Los Alamos National Laboratory, Los Alamos. New Mexico 

One-Step PCR Sequencing 
Barbara Ramsay Shaw 

Duke University, Durham, North Carolina 

'Projects designated by an asterisk were funded through stnall emergency 
grants to Russian scientists following December 1992 site reviews by David 
Galas (formerly of OHER. renamed OBER in 1997). Raymond Gesteland 
(University of Utah), and Elbert Branscomb (LLNL). 

DOE Human Genome Program Report, Appendices 89 

51-217 98-8 


Autoniation of the Front End of DNA Sequencing 

Lloyd M. Smith and Richard A. Guilfoyle 

University of Wisconsin, Madison 

High-Speed DNA Sequence Analysis by Matrix- 
Assisted Laser Desorption Mass Spectrometry 

Lloyd M. Smith 

University of Wisconsin, Madison 

Analysis of Oligonucleotide Mixtures by 
Electrospray lonization-Mass Spectrometry 
Richard D. Smith 

Pacific Northwest National Laboratory, Richland, Washington 

High-Speed Sequencing of Single DNA Mol- 
ecules in the Gas Phase by FTICR-MS 
Richard D. Smith 

Pacific Northwest National Laboratory, Richland, Washington 

Characterization and Modification of DNA 
Polymerases for Use in DNA Sequencing 

Stanley Tabor 

Harvard University, Boston, Massachusetts 

Modular Primers for DNA Sequencing 
Levy Ulanovsky'-^ 

'Argonne National Laboratory, Argonne, Illinois 
Weizmann Institute of Science, Rehovot, Israel 

Time-of-Flight Mass Spectroscopy of DNA for 
Rapid Sequence 
Peter Williams 

Arizona State University, Tempe, Arizona 

Development of Instrumentation for DNA 
Sequencing at a Rate of 40 Million Bases Per Day 
Edward S. Yeung 

Iowa State University, Ames, Iowa 


Resolving Proteins Bound to Individual DNA 


David Allison and Bruce Warmack 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

^Improved Cell Electrotransformation by 


Alexandre S. Boitsov 

St. Petersburg State Technical University, St. Petersburg, Russia 
90 DOE Human Genome Program Report, Appendices 

Overcoming Genome Mapping Bottlenecks 
Charles R. Cantor 

Boston University, Boston, Massachusetts 

Preparation of PAC Libraries 
Pieter J. de Jong 

Roswell Park Cancer Institute, Buffalo, New York 

Chromosomes by Third-Strand Binding 
Jacques R. Fresco 

Princeton University, Princeton, New Jersey 

Chromosome Region-Specific Libraries for 
Human Genome Analysis 
Fa-Ten Kao 

Eleanor Roosevelt Institute for Cancer Research, Denver, 

^Identification and Mapping of DNA-Binding 
Proteins Along Genomic DNA by DNA-Protein 
Crosslin king 
V.L. Karpov 

Engelhardt Institute of Molecular Biology, Russian Academy 
of Sciences, Moscow, Russia 

A PAC/BAC Data Resource for Sequencing 
Complex Regions of the Human Genome: 
A 2- Year Pilot Study 
Julie R. Korenberg 

Cedars Sinai Medical Center, Los Angeles, California 

Mapping and Sequencing of the Human 
X Chromosome 

D. L. Nelson 

Baylor College of Medicine, Houston, Texas 

*Sequence-Specific Proteins Binding to the 

Repetitive Sequences of High Eukaryotic 


Olga Podgornaya 

Institute of Cytology, Russian Academy of Sciences, 
St. Petersburg, Russia 

♦Protein-Binding DNA Sequences 
O.L. Polanovsky 

Engelhardt Institute of Molecular Biology, Russian Academy 
of Sciences, Moscow, Russia 


*Developinent of Intracellular Flow Karyotype 


A.I. Poletaev 

Engelhardt Institute of Molecular Biology, Russian Academy 
of Sciences, Moscow, Russia 

Mapping and Sequencing with BACs and 


Melvin I. Simon 

California Institute of Technology, Pasadena, California 

Towards a Globally Integrated, 
Sequence-Ready BAG Map of the Human 
MelTin I. Simon 

California Instimte of Technology, Pasadena, California 

Generation of Normalized and Subtracted 
cDNA Libraries to Facilitate Gene Discovery 
Marcelo Bento Soares 

Columbia University, New York, New York 

Mapping in Man-Mouse Homology Regions 

Lisa Stubbs 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

Positional Cloning of Murine Genes 

Lisa Stubbs 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

Human Artificial Episomal Chromosomes 
(HAECS) for Building Large Genomic Libraries 
Jean-Michel H. Vos 

University of North Carolina, Chapel Hill 

♦Cosmid and cDNA Map of a Human 
Chromosome 13ql4 Region Frequently Lost 
at B Cell Chronic Lymphocytic Leukemia 

N.K. Yankovsky 

N.I. Vavilov Institute of General Genetics, Moscow, Russia 


BCM Server Core 
Daniel Davison 

Baylor College of Medicine, Houston, Texas 

A Freely Sharable Database-Management 
System Designed for Use in Component-Based, 
Modular Genome Informatics Systems 
Nathan Goodman 

The Jackson Laboratory, Bar Harbor, Maine 

A Software Environment for Large-Scale 


Mark Graves 

Baylor College of Medicine, Houston, Texas 

Generalized Hidden Markov Models for 
Genomic Sequence Analysis 
David Haussler 

University of California, Santa Cruz 

Identification, Organization, and Analysis of 
Mammalian Repetitive DNA Information 

Jerzy Jurka 

Genetic Information Research Institute, Palo Alto, California 

*TRRD, GERD and COMPEL: Databases on 
Gene-Expression Regulation as a Tool for 
Analysis of Functional Genomic Sequences 

N.A. Kolchanov 

Institute of Cytology and Genetics, Novosibirsk, Russia 

Data-Management Tools for Genomic Databases 

Victor M. Markowitz and l-Min A. Chen 

Lawrence Berkeley National Laboratory, Berkeley, California 

The Genome Topographer: System Design 
T. Man- 
Cold Spring Harbor Laboratory, Cold Spring Harbor, 
New York 

A Flexible Sequence Reconstructor for 
Large-Scale DNA Sequencing: A Customizable 
Software System for Fragment Assembly 
Gene Myers 

University of Arizona, Tucson 

The Role of Integrated Software and Databases 
in Genome Sequence Interpretation and 
Metabolic Reconstruction 
Ross Overbeek 

Argonne National Laboratory, Argonne, Illinois 

DOE Human Genome Program Report, Appendices 91 


Database IVansformations for Biological 


G. Christian Overton, Susan B. Davidson, and 

Peter Buneman 

University of Pennsylvania, Ptiiladelphia 

Las Vegas Algorithm for Gene Recognition: 
Suboptimal and Error- Tolerant Spliced 
Pavel A. Pevzner 

University of Southern California, Los Angeles, Califortua 

Foundations for a Syntactic Pattern- 
Recognition System for Genomic DNA 
Sequences: Languages, Automata, Interfaces, 
and Macromolecules 
David B. Searls 

SmithKline Beecham Pharmaceuticals, King of Prussia, 

Analysis and Annotation of Nucleic Acid 


David J. SUtes 

Washington University, Sl Louis, Missouri 

Gene Recognition, Modeling, and Homology 
Search in GRAIL and genQuest 
Edward C. Uberbacher 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

Informatics Support for Mapping in 
Mouse- Human Homology Regions 
Edward Uberbacher 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

SubmitData: Data Submission to Public 
Genomic Databases 
Manfred D. 2U)m 

Lawrence Berkeley National Laboratory, University of 
California, Berkeley 


The Human Genome: Science and the Social 
Consequences; Interactive Exhibits and Pro- 
grams on Genetics and the Human Genome 
Charles C. Carlson 

The Exploratorium, San Francisco, California 

Documentary Series for Public Broadcasting 
Graham Chedd and Noel Schwerin 

Chedd-Angier Production Company, Watertown, 

Human Genome Teacher Networking Project 
Debra L. Collins and R. Neil Schimke 

University of Kansas Medical Center, Kansas City, Kansas 

Human Genome Education Program 
Lane Conn 

Stanford Human Genome Center, Palo Alto, California 

Your World/Our World-Biotechnology & You: 
Special Issue on the Human Genome Project 
Jeff Davidson and Laurence Weinberger 

Pennsylvania Biotechnology Association, State College, 

The Human Genome Project and Mental 
Retardation: An Educational Program 

Sharon Davis 

The Arc of the United States, Arlington, Texas 

Pathways to Genetic Screening: Molecular 
Genetics Meets the High-Risk Family 
Troy Duster 

University of California, Berkeley 

Intellectual Property Issues in Genomics 

Rebecca S. Eisenberg 

University of Michigan Law School, Ann Arbor, Michigan 

AAAS Congressional Fellowship Program 
Stephen Goodman 

The American Society of Human Genetics, Bethesda, 

A Hispanic Educational Program for Scientific, 
Ethical, Legal, and Social Aspects of the Human 
Genome Project 
Margaret C. Jefferson and Mary Ann Sesma' 

California State University and 'Los Angeles Unified School 
District, Los Angeles, California 

Implications of the Genetidzation of Health 
Care for Primary Care Practitioners 
Mary B. Mahowald 

University of Chicago, Chicago, Illinois . 

92 DOE Human Genome Program Report, Appendices 


Nontraditional Inheritance: Genetics and the 
Nature of Science; Instructional Materials for 
High School Biology 
Joseph D. Mclnemey and B. EUen Friedman 

Biological Sciences Curriculum Study, Colorado Springs, 

The Human Genome Project: Biology, 
Computers, and Privacy: Development of 
Educational Materials for High School Biology 
Joseph D. Mclnerney and Lynda B. Micikas 
Biological Sciences Curriculum Study, Colorado Springs, 

Involvement of High School Students in Se- 
quencing the Human Genome 
Maureen M. Munn. Maynard V. Olson, and Leroy Hood 

University of Washington, Seattle 

The Gene Letter: A Newsletter on Ethical, Legal, 
and Social Issues in Genetics for Interested 
Professionals and Consumers 
PhiUp J. ReUly, Dorothy C. Wertz, and Robin J.R. Blatt 

The Shriver Center for Mental Retardation, Waltham, 

The DNA Files: A Nationally Syndicated Series 
of Radio Programs on the Social Implications of 
Human Genome Research and Its Applications 
Ban Scott 

Genome Radio Project, KPFA-FM, Berkeley, California 

Communicating Science in Plain Language: 

The Science-i- Literacy for Health: Human 

Genome Project 

Maria Sosa, Judy Kass, and Tracy Gath 

American Association for the Advancement of Science, 

Washington, D.C. 

The Community College Initiative 

Sylvia J. Spengler and Laurel Egenberger 

Lawrence Berkeley National Laboratory, Berkeley, California 

Genome Educators 

Sylvia Spengler and Janice Mann 

Lawrence Berkeley National Laboratory, Berkeley, California 

Getting the Word Out on the Human Genome 
Project: A Course for Physicians 

Sara L. Tobin and Ann Boughton' 

Stanford University, Palo Alto, California 
'Thumbnail Graphics, Oklahoma City, Oklahoma 

The Genetics Adjudication Resource Project 
Franklin M. Zweig 

Eiiutein Institute for Science, Health, and the Courts, 
Bethesda, Maryland 


Alexander HoUaender Distinguished 

Postdoctoral Fellowships 

Linda Holmes and Eugene Spejewski 

Oak Ridge Institute for Science and Education, Oak Ridge, 


Human Genome Management Information 


Betty K. Mansfield and John S. Wassom 

Oak Ridge National Laboratory, Oak Ridge, Tennessee 

Human Genome Program Coordination 

Sylvia J. Spengler 

Lawrence Berlceley National Laboratory, Berkeley, California 

Support of Human Genome Program Proposal 

Walter WUliams 

Oak Ridge Institute for Science and Education, Oak Ridge, 

Former Soviet Union Office of Health and 
Environmental Research Program 
James Wright 

Oak Ridge Institute for Science and Education, Oak Ridge, 


1996 Phase I 

An Engineered RNA/DNA Polymerase to 
Increase Speed and Economy of DNA 

Mark W. Knuth 

Promega Corporation, Madison, Wisconsin 

DOE Human Genome Program Report, Appendices 93 


Directed Multiple DNA Sequencing and 
Expression Analysis by Hybridization 
Giialberto Ruano 

BIOS Laboratories, Inc., New Haven, Connecticut 

1996 Phase II 

A Graphical Ad Hoc Query Interface Capable 
of Accessing Heterogeneous Public Genome 
Joseph Leone 

CyberConnect Corporation, Storrs, Connecticut 

Low-Cost Automated Preparation of Plasmid, 
Cosmid, and Yeast DNA 
William P. MacConnell 

MacConnell Research Corporation, San Diego, California 

GRAH^- Gen Quest: A Comprehensive 

Computational Framework for DNA Sequence 


RuUi Ann Manning 

ApoCora. Inc., Oak Ridge, Tennessee 

94 DOE Hum«n Qenome Program Report, Appendices 

Appendix F: DOE BER Program 

Text and phoUn in this appendix first appeared in a brvchurr 
prepared by the Human Genome Management Information 
System for the DOE Office of Biological and Environmental 
Research to announce a symposium celthrating SO years of 
achievements in the Hiological and Environmental Research 
Program. "Serving Science and Society into the New 
Millennium" was held on May 21-22. 1997. at the National 
Academy of Sciences in Washington. O.C. The color 
brochure and other recent publications related to BER 
research, including the historically comprehensive A Vital 
Legacy, may be obtained from HGMIS at the address on the 
inside front cover. 

DOE Human Genome Program Report, Appendices 95 


Biologic^ and Envtranmentaf Ros^ch Progrmm 

Arlft(td«s Patrinoa Ph D 

Associate Dirvcfor for Erorgy Retitarch 

for the 

OtticB of Biologtcat and Envifonm«mal R»««vch 

U S Ottpsnmsnt of Energy 

301/903-3251 Fa* 301^903-5051 

DOE Biological and 
Environmental Research 

An Extraordinary Legacy 

To L'\plou ihc h-^ijiiiilc-.^ promise ot cnCTg> icthnologii's .uid >.ht;d 
light on ihcH . 'i uu. (ui-<; lo public hcal'.h and ihc cnviroimwD!. 
thic Biological jiid i;iiii!omncnlal Rcsciirch progntm ol ihc t' S 
Dcp.-irmif!ii Hi fincrgys iDOti 0!li'.c n( Hi;;illh .tiul 
Enviroomcmai Research (OHERl has ctjgagcd in a vancly of 
muludLSciplinary' rcse;uvh acliviticN 

• [--siahlishuis; ihi- '.vorld^. (irs; Hutii-iii CIciiDrvK !'riif;fa:u 

• Developing ad'.iiKcd di,i>;iii'^;u i'»iK .ind 
UwaunenLs for human Jlscjsc 

• Assessing the health effects of radiation. 


N«ti(MuD utiofftary 

StaitfaM SmftrattOD 
Radutton ufeooianf 




ItSIOf Dot fKttfliM 

tar S>mtt«r>l Biskiif R«e>res 

^J 1^-lrcfi Biu'e Sesre* {3t 




Kitted _ 

Utonttofv — — tJ 


National User Facilities 

l>edkat«!<l biomedkal rcMiurcrs, such as 
those maintainfd hv Kl- R a! >«'ver»l DtlE 
laboratorto. art dv ailahte al littlf iir no 
charge. Thes« revmint^s rnable •rtimli'its 
to gain an nndfr^tandlng of retatkinshipsi 
bf iween bUiliigical struitnns* and their 
functiuns, study dii> prwessc, 
develop n»» pharmaceuticals, ami 
conduct bask research m molecular 
biolti^v and environmental proctsss. 

William R. \VdeN F:nviri>iimenUd Molecular Sciemo I iil>ornti>ri (MM i^ .i 
nalioDal mliaborative UM-r facility for pnividini! iniMivalive approaches to meet 
the net-its of l>Of > environmental mKsion!-- 
96 DOE Mteoao Qe<»«t» Program B«j5«tfAi!peRit<a!* 


An Enduring Mandate 

DOE is carrying forward Congressional mandates that began 
with its predecessors, the Atomic Energy Commission and the 
Energy Research and Development Agency: 

Contribute to a Healthy Citizenry 

• Develop innovative technologies for tomorrow's 
biomedical sciences. 

• Provide the basis for individual risk assessments by 
determining the human genome's fine structure by the 
year 2005. 

• Conduct research into advanced medical technologies 
and radiopharmaceuticals. 

• Build and support national user facilities for 
determining biological structure, and ultimately 
function, at the molecular and cellular level. 

Understand Global Climate 

Predict the effects of energy production and its use on the 
regional and global environment by acquiring data and 
developing the necessary understanding of environmental 

Contribute to Environmental 

Conduct fundamental research to establish a better 
scientific basis for remediating contaminated sites. 

DOE user facilities are revealing the molecular details of 
life. Knowing the 3-D structure of the ras protein (above), 
an important molecular switch governing human cell 
growth, will enable interventions to shut off this switch in 
cancer cells. 


Determining the fine structure— DNA sequence — of the 
microorganism Methanococcus jannaschii (pictured at right, 
top) and other minimal life forms In DOE's Microbial 
Genome Program will benefit medicine, agriculture, 
industrial and energy production, and environmental 
bloremedlatlon. The circular representation of the single 
M. jannaschii chromosome, which was fiiUy sequenced In 
1996, illustrates the location of genes and other important 
features. (Vertical bar represents a portion of a sequencing 

p % 


DOE Human Genome Program Report, Appendices 97 


Fifty Years of Achievements, , . 

Leading to Innovative Solutions 

Tools for Medicine and Research 

Radioisotopes developed for medicine and medical imaging are 
being merged with current knowledge in biology and genetics to 
discover new ways of diagnosing and treating cancer and other 
disorders, detecting genes in action, and understanding normal 
development and function of human organ systems. 

• Radioactive molecules used in medical imaging for positron 
emission tomography (PET) and magnetic resonance imaging 
(MRI) allow noninvasive diagnosis, monitoring, and 
exploration of human disorders and their treatments. 

• Isotopes and other tracers of 
brain activity are being used to 
explore drug addiction, the 
effects of smoking, 
Alzheimer's disease, 
Parkinson's disease, and 

• Technetium-99m is used to 
diagnose diseases of the 
kidney, liver, heart, brain, and 
other organs in about 
13 million patients per year. 

• Striking successes have been 
achieved using charged atomic 
particles to treat thyroid diseases, 
pituitary tumors, and eye cancer, 
among other disorders. 

Genome Projects 

A legacy of DOE research on genetic 
effects paved the way for the world's 
first Human Genome Program. Now new 
genomic technologies are being applied 
to environmental cleanup through the 
DOE Natural and Accelerated 
Bioremediation Research and Microbial 
Genome programs, healthcare and risk 
assessment, and such other national 
priorities as industrial processes and 

One-quarter of all patients in U.S. 
hospitals undergo tests using descendants 
of cameras developed by BER to follow 
radioactive tracers in the lM>dy. PET 
scanning has been key to a generation of 
brain metabolism studies as well as 
diagnostic tests for heart disease and 
cancer. PET studies al>ove reveal brain 
metabolism differences in recovering 
alcoholics (left, 10 days, and right, 
30 days, after withdrawal from alcohol). 

The laser-based flow 
cytometer developed at 
DOE national 
lalntratories enables 
researchers to separate 
human chromosomes 
for analysis. 


Discover the breadth of current activities and recent accomplishments via the BER Web Site: 
DOE Human Genome Program Report, Appendices 


Radiation Risks and Protection Guidelines 

BER studies have become the foundation for laws and 
standards that protect the population, including workers 
exposed to radiological sources; 

• Guidelines for the safe use of diagnostic X rays and 

• Safety standards for the presence of radionuclides in 
food and drinking water. 

• Radiation-detection systems and dosimetry 

Finding a Link Between DNA Damage 
and Cancers 

Studies of DNA damage have uncovered similar 
mechanisms at work in damage caused by radiation 
exposure, X rays, ultraviolet light, and cancer-causing 
chemicals. A screening test for such chemicals is now 
one of the first hurdles a new compound must clear on 
it^ way to regulatory and public acceptance. 

Tracking the Regional and Global 
Movement of Pollutants 

BER research helped to establish the earliest and most 
authoritative monitoring network in the world to 
detect airborne radioisotopes. The use of atmospheric 
tracers has led to the improved ability to predict the 
dispersion of pollutants. 

Understanding Global Change 

Important achievements in environmental research 
have led to enhanced capabilities in studying global 
change, including more accurate predictions of 
global and regional climate changes induced by 
increasing atmospheric concentrations of 
greenhouse gases. 

Human chromosomes "painted" by fluorescent dyes to detect 
abnormal exchange of genetic material frequently present in 
cancer. Chromosome paints also serve as valuable resources for 
other clinical and research applications. 

4 % • • • (it's) not SO much where we stand 
as in what direction we are moving. 

[Oliver Wendell Homes, Sr.J J J 


computing is 
faster and 
more realistic 
solutions to 
climate change. 

The Unmanned Aerospace Vehicle (above) conducts 
measurements to quantify the fate of solar radiaUoD falling on 
the earth. 

Creating a New Science of Ecology 

BER achievements in using radioactive tracers to follow 
the movements of animals, routes of chemicals through 
food chains, decomposition of forest detritus, together 
with the program's introduction of computer simulations, 
created the new field of ladioecology . 

DOE Human Genome Program Report, Appendices 99 





This glos$JU7 was adapted from definitions in llie DOE 
Primer on MoUeular Gtnetici (1992). 

Adenine (A): A nitrogenous base, one member of the base 
pair A-T (adenine-thymine). 

Allele: Alternative form of a genetic locus; a single allele for 
each locus is inherited separately from each parent (e.g.. at a 
locus for eye color the allele might result in blue or brown 

Amino add: Any of a class of 20 molecules that are com- 
bined to form proteins in living things. The sequence of 
amiiK) acids in a protein and hence protein function are deter- 
mined by tbe genetic code. 

Amplification: An increase in the number of copies of a spe- 
cific DNA fragment; can be in vivo or in vitro. See cloning, 
polymeiase chain reaction. 

Arrayed library: Individual primary recombinant clones 
(hosted in phage, cosmid, YAC, or other vector) that are 
placed in two-dimensional arrays in microtiter dishes. Each 
primary clone can be identified by the identity of the plate 
and the clone location (row and column) on that plate. Ar- 
rayed libraries of clones can be used for many applications, 
including screening for a specific gene or genomic region of 
interest as well as for physical mapping- Information gath- 
ered on individual clones from various genetic linkage and 
physical map analyses is entered into a relational database 
and used to construct physical and genetic linkage maps si- 
multaneously; clone identifiers serve to inteirelate the multi- 
level maps. Compare library, genomic library. 

Autoradiography: A technique that uses X-ray film to visu- 
alize radioactivcly labeled molecules or fragments of mol- 
ecules; used in analyzing length and number of DNA frag- 
ments after they are separated by gel electrophoresis. 

Autosome: A chromosome not involved in sex determina- 
tion. The diploid human genome consists of 46 chromo- 
somes, 22 pairs of autosomes, and I pair of sex chromo- 
somes (tbe X and Y chromosomes). 


BAC: See bacterial artificial chromosome. 

Bacterial artificial chromosome (BAC): A vector used to 
clone DNA fragments (100- to 300-kb insert size; average, 
ISO kb) in Eschtrichia coli cells. Based on naturally occur- 
ring F-factor plasmid found in the bacterium £. coli. Com- 
pare cloning vector. 

Bacteriophage: See phage. 

Base pair (bp): Two nitrogenous bases (adenine and thym- 
ine or guanine and cytosinc) held together by weak bonds. 
Two strands of DNA are held together in the shape of a 
double helix by the bonds between base pairs. 

Base sequence: The order of nucleotide bases in a DNA 

Base sequence analysis: A method, sometimes automated, 
for determining the base sequence. 

Biotechnology: A set of biological techniques developed 
through basic research and now applied to research and prod- 
uct development. In particular, tbe use by industry of recom- 
binant DNA, cell fusion, and new bioprocessing techniques. 

bp: See base pair. 

cDNA: See complementary DNA. 

Centimotsan (cM): A unit of measure of recombination fre- 
quency. One centimoigan is equal to a 1% chance that a 
marker at one genetic locus will be separated from a marlcer 
at a second locus due to crossing over in a single generation. 
In human beings, I cenlimorgan is equivalent, on average, to 
I million base pairs. 

CenlTonwrc: A specialized chromosome region to which 
spindle fibeis attach dining cell division. 

Chromosome: The self -replicating genetic structure of cells 
containing the cellular DNA that bears in its. nucleotide se- 
quence the linear array of genes. In prokaryotes, chromo- 
somal DNA is circular, and the entire genome is carried on 
one chromosome. Eulcaryotic genomes consist of a number 
of chromosomes whose DNA is associated with different 
kinds of proteins. 

Clone bank: See genomic library. 

Clone: A group of cells derived from a single ancestor. 

Ooning: The process of asexually producing a group of 
cells (clones), all genetically identical, from a single ances- 
tor In recombinant DNA technology, the use of DNA ma- 
nipulation procedures lo produce multiple copies of a single 
gene or segment of DNA is referred to as cloning DNA. 

DOE Human Oanoma Prosrani Rapoft, Otauary 101 


Cloning vector: DNA molecule originating firom a vims, a 
plasmid, or the cell of a higher organism into which another 
DNA fragment of appropriate size can be integrated without 
loss of the vectors capacity for self-replication; vectors intro- 
duce foreign DNA into host cells, where it can be reproduced 
in large quantities. Examples are plasmids, cosmids, and 
yeast artificial chromosomes; vectors are often recombinant 
molecules containing DNA sequences ftom several sources. 

cM: See centimorgan. 

Code: See genetic code. 

Codon: See genetic code. 

Complementary DNA (cDNA): DNA that is synthesized 
from a messenger RNA template; the single-stranded form is 
often used as a probe in physical mapping. 

Complementary sequence: Nucleic acid base sequence that 
can form a double-stranded structure by matching base pairs 
with another sequence; the complementary sequence to 
G-T-A-C is C A T-G. 

Conserved sequence: A base sequence in a DNA molecule 
(or an amino acid sequence in a protein) that has remained 
essentially unchanged throughout evolution. 

Contig: Group of clones representing overlapping regions of 
a genome. 

Contig map: A map depicting the relative order of a linked 
library of small overlapping clones representing a complete 
chromosomal segment 

Cosmid: Artificially constructed cloning vector containing 
the cos gene of phage lambda. Cosmids can be packaged in 
lambda phage particles for infection into £. colt; this permits 
cloning of larger DNA fragments (up to 45 kb) than can be 
introduced into bacterial hosts in plasmid vectors. 

Crossing over The breaking during meiosis of one maternal 
and one piatemal chromosome, the exchange of correspond- 
ing sections of DNA, and the rejoining of the chromosomes. 
This process can result in an exchange of alleles between 
chromosomes. Compare recombination. 

Cytosine (C): A nitrogenous base, one member of the base 
pair G-C (guanine and cytosine). 


Deoxyribonudeotkie: See nucleotide. 

102 DOE Human Q«nonw Program Report, Qlonary 

Diploid: A full set of genetic material, consisting of paired 
chromosomes one chromosome from each parental set. Most 
animal cells except the gametes have a diploid set of chro- 
mosomes. The diploid human genome has 46 chromosomes. 
Compare haploid. 

DNA (deoxyribonucleic add): The molecule that encodes 
genetic information. DNA is a double-stranded molecule 
held together by weak bonds between base pairs of nucle- 
otides. The four nucleotides in DNA contain the bases; ad- 
enine (A), guanine (G), cytosine (C), and thymine (T). In 
nature, base pairs form only between A and T and between G 
and C; thus the base sequence of each single strand can be 
deduced from that of its partner. 

DNA probe: See probe. 

DNA replication: The use of existing DNA as a template for 
the synthesis of new DNA strands. In humans and other eu- 
karyotes, replication occurs in the cell nucleus. 

DNA sequence: The relative order of base pairs, whether in 
a fragment of DNA, a gene, a chromosome, or an entire ge- 
nome. See base sequence analysis. 

Domain: A discrete portion of a protein with its own func- 
tion. The combination of domains in a single protein deter- 
mines its overall function. 

Double helix: The shape that two linear strands of DNA as- 
sume when bonded together. 


E. coU: Common bacterium that has been studied intensively 
by geneticists because of its small genome size, normal lack 
of pathogenicity, and ease of growth in the laboratory. 

Electrophoresis: A method of separating large molecules 
(such as DNA fragments or proteins) from a mixture of simi- 
lar molecules. An electric current is passed through a me- 
dium containing the mixture, and each kind of molecule trav- 
els through the medium at a different rate, depending on its 
electrical charge and size. Separation is based on these differ- 
ences. Agarose and acrylamide gels are the media commonly 
used for electrophoresis of proteins and nucleic acids. 

Endonudease: An enzyme that cleaves its nucleic acid sub- 
strate at internal sites in the nucleotide sequence. 

Enzyme: A protein that acts as a catalyst, speediqg the rate at 
which a biochemical reaction proceeds but not altering the 
direction or nature of the reaction. 


EST: Expressed sequence Ug. See sequence tagged site. 

Eukaryote: Cell or organism with membrane-bound, struc- 
turally discrete nucleus and other well-developed subcellular 
compartments. Eukaryotes include all organisms except 
viruses, bacteria, and blue-green algae. Compare prokaryote. 
See chromosome. 

Evolutionarily conserved: See conserved sequence. 

Exogenous DNA: DNA originating outside an organism. 

Exon: The protein-coding DNA sequence of a gene. Com- 
pare intron. 

ExonudeKc: An enzyme that cleaves nucleotides sequen- 
tially from free ends of a linear nucleic acid substrate. 

Expressed gene: See gene expression. 

FISH (fluorescence in situ hybridization): A physical map- 
ping approach that uses fluorescein tags to detect hybridiza- 
tion of probes with metaphase chromosomes and with the 
less-condensed somatic interphase chromatin. 

Flow cytometry: Analysis of biological material by detec- 
tion of the light-absorbing or fluorescing properties of cells 
or subcellular fractions (i.e., chromosomes) passing in a nar- 
row stream through a laser beam. An absorbance or fluores- 
cence profile of the sample is produced. Automated sorting 
devices, used to fractionate samples, sort successive droplets 
of the analyzed stream into different fractions depending on 
the fluorescence emitted by each droplet. 

Flow karyotyping: Use of flow cytometry to analyze and 
separate chromosomes on the basis of their DNA content. 

Gamete: Mature male or female reproductive cell (sperm or 
ovum) with a haploid set of chromosomes (23 for humans). 

Gene: The fundamental physical and functional unit of he- 
redity. A gene is an ordered sequence of nucleotides located 
in a particular position on a particular chromosome that en- 
codes a specific functional product (i.e., a protein or RNA 
molecule). See gene expression. 

Gene cxpressioii: The process by which a gene's coded in- 
formation is converted into the structures present and operat- 
ing in the cell. Expressed genes include those that are tran- 
scribed into mRNA and then translated into protein and those 
that are transcribed into RNA but not translated into protein 
(e.g., transfer and ribosomal RNAs). 

Gene family: Group of closely related genes that make simi- 
lar products. 

Gene library: See genomic library. 

Gene mapping: Determination of the relative positions of 
genes on a DNA molecule (chromosome or plasmid) and of 
the distance, in linlcage units or physical units, between them. 

Gene product: The biochemical material, either RNA or 
protein, resulting from expression of a gene. The amount of 
gene product is used to measure how active a gene is; abnor- 
mal amounts can be correlated with disease-causing alleles. 

Genetic code: The sequence of nucleotides, coded in triplets 
(codons) along the mRNA, that determines the sequence of 
amino acids in protein synthesis. The DNA sequence of a 
gene can be used to predict the mRNA sequence, and the ge- 
netic code can in turn be used to predict the amino acid se- 

Genetic engineering technology: See recombinant DNA 

Genetic map: See linkage map. 

Genetic material: See genome. 

Genetics: The study of the patterns of inheritance of .specific 

Genome: All the genetic material in the chromosomes of a 
particular organism; its size is generally given as its total 
number of base pairs. 

Genome project; Research and technology development 
effort aimed at mapping and sequencing some or all of the 
genome of human beings and other organisms. 

Genomic library: A collection of clones made from a set of 
randomly generated overlapping DNA fragments represent- 
ing the entire genome of an organism. Compare library, ar- 
rayed library. 

Guanine (G): A nitrogenous base, one member of the base 
pair G-C (guanine and cytosine). 

DOE Human Genome Program Report, Glossary 103 



Haploid: A single set of chromosomes (half the full set of 
genetic material), present in the egg and sperm cells of ani- 
mals and in the egg and pollen cells of plants. Human beings 
have 23 chromosomes in their reproductive cells. Compare 

Heterozygosity: The presence of different alleles at one or 
more loci on homologous chromosomes. 

Homeobox: A short stretch of nucleotides whose base se- 
quence is virtually identical in all the genes that contain it. It 
has been found in many organisms from fhiit flies to human 
beings. In the fruit fly, a homeobox appears to determine 
when particular groups of genes are expressed during devel- 

Homology: Similarity in DNA or protein sequences between 
individuals of the same species or among different species. 

Homologous chromosome: Chromosome containing the 
same linear gene sequences as another, each derived from 
one parent. 

Human gene therapy: Insertion of normal DNA directly 
into cells to correct a genetic defect 

Human Genome Initiative: Collective name for several 
projects begun in 1986 by DOE to (1) create an ordered set 
of DNA segments from known chromosomal locations, 
(2) develop new computational methods for analyzing ge- 
netic map and DNA sequence data, and (3) develop new 
techniques and instruments for detecting and analyzing 
DNA. This DOE initiative is now known as the Human Ge- 
nome Program. The national effort, led by EMDE and NIH, is 
known as the Human Genome Project 

Hybridization: The process of joining two complementary 
strands of DNA or one each of DNA and RNA to form a 
double-stranded molecule. 

Informatics: The study of the application of computer and 
statistical techniques to the management of information. In 
genome projects, informatics includes the development of 
methods to search databases quickly, to analyze DNA se- 
quence information, and to predict protein sequence and 
structure from DNA sequence data. 

In situ hybridization: Use of a DNA or RNA probe to de- 
tect the presence of the complementary DNA sequence in 
cloned bacterial or cultured eukaryotic cells. 

Interphase: The period in the cell cycle when DNA is repli- 
cated in the nucleus; followed by mitosis. 

Intron: The DNA base sequence interrupting the protein- 
coding sequence of a gene; this sequence is transcribed into 
RNA but is cut out of the message before it is translated into 
protein. Compare exon. 

In vitro: Outside a living organism. 


Karyotype: A photomicrograph of an individual's chromo- 
somes arranged in a standard format showing the number, 
size, and shape of each chromosome type; used in 
low-resolution physical mapping to correlate gross chromo- 
somal abnormalities with the characteristics of specific dis- 

I(b: See kilobase. 

Kilobase (kb): Unit of length for DNA fragments equal to 
1000 nucleotides. 

Library: An unordered collection of clones (i.e., cloned 
DNA from a particular organism), whose relationship to each 
other can be established by physical mapping. Compare ge- 
nomic library, arrayed library. 

Linkage: The proximity of two or more markers (e.g., genes, 
RFLP markers) on a chromosome; the closer together the 
markers are, the lower the probability that they will be sepa- 
rated during DNA repair or replication processes (binary fis- 
sion in prokaryotes, mitosis or meiosis in eukaryotes), and 
hence the greater the probability that they will be inherited 

Linkage map: A map of the relative positions of genetic loci 
on a chromosome, determined on the basis of how often the 
loci are inherited together. Distance is measured in 
centimorgans (cM). 

Localize: Determination of die original position (locus) of a 
gene or other marker on a chromosome. 

1 04 DOE Human Genome Program Report, Glossary 


Locus (pi. lod): The position on a chromosome of a gene or 
other chromosome marker; also, the DNA at that position. 
The use of locus is sometimes restricted to mean regions of 
DNA that are expressed. See gene expression. 


Macrorestriction map: Map depicting the order of and dis- 
tance between sites at which restriction enzymes cleave chro- 

Mapping: See gene mapping, linkage map, physical map. 

Marker: An identifiable physical location on a chromosome 
(e.g., restriction enzyme cutting site, gene) whose inheritance 
can be monitored. Markers can be expressed regions of DNA 
(genes) or some segment of DNA with no known coding 
function but whose pattern of inheritance can be determined. 
See RFLP, restriction fragment length polymorphism. 

Mb: See megabase. 

Megabase (Mb): Unit of length for DNA fragments equal to 
1 million nucleotides and roughly equal to 1 cM. 

Meiosis: The process of two consecutive cell divisions in the 
diploid progenitors of sex cells. Meiosis results in four rather 
than two daughter cells, each with a haploid set of chromo- 

Messenger RNA (mRNA): RNA that serves as a template for 
protein synthesis. See genetic code. 

Metaphase: A stage in mitosis or meiosis during which the 
chromosomes are aligned along the equatorial plane of the cell. 

Mitosis: The process of nuclear division in cells that produces 
daughter cells that are genetically identical to each other and 
to the parent cell. 

mRNA: See messenger RNA. 

Multifactorial or multigenlc disorder: See polygenic 

Multiplexing: A sequencing approach that uses several pooled 
samples simultaneously, greatly increasing seque-xcing speed. 

Mutation: Any heritable change in DNA sequence. Compare 


Nitrogenous base: A nitrogen-containing molecule having 
the chemical properties of a base. 

Nucleic acid: A large molecule composed of nucleotide sub- 

Nucleotide: A subunit of DNA or RNA consisting of a ni- 
trogenous base (adenine, guanine, thymine, or cytosine in 
DNA; adenine, guanine, uracil, or cytosine in RNA), a phos- 
phate molecule, and a sugar molecule (deoxyribose in DNA 
and ribose in RNA). Thousands of nucleotides are linked to 
form a DNA or RNA molecule. See DNA, base pair, RNA. 

Nucleus: The cellular organelle in eukaryotes that contains 
the genetic material. 


Oncogene: A gene, one or more forms of which is associated 
with cancer. Many oncogenes are involved, directly or indi- 
rectly, in controlling the rate of cell growth. 

Overlapping clones: See genomic library. 

Pl-derived artificial chromosome (PAC): A vector used to 
clone DNA fragments ( 100- to 300-kb insert size; average, 
150 kb) in Escherichia coli cells. Based on bacteriophage (a 
virus) PI genome. Compare cloning vector. 

PAC: See Pl-derived artificial chromosome. 

PCR: See polymerase chain reaction. 

Phage: A virus for which the natural host is a bacterial cell. 

Physical map: A map of the locations of identifiable land- 
marks on DNA (e.g., restriction enzyme cutting sites, genes), 
regardless of inheritance. Distance is measured in base pairs. 
For the human genome, the lowest-resolution physical map 
is the banding patterns on the 24 different chromosomes; the 
highest-resolution map would be the complete nucleotide 
sequence of the chromosomes. 

DOE Human Genome Program Report, Glossary 105 


Plasmid: Autonomously replicating, extrachromosomal cir- 
cular DNA molecules, distinct from the normal bacterial ge- 
nome and nonessential for cell survival under nonselective 
conditions. Some plasmids are capable of integrating into the 
host genome. A number of artificially constructed plasmids 
are used as cloning vectors. 

Polygenic disorder: Genetic disorder resulting ftom the 
combined action of alleles of more than one gene (e.g., heart 
disease, diabetes, and some cancers). Although such disor- 
ders are inherited, they depend on the simultaneous presence 
of several alleles; thus the hereditary patterns are usually 
more complex than those of single-gene disorders. Compare 
single-gene disorders. 

Polymerase chain reaction (PCR): A method for amplify- 
ing a DNA base sequence using a heat-stable polymerase and 
two 20-base primers, one complementary to the (-t-)-strand at 
one end of the sequence to be amplified and the other 
complementary to the (-)-strand at the other end. Because the 
newly synthesized DNA strands can subsequently serve as 
additional templates for the same primer sequences, succes- 
sive rounds of primer annealing, strand elongation, and dis- 
sociation produce rapid and highly specific amplification of 
the desired sequence. PCR also can be used to detect the ex- 
istence of the defined sequence in a DNA sample. 

Polymerase, DNA or RNA: Enzymes that catalyze the syn- 
thesis of nucleic acids on preexisting nucleic acid templates, 
assembling RNA from ribonucleotides or DNA from deox- 

Polymorphism: Difference in DNA sequence among indi- 
viduals. Genetic variations occurring in more than 1% of a 
population would be considered useful polymorphisms for 
genetic lirkkage analysis. Compare mutation. 

Primer: Short preexisting polynucleotide chain to which new 
deoxyribonucleotides can be added by DNA polymerase. 

Probe: Single-stranded DNA or RNA molecules of specific 
base sequence, labeled either radioactively or immunologi- 
cally, that are used to detect the complementary base se- 
quence by hybridization. 

Prokaryote: Cell or organism lacking a membrane-bound, 
structurally discrete nucleus and other subcellular compart- 
ments. Bacteria are prokaryotes. Compare eukaryote. See 

Promoter: A site on DNA to which RNA polymerase will 
bind and initiate transcription. 

Protein: A large molecule composed of one or more chains 
of amino acids in a specific order, the order is determined by 
the base sequence of nucleotides in the gene coding for the 
protein. Proteins are required for the structure, function, and 
regulation of the bodys cells, tissues, and organs, and each 
protein has unique functions. Examples are hormones, en- 
zymes, and antibodies. 

Purine: A nitrogen-containing, single-ring, basic compound 
that occurs in nucleic acids. The purines in DNA and RNA 
are adenine and guanine. 

Pyrimidine: A nitrogen-containing, double-ring, basic com- 
pound that occurs in nucleic acids. The pyrimidines in DNA 
are cytosine and thymine; in lU^A, cytosine and uracil. 


Rare-cutter enzyme: See restriction enzyme cutting site. 

Recombinant done: Clone containing recombinant DNA 
molecules. See recombinant DNA technology. 

Recombinant DNA molecules: A combination of DNA mol- 
ecules of difrerent origin that are joined using recombinant 
DNA technologies. 

Recombinant DNA technology: Procedure used to join to- 
gether DNA segments in a cell-free system (an environment 
outside a cell or organism). Under appropriate conditions, a 
recombinant DNA molecule can enter a cell and replicate 
there, either autonomously or after it has become integrated 
into a cellular chromosome. 

Recombination: The process by which progeny derive a 
combination of genes different from that of either parent. In 
higher organisms, this can occur by crossing over. 

Regulatory region or sequence: A DNA base sequence that 
controls gene expression. 

Resolution: Degree of molecular detail on a physical map of 
DNA. ranging from low to high. 

Restriction enzyme, endonuclease: A protein that recog- 
nizes specific, short nucleotide sequences and cuts DNA at 
those sites. Bacteria contain over 400 such enzymes that rec- 
ognize and cut over 100 different DNA sequences. See re- 
striction enzyme cutting site. 

106 OOE Human Genome Program Report, Glossary 


Restriction enzyme cutting site: A specific nucleotide se- 
quence of DNA at which a particular restriction enzyme cuts 
the DNA. Some sites occur frequently in DNA (e.g., every 
several hundred base pairs), others much less frequently 
(rare-cutter; e.g., every 10,0(X) base pairs). 

Restriction fragment length polymorphism (RFLP): 

Variation between individuals in DNA fragment sizes cut by 
specific restriction enzymes; polymorphic sequences that 
result in RFLPs are used as markers on both physical maps 
and genetic linkage maps. RFLPs are usually caused by mu- 
tation at a cutting site. See marker. 

RFLP: See restriction fragment length polymorphism. 

Ribonucleic acid (RNA): A chemical found in the nucleus 
and cytoplasm of cells; it plays an important role in protein 
synthesis and other chemical activities of the cell. The struc- 
ture of RNA is similar to that of DNA. There are several 
classes of RNA molecules, including messenger RNA, transfer 
RNA, ribosomal RNA, and other small RNAs, each serving 
a different purpose. 

Ribonucleotide: See nucleotide. 

Sex chromosome: The X or Y chromosome in human be- 
ings that determines the sex of an individual. Females have 
two X chromosomes in diploid cells; males have an X and a 
Y chromosome. The sex chromosomes comprise the 23rd 
chromosome pair in a karyotype. Compare autosome. 

Shotgun method: Cloning of DNA fragments randomly 
generated from a genome. See library, genomic library. 

Single-gene disorder: Hereditary disorder caused by a mu- 
tant allele of a single gene (e.g., Duchenne muscular dys- 
trophy, retinoblastoma, sickle cell disease). Compare poly- 
genic disorders. 

Somatic cell: Any cell in the body except gametes and their 

Southern blotting: Transfer by absorption of DNA firag- 
ments separated in electrophoretic gels to membrane filters 
for detection of specific base sequences by radiolabeled 
complementary probes. 

STS: See sequence tagged site. 

Ribosomal RNA (rRNA): A class of RNA found in the ribo- 
somes of cells. 

Ribosomes: Small cellular components composed of spe- 
cialized ribosomal RNA and protein; site of protein synthe- 
sis. See ribonucleic acid (RNA). 

RNA: See ribonucleic acid. 

Sequence: See base sequence. 

Sequence tagged site (STS): Short (2(X) to 500 base pairs) 
DNA sequence that has a single occurrence in the human 
genome and whose location and base sequence are known. 
Detectable by polymerase chain reaction, STSs are useful for 
localizing and orienting the mapping and sequence data re- 
ported from many different laboratories and serve as land- 
marks on the developing physical map of the human ge- 
nome. Expressed sequence tags (ESTs) are STSs derived 
from cDNAs. 

Sequencing: Determination of the order of nucleotides (base 
sequences) in a DNA or RNA molecule or the order of amino 
acids in a protein. 

Tandem repeat sequences: Multiple copies of the same 
base sequence on a chromosome; used as a marker in 
physical mapping. 

Technology transfer: The process of converting scientific 
findings from research laboratories into useful products by 
the commercial sector. 

Telomere: The end of a chromosome. This specialized 
structure is involved in the replication and stability of linear 
DNA molecules. See DNA replication. 

Thymine (T): A nitrogenous base, one member of the base 
pair A-T (adenine-thymine). 

Transcription: The synthesis of an RNA copy from a se- 
quence of DNA (a gene); the first step in gene expression. 
Compare translation. 

Transfer RNA (tRNA): A class of RNA having structures 
with triplet nucleotide sequences that are complementary to 
the triplet nucleotide coding sequences of mRNA. The role 
of tRNAs in protein synthesis is to bond with amino acids 
and transfer them to the ribosomes, where proteins are as- 
sembled according to the genetic code carried by mRNA. 

n'^i> Munan Oenome Program Report Gtossary 1 07 


Tnuisformatioii: A process by which the genetic material Virus: A noncellular biological entity that can reproduce 

carried by an individual cell is altered by incorporation of only within a host cell. Viruses consist of nucle.c acid cov- 

exogenous DNA into its genome. ered by protein; some animal viruses are also surrounded by 

membrane. Inside the infected cell, the vums uses the syn- 
Translation: The process in which the genetic code carried thetic capability of the host to produce progeny virus, 
by mRNA directs the synthesis of proteins from amino acids. 
Compare transcription. VLSI: Very large scale integration allowmg more than 

100,000 transistors on a chip. 
tRNA: See transfer RNA. 


YAC: See yeast artificial chromosome. 

Uracil: A nitrogenous base normally found in RNA but not 

DNA- uracU is capable of forming a base pair with adenme. Yeast artmdal chromosome (YAC): A vector used to clone 

DNA fragments (up to 400 kb); it is constructed from the 
telomeric, centromeric, and replication origin sequences 

■«r needed for replication in yeast cells. Compare cloning vector. 

Vector: See cloning vector. 

108 OOe Human Qenom* Program Raport, Gloasary 


DOE/ER-0713 (Pari 2) 


Part 2, 1996 Research Abstracts 

Date Published: November 1997 

Prepared for the 

U.S. Department of Energy 

Office of Energy Research 

Office of Biological and Environmental Research 

Germantown, MD 20874-1290 

Prepared by the 

Human Genome Management Information System 

Oak Ridge National Laboratory 

Oak Ridge, TN 37830-6480 

managed by 

Lockheed Martin Energy Research Corporation 

for the 

U.S. Department of Energy 

Under Contract DE-AC05-96OR22464 



More than a decade ago. the Office of Health and Environmental Research (OHER) of the U.S. Depart- 
ment of Energy (DOE) stnick a bold course in launching its Human Genome Initiative, convinced that 
its mission would be well served by a comprehensive picture of the human genome. Organizers recog- 
nized that the information the project would generate — both technological and genetic — would con- 
tribute not only to a new understanding of human biology and the effects of energy technologies but also to a host of 
practical applications in the biotechnology industry and in the arenas of agriculture and environmental protection. 

Today, the project's value appears beyond doubt as worldwide participation contributes toward the goals of determining 
the human genome's complete sequence by 2005 and elucidating the genome structure of several model organisms as 
well. This report summarizes the content and progress of the DOE Human Genome Program (HGP). Descriptive 
research summaries, along with information on program histor>'. goals, management, and current research highlights, 
provide a comprehensive view of tfie DOE program. 

Last year marked an early transition to the third and fmal phase of the U.S. Human Genome Project as pilot programs to 
refine large-scale sequencing strategies and resources were funded by DOE and the National Institutes of Health, the two 
sponsoring U.S. agencies. The human genome centers at Lawrence Berkeley National Laboratory. Lawrence Livermore 
National Laboratory, and Los Alamos National Laboratory had been serving as the core of DOE multidisciplinary HGP 
research, which requires extensive contributions from biologists, engineers, chemists, computer scientists, and mathema- 
ticians. These team efforts were complemented by those at other DOE-supported laboratories and about 60 universities, 
research organizations, companies, and foreign institutions. Now. to focus DOE's considerable resources on meeting the 
challenges of large-scale sequencing, the sequencing efforts of the three genome centere have been integrated into the 
Joint Genome Institute. The institute will continue to bring together research from other DOE-supported laboratories. 
Work in other critical areas continues to develop the resources and technologies needed for production sequencing; com- 
putational approaches to data management and interpretation (called informatics); and an exploration of the important 
ethical, legal, and social issues arising from use of the generated data, particularly regarding the privacy and confidenti- 
ality of genetic information. 

Insights, technologies, and infrastructiue emerging from the Human Genome Project are catalyzing a biological revolu- 
tion. Health-related biotechnology is already a success story — and is still far from reaching its potential. Other applica- 
tions are likely to beget similar successes in coming decades; among these are several of great importance to DOE. 
We can look to improvements in waste control and an exciting era of environmental bioremediation. we will see new 
approaches to improving energy efficiency, and we can hope for dramatic strides toward meeting the fuel demands of 
the future. 

In 1997 OHER. renamed the Office of Biological and Environmental Research (OBER). is celebrating 50 years of con- 
ducting research to exploit the boundless promise of energy technologies while exploring their consequences to the 
public's health and the environment. The DOE Human Genome Program and a related spin-off project, the Microbial 
Genome Program, are major components of the B iological and Environmental Research Program of OBER. 

EKDE OBER is proud of its contributions to the Human Genome Project and welcomes general or scientific inquiries 
concerning its genome programs. Announcements soliciting research applications appear in Federal Register, Science, 
Human Genome News, and other publications. The deadline for formal applications is generally midsummer for awards 
to be made the next year, and submission of prcproposals in areas of potential interest is strongly encouraged Further 
information may be obtained by contacting the program office or visiting the DOE home page (301/903-6488, 
Fax: SSZl,, URL: 

^stideVjEA^ft©*! Associate Dir^Q^ 
Office of Biological and Environmental Research 
U.S. Department of Energy 
Novembers. 1997 




he research abstracts in this section were funded in FY 1996 by the DOE Office of Health and Environ- 
mental Research, which was renamed Office of Biological and Environmental Research in 1997. 

These unedited abstracts were contributed by DOE Human Genome Program grantees and contractors. 
Names of principal investigators are in bold print Submitted in 1996. contact information is for the fu^t person named 
unless another investigator is designated as contact person. Principal investigators of research projects described by 
abstracts in this section are listed under their respective subject categories, and an index of all investigators named in 
the abstracts is given at the end of this report 

Part I of this report contains narratives that represent DOE Human Genome Program research in large, multidisci- 
plinary projects. As a convenience to the reader, these narratives are reprinted (without graphics) as an appendix to this 
volume. Part 2. The projects represent woric at the Joint Genome Institute (p. 72), Lawrence Livermore National Labo- 
ratory Human Genome Center (p. 73), Los Alamos National Laboratory Center for Human Genome Studies (p. 77). 
Lawrence Berkeley National Laboratory Human Genome Center {p. 8 1 ), University of Washington Genome Center 
(p. 85). Genome Database (p. 87). and National Center for Genome Resources (p. 9 1 ). Only the contact persons for 
these organizations are Usted in the Index to Principal and Coin vestiga tors. More information on research carried out in 
these projects can be found on their listed Web sites. 



1996 Research Abstracts 1 

Sequencing 1 

Mapping 19 

Informatics 33 

Ethical, Legal, and Social Issues 45 

Infrastructure 59 

Small Business Innovative Research 63 

Projects Completed FY 1994-95 67 

Appendix: Narratives from Large, Multidisciplinary Research Projects 71 

(Text reprinted from Human Genome Program Report: Part 1, Overview and Progress) 

Index to Principal and Coinvestigators 93 

Acronym List Inside back cover 



1996 Research Abstracts 

Project Categories and Principal Investigators 
Sequencing i 

W.H. Benner and J.M. Jaklevic 1 

Chung-Hsuan Chen . .._..._. _..»..._ _ — 1 

George Church _ _ _ — - •• 2 

Pieter de Jong _ _.. _ - - „...™.™ 2 

Norman Dovichi ~ - ~..." 3 

John J. Dunn and F. William Studier _..._...... — _ - «...- 3 

John J. Dunn and F. William Studier _ - - - 4 

Glen A. Evans _ ~ 4 

Glen A. Evans - 5 

•V.L. Florentiev » ~ - - 5 

Raymond F. Gesteland — _...„.............»....- _..._ — ~....- —. -. 6 

Joe Gray and Daniel Pinkel ^...«...„........„... ~ 7 

Trevor Hawkins ~ - 8 

Leroy Hood, Mark D. Adams, and Melvin Simon - 8 

Barry L. Karger - - -—9 

Richard A. Mathies and Alexander N. Glazer ._ _ - ~ ~ 9 

Andrei Mirzabekov „ ~ ~ ~ - -...10 

Robert K. Moyzis _ - — ~ 12 

Barbara Ramsay Shaw .._ ~.... - 13 

Lloyd M. Smith and Richard A. Guilfoyle .._ 13 

Lloyd M.Smith „ » -14 

Richard D. Smith ...„ „ _ ~ 14 

Richard D. Smith _ ~ ~ -15 

Stanley Tabor - 16 

Levy Ulanovsky - 16 

Peter Williams 17 

Edward S. Yeung - ~ 17 

Mapping 19 

David Allison and Bruce Warmack 19 

* Alexandres. Boitsov ™.™ 19 

Charles R. Cantor 19 

Pieter J. de Jong 20 

Jacques R. Fresco 21 

Fa-Ten Kao.- 21 

•V.L.Karpov - 22 

•Russian scientists designated by an astciisk received small emergency grants following December 1 992 site reviews by David Galas (formerly DOE 
Office of Health and Environmental Research, which was renamed Office of Biological and Environmental Research in 1997), Raymond Gesteland 
(University of Utah), and Elbert Bran-scomb (Lawrence Livermore National Laboratory). 



Julk R. Korenbcrg — . — . . ._. ._.„_..._.„.„.._..._..._...„.. . 22 

D. L. Nebon 23 

*Olga Podgomaya _..»..-..»...-....-.»-...-...-.„-„».«-..«...-..._....».»...-...-.._„ . — . — ™. 24 

*O.L. PoUmovsky 25 

*A.I. PoleUev 26 

Mdvin 1. Simon 26 

Melvin I. Simon .-.27 

Marccio Bento Soares „.._..._..„. _..„ ._..._ . . 27 

LisaStubbs 2» 

LisaStubbs 29 

Jean-Michel H. Vo6 — 30 

•N.K. Yanliovsliy 30 

Informatics _ 33 

Daniel Davison _...-..._...-...._...-..._...»..-..._...-...-...-...-.. . «..-...—...-„-...-...-....-.. — ......33 

Nathan Goodman . ..._.. — . . ~ — _. . 33 

Marl( Graves ....„...-...-...-... .._... . - — ..._...-...-........_ -.-...~. — ... — . — 34 

David Haussler 34 

Jerzy Jurka ~ _ -...- — -34 

*N.A. Kolchanov _ - 35 

Victor M. Marl(owiU and I-Min A. Chen 36 

TMarr 37 

Gene Myers - ~ .-...-...-...- - ...- 38 

Ross Overlieeic _ - -...-...-..-...-...-...-. 38 

G. Christian Overton, Susan B. Davidson, and Peter Buneman -...-. ._...-....«...„...-...- -.39 

Pavel A. Pevzner - -. — - - 40 

David B. Seaiis - - 41 

DavidJ. SUtes - - - - - - 41 

Edward C. Uberbacher _ - - - 42 

Edward Uberbacher - - - 44 

Manfred D.Zom - _ - 44 

Ethical, Legal, and Social Issues 45 

Charies C. Carlson - — - 45 

Graham Chedd and Noel Schwerin — ..- - - - - - — 45 

Debra L. Collins and R. Neil Schimke _ 45 

Lane Conn - -.—...-...-...-... — . ...-...-...-...„..-...- - 46 

Jeff Davidson and Laurence Weinberger .- - -.— — 47 

Sharon Davis - - _ _ -.. 47 

Troy Duster - - - -. - - -48 

Rebeccas. Eisenberg — 48 

Stephen Goodman - - - - - 49 

Margaret C. JefTerson and Mary Ann Sesma —.-...-...-...— ._...-...-...-...-...—.-... .- 50 

Mary B. Mahowald ...- - 50 

Joseph D. Mclnemey and B. Ellen Friedman - -...51 


Joseph D. Mclnerney, Lynda B. Micikas — 52 

Maureen M. Munn, Maynard V. Olson, and Leroy Hood ™........ — ~ ~ — 52 

Philip J. Reilly, Dorothy C. Wertz, and Robin J.R. Blatt ~ 53 

Ban Scott - - - —  - - 53 

Maria Sosa «...» - _..._........-.. — ........... ....„..._ ~ -54 

Sylvia J. Spengler „........_ ..«...-...„ -...„..._ « 54 

Sylvia Spengler and Janice Mann - ~ - 55 

Sara L. Tobin and Ann Boughton ~.. - ~...55 

Franklin M. Zweig - - — - " — 56 

Infrastructure 59 

Linda Holmes and Eugene Speje wski - "— — 59 

Betty K. Mansfield and John S. Wassora ~ • — — 59 

Sylvia J. Spengler ~ - — - ** 

Walter Williams •• - - ^1 

James Wright - - - - - — •*! 

Small Business Innovation Research 63 

MarkW. Knuth -. 63 

Gualberto Ruano ..».. .»..._..._ — ..- — _...„..._...-...-..._..._...-....- ..-. -...-... — ..-63 

Joseph Leone —. — — "* 

WUIiam P. MacConnell ..- - ...- ^ 

Ruth Ann Manning - -...- — ............. — ........._...-...-...-...-...-— -...- ._..._....-...-. 64 



tual mass data could be determined. To address this prob- 
lem, we are developing a detector that will simultaneously 
measure the charge and velocity of individual ions. We 
have been able to mass analyze DNA molecules in the I to 
10 MDa range using chaise-detection mass spectrometry. 
In this technique, individual electrospray ions are directed 
to fly through a metal tube which detects their image 
charge. Simultaneous measurement of their velocity pro- 
vides a way to measure their mass when ions of known 
energy are sampled. Several thousand ions can be ana- 
lyzed in a few minutes, thus generating statistically sig- 
nificant mass values regarding the ions in a sample popu- 
lation. We are attempting to apply this technology to the 
analysis of PCR products. 

DOE Contract No. DE-AC03-76SF00098. 

Mass Spectrometer for Human 
Genome Sequencing 

Chung-Hsuan Chen, Steve L. Allman. and K. Bruce 


Oak Ridge National Laboratory, Oak Ridge. TN 37831 

423/574-5895, Fax: -2115, 

The objective of this program is to develop an innovative 
fast DNA sequencing technology for the Human Genome 
Project. It can also be applied to fast screening of genetic 
and contagious diseases. DNA fingerprinting, and envi- 
ronmental impact analysis. 

The approach of this program is to replace conventional 
gel electrophoresis sequencing methods by using lasers 
and mass spectrometry for sequencing. The present gel 
sequencing method usually takes hours to days to acquire 
DNA analysis or sequencing, since different lengths of 
DNA segments need to be separated in dense gel. With 
laser desorption mass spectrometry fl,DMS) approach, 
various sizes of DNA segments are separated in the 
vacuum chamber of a mass spectrometer. Thus, the time 
taken to separate various sizes of DNA is less than one 
second compared to hours using other methods. 

Recently, we successfully demonstrated sequencing short 
DNA segments with this approach. We also have suc- 
ceeded in using LDMS for fast screening of cystic fibrosis 
disease. We succeeded in identifying both point mutation 
and deletion of cystic fibrosis. In addition, we had pre- 
liminary success in using LDMS to achieve DNA finger- 
printing. Thus, laser desorption mass spectrometry 
(LDMS) is going to emerge as a new and important bio- 
technological tool for DNA analysis. 

DOE Contract No. DE-AC05-84OR21400. 

•ProieclsdcMfMled by an aaeri* received nnall enwgoKy gra*. following December 1992 «e review, by David Gala.- (fonnerly DOE Office of 
H^Sr^dlTSHal Re^arch. which wa., renalS Office of Biolopd ^ Envi,onn«.ul Re^arch in 1997), Raymond Ge«ela». (U^venHy 
of Utah), and Elbcit BranKorab (Lawrence Livermore National Laboraory). 

Advanced Detectors for Mass 

W.H. Benner and J.M. Jaklevic 

Human Genome Group; Engineering Science Department; 
Lawrence Berkeley National Laboratory; University of 
CaUfomia; Berkeley, CA 94720 
510/486-7194, Fax; -5857. 

Mass spectrometry is an instrumental method capable of 
producing rapid analyses with high mass accuracy. When 
applied to genome research, it is an attractive alternative to 
gel electrophoresis. At present, routine DNA analysis by 
mass spectrometry is seriously constrained to small DNA 
fragments. Contrasted to other mass spectrometry facilities 
in which the development of ladder sequencing is empha- 
sized, we are exploring the application of mass spectrom- 
etry to procedures that identify short sequences. This ap- 
proach helps the molecular biologists associated with 
LBL's Human Genome Center to identify redundant se 
quences and vector contamination in clones rapidly, 
thereby improving sequencing efficiency. We are also at- 
tempting to implement a rapid mass spectrometry-based 
screening procedure for PCR products. 

The implementation of these applications requires that the 
performance of matrix-assisted-laser-desorption-ionization 
(MALDl) and electrospray mass spectrometry is im- 
proved. Our focus is the development of new ion detectors 
which will advance the state-of-the-art of each of these 
two types of spectrometers. One of the limitations for ap- 
plying nMSS spectrometry to DNA analysis relates to the 
poor efficiency with which conventional electron multipli- 
ers detect large ions, a problem most apparent in 
MALDI-TOF-MS. To solve this problem, we are develop 
ing alternative detection schemes which rely on heat pulse 
detection. The kinetic energy of impacting ions is con- 
verted into heat when ions strike a detector and we are at- 
tempting to measure indirectly such heat pulses. We are 
developing a type of cryogenic detector called a supercon- 
ducting tunnel junction device which responds to the 
phonons produced when ions strike the detector. This de- 
tector does not rely on the formation of secondary elec- 
trons. We have demonstrated this type of detector to be at 
least two orders of magnitude more sensitive, on an 
area- normalized basis, than microchannel plate ion detec- 
tors. This development could extend the upper mass limit 
of MALDI-TOF-MS and increase sensitivity. 

Electrospray ion sources generate ions of mega-Dalton 
DNA with minimal fragmentation, but the mass spectro- 
metric analyses of these Urge ions usually leads only to a 
mass-to-charge distribution. If ion charge was known, ac- 

DOE Human Oanom* Program Raport, Part 2. 1M« Raaaarcti Abalracta 



Genomic Sequence Comparisons 

George Church 

Harvard Medical School; Boston, MA 02115 
617/432-0503 or -7562, Fax: -7266 

The first objective of this project is completion of an auto- 
mated system to sequence DNA using electrophore 
mass-tag (EMT) primers for dideoxy sequencing. The pro- 
totype machine will contain a 60 capillary array with 400 
EMTlabeled sequence ladders per capillary. The system is 
designed to use 100-fold less reagent and have 500-fold 
higher speed (1000 bases per sec per instrument) than cur- 
rent sequencing technology. Cleavage and laser desorption 
of EMTs from membranes for subsequent detection by 
ECTOF mass spectrometry. The second objective is to 
overcome the limitations of purely hypothetical annotation 
of the growing number of reading frames in new genome 
sequences. We measure gene product levels and interac- 
tions using DNA microarrays, whole genome in vivo 
footprinting and crosslinking. 

Our approach involves system integration of instrumenta- 
tion, organic chemistry, molecular biology, electrophoresis 
and software to the task of increasing sequencing accuracy 
and efficiency. Likewise we integrate such instruments and 
others with the needs of acquiring and annotation of 
large-scale microbial and human genomic sequence and 
population polymorphisms. 

To establish functions for new genes, we use large scale 
phenotyping by multiplexed growth competition assays, 
both by targeted deletion and by saturation insertional mu- 
tagenesis. We will continue to develop a system to se- 
quence DNA using electrophore mass-tags (EMTs). We 
will establish genome-scale experimental methods for se- 
quence annotation. 

The most significant findings in 1995-1996 were 1) Dem- 
onstration of use of electrophore mass-tags in dideoxy se- 
quencing. 2) Development of IR-laser desorption method 
and model. 3) A novel dsDNA microarray synthesis strat- 
egy. 4) A new amplifiable differential display for 
whole-genome in vivo DNA-protein interactions. 5) Estab- 
lishment and application of a microbial DNA-protein inter- 
action database. 

DOE Grant No. DE-FG02-87ER60565. 

A PAC/BAC End-Sequence Data 
Resource for Sequencing the Human 
Genome: A 2- Year Pilot Study 

Pieter de Jong 

Roswell Park Cancer Institute; Buffalo, NY 14263 
716/845-3168, Fax: -%%^9, 

Large scale sequencing of the Human genome requires the 
availability of high-fidelity clones with large genomic in- 
serts and a mechanism to find clones with minimal over- 
laps within the clone collections. The first need can be sat- 
isfied with bacterial artificial chromosome libraries (PACs 
and BACs) which already exist and further such libraries 
now being developed. However, a cost-effective way for 
establishing high-resolution contig maps for the human 
genome has not yet been established. Recently, a new ap- 
proach for virtual screening for overlapping clones has 
been proposed by several research groups and has been 
discussed eloquently in a manuscript by Venter et al., 1996 
(Nature). We will implement this approach for use with 
our human PAC and BAC libraries and use the fu^t year as 
a pilot stage. The goal of the one year pilot is to prove the 
feasibility of large scale end sequencing and to demon- 
strate usefulness. 

The first goal will be met by sequencing the ends for 
40,000 clones from our existing PAC library and from 
BAC libraries currently being developed under NIH fund- 
ing within our laboratory. The end-sequencing will be 
based on our new DOP-vector PCR procedure (Chen et al, 
1996, Nucleic Acids Research 24, 2614-2616). All se- 
quence data will be made available through public data- 
bases (GSDB, GDB, Genbank) and will also become 
BLAST searchable through the UTSW WWW site from 
our collaborator. Glen Evans. In view of our current 
under-developed informatics structure, we do not expect to 
provide BLAST search access through our own web site 
during the pilot phase. 

To prove the usefulness of available end sequences, we 
will prepare a chromosome I4-enriched clone collection 
from our current 20-fold deep PAC library. To detect the 
chromosome 14 clones, we will use as hybridization 
probes a set of 1 ,000 mapped STS markers available from 
Paul Dear (MRC, Cambridge, UK), the about 600 markers 
present in the Whitehead map and the in situ mapped BAC 
and PAC clones available from Julie Korenberg. We will 
hybridize with these existing markers in probe pools, spe- 
cific for regions of chromosome 14. Thus we will isolate 
region-enriched PAC clone collections. 

Assuming that the clone collections will be at least 
50%-specific for chromosome 14 (50% false positives) 
and will include most of the chromosome 14 PACs from 
our library, a collection of about 35,000 clones is expected. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



Hence, the bulk of the end sequences obtained during the 
first year will be derived from the chromosome 14 en- 
riched set and should result in a sequence ready clone col- 
lection covering about 100 Mbp of the human genome. 
The purity of the chromosome 14 PAC collection will be 
characterized in a number of different ways, including test- 
ing with independent markers not used as probes and by 
FISH analysis of a representative set of PAC clones. To 
test the usefulness of the end sequence resource, the 
Sanger Centre will sequence chromosome 14 PACs from 
our collection and identify overlapping clones by virtual 
screening, using our end-sequence database. 

If overlapping clones can not be found with the expected 
level of redundancy in the end-sequence database, we will 
screen the original PAC library with probes or STS mark- 
ers derived from the sequenced PAC clones. 

Subcontract under Glen Evans' DOE Grant No. DE-FC03- 


Multiple-Column Capillary Gel 

Norman Oovichi 

Department of Chemistry; University of Alberta; 
Edmonton, Alberta, Canada T6G 2G2 
403/492-2845, Fax: -S23\. 
hnp://hobbes. chem. 

The objective of this project is to develop high-throughput 
DNA sequencing instrumentation. A two-dimensional ar- 
rayed capillary electrophoresis instrument is under devel- 

We have developed multiple capillary DNA sequencers. 
These instruments have several important attributes. First, 
by operation at electric fields greater than 100 V/cm, we 
are able to separate DNA sequencing fragments rapidly 
and efficiently. Second, the separation is performed with 
3%T 0%C polyacrylamide. This low viscosity, 
non-crosslinked matrix can be pumped from the capillary 
and replaced with fresh material when required. Third, we 
operate the capillary at elevated temperature. High tem- 
perature operation eliminates compressions, speeds the 
separation, and increases the read length. Fourth, our fluo- 
rescence detection cuvette is manufactured locally by 
means of microlithography technology. These detection 
cuvettes provide robust and precise alignment of the opti- 
cal system. Currently, 5, 16, and 90 capillary instniments 
are in operation in our lab; 32 and 576 capillary devices 
are under development. Fourth, we use both avalanche 
photodiode photodetectors and CCD cameras for high sen- 
sitivity detection. We have obtained detection limits of 120 
fluorescein molecules injected onto the capillaries. High 
sensitivity is important in detecting the low concentration 
fragments generated in long sequencing reads. This combi- 

nation of low concentration acrylamide, high temperature 
operation, and high sensitivity detection allows separation 
of fragments over 800 bases in length in 90 minutes. 

DOE Grant No. DE-FG02-91ER6I 123. 

DNA Sequencing with Primer Libraries 

John J. Dunn, Laura-Li Butler-Lofifredo, and F. William 

Biology Department; Brookhaven National Laboratory; 

Upton. NY 1 1 973 

516/344-3012, Fax: -3407, 


Primer walking using oligonucleotides selected from a li- 
brary is an attractive strategy for large-scale DNA se- 
quencing. Strings of three adjacent hexamers can prime 
DNA sequencing reactions specifically and efficiently 
when the template is saturated with a single stranded 
DNA-binding protein (1), and a library of all 4,096 
hexamers is manageable. We would like to be able to se- 
quence directly on 35-kbp fesmid templates, but the signal 
from a single round of synthesis is relatively weak and 
triple-hexamer priming has not yet been adapted for cycle 
sequencing. We reasoned that a hexamer library might be 
used for cycle sequencing if combinations of hexamers 
could be selectively ligated by using other hexamers as the 
template for alignment. In this way, the longer primers 
needed for cycle sequencing could be generated easily and 
economically without the need for complex machines for 
de novo synthesis. 

We found that ordered ligation of 3 hexamers to form an 
18-mer occurs readily on a template of the 3 complemen- 
tary hexamers (offset by three base pairs) that can base 
pair unambiguously to form a double-stranded complex of 
indefinite length (2). Each hexamer forms three comple- 
mentary base pairs with two other hexamers, generating 
complementary chains of contiguous hexamers with strand 
breaks staggered by three bases. Two adjacent hexamers in 
the chain to be ligated contain 5' phosphate groups and the 
others are unphosphorylated. Both T4 and T7 DNA ligase 
can ligate the phosphorylated hexamers to their neighbors 
in such a complex at hexamer concentrations in the 50-100 
M range, producing an 18-mer and leaving three unphos- 
phorylated hexamers. The products of these ligation reac- 
tions can be used directly for fluorescent cycle sequencing 
of 35-kbp templates. 

Unambiguous ligation requires that alternative complexes 
with perfect base pairing not be possible with the combina- 
tion of hexamers used. Since the combination of hexamers 
is dictated by the sequence of the desired ligation product, 
some oligonucleotides cannot be produced unambiguously 
by this method. However, 82.5% of all possible 18-mers 
could potentially be generated starting with a library of all 

DOE Human Genome Program Report, Part 2, 1996 Raaaarch Abatracta 



4096 hexamers, more than adequate for high throughput 
DNA sequencing by primer walking. 

DOE Grant No. DE-AC02-76CH00016. 

(1 ) Kiclcczawa. J., Dunn, J. J . and Sludicr. F W DNA sequencing by 
primer walking with strings of contiguous hexamers. Science, 258, 
1787-1791 (1M2). 

(2) Dunn. J. J., Buller-Loffredo, L. and Studier, F. W. Ligation of 
hexamers on hexamcr templates to produce primers for cycle 
sequencing or the polymerase chain reaction. Anal.Biochem. 228, 

Rapid Preparation of DNA for 
Automated Sequencing 

John J. Dunn. Matthew Randesi, and F. William Studier 

Biology Department; Brookhaven National Laboratory; 

Upton, NY 11973 

516/344-3012, Fax: -3407, 

We have developed a vector, referred to as a fesmid, for 
making libraries of approximately 35-kbp DNAs for map- 
ping and sequencing. The high efficiency lambda packag- 
ing system is used to generate libraries of clones. These 
clones are propagated at very low copy number under con- 
trol of the replication and partitioning functions of the F 
factor, which helps to stabilize potentially toxic clones. A 
PI lytic replicon under control of the lac repressor allows 
amplification simply by adding IPTG. The cloned DNA 
fragment is flanked by packaging signals for bacteriophage 
T7, and infection with an appropriate T7 mutant packages 
the cloned sequence into T7 phage particles, leaving most 
of the vector sequence behind. The size of the vector por- 
tion is such that genomic fragments packageable in lambda 
(normal capacity 48.5 kbp) should also be packaged in T7 
(normal capacity 40 kbp). 

We have made fesmid libraries of several bacterial DNAs, 
including Borrelia burgdorferi (the cause of Lyme disease), 
Bartonella henselae (the cause of cat scratch fever), E. 
coli, B.subtilis, H. influenzae, and S. pneumoniae, some of 
which have been reported to be difficult to clone in cosraid 
vectors. Human DNA is also readily cloned in these vec- 
tors. Brief amplification followed by infection with a gene 
3 and 17.5 double mutant of T7, which is defective in rep- 
licating its own DNA, produces lysates in which essen- 
tially all of the phage particles contain the cloned DNA 
fragment. Simple techniques yield high-quality DNA from 
these phage particles. Primers for direct sequencing firom 
the ends of fesmid clones have been made. 

Primer walking from the ends of fesmid clones could be an 
efficient way to sequence bacterial genomes, YACs, or 
other large DNAs without the need for prior mapping of 
clones. The ends of fesmids from a random library provide 

multiple sites to initiate primer walking. Merging of the 
elongating sequences from different clones will simulta- 
neously generate the sequence of the original DNA and 
determine the order of the clones. The packaged fesmid 
DNAs are a convenient size for multiple restriction analy- 
ses to confuTO the accuracy of the nucleotide sequence. 

DOE Grant No. DE-AC02-76CH00016. 

A PAC/BAC End-Sequence Database 
for Human Genomic Sequencing 

Glen A. Evans, Dave Burbee, Chris Davies, Trey Fondon, 

Tammy Oliver, Terry Franklin, Lisa Hahner, Shane Probst, 

and Harold R. (Skip) Gamer 

Genome Science and Technology Center and McDermott 

Center for Human Growth and Development; University 

of Texas Southwestern Medical Center at Dallas; Dallas, 

TX 75235-8591 

214/648-1660, Fax: -1666, gevan!! 

While current plans call for completing the human genome 
sequence in 2003, major obstacles remain in achieving the 
speed and efficiency necessary to complete the task of 
mapping and sequencing. As an approach to this problem, 
we proposed a novel approach to large scale construction 
of sequence-ready physical clone maps of the human ge- 
nome utiUzing end-specific sequence sampling. An earlier 
pilot project was initially carried out to develop a GSS (ge- 
nomic sequence sampled) map of human chromosome 11 
by sequencing the ends of 17,952 chromosome 11 specific 
cosmids. This chromosome 11 -specific end-sequence data- 
base allows rapid and sensitive detection of clone overlaps 
for chromosome 11 -sequencing. 

In this project, we propose to evaluate the utility of PAC 
and BAC end-sequences representing the entire human 
genome as a tool for complete, high accuracy mapping and 
sequencing. In this approach, we utilized total genomic 
PAC/BAC libraries (constructed by P de Jong, RPCI), fol- 
lowed by end-sequencing of both ends of each clone in the 
library and limited regional mapping of a subset of clones 
as sequencing nucleation points by FISH (Fluorescence in 
situ hybridization). 

To initiate regional analysis, a single clone would be se- 
quenced by shotgun or primer directed sequencing, the 
entire sequence used to search the end-database for over- 
lapping clones, and the minimal overlapping clones for 
extending the sequence selected. This approach would al- 
low rational and efficient simultaneous mapping and se- 
quencing, as well as expediting the coordination and ex- 
change of information between large and small groups par- 
ticipating in the human genome project 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



In this pilot project proposal we are carrying out auto- 
mated end-sequencing of approximately 40,000 PAC and 
BAC clones representing the entire human genome, as 
well as about 500 PAC clones localized to human chromo- 
somes 1 1 and 15. The clones and resulting end-sequence 
data base will be utilized to 1) nucleate regions of interest 
for large scale sequencing concentrating on regions of 
chromosome 11 and 15, 2) correspond with regions 
mapped by other methods to confirm the mapping accu- 
racy and 3) used to evaluate the use of random clone end 
sequence libraries. DNA sequencing is being carried out in 
an entirely automated fashion using a Beckman/Sagian 
robotic system, ABI 377 automated sequencers and auto- 
mated sequence data processing, annotation and publica- 
tion using a Hewlett Packard/Convex superparallel com- 
puter located at the UTSW genome center. FISH analysis 
of a sample of PAC clones has been carried out and de- 
fines the potential chimera rate in existing PAC libraries as 
less than 1 .2%. This effort will be coordinated with efforts 
of other groups carrying out PAC and BAC library con- 
struction, PAC and BAC end-sequencing and FISH analy- 
sis to avoid duplication of effort and provide a comprehen- 
sive end-sequence library and data set for use by the inter- 
national human genome sequencing effort. 

DOE Grant No. DE-FC03-96ER62294. 

Automated DNA Sequencing by 
Parallel Primer Walking 

Glen A. Evans, Dave Burbee, Chris Davies, Jeff 

Schageman, Shane Probst, Terry Franklin, Ken Kupfer, 

and Harold R. (Skip) Gamer 

Genome Science and Technology Center and McDermott 

Center for Human Growth and Development; University 

of Texas Soufliwestem Medical Center at Dallas; Dallas, 

TX 75235-8591 

214/648-1660, Fax; -1666, 


The development of efficient mapping approaches coupled 
with high throughput, automated DNA sequencing remains 
one of the key challenges of the Human Genome Project. 
Over the past few years, a number of strategies to expedite 
clone-by-clone DNA sequencing have been developed in- 
cluding efficient shotgun sequencing, sequencing of nested 
deletions, and transposon-mediated primer insertion. We 
have developed a novel sequencing strategy applicable to 
high throughput, large scale genomic analysis based upon 
DNA sequencing directly primed on of cosmid templates 
using custom-designed, automatically synthesized oligo- 
nucleotide primers. This approach of directed primer 
"walking" would allow the number of sequencing reac- 
tions and the efficiency of sequencing to be vastly im- 
proved over traditional shotgun sequencing. 

Custom primer design has been carried out using software 
we developed for prediction of "walking" primers directly 
from the output of AB 1377 automated DNA sequencers, 
and the output used to automatically program synthesis of 
the custom primers using 96 or 192 channel oligonucle- 
otide synthesizers constructed at UTSW. Automated opera- 
tion of the sequencing system is thus possible where re- 
sults of each sequencing reaction is used to predict, syn- 
thesize, and carry out appropriate extension reactions for 
downstream "walking". A automated prototype system has 
been assembled where dye terminator DNA sequencing 
can be carried out from 96 cosmid templates simulta- 
neously followed by prediction of oligonucleotide "walk- 
ing" primers for extending the sequence of each fragment, 
and programming an attached 96-channel oligonucleotide 
synthesizer to initiate a second round of sequencing. Using 
a set of nested cosmids covering 800 kb at 5X redundancy, 
primer directed sequencing should allow completion of 
800 kb of finished, high accuracy DNA sequence in 8 to 
16 cycles. Furthermore, coupling of automated DNA se- 
quencing instrumentation to DNA sequence analysis pro- 
grams and multichannel oligonucleotide synthesizers will 
allow almost complete automation of sequencing process 
and the development of instrumentation for completely 
unattended DNA sequencing. 

DOE Grant No. DE-FG03-95ER62055. 

♦Parallel Triplex Formation as Possible 
Approach for Suppression of 
DNA-Viruses Reproduction 

V.L. Florentiev, A.K. Shchyolkina, I.A. Il'icheva, E.N. 
Tunofeev, and S. Yu Tsybenko 
Engelhardt Institute of Molecular Biology; Russian 
Academy of Sciences; Moscow 1 17984, Russia 
Fax: -H7-095/135-1405, /7or@imi.( 

It is well known that homopurine or homopyrimidine 
single stranded oligonucleotides can bind to 
homopurine-homopyrimidine sequences of two-stranded 
DNA to form stable three-stranded helices. In such tri- 
plexes two identical strands have antiparallel orientation. 
We denote these triplexes as "antiparallel" or "classical" 

A particular interest of investigators to triplexes has arisen 
due to an elegant idea of using triplexes as 
sequence-specific tools for purposeful influence on DNA 
duplexes. Triplex forming oligonucleotides were shown to 
be potentially useful as regulators of gene expression and 
subsequently as therapeutical (antiviral) agents. 

A significant limitation to the practical application of anti- 
parallel triplex is the requirement for homopurine tracts in 
target DNA sequences. Numerous investigations sUghtly 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



expanded the repertoire of triple-forming sequences but 
did not completely remove this limitation. 

It was recently shown that during homologous recombina- 
tion promoted by RecA a triple-stranded DNA intermedi- 
ate was formed. Such a structure is a new form of the triple 
helix. In sharp contrast with the "classical" triplexes their 
third strand is parallel to the identical strand of the 
Watson-Crick duplex. We denote this structure as "paral- 
lel" triplex. Recently, the parallel triplex was obtained only 
by deproteinization of joint molecules generated by recom- 
bination proteins. 

We first obtained experimental (chemical probe, melting 
curves and fluorescence due binding) results that provide 
convincingly evidence for protein-independent formation 
of parallel triplex [ 1 ] and than confirmed this fact by FTIR 
data [2]. Because the parallel triplex can be formed for any 
sequence, it might be "ideal" potential tool for sequence 
specific recognition of DNA. Unfortunately, low stability 
of parallel triplexes prohibits practical application of these 

Earlier we found that propidium iodide stabilizes selec- 
tively the parallel triplexes [3]. This fact was the basis of 
new approach to stabilization of parallel triplexes being 
developed by us now. The approach consists in use of tar- 
geting oligonucleotide, which contains in intemucleotide 
linkage the alkyl insert coupled with intercalated ligand 
through linker Length of linker was chosen to allow 
ligand to intercalate in the same stacking-contact (length 
of linker was picked by molecular dynamic calculations). 

Preliminary study showed that presence of intercalating 
inserts increase considerably stability of DNA duplexes 
[4]. Now we are investigating in detail effect of such 
modification of targeting oligonucleotides on stability of 
parallel triplexes. 

DOE Grant No. OR00033-93CIS(X)5. 


I Shchyolkina. A K.. Timofeev. E. N.. Borisova, O. F . Ilichcva, LA.. 
Minyal. E. E.. KhomyaJtova. E. B. and Florenliev. V L. ( 1 994) The 
R-form DNA does exisl. FEBS Letters, 339. 113-118. 

2. Dagneaux. C. Gousset. H.. Shchyolkina, A. K.. Ouali. M.. Lcltclier. 

R . Liquier, J . Florenliev, V. L. and Taillandcr, E. ( 1 996) Parallel and 
antipandlel AA-T intramolecular triple helices. Nucleic Acids Res., 

3. Borisova, O. F., Shchyolkina, A. K., Timofeev, E. N., Tsybenko, S. Yu., 

Mirzabekov, A. and Borcnticv, V. L. (1995) Stabilization of parallel 
triplex with propidium iodide. J. Biomol. Struct. Dynam.. 13, 15-27. 

4. Timofeev, EN., Smimov I P . Haff. L. A.. Tishchenko, E. L. 

Miizabekov, A. D and Florenliev, V L. (1996) Methidium 
intercalator inserted into synthetic oligonucleotides. Tetrahedron 
Utt., 37. 8467-8470. 

Advanced Automated Sequencing 
Technology: Fluorescent Detection for 
Multiplex DNA Sequencing 

Andy Marks, Tony Schurtz, F. Mark Ferguson, Leonard 
Di Sera, Alvin Kimbail, Diane Dunn, Doug Adamson. Pe- 
ter Cartwright, Robert B. Weiss,' and Raymond F. 

Department of Human Genetics and 'Howard Hughes 
Medical Institute; University of Utah; Salt Lake City, 
UT 84112 

Gesteland: 801/581-5190, Fax: /585-3910 

Automation of a large-scale sequencing process based on 
instrumentation for automated DNA hybridization and de- 
tection is a focal point of our research. Recently, we have 
devised a method for amplifying fluorescent light output 
on nylon membranes by using an alkaline phosphatase- 
conjugated probe system combined with a fluorogenic al- 
kaline phosphatase substrate [1], The amplified signal al- 
lows sensitive detection of DNA hybrids in the 
sub-femtomole/band range. 

On the basis of this detection chemistry, automated devices 
for detecting DNA on blotted microporous membranes us- 
ing enzyme-linked fluorescence, termed Probe Chambers, 
have been built. The fluorescent signal is collected by a 
CCD camera operating in a Time Delay and Integration 
mode. Concentrated solutions of probes and enzymes are 
stored in Peltier-cooled septa sealed vials and delivered by 
syringe pimips residing in a gantry style pipetting robot. 
Fluorescence excitation is generated by a mercury arc 
lamp acting through a fiber optic "light line". Three 30 x 
63 centimeter sequencing membranes can be simulta- 
neously processed, currently revealing up to 108 lane sets 
per multiplex cycle. A probing cycle is completed approxi- 
mately every eight hours. 

Integration of the Probe Chamber into the production pipe 
line is accompUshed through connections to the laboratory 
data base. A critical component of a high-throughput se- 
quencing laboratory is the software for interfacing to in- 
strumentation and managing work flow. The Informatics 
Group of the Utah Genome Center has designed and 
implemented an innovative system for automating and 
managing laboratory processes. This software allows the 
model of workflow to be easily defined. Given such a 
mtxlel, the system allows the user to direct and track the 
flow of laboratory information. The core of the system is a 
generic, client-server process management engine that al- 
lows users to define new processes without the need for 
custom programming. Based on these definitions, the soft- 
ware will then route information to the next process, track 
the progress of each task, perform any automated opera- 
tions, and provide reports on these processes. To further 
increase the usefulness of our laboratory information sys- 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



tem, we have augmented it with hand-help mobile comput- 
ing devices (Apple Newtons) that link to the database 
through RF networking cards. 

Base calUng software has been developed to support our 
automated, large scale sequencing effort. 1st stage se- 
quence calling identifies putative bands, however, depend- 
ing on the number of reader indel errors (2-6%), merging 
1st stage sequence without the aide of cutoff information 
can be difficult. To improve our base calling we have em- 
ployed Fuzzy Logic to establish confidence metrics. The 
logic produces a confidence metric for each band using 
band height, width, uniqueness, shape, and the gaps to ad- 
jacent bands. The confidence metric is then used to iden- 
tify the largest block of highest quality sequence to be 

DOE Grant No. DE-FG03-94ER61817. 

(11 Chciry. J L., Young. H.. Di Sera. LJ . Ferguson. P.M.. Kimball. AW.. 
Dunn, D.M.. GcsleUnd, R.F. and Weiss, R.B. (1994) Enzyme-linked 
fluorescent detection for automated multiplex DNA sequencing. 
Genomics 20. 68-74. 

Resource for Molecular Cytogenetics 

Donna Albertson, Colin Collins, Joe Gray,' Steven 

Lockett, Daniel Pinkel,' Damir Sudar, Heinz-Ulrich 

Weier, and Manfred Zom 

Lawrence Berkeley National Laboratory; Berkeley, 

CA 94720 and 'University of California; San Francisco, 

CA 94143 

Gray; 415/476-3461, Fax: -illS. 

Pinkel: 415/476-3659, Fax: -821S. 

The purpose of the Resource for Molecular Cytogenetics is 
to develop molecular cytogenetic techniques, instruments 
and reagents needed to facilitate large scale genomic DNA 
sequencing and to assist in identification and functional 
characterization of genes involved in disease susceptibility, 
genesis and progression. This work is closely coordinated 
with the LBNL Human Genome Program and directly sup- 
ports research in the LBNL Life Sciences Division and the 
UCSF Cancer Center. Work currently is in four areas: 
a)Genome analysis technology, b)Probe development and 
physical map assembly, c)Digital imaging microscopy and 
d)Informatics. The Resource acts as a catalyst for research 
in several areas so some support comes from Industry, the 
NIH and NIST. 

Probe development and physical map assembly: The Re- 
source maintains a list of over a thousand publicly available 
probes suitable for molecular cytogenetic studies. These in- 
clude approximately 600 probes each selected by the Re- 
source to contain a known STS or EST. Probes selected by 
the Resource can be requested through our web page. 

The Resource also participates in the development of low 
and high resolution physical maps to facilitate analysis and 
characterization of genetic abnormalities associated with 
human disease. Low resolution mapping panels with 
probes distributed at few megabase intervals have been 
completed this year for chromosomes 1, 2, 3, 7, 8, 10, and 
20. The mapped STSs associated with these probes facili- 
tate movement from low to high resolution physical maps. 
STS content mapping and DNA fingerprinting have been 
applied to develop a high resolution, sequence-ready map 
comprised of BAC and PI clones for the -1Mb region of 
chromosome 20 between WI9227 and D20S902. This re- 
gion is amplified in -10% of human breast cancers. Ap- 
proximately 3(X) kb of this region has been sequenced by 
the LBNL Human Genome Program. 

Quantitative DNA fiber mapping (QDFM) has been devel- 
oped this year to facilitate high resolution analysis of ge- 
nomic overlap between cloned probes. In this approach, 
cloned DNA molecules are uniformly stretched during dry- 
ing by the hydrodynamic action of a receding meniscus. 
The position of specific sequences along the stretched 
DNA molecules is visualized by fluorescence in situ hy- 
bridization (FISH) and measured by digital image analysis. 
QDFM has been used to map gamma alpha transposons, 
plasmid or cosmid probes along PI molecules, and PI or 
PAC clones along straightened YAC molecules with few 
kilobase resolution. QDFM is now being studied to deter- 
mine its utility in the assembly of minimally overlapping, 
sequence-ready contigs, assessment of the integrity of 
cloned B ACs and mapping of subclones prepared for di- 
rected DNA sequencing along the clone from which they 
were derived. 

Genome analysis technology: The Resource has partici- 
pated in the development of comparative genomic hybrid- 
ization (CGH) as a tool for detection and mapping of 
changes in relative DNA sequence copy number in humans 
and mouse. This year, CGH to arrays of cloned probes 
(CGHa) has been demonstrated. This is advantageous be- 
cause it allow aberrations to be mapped with resolution 
determined by the genomic spacing of probes on the array. 
CGHa also is attractive since it appears to be linear over a 
relative copy number range of at least 104 between the two 
nucleic acid samples being compared. 

The Resource has participated in the development of FISH 
approaches to analysis of relative gene expression in nor- 
mal and aberrant tissues. FISH with cloned or predicted 
expressed sequences, previously developed in C. elegans, 
is now being applied to the assessment of expression of 
human genes. The C. elegans work suggests a throughput 
of several dozen sequences per month. Information from 
this approach will be important in assessment of the func- 
tion of newly discovered genes, including those predicted 
from DNA sequencing. 

(abstract continued) 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



Digital imaging microscopy: The Resource supports work 
in microscopy, image processing and analysis methods 
needed for CGH and CGHa, 3D FISH, tissue analysis, rare 
event detection, multi-color image acquisition, aberration 
scoring for biodosimetry, and analysis of FISH to DNA 
fibers. Developments this year include an improved pack- 
age for CGH and prototype systems for analysis of DNA 
fibers, CGHa arrays and semiautomatic segmentation of 
nuclei in three dimensions. 

Informatics: The Resource maintains a web site at http:// that summarizes information about 
mapped probes. Probes developed by the Resource can be 
requested directly through this page. In addition, the Re- 
source has developed a Web page for exchange of ge- 
nomic, genetic and biologic information between geo- 
graphically disperse collaborators. The page, under pass- 
word control, carries information about physical maps, 
genomic sequence, sequence annotation, and gene expres- 
sion images. 

DOE Contract No. DEAC0376SF(X)098. 

DNA Sample Manipulation and 

Trevor Hawkins 

Center for Genome Research; Whitehead Institute/Massa- 
chusetts Institute of Technology; Cambridge, MA 02 1 39 
617/252-1910, Fax: -1902, 

The objective of this project is to develop a high-tlirough- 
put, fully automated robotic device for the complete auto- 
mation of the sequencing process. We also aim to further 
develop DNA sequencing electrophoresis systems and to 
integrate these devices with our robotics. 

We have built the Sequatron, an integrated, robotic device 
which automates the tasks of DNA purification and setup 
of thermal cycle sequencing reactions. The major compo- 
nent of our system is an articulated CRS 255A robotic arm 
which is track mounted. The deck of the robot contains 
several new or modified XYZ robotic workstations, a 
novel thermal cycler with automated headed lids, carou- 
sels, and custom built plate feeders. 

Biochemically, we have employed our Solid-phase revers- 
ible immobilization (SPRI) technique to isolate and ma- 
nipulate the DNA throughout the process. 

Specifically we have set up the Sequatron to isolate DNA 
from MI 3 phage or crude PCR products using the same 
protocol and procedures. From M13 phage we obtain ap- 
proximately Ig of DNA per well, which is sufficient for 
multiple sequencing reactions. 

The current throughput of the system is 80 microliter plates 
of samples from M 1 3 phage supematants or crude PCR 
products to sequence ready samples every 24 hours. Re- 
cently, new enzymes, new energy transfer primers and higher 
density microtiter plates have opened up possible increases 
to in excess of 25,000 samples per 24 hour period. 

DOE Grant No. DB-FG02-95ER62099. 

Relevant Publication 

DeAngelis. M . Wang. D.. & Hawkins. T. (1995) Nucl. Acids Res 23, 


Construction of a Genome-Wide 
Characterized Clone Resource for 
Genome Sequencing 

Leroy Hood, Mark D. Adams,' and Melvin Simon' 

University of Washington; Seattle, WA 98195-7730 

206/616-5014, Fax: /685-7301, tawny® u.wa.< 

'The Institute for Genomic Research; Rockville, MD 


K^Ufomia Institute of Technology; Pasadena, CA 91125; 

Bacterial artificial chromosomes (BACs) represent the 
state of the art cloning system for human DNA because of 
their stability and ease of manipulation. Venter, Smith and 
Hood (Nature 381:364-366, 1996) have proposed a strat- 
egy based on the use of sequences from the ends of all 
clones in a deep coverage SAC library to produce a 
sequence-ready set of clones for the human genome. We 
propose to demonstrate the effectiveness of this strategy by 
performing a directed test, initially on chromosomes 16 
and 22, and continuing on to chromosome 1 . All available 
markers on chromosome 16 (including the large number of 
soon-to-be-available radiation hybrid markers) will be 
used to screen the existing 8x BAC library at CalTech. 
This will serve to evaluate the quality of the library in 
terms of representation of broad chromosomal regions. A 
similar procedure will be used for chromosome 22, except 
that the existing BAC map will be used to select more 
evenly spaced markers for screening, including use of 
end-sequence markers from the current chromosome 22 
BAC map constructed in the Simon lab. Each identified 
clone will be rearrayed from the library and end se- 
quenced. This information will dovetail nicely with ongo- 
ing sequencing projects at TIGR and the Sanger Centre, 
which will in turn provide additional information on the 
average degree of BAC overlap detectable by this method, 
the degree of interference with genome-wide repeats, and 
the appropriate use of fingerprinting as an early or late ad- 
dition to the end-sequencing information. In addition, we 
will develop and implement cost-effective, 
high-throughput methods of preparing and end-sequencing 
BAC DNA that are suitable for scaling to characterization 

DOE Human Ganome Program Report, Part 2, 1996 Research Abstracts 


of the full 400,000 clones necessary for characterization of 
a 15x human BAC library. 

E)OE Grant No. DE-FC03-96ER62299. 

DNA Sequencing Using Capillary 

Barry L. Karger 

Bamett Institute; Northeastern University; Boston, MA 


617/373-2867 or -2868, Fax: -2855 


During the past year, we have made major progress in the 
design of a replaceable polymer matrix for DNA sequenc- 
ing and the development of the first generation multiple 
capillary array of 12 capillaries. We also implemented 
ultrafast separation of dsDNA (e.g. 30 sec for complete 
resolution of the standard X 174-HAE HI restriction frag- 

In the separation of sequencing reaction products, we com- 
pleted a study on the role of polymer molecular weight and 
concentration. Using linear polyacrylamide (LPA), the 
polymer with which we have had our most success, we 
have achieved 1000 base read lengths in 1 1/2 hrs. Optimi- 
zation of column length, electric field and column tem- 
perature (50° C) was required. Using emulsion polymer- 
ization, we are now able to produce LPA powders with 
MW of - lO-" k Da. The fully replaceable matrix is very 
powerful for rapid sequencing of long reads. 

We have successfully implemented a 12-capillary array 
instrument and are using it to study issues of ruggedness in 
routine sequencing. As part of this, we have developed a 
sample clean-up procedure which reduces all reactions to a 
similar state in terms of sample solution prior to injection. 
The results of this work have led to the design of a 96-cap- 
illaiy array that we will implement over the next year. 

We have also achieved very fast separations of ss- and 
dsDNA using short capillaries and very high yields. For 
example, sequencing 300 bases in 3^ mins. has been 
shown, as well as very rapid mutational analysis. Imple- 
mentation of such speeds on a capillary array will create 
an instrument for high throughput automated analysis. 

DOE Grant No. DE-FG02-90ER60985. 


Ultrasensitive Fluorescence Detection 
of DNA 

Richard A. Mathies and Alexander N. Glazer 

Departments of Chemistry and Molecular and Cell 
Biology; University of California; Berkeley CA 94720 
510/642-4192, Fax: -3599, 

The overall goal of this project is to develop new fluores- 
cence labeling methods, separation methods and detection 
technologies for DNA sequencing and genomic analysis. 

Highlights along with representative publications are given 

Energy Transfer Primers. Families of sequencing and PCR 
primers have been developed that contain both fluores- 
cence donor and acceptor chromophores.' These labeled 
primers with optimized excitation and emission properties 
provide from 2- to 20-fold enhanced signal intensities in 
automated DNA sequencing with slab gels and with capil- 
lary arrays.^ The reduced spectral cross talk of these ET 
primers also makes them valuable in PCR product and 
STR analyses.' 

New Intercalation Dye Labels. A new family of 
heterodimeric bis-intercalation dyes has been synthesized 
exploiting the concept of fluorescence energy transfer be- 
tween two different cyanine intercalators.^ By tailoring the 
spectroscopic properties of the dyes, labels with intense 
emission above 650 nm following 488 nm excitation have 
been fabricated. By adjusting tiie spacing linker between 
the two dyes, the binding affinity has also been optimized. 
These molecules are useful for noncovalent multiplex la- 
beling of ds-DNA in a wide variety of multicolor analy- 

Capillary Electrophoresis Chips. Capillary and capillary 
array electrophoresis systems have been photolithographi- 
cally fabricated on 2x3' glass substrates." These devices 
provide high quality electrophoretic separations of 
ds-DNA fragments and DNA sequencing reactions with a 
10-fold increase in speed.' Arrays of up to 32 capillaries on 
a single chip have been fabricated. 

Single DNA Molecule Fluore.icence Detection. A 
confocal fluorescence system has been used to demon- 
strate tiiat single molecule fluorescence burst counting can 
be used to detect CE separations of ds-DNA fragments. 
Fragments as small as 50 bp can be counted and mass sen- 
sitivities as low as 100 molecules per electrophoresis band 
are possible. This technology should be valuable in incipi- 
ent cancer and trace pathogen detection.* 

DOE Grant No. DE-FG03-91ER61125. 

(abstract continued) 

DOE Human Genome Program Report, Part 2, 1996 Research AbstracU 




I Ju. J., Ruan. C. Fuller. C. W. Glazer. A. N. and Malhiei, R. A. 
Fluorescence Energy Transfer Dyc-Labcled Primers for DNA 
Sequencing and Analysis, Proc. Natl. Acad. Sci. U.S.A. 92. 
4347^1351 (1995). 

2. Ju, J., Glazer. A. N and Mathies. R A. Eneigy Tnmsfer Primen: A 

New Fluofcsccnce Labeling Paradigm for DNA Sequencing and 
Analysis, Nature Medicine 2, 1 80- 1 82 (1 996). 

3. Wang, Y., Ju, J., Carpenter, B., Atherton, J M , Sensabaugh, G F and 

Mathies. R A. High-Speed. High-Throughput THOI Allelic Sizing 
Using Energy Transfer Ruorcscent Piimers and Capillary Aiiay 
Electrophoresis. Analytical Chemistry 67, 1 1 97- 1 203 ( 1 995). 

4. Benson. S. C, Zeng, Z.. and Glazer, A. N. Fluorescence Energy 

Transfer Cyaninc Hetcrodimcrs with High Affinity for 
Double-Stranded DNA I Synthesis and Spectroscopic Properties, 
Anal Biochcm. 231,247-255(1995). 

5. Zeng, Z., Benson, S. C. and Glazer, A. N. Fluorescence Energy 

Transfer Cyanine Hetcrodimers with High Affinity for 
Double-Stianded DNA. 11. Applications to Multiplex Restriction 
Fragment Sizing, Anal. Biochem. 23 1 , 256-260 ( 1 995) 

6. Woolley, A. T. and Mathies. R. A. Ultra-High-Spced DNAFiagmcnt 

Separations Using Microfabricatcd Capillary Array Electrophoresis 
Chips. Proc. Nad Acad. Sci. U.S.A.. 91 . 1 1 348-1 1352 (1994). 

7. Woolley. A. T. and Mathies. R. A. Ultra- High-Speed DNA Sequencing 

Using Capillary Array Electrophoresis Chips. Analytical Chemistry 

8. Haab, B. B. and Mathies. R A. Single Molecule Fluorescence Burst 

Detection of DNA Fragments Separated by Capillary Electrophoresis. 
Analytical Chemistry 67.3253-3260(1995). 

Joint Human Genome Program 
Between Argonne National Laboratory 
and the Engelhardt Institute of 
Molecular Biology 

Andrei Mirzabekov,'^ G. Yershov.'^ Y. Lysov,^ V. 
Barsky,' V. Shick,^ and S. Bavikin' 
'Argonne National Laboratory; Argonne. II 60439 
630/252-3161 or -3361, Fax: /252-3387 

Engelhardt Institute of Molecular Biology; 1 17984 Mos- 
cow, Russia 

In 19%, more than thirty U.S. and Russian research work- 
ers participated in the joint Human Genome Program be- 
tween Argonne National Laboratory and Engelhardt Insti- 
tute of Molecular Biology on the development of sequenc- 
ing by hybridization with oligonucleotide microchips 

During this year, about twenty Russian scientists have 
been working from 3 njonths to I year in ANL. In this pe- 
riod, 3 papers have been published and 5 papers accepted 
for publication, 3 more papers are submitted for publica- 

The main research efforts of the group have been concen- 
trated in three directions: 

I. Improvement of SHOM technology. 

II. Development of SHOM for the needs of Human Ge- 
nome Program. 

III. Development of new approaches based on SHOM 

I. Improvement of SHOM technology 

As a major result of the work in this direction, simple, reli- 
able and effective methods of microchip manufacturing, 
sample preparations, and quantitative hybridization analy- 
sis by fluorescence microscopy have been developed or 

1 . Photopolymerization technique for production of 
micromatrices of polyacrylamide gel pads on 
hydrophobicized glass surface was improved to become a 
simple, highly reproducible and inexpensive procedure (7). 

2. New and cheaper chemistry of the oligonucleotide im- 
mobilization has been developed and introduced for pro- 
duction of more durable microchips. It is based on the use 
of amino-oligonucleotides and aldehyde-gels instead of 
3-methyluridine-oligonucleotides and hydrazide-gels (3). 

3. Four-pin robot has been constructed with computer con- 
trol of every microchip element production. High quality 
microchips with 4100 inmiobilized oligonucleotides have 
been manufactured and the complexity of the microchips 
can easily be scaled up to a few tens of thousand elements. 

4. TVvo-color fluorescence microscope has been equipped 
for regular use with proper mechanics and software. It al- 
lows investigators to regularly use the automatic quantita- 
tive monitoring of the hybridization on the whole micro- 
chip and to measure the kinetics of hybridization as well as 
the melting curves of duplexes formed with all microchip 
oligonucleotides (lv2,8). 

5. Foiu'-color fluorescence microscope was manufactured 
and four proper fluorescence dyes are at present under se- 

6. Chemical methods of introduction of several fluores- 
cence dyes into DNA and RNA with or without fragmenta- 
tion have been developed and regularly used in SHOM 
experiments (4). 

7. A theory describing the kinetics of hybridization with 
gel-inunobilized oligonucleotides has been developed (5). 

8. Simple and relatively inexpensive equipment (around 
$10,000 per set) has been produced for manual manufac- 
turing of microchips and fluorescence measurement of hy- 
bridization, which will enable every laboratory to produce 
and practically use microchips containing up to 100 immo- 
bilized oligonucleotides or other compounds. 

n. Application of SHOM 

Although the main goal of our SHOM development is to 
produce a simple de novo sequencing procedure, a number 


DOE Human Qanom* Program Raport, Part 2, 1998 Raaaarch Abatracta 


of other SHOM applications have been tested as interme- 
diate steps in the SHOM research. 

I. Sequence analysis and sequencing 

A number of technical problems should be solved for de 
novo sequencing although they are much less stringent for 
comparative sequence analysis than for de novo sequenc- 
ing. Among these: 

a) Reliable discrimination of perfect and mismatched du- 
plexes. We have significantly improved the discrimination 
by decreasing the length of hybridized oligonucleotides to 
6-and 8-mers (1. 7) and by using 5-mers in "contiguous 
stacking" hybridization (1,2). Essential improvement was 
also achieved by automatic measuring of the melting 
curves for duplexes formed in each microchip element and 
calculating their thermodynamic parameters, free energy, 
enthalpy and entrophy for different regions of the melting 
curves and by comparing them with these parameters for 
perfect duplexes. In addition, a highly reliable discrimina- 
tion was achieved by using two-color fluorescence micros- 
copy and by quantitative comparison of the hybridization 
pattern of a known DNA or synthetic oligonucleotides and 
DNA under smdy labeled with different fluorophores (8). 

b) Difference in hybridization efficiency depends on the 
GC-content and the length of the duplex We have equal- 
ized the efficiency by choosing proper concentration for 
the immobilized oligonucleotide (6,7) and also by increas- 
ing the effective length of immobilized oligonucleotides 
by adding at one or both their ends 5-nitroindole as a uni- 
versal base or a mixture of four bases (2). 

c) Interference of hairpins and other structures in DNA 
with less stable duplexes formed upon the DNA hybridiza- 
tion with comparatively short immobilized oligonucle- 
otides of the microchip. This interference was decreased 
by fragmentation of the analysed sample of DNA and RNA 
in the course of incorporation of a fluorescence label (4). 
We have also tested incorporation by a chemical bond of 
an intercalator into immobilized oligonucleotides that sta- 
bilized its base paring with DNA over hairpin formation 

d) Necessity to increase the microchip complexity for se- 
quencing long DNA stretches. As an alternative, further 
development of so-called contiguous stacking hybridiza- 
tion was shown to improve the efficiency of 8-raer micro- 
chip up to that of 13-mer microchip so that DNA of several 
kilobases in length could be sequenced by SHOM (2). 

e) 6-mer microchips for sequencing and sequence analysis. 
We have now come to the stage of manufacturing micro- 
chips containing 4,096 (i.e. all possible) 6-mers. The con- 
trol tests partly described above have shown that these mi- 
crochips can be effectively used for sequence analysis, 
mutation diagnostics and detection of sequencing mistakes 


by conventional gel-sequencing methods. We hope that 
after demonstrating the efficiency of 6-mer microchips, we 
shall be able to get sufficient financial .support for produc- 
tion of the microchip with all 65,536 8-mers. 

2. Mutation diagnostics and gene polymorphism analysis 

The improvements described above have been introduced 
for reliable ("Yes" or "No" mode) identification of 
single-base changes in human genomic DNA. The effi- 
ciency of SHOM has been demonstrated for identification 
of a number of b-thalassemia mutations ( 1 ,2,8) and HLA 
allele variations in the human genome. 

3. Identification of microorganisms and gene expression 

Bacterial microchips have been manufactured and tested. 
Their ability for reliable identification of a number of bac- 
terial strains in the sample has been demonstrated (6). The 
chips containing oligonucleotides complementary to spe- 
cific regions of 16S ribosomal RNA were hybridized with 
samples of rRNA, total RNA, DNA and RNA transcripts 
of PCR-amplified genomic rDNA. Similar preliminary 
experiments demonstrated the efficiency of SHOM for 
monitoring the gene expression. 

III. Development of new approaches based on the 
SHOM technology 

1. Enzymatic modification of nucleic acids on selected ele- 
ments of the oligonucleotide chip. The gel pads of the oli- 
gonucleotide chip are separated from each other by hydro- 
phobic glass surface. It prevents the cross-talking of the 
chip elements when a drop of solution is applied on speci- 
fied elements. At the same time, a high porosity of the gel 
allows diffusion of large proteins into the gel. We have 
demonstrated that immobilized oligonucleotides can be 
enzymatically phosphorylated and ligated with contigu- 
ously stacked 5-mer after hybridization with DNA. A 
walking sequencing procedure by stacked pentanucleotides 
was proposed that is based on enzymatic ligation and 
phosphorylation on oligonucleotides chips (9). 

2. DNA fractionation on oligonucleotide chips. Due to the 
same properties, the oligonucleotide chips are used for 
fractionation of DNA after DNA hybridization with some 
complementary oligonucleotides of the chip. A new proce- 
dure for sequencing long DNA pieces was proposed that is 
based on fractionation of DNA on fractionating oligo- 
nucleotide chips followed by sequencing of the isolated 
DNA by SHOM on sequencing microchips. The procedure 
allows the investigator to skip cloning and mapping of 
long DNA pieces (9). 


It appears that the major technical problems of SHOM 
have been in most part solved, and this technology can al- 

DOE Human Ganome Program Rapoit, Part 2, 1996 Raaeareh Abstracts 




ready be applied for sequence analysis and checking the 
accuracy of conventional sequencing methods. A number 
of other applications in the Himian Genome Program are 
within the reach of SHOM, such as mutation screening, 
gene polymorphism studies, detection of microorganisms, 
gene expression studies, etc. Application of SHOM for de 
novo DNA sequencing requires manufacturing of more 
complicated microchips and improvement of some other, 
already available methods. 

DOE Contract No. W-3 1-1 09-Eng-38. 


1. Yep*ovG . Barsky V. Bclgovsky A., Kirillov Eu. Kreindlin E . 

Ivajiov I.. Parinov S.. Guschin D.. Drobishev A.. Dubiley S.. 
Mtrzabekov A. DNA analysis and diagnostics on oligonucleotide 
microchips // Proc. Natl Acad Sci. 1 996. Vol 93. 491 3-491 8. 

2. Parinov S., BarUcy V.. Yershov G.. Kirillov Eu., Timofcev E.. 

Belgovskiy A.. Mirzabckov A. DNA sequencing by hybridization (o 
microchip octa-and decanucleotides extended by stacked 
pentanucleotides. // NucI Acids Res. 1 996. Vol. 24. N 1 5 R 

3. Timofcev E.. Kochetkova S.. A., Mirzabckov A. Radioselective 

immobilization of short oligonucleotides to acrylic copolymer gels // 
NucI. Acids Res, 1996. Vol. 24. N 16. P 3142-3148. 
4 Piudnikov D . Mirzabckov A. Chemical methods of DNA and RNA 
fluorescent labelling. // NucI. Acids Res. 1996.. in press. 

5. Livshits M.. Mirzabckov A. Theoretical analysis of the kinetics of 

DNA hybridization with gel-immobilized oligonucleotides. // 
Biophys J 1996 Vol. 71. in print//. 

6. Guschin D , Mobarry B , Proudnikov D . Stahl D.. Rinmann B., 

Mirzabckov A. Oligonucleotide microchips as genosensotN for 
determinative and environmental studies in microbiology //Applied 
and Environmental Microbiology, in print//. 

7. Guschin D.. Ycrshov G., Za.slavsky A., Gemmell A., Shick V. Lysov 

Yu., Mirzabckov A. A simple method of oligonucleotide microchip 
manufacturing and properties of the microchips // submined for 
8 Drobyshev A., Mologina N . Shik V.. Pobedimskaya D.. Ycrshov G.. 
Mirzabckov A. Sequence analysis by hybridization with oligonucle- 
otide microchip: identification of beta-thalassemia mutations // Gene 
(in print). 

9. Dubiley S.. Kirillov Eu.. Lysov Yu.. Mirzabckov A. DNA fractionation, 

sequence analysis and ligation of immobilized ohgomer\ on 
oligonucleotide chips // submitted for publicadon. 

10. Timofcev E.. Smimov I.P. Haff L.A., Tishchenko E.I.. Mirzabckov 
AX)., Rorcntiev VX.. Methidium Intercalator Inseried into Synthetic 
Oligonucleotides // Tetrahedron Letters 1996. v 37, N47. p.8467. 

Relevant Publication 

Methods of DNA sequencing by hybridization ba-sed on optimizing 
concentration of matrix-bound oligonucleotide and device for 
carrying out .same by Khrapko K., Khorlin A., Ivanov L. Er^hov G.. 
Lysov Yu., Florenticv V., Mirzabckov A US Patent 5,552,270. Sep. 3, 
1996. PCT/RU92/00052. Hied Mar 18, 1992. 

High-Throughput DNA Sequencing: SAmple 
SEquencing (SASE) Analysis as a Framework 
for Identifying Genes and Complete 
Large-Scale Genomic Sequencing 

Robert K. Moyzis and Jeffrey K GrifTith' 

Center for Human Genome Studies; Los Alamos National 

Laboratory; Los Alamos, NM 87545 

505/667-3912, Fax: -2$9l, 

'University of New Mexico; Albuquerque, NM 87131 

The human chromosome 5 and 16 physical maps (Doggen 
et al., Nanire 377:Suppl:335-365, 1995; Grady et al.. 
Genomics 32:91-96, 1996) provide the ideal framework 
for initiating large-scale DNA sequencing. These physical 
mapping studies have shown clearly that gene density in 
himians will vary greatly. For example, band 16q21, con- 
sisting of 8 Mb of DNA, has no genes or trapped exons 
assigned to it. as yet. In contrast, band 16pl3.3 has an ex- 
tremely high density of coding regions in the DNA exam- 
ined to date (i.e., multiple genes/cosmid). Given this wide 
variation in gene density and current sequencing costs, we 
propose that newly targeted genomic regions should be 
analyzed first by a "Lewis and Clark" exploratory ap- 
proach, before committing to full length DNA sequencing. 
We are using a SAmple SEquencing (SASE) approach to 
rapidly generate aligned sequences along the chromosome 
5 and 16 physical maps. SASE analysis is a method for 
rapidly "scanning" large genomic regions with minimal 
cost, identifying, and localizing most genes. Briefly, indi- 
vidual cosmids are partially digested with Sau3A and 3 kb 
fragments are recloned into double-strand sequencing vec- 
tors. By sequencing both ends of a IX sampling of these 
recloned fragments along with end sequences of the 
cosmid, 70% sequence coverage is achieved with 98% 
clone coverage. The majority of this clone coverage is or- 
dered by the relationship between the subclone end se- 
quences. These ordered sequences are ideal substrates for 
directed sequencing strategies (for example, primer walk- 
ing or transposon sequencing). SASE analysis has been 
initiated on the 40 Mb short arm of chromosome 1 6 and 
the 45 Mb short arm of chromosome 5. We propose to 
make SASE sequences, along with feature annotation, 
publicly available through GSDB. Such data are sufficient 
to allow PCR amplification of the sequenced region from 
GSDB submissions alone, eliminating the need for exten- 
sive clone archiving and distributing, will allow for the 
effective "democratization" of the genome, allowing nu- 
merous laboratories to share and contribute to the growing 
genome databases. 

DOE Grant No. DE-FG03-96ER62298. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



One-Step PCR Sequencing 

Kenneth W. Porter, J. David Briiey, and Barbara Ramsay 

Department of Chemistry; Duke University; Durham, NC 


919/660-1553, Fax: -1605, 

A method is described to simultaneously amplify and se- 
quence DNA using a new class of nucleotides containing 
boron. During the polymerase chain reaction, 
boron- modified nucleotides, i.e. 2'-deoxynucleoside 
5'-a-[P-borano] -triphosphates,'-^ are incorporated into the 
product DNA. The boranophosphate linkages are resistant 
to nucleases and thus the positions of the borano- 
phosphates can be revealed by exonuclease digestion, 
thereby generating a set of fragments that defines the DNA 
sequence. The boranophosphate method offers an alterna- 
tive to current PCR sequencing methods. 

Single-sided primer extension with dideoxynucleotide 
chain terminators is avoided with the consequence that the 
sequencing fragments are derived directly from the origi- 
nal PCR products. Boranophosphate sequencing is demon- 
strated with the Pharmacia and the Applied Biosystems 
373A automatic sequencers producing data that is compa- 
rable to cycle sequencing. 

DOE Grant No. DE-FG02-97ER62376 and NIH Grant No. 


[11 Sood. A . Shaw, B. R. and Spielvogel. B. F. (1990) J. Amer. Chcm 

Soc. 112,9000-9001. 
[2J, J., Shaw, B. R.. Porter, K., Spielvogel. B. E, and Sood, A 

(1992) Angcw. Chem. In(. Ed. Engl. 31. 1373-1375. 

Automation of the Front End of DNA 

Lloyd M. Smith and Richard A. Guilfoyle 

University of Wisconsin; Madison, WI 53706 
Guilfoyle: 608/265-6138, Fax: -6780 
raguilfo @fac!itc^. wise, edu 

The objective of this project is to continue developing 
more efficient tools and methods addressing the 
"front-end" processes of large-scale DNA sequencing. Our 
specific aims are high-throughput purification and map- 
ping of cosmid inserts, controlled fragmentation of random 
inserts, direct selection vectors for cloning and sequencing, 
high-throughput M13 clone isolations, and 
high-throughput template purifications. 

An approach to multi-cosmid purifications was developed 
using a cell-harvester and binding to GF/C glass fiber 
filter-bottom microtiter plates. This method proved inad- 
equate because the yields were low and the DNA was eas- 

ily fragmented. In the last year we have started examining 
the use of triplex-affinity capture (TAC) for this purpose as 
applied to BACs, based on our previous success with TAC 
purification and restriction mapping of cosmids ( 1 ,2). 

We initially proposed to control random fragmentation for 
shotgun cloning using CvUl and its methyltransferase. 
Instead, we are now exploring automating it by scaled- 
down nebulization and parallel processing. 

We have made a vector, M13-102 (3,4, patented)), for fa- 
cilitating construction and improving quality of Ml 3 shot- 
gun libraries. It allows direct selection of recombinants, 
dephosphorylation of inserts to reducing chimerics, con- 
tains universal primers for fluorescent sequencing, and a 
triplex sequence for easy TAC purification of linearized 
RF DNA. We also made a version of this vector, 
MI3-100Z, which expressed the alpha-peptide of B-gal. Its 
utility is in flow cytometry based clone isolation. We con- 
tinue to develop these vectors for multiple cloning sites, 
and insert flipping using in closing steps of large-scale se- 
quencing projects. 

We continue to develop high-throughput clone isolations 
by flow cytometric cell sorting. Ml 3 or plasmid clones can 
theoretically be isolated at rates in microtiter wells at rates 
up to 2 per second using our present FacStar-Plus cytom- 
eter and collection assembly. Theoretical rates are much 
higher. This bypasses plating onto solid-media and any 
need for plaque/colony picking. We initially tried isola- 
tions after microencapsulation of cells in agarose gel 
microbeads, but with H/W and S/W improvements we can 
now distinguish positively selected transfected cells from 
background. Efficiency of sorting is very sensitive to de- 
tection efficiency. We continue to investigate different 
methods of florescence detection for various plasmid and 
M13 vector systems including fluorogenic substrates for 
B-gal, fluorescent-tagged antibodies to M13 or cell surface 
proteins, and green fluorescent protein as a reporter. 

We have been developing a solid-phase filter plate method 
for Ml 3 template purifications using carboxylated polysty- 
rene beads (Bangs Labs, IN) for automating on the 
Hamilton 2200. It should process 96 samples in under 30 
minutes and deliver I -2 micrograms per sample for 
cycle-sequencing. This approach has proven superior to 
others we have tried with respect to amenability to auto- 
mation (5,6). 

Ancillary projects. We reported a method for direct fluo- 
rescence analysis of genetic polymorphisms using oligo- 
nucleotide arrays on glass supports (7), which spun off 
other projects including (a) enhanced discrimination by 
artificial mismatch hybridization (8), restriction hybridiza- 
tion ordering of shotgun clones, and restriction site 
indexing-PCR (RSI-PCR) (9, patent applied for). RSI-PCR 
is an alternative strategy to extra-long PCR which has 
application in large gap filling (>45kb) differential 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




gene expression analysis, RFLP and EST marker produc- 
tion, end-sequencing and others. 

Our most significant fmdings are the following: 

1 . Improved direct selection M 1 3 cloning vector 

2. Rapid restriction mapping of cosmids using 
triple-helix affinity capture 

3. High-throughput M13 template production using car- 
boxylated beads 

4. Sequencing of a cosmid encoding the Drosophila 
GABA receptor 

5. Improved detection of sequencing clones by 

6. RSI-PCR, a strategy to obtain mapped and 
sequence-ready DNA directly from up to 0.5 kb re- 
gions of a complex genome using palindromic class II 
restriction enzymes; bypasses conventional cloning 
methodology (see previous section for applications). 

DOE Grant No. DE-FG02-91ER6U22. 

1 . Ji. H.. Smith. L.M., and GuUfoyle. R.A. (1994) GATA 1 1 . 43-47. 

2. Ji. H.. Francisco. T., Smith, LM and Guilfoylc. R.A { 1996) Genomics 

31. 185-192. 
3 GuilfoyleJ?. and Smith, L.M (1994) Nucleic Acids Res. 22. 1(X)-107. 

4. Chen. D., Johnson. A.F.. Seveiin, IM.. Rank, D.R., Smith, L.M. and 

Guilfoyle. R.A. (1996) Gene 172. 53-57. 

5. Kotoer, D£.. Guilfoyle, R.A., and Smith. L. (1994) DNA Sequence 4, 


6. Johnson, A.F.. Wang. R. Ji. R. Chen, D., Guilfoyle. R.A. and Smith, 

LM. (1996) AnaJ Biochem 234. 83-95. 

7. Guo. Z.. Guilfoyle. RA.. Thicl. A J.. Wang, R. and Smith. LM. (1994) 

Nucleic Acids Res, 22. 5455-5465. 

8. Guo. Z.. Liu, Q.. and Smith, L.M. (submitted). 

9. Guilfoyle. R.A.. Guo. Z., Krocning. D., Leeclt, C, and Smith. 


High-Speed DNA Sequence Analysis by 
Matrix-Assisted Laser Desorption 
Mass Spectrometry 

Lloyd M. Smith and Brian Chait' 

Department of Chemistry; University of Wisconsin; 

Madison, WI 53706 

608/263-2594, Fax: /265-6780, 

'RockefeUer University; New York, NY 10021 

Our mass spec research has focused primarily on the possi- 
bility of utilizing Matrix- Assisted Laser Desorption/Ioniza- 
tion Mass Spectrometry (MALDI-MS) as an alternative 
method to conventional gel electrophoresis for DNA se- 
quence analysis. In this approach, extension fragments gen- 
erated by the Sanger sequencing reactions are separated by 
size and detected in the mass spectrometer in one step. 

Our group has shown fragmentation to be a major factor 
limiting accessible mass range, sensitivity, and mass reso- 
lution in the analysis of DNA by MALDI-MS. This DNA 

fragmentation was shown to be strongly dependent on both 
the MALDI matrix and the nucleic acid sequence em- 
ployed. Fragmentation is proposed to follow a pathway in 
which nucleobase protonation leads to cleavage of the 
N-glycosidic bond with base loss, followed by cleavage of 
the phosphodiester backbone. Modifications of the deox- 
yribose sugar ring by replacing the 2' hydrogen with more 
electron-withdrawing groups such as the hydroxyl or 
fluoro group were shown to stabilize the N-glycosidic 
bond, partially or completely blocking fragmentation at the 
mtxlified nucleosides. The stabilization provided by these 
chemical modifications was also shown to expand the 
range of matrices useful for nucleic acid analysis, yielding 
in some cases greatly improved performance. 

DOE Grant No. DE-FG02-91ER61130. 
Relevant Publication 

Zhu, L.; Parr. G R; Fiugerald. M. C ; Nelson. C M.; Smith, L. M. 
Oligodeoxynucleotidc fragmentation in MALX>I/TOF Mass 
spectrometiy using 355 nm radiation. J. Am. Chcm. Soc. 1995. 1 17, 

Analysis of Oligonucleotide Mixtures 
by Electrospray lonization-Mass 

Richard D. Smith. David C. Muddiman, James E. Bruce, 
and Harold R. Udseth 

Envirorunental Molecular Sciences Laboratory; Pacific 
Northwest National Laboratory; Richland, WA 99352 
509/376-0723, Fax: -5824. 
http://www.emsl.pnl. gov:2080/docs/msd/fticr/ 
advmasspec. html 

This project aims to develop electrospray ionization mass 
spectrometry (ESl-MS) methods for high speed DNA se- 
quencing of oligonucleotide mixtures, that can be inte- 
grated into an effective overall sequencing strategy. A sec- 
ond goal is develop mass spectrometric methods that can 
be effective utilized in post genomic research in broad ar- 
eas of DNA characterization, such as with polymerase 
chain reaction to rapidly and accurately identify single 
base polymorphisms. ESI produces intact molecular ions 
from DNA fragments of different size and sequence with 
high efficiency [1]. Our aim is to determine ESI mass 
spectrometry conditions that are compatible with biologi- 
cal sample preparation to allow efficient ionization of 
DNA and allowing for the analysis of complex mixtures 
(e.g., Sanger sequencing ladder). We have developed a 
novel on-line microdialysis method at PNNL to remove 
salts, detergents, and buffers from such biological prepara- 
tions as PCR and dideoxy sequencing mixtures. This has 
allowed for rapid and efficient desalting (e.g., of samples 
having 0.25 M NaCl) allowing ESI mass spectral analysis 
without the typically problematic Na-adducts observed. 
Oligonucleotide ions are typically produced from ESI with 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


a broad distribution of net charge states for each molecular 
species, and thus leading to difficulties in analysis of com- 
plex mixtures [1]. To make identification of each compo- 
nent in a sequencing mixture possible, the charge states of 
molecular ions can be reduced using gas-phase reactions. 
The charge-state reduction methods being examined in- 
clude; (1) reactions with organic acids and bases (in the 
solution to be electrosprayed and the ESI-MS interface or 
the gas phase); (2) the labeling of the oligonucleotides 
with a designed functional group for production of mo- 
lecular ions of very low charge states; and (3) the shielding 
of potential charge sites on the oligonucleotide phosphate/ 
phosphodiester groups with polyamines (and the subse- 
quent gas-phase removal of the neutral amines). In initial 
studies two methods for charge state reduction of gas 
phase oligonucleotide negative ions have been tested: (1) 
the addition of acids and bases to the oligonucleotide solu- 
tion and (2) the formation of diamine adducts followed by 
dissociation in the interface region (2,3]. Several methods 
show promise for charge state reduction and results have 
been demonstrated for series of smaller oligonucleotides. 
We have recently demonstrated for the first time that PCR 
products can be rapidly detected using ESI-MS with sig- 
nificant improvements projected [4,5]. Finally, new mass 
spectrometric methods have been developed to provide the 
dynamic range expansion necessary for addressing DNA 
sequencing mixtures [6]. Our overall aim is to provide a 
foundation for the development of an overall approach to 
high speed sequencing (including the rapid and precise 
PCR product characterization) using cost effective 
high-throughput instrumentation. 

DOE Contract No. DE-AC06-76RLO-1830. 

( 1 1 "New Developments in Biochemica] Spectrometry; 

Electrospray Ionization", R. D. Smith. J. A. Loo. C. G. Edmonds. C. 

I. Barinaga. and H.R Udselh. Anal, Chcm., 62, 882-889 (1990). 
[21 "Charge State Reduction of Oligonucleotide Negative Ioils from 

Electrospray Ionization", X Cheng, D C. Gale, H. R. Udseth, and R. 

D. Smith, Anal. Chem.. 67. 586-593 (1995). 
[3] "Charge-State Reduction with Improved Signal Intensity of 

Oligonucleotides in Electrospray Ionization Spectrometry" D.C. 

Mudditnan. X.Cheng. H R. Udseth and R D Smith J. Am. Soc. Mass 

Spectrom.. 7 (8) 697-706 ( 1996). 
[4] "Analysis of Double-stranded Polymera-sc Chain Reaction Products 

from the Bacillus ccrcus Group by Electrospray Ionization Fourier 

Transform Ion CycloO-on Resonance Mass Spectrometry" D.S. 

Wunschel, K.F. Fox, A. Fox, IE. Bruce, DC. Muddiman and RD. 

Smith Rapid Commun. in Ma,ss Spectrom.. 10. 29-35 (1996). 
|5 1 "Characterization of PCR Products From Bacilli Using Electrospray 

Ionization FTICR Spectrometry". D. C. Muddiman. D S. 

Wunschel. C. Liu. L. Pasa-Tolic, K F Fox. A Fox. G. A. Anderson, 

and R. D. Smith, Anal, Chem., 68. 3705-3712 (19%). 
[6] "Colored Noise Waveform-s and Quadrupole Excitation for the 

Dynamic Range Expansion in Fourier Transform Ion Cyclotron 

Resonance Mass Spectrometry". J. E. Bruce. G. A. Anderson and R. 

D. Smith. AnaJ. Chem., 68, 534-54 1 ( 1 996). 


High-Speed Sequencing of Single DNA 
Molecules in the Gas Phase by 

Richard D. Smith. David C. Muddiman, S. A. Hofstadler, 

and J. E. Bruce 

Environmental Molecular Sciences Laboratory; Pacific 

Northwest National Laboratory; Richland, WA 99352 

509/376-0723, Fax: -5824, 



This project is aimed at the development of a totally new 
concept for high speed DNA sequencing based upon the 
analysis of single (i.e., individual)large DNA fragments 
using electrospray ionization (ESI) combined with Fourier 
transform ion cyclotron resonance (FTICR) mass spec- 
trometry. In our approach, large single-stranded DNA seg- 
ments extending to as much as 25 kilobases (and possibly 
much larger), are transferred to the gas phase using ESI. 
The multiply-charged molecular ions are trapped in the 
cell of an FTICR mass spectrometer, where one or more 
single ion(s) are then selected for analysis in which its 
mass-to-charge ratio (m/z) is measured both rapidly and 
non-destructively. Single ion detection is achievable due to 
the high charge state of the electrosprayed ions and the 
unique sensitivity of new FTICR detection methodologies. 

Initial efforts under this project have demonstrated the ca- 
pability for the formation, extended trapping, isolation, 
and monitoring of sequential reactions of highly charged 
DNA molecular ions with molecular weights well into the 
megadalton range [1-6]. We have shown that large 
multiply-charged individual ions of both single and 
double-stranded DNA anions can also be efficiently 
trapped in an FTICR cell, and their mass-to-charge ratios 
measured with very high accuracy. Thus, it is feasible to 
quickly determine the mass of each lost unit as the DNA is 
subjected to rapid reactive degradation steps. One ap- 
proach is to develop methods based upon the use of 
ion-molecule or photochemical processes that can promote 
a stepwise reactive degradation of gas-phase DNA anions. 
Successful development of one of these approaches could 
greatly reduce the cost and enhance the speed of DNA se- 
quencing, potentially allowing for sequencing DNA seg- 
ments of more than 25 kilobase in length, on a time scale 
of minutes with negligible error rates with the added po- 
tential for conducting many such measurements in parallel. 
Instrumentation optimized for these purposes is currently 
being introduced and promises to greatly advance the 
methodology. The techniques being developed promise to 
lead to a host of new methods for DNA characterization, 
potentially extending to the size of much larger DNA re- 
striction firagments (>500 kilobases). 

DOE Contract No. DE-AC06-76RLO-1830. 

(abstract continued) 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




(1 J 'Trapping Detection and Reaction of Very Laige Single Molecular 
Ions by Mass Spectrometry," R D. Smith. X Cheng, J. E. Bnice, S.A 
Hofsladler and G A. Anderson. Nature. 369. 1 37- 1 39 ( 1 994) 

[2] "Chaige Stale Shifting of Individual Multiply-Chaiged Ions of Bovine 
Albumin Dimer and Molecular Weight Determination Using an 
Individual-Ion Approach." X. Cheng. R. BaJchtiar, S. Van Orden, and 
R. D. Smith, Anal Chem . 66, 2084-2087 ( 1 994) 

[3] 'Trapping. Detection, and Mass Measurement of Individual Ions in a 
Fourier Transform Ion Cyclotron Resonance Mass Spectrometer,: J.E. 
BruceJC Cheng. R. Bakhtiar. Q. Wu, SA. Hofsladler. G.A. 
Anderson, and RD.Smitb, J. Amer. Chem. Soc., 1 16. 7839-7847 

[4] "Direct Charge Number and Molecular Weight Determination of 
Large Individual loos by Electrospray lonization-Fourier Transform 
Ion Cyclotron Resonance Mass Spectrometry ". R. Chen, Q. Wu. D.W 
Mitchell, S.A. Hofstadlcr. A-L. Rockwood. and R D. Smith. Anal. 
Chem., 66, 3964-3969 (1994). 

[5] 'Trapping, Detection and Ma.s.s Determination of Coliphage T4 (108 
MDa) Ions by Electrospray Ionization Fourier Transform Ion 
Cyclotron Resonance Mass Spectrometry" R. den, X. Cheng, D.W. 
Mitchell, S.A Hofstadlcr, Ai. Rockwood, Q. Wu, M.G. Sherman 
and RJJ. Smith, Anal. Chem.,67, 1 1 59- 1 1 63 ( 1 995). 
|61 "Accurate Molecular Weight Determination of Plasmid DNA Using 
Mass Spectrometry", X. Cheng, D. G. Camp IL Q. Wu. R. Bakhdar. 
D. L. Springer. BJ. Morris. J. E. Bruce. G. A. Anderson, C. G. 
Edmonds and R. D. Smith, Nucleic Acid Res . 24, 2183-2189 (1996). 

Characterization and Modification of 
DNA Polymerases for Use in DNA 

Stanley Tabor 

Harvard University; Boston, MA 02115-5730 
617/432-3128, Fax: -3362, 
hnp://sbweb. med.harvard. edu/~bcmp 

Our studies are directed towards improving the properties 
of DNA polymerases for use in DNA sequencing. The pri- 
mary focus is understanding the mechanism by which 
DNA polymerases discriminate against nucleotide analogs, 
and the mechanism by which they incorporate nucleotides 
processively without dissociating from the DNA template. 

We are comparing three DNA polymerases that have been 
used extensively for DNA sequencing; E. coli DNA poly- 
merase I, T7 DNA polymerase, and Taq DNA polymerase. 
These are related to one another, and this homology has 
been exploited to construct active site hybrids that have 
been used to determine the structural basis for differences 
in their activities. Specifically, the hybrids have been used 
(1) to determine why E. coli DNA polymerase I and Taq 
DNA polymerase discriminate strongly against 
dideoxynucleotides, and (2) to understand how T7 DNA 
polymerase interacts with its processivity factor, 
thioredoxin, to confer high processivity. 

Based on these smdies, we have been able to modify Taq 
DNA polymerase and E. coli DNA polymerase I to make 
them incorporate dideoxynucleotides much more effi- 

ciently, and to have increased processivity in the presence 
of thioredoxin. The ability to incorporate 
dideoxynucleotides efficiently greatly improves the unifor- 
mity of band intensities on a DNA sequencing gel, thereby 
increasing the accuracy of the DNA sequence obtained. In 
addition, the efficient use of dideoxynucleotides reduces 
the amount of these analogs required for DNA sequencing, 
an important issue when using fluorescently modified 
dideoxy terminators. In an approach that complements 
these studies, we, in collaboration with Dr. Thomas 
Ellenberger (Harvard Medical School), are determining the 
crystal structure of T7 DNA polymerase in a complex with 
thioredoxin and a primer-template. Knowledge of this 
structure will allow the rationale design of specific muta- 
tions that will enable DNA polymerases to incorporate 
other analogs useful for DNA sequencing more efficiently, 
such as those with fluorescent moieties on the bases. 

DOE Grant No. DE-FG02-96ER62251. 
Relevant Publication 

Tabot, S.. and Richardson. C C. ( 1995). A single residue in DNA 
polymerases of the Escherichia coli DNA polymerase I family is 
critical for distinguishing between deoxy-and dideoxyribonucleotides. 
Proc. Natl. Acad. Sci. U.S.A. 92. 6339-6343 

Bedford. E.. Tabor. S. and Richardson, C. C. ( 1 997). The thioredoxin 
binding domain of bacteriophage T7 DNA confers 
processivity on Escherichia coli DNA polymerase L Proc. Natl. Acad 
Sci. U.S.A. 94. 479-484. 

Modular Primers for DNA Sequencing 

Mugasimangalam Raja,'-' Dina Sonkin,' Lev Lvovsky,' 
and Levy Ulanovsky" 

'Center for Mechanistic Biology and Biotechnology; 
Argonne National Laboratory, Argonne, IL 60439-4833 
Ulanovsky: 630/252-3940; Fax: -3387, 
T5ept. of Structural Biology; Weizmann Instimte of Sci- 
ence; Rehovot 76100, Israel 

We are developing molecular approaches to DNA sequenc- 
ing enabling primer walking without the step of chemical 
synthesis of oligonucleotide primers between the walks. 
One such approach involves "modular primers" described 
earlier, consisting of 5-mers, 6-mers or 7-mers (selected 
from a presynthesized library), annealing to the template 
contiguously with each other. Another approach, that we 
have termed DENS (Differential Extension with Nucle- 
otide Subsets), works by selectively extending a short 
primer, making it a long one at the intended site only. 
DENS starts with a limited initial extension of the primer 
(at 20-30 C) in the presence of only 2 out of the 4 possible 
dNTPs. The primer is extended by 6-9 bases or longer at 
the intended priming site, which is deliberately selected, 
(as is the two-dNTP set), to maximize the extension 
length. The subsequent sequencing/termination reaction at 
60-65 C then accepts the extended primer at the intended 
site, but not at alternative sites, where the initial extension 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


(if any) is generally much shorter. DENS allows the use of 
primers as long as 8-mers (degenerate in 2 positions) 
which prime much more strongly than modular primers 
involving 5-7 mers and which (unlike the latter) can be 
used with thermostable polymerases, thus allowing 
cycle-sequencing with dye-terminators for Taq, as well as 
making double-stranded DNA sequencing more robust. 

These technologies are expected to speed up genome se- 
quencing in more than one way: 

a) Reduction in redundancy would result firom more effi- 
cient and rapid closure of even long gaps which are cur- 
rently avoided at the price of 7-to 9-fold redundancy in 
shotgun. Instantly available primers would also improve 
the quality of sequencing. Stretches of sequence that have 
too low confidence level (high suspected error rate) can be 
resequenced without synthesizing new oligos and without 
growing any new subclones. 

b) Further down the road, the completion of the automa- 
tion of the closed cycle of primer walking will be made 
possible via the elimination of the need to synthesize the 
walking primers. Combined with the capillary sequencers, 
the instant availability of the walking primers should re- 
duce the time per walking cycle from 2-3 days now to 
about 1 .5-2.0 hours, an improvement in speed by a factor 
of 20-50. 

c) The closed-end automation would minimize both the 
labor cost and human errors. As primer walking has mini- 
mal, if any, front-end and back-end bottlenecks inherent to 
shotgun, the cost of sequencing would be essentially that 
of reagents, 5 cents/base or less. 

DOE Grant No. DE-FG02-94ER6 183 1 . 

Time-of-Flight Mass Spectroscopy of 
DNA for Rapid Sequence 

Peter Williams, Chau-Wen Chou, David Dogniel, Jennifer 

Krone, Kathy Lewis, and Randall Nelson 

Department of Chemistry and Biochemistry; Arizona State 

University; Tempe, AZ 85287 

602/965-4107, Fax: -2147, 

There are three potential roles for mass spectrometry rel- 
evant to the Human Genome Project: 

a) The most obvious role is that on which all groups have 
been focussing -development of an alternative, faster se- 
quence ladder readout method to speed up large-scale se- 
quencing. Progress here has been difficult and slow be- 
cause the mass spectrometry requirements exceed the cur- 
rent capabilities of mass spectrometry even for proteins, 
and DNA presents significantly more difficulty than pro- 
teins. We have shown previously that pulsed laser ablation 


of DNA from frozen aqueous films has the potential to 
yield sequence-quality mass spectra, but that ionization in 
this approach is erratic and uncontrollable. We are focus- 
sing on developing ionization methods using ion (or elec- 
tron) attachment to vapor-phase DNA (ablated from ice 
films) in an electric field-free environment; results of this 
approach will be reported. 

b) Mass spectrometry may not ultimately compete favor- 
ably in speed with large-scale multiplexing of conven- 
tional or near-term technologies such as capillary electro- 
phoresis. However, as the Genome project nears comple- 
tion there will be an increasing need for rapid small-scale 
DNA analysis, where the multiplex advantage will not be 
so great and mass spectrometry could play a more signifi- 
cant role there. With this in mind we are looking at ways to 
speed up the overall mass spectrometric analysis, e.g. 
simple rapid cleanup of sequence mixtures, and at genera- 
tion of short sequence ladders by exopeptidase digestion. 

c) Given the genome data base(s) at the completion of the 
project, with rapid search capability, a need will arise for 
comparably rapid generation of search input data to iden- 
tify often very small quantities of proteins isolated from 
biochemical investigations. With this in mind we have de- 
veloped extremely rapid enzyme digestion techniques opti- 
mized for mass spectrometric readout, using endopepti- 
dases covalently coupled directly to the mass spectrometer 
probe tip. The elimination of autolysis and transfer losses 
allows rapid (few minute) endopeptidase digestion and 
mass analysis of as little as I picomole of protein, leading 
to an ambiguous database identification. An alternative 
search procedure uses partial amino-acid sequence infor- 
mation. With the added use of exopeptidases to generate a 
peptide ladder sequence in the mass spectrum of the en- 
dopeptidase digest, on the order of a dozen residues of in- 
ternal sequence can be generated in a total analysis time of 
20 minutes or less, again using only picomoles of sample. 

DOE Grant No. DE-FG02-91ER61127. 

Development of Instrumentation for 
DNA Sequencing at a Rate of 40 
Million Bases Per Day 

Edward S. Yeung, Huan-Tsung Chang, Qingbo Li, 

Xiandan Lu, and Eliza Fung 

Ames Laboratory and Department of Chemistry; Iowa 

State University; Ames, lA 5001 1 

515/294-8062, Fax: -0266, 

We have developed novel separation, detection, and imag- 
ing techniques for real-time monitoring in capillary elec- 
trophoresis. These techniques will be used to substantially 
increase the speed, throughput, reliability, and sensitivity 
in DNA sequencing applications in highly multiplexed 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




capillary arrays. We estimate that it should be possible to 
eventually achieve a raw sequencing rate of 40 million 
bases per day in one instrument based on the standard 
Sanger protocol. We have reached a stage where an actual 
sequencing instrument with 100 capillaries can be built to 
replace the Applied Biosystems 373 or 377 instruments, 
with a net gain in speed and throughput of 100-fold and 
24-fold, respectively. 

The substantial increase in sequencing rate is a result of 
several technical advances in our laboratory. (1) The use of 
commercial linear polymers for sieving allows replaceable 
yet reproducible matrices to be prepared that have lower 
viscosity (thus faster migration rates) compared to poly- 
acrylamide. (2) The use of a charge-injection device camera 
allows random data acquisition to decrease data storage and 
data transfer time. (3) The use of distinct excitation wave- 
lengths and cut-off emission filters allows maximum light 
throughput for efficient excitation and sensitive detection 
employing the standard 4-dye coding. (4) The use of 
indexmatching and 1 : 1 imaging reduces stray light without 
sacrificing the convenience of on-column detection. 

Continuing efforts include further optimization of the 
separation matrix, development of new column condition- 
ing protocols, refinement of the excitation/emission optics, 
design of a pressure injection system for 96-well titer 
plates, validation of a new 2-color base-calling scheme, 
simplification of software to allow essentially real-time 
data processing, implementation of voltage programming 
to shorten the total run times, and scale up of the technol- 
ogy to allow parallel sequencing in up to 1,000 capillaries. 

Relevant Publications 

K. Ucno and E. S. Yeung. "Simultaneous Mociloring of DNA Fragmcots 

Separated by Capillary Electrophoresis in a Multiplexed Array of 100 

Channels", Anal. Chem. 66. 1424-1431 (1994). 
X. Lu and E. S. Yeung. "Optimization of Excitation and Detection 

Geometry for Multiplexed Capillary Array Electrophoresis of DNA 

Fragments". Appl. Speclrosc. 49. 605-609 ( 1995). 
Q. Li and E. S. Yeung. "Evaluation of the Potential of a Charge Injection 

Device for DNA Sequencing by Multiplexed Capillary Electrophore- 
sis". Appl. Spectrosc. 49, 825-833 (1995). 
E. N. Fung and E. S. Yeung. "High-Speed DNA Sequencing by Using 

Mixed Poly(ethylcneoxide) Solutioas in Uncoatcd Capillary 

Columns." Anal. Chem. 67. 1913-1919(1995). 
Q. Li and E. S. Yeung, "Simple Tv^o-Color Schemes for 

DNA Sequencing Based on Standard 4-L.abel Sanger Chemistry", 

Appl. Specti^osc. 49. 1528-1533 (1995). 


DOE Human Qanonie Program Report, Part 2, 1996 Raaearch Abstracti 



Our work for 1996 and 1997 will include the following: 

1 . Comparative study of the kinetics of entry of DNA of 
different molecular forms into E.coli cells DH lOB/r and 
DH5a during electrotransformation. Study of the optimal 
regimes of cell-wall permeabilization for the DH lOB/r cells. 

2. Study of the efficiency of BAC cloning in DHlOB/r 
cells using new electrotransformation method. Optimiza- 
tion of the procedure for DH lOB/r cells. 

3. Modernization of the electronic equipment in accor- 
dance with results of the biological experiments. To ex- 
pand the studies, we need to extend the capability of the 
instrumentation to increase its flexibility and to improve 
the accuracy and reproducibility of the electric fields we 
generate by incorporating electronic components with 
higher tolerances. 

DOE Grant No. OR00033-93CIS0I5. 

Overcoming Grenome Mapping 

Charles R. Cantor 

Center for Advanced Biotechnology; Boston University; 

Boston MA 02215 

617/353-8500, Fax: 8501, 

Most traditional DNA analysis is done based on fraction- 
ation of DNA by length. We have, instead, begun to ex- 
plore the use of DNA sequences as capture and detection 
methods to expedite a number of procedures in genome 

Triplet repeats like (GGC)^ are an important class of hu- 
man genetic markers, and they are also responsible for a 
number of inherited diseases involving the central nervous 
system. For both of these reasons it would be very useful 
to have a way to monitor the status of large numbers of 
triplet repeats simultaneously. We are developing methods 
to isolate and profile classes of such repeats. 

In one method, genomic DNA is cut with one or more re- 
striction nucleases, and splints are ligated onto the ends of 
the fragments. Then fragments containing a specific class 
of repeats are isolated by capture on magnetic microbeads 
containing an immobilized simple repeating sequence. The 
desired material is then released, and, if necessary, a selec- 
tive PCR is done to reduce the complexity of the sample. 
Otherwise the entire captured sample is amplified by PCR. 
The spectrum of repeats is then examined by electrophore- 
sis on an automated fluorescent gel reader In our case the 
Pharmacia ALF is used, because of its excellent quantita- 
tive signal accuracy. A very complex spectrum of bands is 

•Projects destgnaled by an a.slcrisk received small emergency grants following December 1992 site reviews by David Gala.s (formerly DOE Office of 
Health and Environmental Research, which was renamed Office of Biological and Environmental Research in 1997). Raymond Gesteland (University 
of Utah), and Elbert Branscomb (Lawrence Livermorc National Laboratory). 

Resolving Proteins Bound to Individual 
DNA Molecules 

David Allison, Bruce Warmack. Mitch Doktycz, Tom 

ITiundat, and Peter Hoyt 

Molecular Imaging Group; Health Sciences Research 

Division; Oak Ridge National Laboratory; Oak Ridge, TN 


Allison: 423/574-6199, Fax: -6210, 

Warmack; 423/574-6202, Fax: -6210, 

We have precisely located sequence specific proteins 
bound to individual DNA molecules by direct AFM imag- 
ing. Using a mutant £coR I endonuclease that site-specifi- 
cally binds but doesn't cleave DNA, bound enzyme has 
been imaged and located, with an accuracy of ±1%, on 
well characterized plasmids and bacteriophage lambda 
DNA (48 kb). Cosmids have been mapped and, by incor- 
porating methods for anchoring molecules to surfaces and 
straightening to prevent molecular entanglement, BAC- 
sized clones could be analyzed. 

This direct imaging approach could be rapidly developed 
to locate other sequence-specific proteins on genomic 
clones. Enzymatic proteins, involved in identifying and 
repairing damaged or mutated regions on DNA molecules, 
could be imaged bound to lesion sites. Transcription factor 
proteins that identify gene-start regions and other regula- 
tory proteins that modulate the expression of genes by 
binding to specific control sequences on DNA molecules 
could be precisely located on intact cloned DNAs. 

Conventional gel-based techniques for identifying site- 
specific protein binding sites must rely upon fragment 
analysis for identifying restriction enzyme sites, or, for non- 
cutting proteins, upon gel-shift methods that can only ad- 
dress small DNA fragments. Conversely, AFM imaging is 
a general approach that is applicable to the analysis of all 
site-specific DNA protein interactions on large-insert clones. 
This technique could be developed for high-throughput 
analysis, can be accomplished by technicians, uses readily 
available relatively inexpensive instrumentation, and should 
be a technology fully transferable to most laboratories. 

DOE Contract No. DE-AC05-840R2I400. 

^Improved Cell Electrotransformation 
by Macromolecules 

Alexandre S. Boitsov, Boris V. Oskin, Anton O. Reshetin, 

and Stepan A. Boitsov 

Department of Biophysics; StPetersburg State Technical 

University; 195251 St Petersburg, Russia 

-H7-8 12/277-5959, Fax: /247-2088 or/534-3314, 

DOE Huinan Ganonie Program Report, Part 2, 1996 Research Abstracts 19 



seen representing hundreds of DNA fragments. We have 
shown that this spectrum is dramatically different with 
DNAs from unrelated individuals, and the spectrum is 
markedly dependent on the choice of restriction enzyme, 
as expected. Repeated measurements on the same sample 
are highly reproducible. The ability of the method to detect 
a specific altered repeat length in a complex DNA sample 
has been validated by examining several individuals with 
normal or expanded repeat sequences in the Huntington's 
disease gene. One very powerful application of this 
method may be the analysis of potential DNA differences 
in monozygotic twins discordant for a genetic disease. 
This method can be used to capture genome subsets con- 
taining any interspersed repeat. It will also detect inser- 
tions and deletions nearby such repeats. Methylation dif- 
ferences between sensitive methylation samples are also 
detectable when restriction fragments are used. 

Conventional analysis of triplet repeats is very laborious 
since individual repeats must be analyzed by electrophore- 
sis on DNA sequencing gels. The decrease in effort for 
such analyses will scale linearly as the number of repeats 
that can be analyzed simultaneously, so we are potentially 
looking at something like a factor of 100 improvement if 
the above scheme under development can be effectively 

As an alternative approach, we are developing chip-based 
methods that can detect the length of a tandemly-repeating 
sequence without any need for gel electrophoresis. Here 
the goal is to build an array of all possible repeat sequence 
lengths flanked by single-copy DNA. When an actual 
sample is hybridized to such an array, the specific alleles 
in the sample will produce perfect duplexes at their corre- 
sponding points in the array and at mismatched duplexes 
elsewhere. Thus, the task of scoring the repeat lengths is 
reduced to the task of distinguishing perfect and imperfect 
duplexes. Currently we are exploring a number of different 
enzymatic protocols that offer the promise of making such 
distinctions reliably. 

In other work we are using enzyme-enhanced sequencing 
by hybridization (SBH) as a device for the rapid prepara- 
tion of DNA samples for mass spectrometry. For example, 
partially duplex DNA probes can capture and generate se- 
quence ladders from any arbitrary DNA sequence. Current 
MALDI protocols allow sequence to be read to lengths of 
50 to 60 bases. While this is probably insufficient for most 
de novo DNA sequencing, it is an extremely promising 
approach for comparative or diagnostic DNA sequencing. 

DOE Grant No. DE-FG02-93ER6 1609. 

Preparation of PAC Libraries 

Joe Catanese, Baohui Zhao. Eirik Frengen. Chenyan Wu. 

Xiaoping Guan. Chira Chen. Eugenia Pletrzak, 

Panayotis A. loannou,' Julie Korenberg,* Joel Jessee,' and 

Pieter J. de Jong 

Department of Human Genetics; Roswell Park Cancer 

Institute; Buffalo, NY 14263 

de Jong: 716/845-3168. Fax: -8849 

pieter@ dejong. med. buffalo, edu 

http://bacpac. med buffalo, edu 

'The Cyprus Institute of Neurology and Genetics; Nicosia, 


'Cedars Sinai Medical Center; Los Angeles, CA 90048 

'Life Technologies. Gaithersburg. MD 20898 

Recently, we have developed procedures for the cloning of 
large DNA fragments using a bacteriophage PI derived 
vector. pCYPACI (loannou et al. (1994). Nature Genetics 
6: 84-89). A slightly modified vector (pCYPAC2) has now 
been used to create a 1 5-fold redundant PAC library of the 
human genome, arrayed in more than 1.000 384-well 
dishes. DNA was obtained from blood lymphocytes from a 
male donor The library was prepared in four distinct sec- 
tions designated as RPCI- 1 . RPCI-3. RPCI-4 and RPCI-5. 
respectively, each having 120 kbp average inserts. The 
RPCI- 1 segment of the library (3X; 120.000 clones, in- 
cluding 25% non-recombinant) has been distributed to 
over 40 genome centers worldwide and has been used in 
many physical mapping studies, positional cloning efforts 
and in various large-scale DNA sequencing enterprises. 
Screening of the RPCI-1 library by numerous markers re- 
sults in an average of 3 positive PACs per autosome- 
derived probe or STS marker. In situ hybridization results 
with 250 PAC clones indicate that chimerism is low or 
non-exi.sting. Distribution of RPCI-3 (3X. 78,000 clones, 
less than \% non-recombinants, 4% empty wells) is now 
underway and the further RPCI-4 and -5 segments (< 5% 
empty wells) will be distributed upon request. To facilitate 
screening of the PAC library, we have provided the RPCI- 1 
PAC library to several screening companies and noncommer- 
cial resource centers. In addition, we are now distributing 
high-density colony membranes at cost-recovery price, 
mainly to groups having a copy of the PAC library. The 
combined RPCI- 1 and -3 segments (6X) can be repre- 
sented on 1 1 colony filters of 22x22 cm, using duplicate 
colonies for each clone. We are currently generating a 
similar PAC library from the 129 strain. 

To facilitate the additional use of large-insert bacterial 
clones for functional studies, we have prepared new PAC 
& BAC vectors with a dominant selectable marker gene 
(the blasticidin gene under control of the beta-actin pro- 
moter), an EBV replicon and an "update feature". This fea- 
ture utilizes the specificity of Transposon Tn7 for the Tn7att 
sequence (in the new PAC and BAC vectors) to transpose 
marker genes, other replicons and other sequences into PACs 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



or BACs. Hence, it facilitates retrofitting existing PAC/ 
BAC clones (made with the new vectors) with desirable 
sequences without affecting the inserts. The new vector(s) 
are being applied to generate second generation libraries 
for human (female donor), mouse and rat. 

DOE Grant No. DE-FG02-94ER61883 and NIH Grant No. 

Development of Affinity Technology for 
Isolating Individual Human 
Chromosomes by Third-Strand 

Jacques R. Fresco and Marion D. Johnson III 
Department of Molecular Biology; Princeton University; 
Princeton, NJ 08544-1011 
609/258-3927. Fax: -6730 
esteckman @ molbiol.princeton. edu 

Prior to the onset of this grant, solution conditions had 
been developed for binding a 17-residue third strand 
oligodeoxyribonucleotide probe to a specific human chro- 
mosome (HC) 17 multicopy alpha satellite target sequence 
cloned into DNA vectors of varying size up to 50 kb. 
Binding was shown to be both highly efficient and spe- 
cific. Moreover, initial experiments with fluorescent-la- 
beled third strands and human lymphocyte metaphase 
spreads and interphase nuclei proved similarly successful. 
During the current research period, the technology for such 
third strand-based cytogenetic examination, i.e.. Triplex In 
Situ Hybridization or TISH, of such spreads was perfected, 
so that it is now a highly reproducible method. Compari- 
son of spreads of different individuals by TISH and FISH 
analysis has provided a new basis for detecting alpha satel- 
lite DNA polymorphisms, the basis of which requires fur- 
ther investigation. 

This year work also commenced on the development of 
comparable probes specific for alpha satellite sequences in 
HC-X. 1 1, and 16. The work with HC-X has reached the 
stage where we are ready to test the probe for TISH -based 
cytogenetic analysis. Solution studies of the interaction of 
the probes designed for HC-I I and HC-16 alpha satellite 
targets are following the well-established path we em- 
ployed for HC-17 and HC-X. With the expectation of suc- 
cess in these cases during the coming year, the way should 
be clear for the development and application of compa- 
rable probes for alpha satellite sequences of any other hu- 
man chromosomes that may be of interest, and possibly of 
other eukaryotic species. 

Meanwhile, we have begun to turn our attention to two 
other goals, one being the exploitation of our probes for 
the isolation of individual human chromosomes by affinity 


purification, as we originally proposed. The other goal is 
to exploit our probes as aids in flow sorting human chro- 
mosomes, a direction of work we expect to pursue in col- 
laboration with the Lx)s Alamos National Laboratory, just 
as soon as they indicate a readiness to do so. Finally, we 
have begun to evaluate the possibility of using third-strand 
binding fluorescent probes for detection of single copy 
genes by means of photon counting, a goal which we plan 
to undertake with our colleague Robert Austin of our Phys- 
ics Department 

DOE Grant No. DE-FGO2-96ER622202. 

Chromosome Region-Specific Libraries 
for Human Genome Analysis 

Fa-Ten Kao 

Eleanor Roosevelt Institute for Cancer Research; Denver, 

CO 80206 

303/333-4515, Fax: -8423, 

The objective of this project is to construct and character- 
ize chromosome region-specific libraries as resources for 
genome analysis. We have used our chromosome micro- 
dissection and Mbol linker-adaptor technique (PNAS 88, 
1844, 1991) to construct region-specific libraries for hu- 
man chromosome 2 and other chromosomes. The libraries 
have been critically evaluated for high quality, including 
insert size, proportion of unique vs repetitive sequence 
microclones, percentage of microclones derived from dis- 
sected region, etc. 

We have constructed and characterized 1 1 region-specific 
libraries for the entire human chromosome 2 (the second 
largest human chromosome with 243 Mb of DNA), includ- 
ing 4 libraries for the short arm and 6 libraries for the long 
arm, plus a library for the centromere region. The libraries 
are large, containing hundreds of thousands of microclones 
in plasmid vector pUCI9, with a mean insert size of 200 
bp. About 40-60% of the microclones contain unique se- 
quences, and between 70-90% of the microclones were 
derived from the dissected region. In addition, we have 
isolated and characterized many unique sequence 
microclones from each library that can be readily se- 
quenced as STSs, or used in isolating other clones with 
large inserts (like YAC, BAC. PAC, PI or cosmid) for 
contig assembly. These libraries have been used success- 
fully for high resolution physical mapping and for posi- 
tional cloning of disease-related genes assigned to these 
regions, e.g. the cloning of the gene for hereditary 
nonpolypsis colorectal cancer (Cell 75, 1215, 1993). 

For each library, we have established a plasmid sub-library 
containing at least 20,000 independent microclones. These 
sub- libraries have been deposited to ATCC for permanent 
maintenance and general distribution. The ATCC Reposi- 
tory numbers for these libraries are: #87188 for 2PI library 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 21 



(region 2p23-p25. comprising 25 Mb); #87189 for 2P2 
library (2p21 -p23. 28 Mb); #87103 for 2P3 library 
(2pl4-pl6. 22 Mb); #87104 for 2P4 library (2pll-pl3, 28 
Mb); #77419 for 2Q1 Ubrary (2q35-q37, 28 Mb); #87308 
for 2Q2 library (2q33-q35, 24 Mb); #87309 for 2Q3 li- 
brary (2q31-q32, 26 Mb); #87310 for 2Q4 library 
(2q23-q24, 19 Mb); #87409 for 2Q5 Ubrary (2q21-q22, 23 
Mb); #87410 for 2Q6 Ibrary (2qll-ql4. 31 Mb); and 
#87411 for 2CEN library (2pll.l-qll.l, 4 Mb). Details of 
these libraries have been described: Hum. Genet. 93, 557, 

1994 (for 2P1 library); CytogeneL Cell Genet. 68, 17, 

1995 (for 2P2 library); SomaL Oil Mol. Genet. 20, 353, 
1994 (for 2P3 library); SomaL Cell Mol. Genet. 20. 133, 
1994 (for 2P4 library); Genomics 14, 769, 1992 (for 2Q1 
library; SomaL Cell Mol. GeneL 21, 335, 1995 (for 2Q2, 
2Q3 & 2Q4 libraries); SomaL Cell Mol. Genet. 22, 57, 

1996 (for 2Q5, 2(^6 & 2CEN libraries). 

Region-specific libraries and short insert microclones for 
chromosome 2 are particularly useful resources for its 
eventual sequencing because this chromosome is less ex- 
ploited and detailed mapping information is lacking. We 
have also constructed 3 region-specific libraries for the 
entire chromosome 18 using similar methodologies, in- 
cluding 18PUbrary(18pll.32-pll.l.22Mb); 18Q1 library 
(18qll.l-ql2.3, 25 Mb); and 18Q2 library (18q21.1-q23, 
34 Mb). Details of these libraries have been described 
(Somat. Cell Mol. GeneL 22, 191-199, 1996). 

DOE Grant No. DE-FG03-94ER61819. 

^Identification and Mapping of 

DNA -Binding Proteins Along Genomic 

DNA by DNA-Protein CrossHnking 

V.L. Karpov, O.V. Preobrazhenskaya, S.V. Belikov, and 

D.E. Kamashev 

Engelhardt Institute of Molecular Biology; Russian 

Academy of Sciences; Moscow 17984, Russia 

Fax: ■f7-095/135-1405, 

In 1995-1996 we continued to map and identify nonhistone 
proteins binding at loci along the yeast chromosome. Using 
DNA-protein crosslinking in vivo, we detected two polypep- 
tides that probably correspond to core subunits of yeast 
RNA-polymerase II in the coding region of the transketolase 
gene (TKL2). Several nonhistone proteins were detected 
that bind to the upstream region of TKL2 and to an 
intergenic spacer between calmodulin (CMDl) and 
mannosyl transferase (ALGl) genes. The apparent molecular 
weight of these proteins was estimated. We also developed 
a new method to synthesize strand-specific probes. 

Using DNA-protein crosslinking in vitro, we found the 
amino acid residues of the Lac-repressor that interacts with 
DNA. Only Lys-33 crosslinks with the Lac-operator in the 
specific complex. 

In addition to Lys-33, the N-terminal end of the protein 
also crossUnks in a nonspecific complex. Our results dem- 
onstrate that, in the presence of an inducer, the repressor's 
N-termini crosslink to the operator's outermost nucle- 
otides. We suggest that binding of an inducer changes the 
orientation of the DNA-binding domain of the Lac repres- 
sor to the opposite of that found for the specific complex. 

We plan to use a new method to increase resolution and 
thus identify amino acids and nucleotides that participate 
in DNA-protein recognition. The mechanisms of transcrip- 
tion regulation of some yeast genes will thus be further 
elucidated. Our approaches are based on DNA-protein 
crosslinking. Detailed analysis will be done for specific 
and nonspecific complexes, in the presence and absence of 
inducers. This will allow us to make some conclusions 
about possible conformational rearrangements in 
DNA-protein complexes during gene activation at the 
protein's DNA-binding domains. 

DOE Grant No. OR00033-93CIS007. 

1 . Papatsenko D.A.. Belikov S.V.. Preobrazhenskaya O.V., and Kajpov 

V.L. Two-dimensional gels and hybrydizalion for studying 
DNA-pro»ein conlacts by crosslinking // Methods in Molecular and 
Cellular Biology 1995 V 5. No 3. P17I-177 

2. Kama-shev D., Estpova N.G-. Ebralidse K.. and Mirzabekov. A.D. 

Mechanism of lac repressor switch-off: Orientation of lac repressor 
DNA-binding domain is reversed upon inducer binding //FEBS Lett. 
1995.V.375. P27-30 

3. PapaLsenko DA.. Pripotova I.V.. Belikov S.V.. and Karpov. V.L. 

Mapping of DNA-binding proteins along yeist genome by 
UV-induced DNA-protein crosslinking.// FEBS Letter!. 1996. 381 , 
4 Belikov S v.. Papalsenko DA . and Kaipov VI, A method to 
synthesize strand-specific probes. //Anal.BitKhemistry. 1996, 

A PAC/BAC Data Resource for 
Sequencing Complex Regions of the 
Human Genome: A 2- Year Pilot Study 

Julie R. Korenberg 

Cedars Sinai Medical Center; University of California; 
Los Angeles, CA 90048-1869 
310/855-7627, Fax: /652-8010 

While the complete sequencing the human genome at 
99.99% accuracy is an immediate goal of the Hiunan 
Genome ProjecL a serious technical deficiency remains the 
ability to rapidly and efficiently construct sequence ready 
maps as sequencing templates. This is particularly prob- 
lematic in regions with untisual genome structure. An un- 
derstanding of these troublesome regions prior to 
genome-wide sequencing will provide quality assurance as 
well as reliable sequencing strategies in these regions. 

22 DOE Human Genome Program Report, Part 2, 1996 Research Abstrai:ts 



This proposal will generate a "whole genome" data re- 
source to enable rapid and reliable sequencing of genomic 
DNA by the definition and characterization of the more 
than 52 regions of high homology now known to be dis- 
tributed within unrelated genomic regions and cloned in 
BACs and PACs. To do this, we will: 

1. Define regions of true homology in the human genome 
by characterizing subsets of the 4,700 BAC/PACs that 
generate multiple hybridization signals using fluorescence 
in situ hybridization (FISH). Of the 1,200 sites of multiple 
signals, more than 52 regions contain repeats as defined by 
600 BAC/PACs. The chimerism rate, multiple clone wells, 
and chromosome of origin will be defined by re-streaking 
each clone, followed by fingerprint, FISH and PCR-ba.sed 
end-sequence analyses on hybrid panels and radiation hy- 

Data will be shared with large sequencing efforts, depos- 
ited in the 4D database, available with annotation on ftp 
server and through GDB. 

2. Generate contigs of BACs and PACs in regions of com- 
plex genome organization. Using STS, EST analyses, fin- 
gerprinting, BAC/PAC to BAC/PAC Southerns, end se- 
quence walking in 3.5-20X libraries, and metaphase/inter- 
phase FISH, contigs will be seeded in 2-5 of the regions of 
known genome complexity, each of which is estimated as 
2-5 Mb. These data will be used to evaluate and provide 
independent quality assurance of the STS and Radiation 
hybrid, and genetic maps in these regions. The most sig- 
nificant of these include lp36/lq; 2p/q; multiple sites; 
gp23 and 8 further sites; 9p/q. 

3. Define additional regions of complex genomic structure. 
Library screening using known members of multiple mem- 
ber retro-transposon and other known repeated sequences 
defined by the ncbi database, followed by FISH analyzes 
to determine structure and potential large regions of asso- 
ciated homologies. 

Collaboration with other genome and sequencing centers 
will provide quality control in the generation of 
sequence-ready maps for sequencing templates. 

We believe that this effort is important since 1) it will pro- 
vide a critical mapping tool necessary for the generation of 
sequence ready maps; 2) if initiated now, the problem ar- 
eas could be delineated before scale ups to full production 
occur in major genome centers; 3) represents a modest cost 
such that the cost of these data would comprise only a 
small fraction of the cost of the entire genome sequence 
and would vasdy decrease the cost of sequencing errors 4) 
and could be completed in a, short time (2 to 3 years) so as 
to be of maximum benefit to sequencing centers. The Prin- 
cipal Investigator in this project is ideally suited for this 
effort because the group has developed the technology and 
initiated FISH and genome analyses of over 4000 clones. 

We believe that this project represents a critical and timely 
effort to enable rapid and cost effective human genome 

Subcontract under Glen Evans' DOE Grant No. 

Mapping and Sequencing of the 
Human X Chromosome 

D. L. Nelson, E.E. Eichler. B.A. Firulli. Y. Gu. J. Wu, 

E. Brundage. A.C. Chinault, M. Graves, A. Areason. 

R. Smith. E.J. Roth, H.Y. Zoghbi, Y. Shen, MA. Wentland, 

D.M. Muzny. J. Lu. K Timms, M. Metzger, and 

R.A. Gibbs 

Department of Molecular and Human Genetics and Human 

Genome Center, Baylor College of Medicine; Houston, 

TX 77030 

713/798-4787, Fax: -6370 or -5386, 

http://www. bcm. tmc. edu/molgen 

The human X chromosome is significant from both medi- 
cal and evolutionary perspectives. It is the location of sev- 
eral hundred genes involved in human genetic disease, and 
has maintained synteny among mammals: both of these 
aspects are due to its role in sex determination and the hap- 
loid nature of the chromosome in males. We have ad- 
dressed the mapping of this chromosome through a num- 
ber of efforts, ranging from long-range YAC-based map- 
ping to genomic sequence determination. 

YAC mapping. The YAC-based map of the X is essentially 
complete. We have constructed a 40 Mb physical map of 
the Xp22.3-Xp21.3 region, spanning an interval from the 
pseudoautosomal boundary (PABX) to the Duchenne mus- 
cular dystrophy gene. This region is highly annotated, with 
85 breakpoints defining 53 deletion intervals. 175 STSs 
(20 of which are highly polymorphic), and 19 genes. 

Cosmid binning. The YAC-based physical is being used in 
a systematic effort to identify and sort cosmids prepared at 
LLNL from flow sorted X chromosomes into intervals. 
Gene identification through use of a common database for 
cDNA pool hybridization data is continuing. Over 50 
YACs have been utilized as probes to the gridded cosmic 
arrays. These have identified over 9000 cosmids from the 
24,000 member library. An additional 4000 cosmids have 
been identified using a variety of probes, with the bulk 
coming from cDNA pool probes. More recent emphasis 
has been placed on BAC clones as their identity for 
sequencing has been established. These have been identi- 
fied using the usual methods. 

Cosmid contig construction. Creation of long-range conti- 
nuity in cosmids and BACs proceeds from clones identi- 
fied by the YAC-based binning experiments. Identification 
of STS carrying clones is carried out by a combined PCR/ 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




hybridization protocol, and adds to the specificity of the 
overiap data. Cosmids are grown and DNA is prepared by 
an Autogen robot. DNAs are digested and analyzed by the 
AB362 GeneScanner for collection of fingerprint data. The 
use of novel fluorescent dyes (BODIPY) in this applica- 
tion has increased signal strength markedly. End fragment 
detection is currently carried out with traditional Southern 
hybridization, however additional dyes will permit detec- 
tion without hybridization in the GeneScanner protocol. 
Data are transferred to a Sybase database and analyzed 
with ODS (J. Arnold, U. Georgia) software for overlap. 
ODS output is ported to GRAM (LANL) for map con- 
struction. A fully automated approach has yet to be 
achieved, but this goal is increasingly in reach. 

Sequencing. An independently funded project awarded to 
RAG seeks to develop long-range genomic sequence for 
~2 Mb of the human X chromosome. In support of this 
project, cosmids have been constructed and isolated for the 
1 .6 Mb region between FRAXA and FRAXF in 
Xq27.3-Xq28. To date, the complete sequences of the re- 
gions surrounding the FMRl and IDS genes have been 
determined (180 and 130 kb, respectively), along with an 
additional -700 kb of the interval. This sequence has led to 
identification of the gene involved in FRAXE mental retar- 
dation. Additional sequence in Xq28 has been determined, 
including that of a cosmid containing the two genes, 
DXS1357E and a creatine transporter. This sequence has 
been duplicated to chromosome 16pl 1 in recent evolution- 
ary history. Comparative sequence analysis reveals 94% 
sequence identity over 25 kb, and the presence of 
pentameric repeats which are likely to have mediated the 
duplication event. A number of technical advances in 
sequencing have been developed, including the use of 
BODIPY dyes in AB373 sequencing protocols, which has 
offered enhanced base calling due to reduced mobility 
shifting, improved single strand template protocols for 
much reduced cost, and streamlined informatics processes 
for assembly and annotation. 

IX)E Grant Nos. DE-FG05-92ER61401 and 
DE-FG03-94ER61830 and NIH Grant No. 5P30 

♦Sequence-Specific Proteins Binding to 
the Repetitive Sequences of High 
Eukaryotic Genome 

Olga Podgomaya, Ivan Lobov, Ivan Matveev, Dmitry 
Lukjaoov, Natella Enukashvily, and Elena Bugaeva 
Institute of Cytology; Russian Academy of Sciences; St. 
Petersburg 194064, Russia 
Telephone and Fax: -h7-8 1 2/520-9703 
podg@ivm.stud.pu. ru 

Repetitive sequences occupy the most part of the whole 
eukaryotic genome but up to the last few years there has 
not been much interest in their role. The situation changed 
when alpha-satellites in human and minor satellites in 
mouse became candidates for centromere function respon- 
sibility. A number of centromere-specific proteins are 
under investigation but none seems to distinguish centro- 
meric functions of exact sequences among long arrays of 
tandemly repeated satellites. The proteins associated with 
that array are poorly known. We are trying to find out what 
proteins are involved in maintaining the heterochromatin 
structure of different types of repetitive sequences. 

The major proportion of total genomic satellite DNA re- 
mains attached to the nuclear matrix (NM) after DNasel 
and high salt treatment. We followed this association in 
various steps during NM preparation by in situ hybridiza- 
tion with the mouse satellite probe. Two mouse species 
were used -M. musculus and M. spretus. Both contain the 
same repertoire of satellite DNAs but in different amounts. 
In M. musculus the centromeric heterochromatin contains 
major satellite (MA) as the principal component. In M. 
spretus the minor satellite (MI) is predominant. To test 
DNA-binding activity of the proteins after chromatogra- 
phy of the soluble NM proteins on cationic and anionic 
ion-exchange columns, gel shift assays were performed 
with cloned dimer of MA and a trimer of MI. To produce 
antibodies, the DNA-protein complexes obtained from 
large-scale gel-shift assays were isolated and injected into 
a guinea pig. 

The gel shift assay with column fractions from M. muscu- 
lus NM and MA shows a ladder of complexes. The com- 
plexes could be competed out with an excess of MA DNA 
but not with the same amount of E. coli DNA. Antibodies 
from the immune serum caused a hypershift of the MA/ 
NM protein complexes. Preimmune serum at the same di- 
lution did not alter the mobility of the complexes. A com- 
bination of western and Southern blots allows us to con- 
clude that a protein with a molecular weight of about 80 
kD and some similarity to the intermediate filaments i.s 
responsible for the MA/NM interaction. 

Specific DNA-binding activity to the MI has been tested 
after column fractionation of the M. spretus NM extract. A 
ladder of complexes can be competed out with an excess 
of unlabeled MI but not E. coli or MA DNA. MI contains 
the CENPB-box sequence, which is the binding site for the 
protein CENPB, one of the centromeric proteins. Fractions 
ft'om the NM extract with Ml-specific binding activity do 
not contain CENPB, as shown by western blotting with 
anti-CENPB antibodies. 

The same kind of work is going on with human analogs of 
MA and MI sequences, using large clones of satellite and 
alpha-satellite DNA and nuclear matrices. 


DOE Human Oenome Program Raport, Part 2, 1996 Research Abstracts 



There are few satellite DNA-binding proteins isolated, 
none of them directly ftom the NM. Our long-term aim is 
to understand the role of these proteins in heterochromatin 
formation and in heterochromatin association with NM. 

Extracts firom hand-isolated nuclear envelopes from frog 
oocytes were tested for the specific DNA-binding activity 
to (T2G4)116. A fragment of Tetrahymena telomere from a 
YAC plasmid was used as a labelled probe in a gel-shift 
assay. The DNA-protein complexes from the assay were 
cut out and injected into a guinea pig. The antibodies (AB) 
obtained stained one protein with an m.w. of about 70 kD 
in the nuclear envelope of the oocyte, nothing in the inner 
part of the oocyte, and 70 kD and 120 kD in the fixjg liver 
nuclei. The immunofluorescent AB stained fme patches on 
the oocyte nuclear envelope and a number of intranuclei 
spots in the frog blood cells. 

The electron-microscope immuno-gold technique showed 
that the protein is localized in the outer surface of the oo- 
cyte nuclear envelope in cup-like structures. DNA-binding 
activity to the same sequence has been tested and found in 
the mouse nuclear maU-ix extracts. The activity could be 
eluted from the DEAE52 ion exchange column in 0.15 
NaCl. The activity could be competed out with the frag- 
ment itself but not with E. coli DNA in the same amounts. 
AB stained a 70-kD protein in active fractions after ion 
exchange chromatography. In nuclear matrix preparations, 
the AB recognized a 120-kD protein as well. The AB 
caused hypershifi of the complexes on the gel shift assay. 
The AB has some affmity to the keratins. In the mouse cell 
culture 3T3 line the staining is intranuclei, widi fme dots 
forming chains surrounding dark areas, which do not cor- 
respond to the nucleoli. 

Similar results were observed when a mouse cell line was 
transformed with head-and tail-less human keratin con- 
structs (Bader et a].. 1991 , 7 Cell BinI 115: 1293). These 
results suggest that the nuclear proteins detected with the 
AB may be natural analogs of this artificial keratin con- 
struct. The pattern of staining did not resemble the picture 
of telomere-specific staining. Possibly the protein recog- 
nized intragenomic (T2G4)2 sequence, which is present in 
25% of murine GenBank sequences rather than telomere. 
We are going to do immunocytochemical investigations of 
frog and mouse development in order to determine the 
point when transcription of the 120- kD protein is initiated 
and the staining becomes intranuclear. 

As a continuation of the previous project the multiple 
alignment of all the A/u sequences from GenBank is going 
on. We are also trying to obtain antibodies to the main 
i4/u-binding proteins to find out how many proteins could 
be bound to Alu sequence. 

DOE Grant No. OR00033-93C1S014. 

*Protein-Binding DNA Sequences 

O.L. Polanovsky, A.G. Stepchenko, and N.N. Luchina 
Engelhardt Institute of Molecular Biology; Russian 
Academy of Sciences; Moscow 117984, Russia 
Fax: ■I-7-095/135-1405, 

POU domain of Oct-2 transcription factor binds octamer 
sequence ATGCAAAT and a number of degenerated se- 
quences. It has been shown that POUs and POUh domains 
recognize left and right parts of the oct-sequence, respec- 
tively. The recognized sequences are partly overlapped in 
the native octamer. In the degenerated recognition sites 
these core sequences may be separated with a spacer up to 
four nucleotides. The obtained data changed our view on 
the number and structure of potential targets recognized on 
DNA by POU proteins. 

Protein-DNA binding is realized due to interaction of a 
conservative amino acid residues with a DNA target. In 
POU proteins amino acid residues in positions 47 (Val). 50 
(Cys) and 51 (Asn) of POUh domain are absolutely con- 
servative. In order to examine a possible role of Val47 we 
substituted this residue by each of the 19 other amino acid 
residues and the interaction of the mutant proteins was in- 
vestigated with homeospecific site and its variants 
(ATAANNN) and with oct sequence. It was shown that 
Ile47 mutant retains the affinity and specificity. Val re- 
placement for Ser, Thr or His partially reduce the affinity. 

Asi>47 mutant sharply relax the specificity of protein-DNA 
recognition. Mutants at 47 position have much stronger 
effects on binding to homeospecific sites than to octamer 
motifs. Our data indicate that there is not a simple 
mono-letter code of protein/DNA recognition. It has been 
shown that this recognition is determined not only by the 
nature of the radicals involved in the contact but also by 
the structure of DNA binding domain as a whole and prob- 
ably by cooperative interaction of POUs and POUh domains. 

Proposals for 1997. The role of Cys50 in POU domain/ 
DNA recognition will be investigated. This residue is ab- 
solutely conservative in POU proteins but it is variable in 
relative homeo-proteins. Our preliminary data allow to 
suppose that residue at position 50 of POU homeodomain 
have a key role in discrimination between TAAT-like and 
octamer sequences. The role of the nuleotides flanking 
DNA target will be investigated. 

DOE Grant No. OR00033-93CIS005. 

Relevant Publications 

1. S(q)chciiko A.G. ( 1 994) Noncanonica] ocl-sequeoces are tarsels for 

mouK Ocl-2B transcTipoofi faclor. FEBS Leltcpi, V.337. P.I75-I78. 

2. SlqKhcnka A.G.. Polanovsky 01.. (1996) Inlcraction of Ocl proteins 

with DNA. Molecular Biology. V.30, P.296-302 

3. Slcpchenko A.G.. Luchina N.N.. Polanovdcy OJ.. Tlie role of 

conservative VaJ47 for POU bomeodomain/DNA recognition. FTBS 
Letters, in press. 

DOE Human Genom* Program Report, Part 2, 1996 Research Abrtrai:ts 




*Development of Intracellular Flow 
Karyotype Analysis 

V.V. Zenin.' N.D. Aksenov,' A.N. Shatrova,' N.V. Klopov,^ 

L.S. Cram,' and A.I. PoleUev 

Engelhardt Institute of Molecular Biology; Russian 

Academy of Sciences; Moscow 1 17984, Russia 

Poletaev: +7-095/135-9824, Fax: -1405 

polet@polet. msk. su 

'Institute of Cytology; Russian Academy of Sciences; 

St. Petersburg, Russia 

^St. Petersburg Institute of Nuclear Physics; Gatchina, Russia 

'Los Alamos National Laboratory; Los Alamos. NM 87545 

Instrumentation for univariate fluorescent flow analysis of 
chromosome sets has been developed for human cells. A 
new method of cell preparation and intracellular staining 
of chromosome with different dyes was developed and 
improved. Cells suspension for flow analysis must satisfy 
the following requirements; minimal amount of free chro- 
mosomes and debris (dead cells, cell fragments etc.); chro- 
mosomes structure must be stabilized inside mitotic cells; 
chromosomes must be stained inside the cells up to satura- 
tion with the used dyes; chromosomes must be able to re- 
lease from cells with minimal possible mechanical treat- 
ment The method includes enzyme treatment (chymot- 
rypsin), incubation with saponin and separation of 
prestained cells from debris on sucrose gradient. The de- 
veloped protocol was tested and improved in the course of 
several months of work and allows us to obtain a well 
stained sample with a minimal amoimt of contaminates [2]. 

A special magnetic mixing/stirring device was constructed 
to perform cell membrane breaking. It was placed inside 
the flow chamber of a serial flow cytometer ATC-3000 
equipped with additional electronic card for time-gated 
data acquisition [1]. The rupturing of prestained mitotic 
cells is performed by means of a small magnetic rod vi- 
brating in an alternative magnetic field. The efficiency of 
mitotic cells breaking with electromagnetic cell breaking 
device was tested using different human cell lines[2,3]. 

The device works in a stepwise mode: a defined volume of 
sample is delivered to the breaking chamber for rupturing 
mitotic cell (cells) for a defined time period, followed by 
buffer wash to move the released chromosomes from the 
breaking chamber to the point of the analysis. The infor- 
mation about the chromosomes appearing at the point of 
analysis is accumulated in list mode files, making it pos- 
sible to resolve chromosome sets arising from single cells 
on the basis of time gating. The concentration of cells in 
the sample must be kept low to ensure that only one cell at 
a time enters the breaking device. 

The developed software classifies chromosome sets ac- 
cording to different criteria: total number of chromosomes, 
overall DNA content in the set, and the number of chromo- 

somes of certain type [2,3]. In addition it's possible to de- 
termine the presence of extra chromosomes or loss of 
chromosome types. Thus this approach combines the high 
performance of flow cytometry (quantitation and high 
throughput) with the advantages of image analysis (cell to 
cell karyotype analysis and skills of trained cytogeneti- 
cist). The data analysis capabilities offer extensive flexibil- 
ity in determining important features of the karyotypes 
under study. This development offers the potential to du- 
plicate most of what is determined by clinical cytogeneti- 
cists. The results now obtained are in good accordance 
with goals of the project formulated before [4]. 

DOE Grant No. OR00033-93CIS008. 


( 1 ]. V.V. Zenin. N.D. Aksenov, A.N. Shatrova. Y.V. Kravalsky. A. 

Kuznclzova, L.S. Cram, A.L Poletaev. "Time-galcd human chromo- 
some flow analysis" XVII Congress of the International Society for 
Analytical Cytology. 1994, Lake Placid. USA, Cytometry Supplement 
7, p. 58. 

121- VV. Zenin. N.D. Aksenov, A.N. Shatrova, Y.V. Kravalsky. A. 

Kuznetsova, L.S. Cram . A.L Poletaev; 'Time-gated flow analysis of 
human chromosomes"; DOE Human Genome Program, 
Contractor-Grantee Workshop FV. November 1 3- 17. 1 994; Santa Fe, 
New Mexico, p. 13. 

(3). V.V. Zenin, N.D. Aksenov. A J>J- Shatrova. N.V Klopov , L.S. Gam, 
A.L Poletaev: "Cell by cell flow analysis of human chromosome 
sets"; DOE Human Genome Program. Contractor-Grantee Workshop 
V. January 2 S-Fcbniary 1 .1996; Santa Fe, New Mexico, p, 112. 

(4]. Andrei I. Poletaev, Sergei L Stepanov, Valeri V. Zenin. Nikolay 

Aksenov. Tatijana V. Nasedktna and Yuri V. Kravazky: "Development 
of Intracellular FlowKaryotype Analysis"; DOE Human Genome, 
1993 Program Report, p.34-35. 

Mapping and Sequencing with BACs 
and Fosmids 

Ung-Jin Kim, Hiroaki Shizuya, and Melvin I. Simon 

Division of Biology; California Institute of Technology; 

Pasadena, CA 91 1 25 

Kim: 818/395^901, Fax: /796-7066, 

Simon: 818/395-3944, Fax /7%-7066 

simonm @ starbasel 

BACs and fosmids are stable, nonchimeric, and highly 
representative cloning systems. BACs maintain 
large-fragment genomic inserts (100 to 3(X) kb) that are 
easily prepared for most types of experiments, including 
DNA sequencing. 

We have improved the methods for generating BACs and 
developed extensive BAC libraries. We have constructed 
human BAC libraries with more than 175,000 clones from 
male fibroblast and sperm, and a mouse BAC library with 
more than 200,000 clones. We are currently expanding hu- 
man library with the aim of achieving total SOX coverage 
human genomic library using sperm samples from anony- 
mous donors. 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


The BAC libraries provide resources to bridge the gap be- 
tween genetic -cytogenetic information and detailed physi- 
cal characteristics of genomic regions that include DNA 
sequence information. They also provide reliable tools for 
generating a high-resolution, integrated map on which a 
variety of information and resources are correlated. Using 
primarily the human BAC library constructed from fibro- 
blasts, we have assembled a physical contig map of chro- 
mosome 22 [1]. First, the entire library was screened by 
most of the known chromosome 22-specific markers that 
include cDNA, anonymous STS markers, FISH-mapped 
cosmids and fosmids, YAC-Alu PCR products, 
FISH-mapped BACs, and flow-sorted chromosome 22 
DNA. The positive clones have been assembled into 
contigs by means of the STS-contents or other markers 
assigned to BAC clones. Most of the contigs were con- 
firmed by using a restriction fingerprinting scheme origi- 
nally developed by Sulston and Coulson, and modified in 
our laboratory. Currently, the contigs cover over 80% of 
the chromosome arm. Various physical or genetic land- 
marks on this chromosome can now be precisely localized 
simply by assigning them to BACs or contigs on the map. 
Using BAC end sequence information from each of the 
chromosome 22-specific BACs, it is now possible to close 
the gaps efficiently by screening deeper BAC libraries 
with new probes specific to the ends of contigs. 

The resulting BAC contig map is now serving as a road 
map for sequencing the chromosome. Chromosome 
22-specific BAC clones have been distributed to our col- 
laborators including The Sanger Center and Dr. Bruce Roe 
in University of Oklahoma, and many of the clones have 
already been sequenced. BAC end sequencing scheme[2] 
will play a crucial role toward the complete sequencing of 
chromosome 22, and we are currently sequencing the ends 
of these BACs directly using the miniprepped BAC DNA 
as templates. 

DOE Grant No. DE-FG03-89ER6089I. 


tn Kimelal (19%) A BaclcnaJ Artificial Chroniosonic-ba.sed 

framework contig map of human chromosome 22q. Proc. Natl. Acad. 
Sci. USA v93 (13): pp6297-6301 

[2] Venter. C, Smith. H.O.. and Hood, L. (1996) Nadiic 381: pp364-366. 

Toward.s a Globally Integrated, 
Sequence-Ready BAC Map of the 
Human Genome 

Ung-Jin Kim, Hiroaki Shizuya. and Melvin I. Simon 

Division of Biology; California Instimte of Technology; 

Pasadena, CA 91 125 

Kim; 818/395-4901, Fax: /796-7066, 

Simon: 818/395-3944, Fax: /796-7066 

http://www. tree, 


BAC clones are ideal for genome analysis since they are 
non-chimeric, stably maintain large fragment genomic in- 
serts (100-300 kb)[l], and it is easy to prepare BAC DNA 
samples for most types of experiments including DNA se- 
quencing[2]. We have improved BAC cloning technique in 
the past years and constructed >20X human BAC libraries. 
As BACs are proving to be the most efficient reagents for 
large scale genomic sequencing, we intend to increase the 
depth of the library to SOX genomic equivalence. Using 
the ESTs. especially the Unigenes that have been chromo- 
somally assigned by other means such as Radiation Hybrid 
mapping and YAC-based STS content mapping, we plan to 
organize the BAC library into a mapped resource. The re- 
sulting BAC-EST framework map will provide a high 
resolution EST (or gene) map and instant entry points for 
gene finding and large scale genomic sequencing. We also 
intend to determine the end sequences of the BAC inserts 
firom a significant number of the clones (at least 350,000 
clones or 15X genomic equivalence) within two years [3]. 
All the BAC-EST mapping data and BAC end sequences 
will be made available via public databases and WEB 
servers. The mapping data and end sequence information 
will dramatically facilitate the process of finding clones 
that extend the sequenced regions with minimal overlaps. 
Thus, the tagged BAC libraries will serve as a reliable and 
facile sequence-ready resource and an organizing tool to 
support and coordinate simultaneously multiple sequenc- 
ing projects all over the genome. 

DOE Grant No. DE-FC03-96ER62242. 


[ 1 1 Shizuya. H . Birrcn. B.. Kim, U.-J.. Mancino. V.. Slepak. T. Tachiiri. 

Y. and Simon. M.L (1992) Pioc. Natl. Acad. Sci. USA 89. 

12) Kim, U.-J.. Birren, B.W.. Yu-Ling Sheng. Tatiana Slcpalc. Valena 

Mancino, Ocilie Boysen. Hyung-Lyun Kang, Melvin I. Simon, and 

Hiroaki Shizuya (1996) Genomics 34. 213-218 
|3| Venter, C, Smith, UO., and Hood. L. (19%) Nature 381: pp364-366. 

Generation of Normalized and 
Subtracted cDNA Libraries to 
Facilitate Grene Discovery 

Marcelo Bento Scares, Maria de Fatima Bonaldo, Pierre 

Jelenc, and Susan Baumes 

Department of Psychiatry; Columbia University; and The 

New York State Psychiatric Instimte; New York, NY 


212/960-2313, Fax: /78I-3577, 

Large- scale single-pass sequencing of cDNA clones ran- 
domly picked from libraries has proven quite powerful to 
identify genes and the use of normalized libraries in which 
the frequency of all cDNAs is within a narrow range has 
been shown to expedite the process by minimizing the re- 
dundant identification of die most prevalent mRNAs. In an 

DOE Human Genome Program Report, Part 2, 1996 Research AbstracU 




attempt to contribute to the ongoing gene discovery ef- 
forts, we have further optimized our original procedure for 
construction of normalized directionally cloned cDNA li- 
braries! 1] and we have successfully applied it to generate a 
number of human cDNA libraries from a variety of adult 
and fetal tissues [2]. To date we have constructed libraries 
from infant brain, fetal brain, adult brain, fetal 
liver-spleen, full-term and 8-9 week placentae, adult 
breast, retina, ovary tumor, melanocytes, parathyroid tu- 
mor, senescent fibroblasts, pineal glands, multiple sclero- 
sis plaques, testis, B cells, fetal heart, fetal lung, 8-9 week 
fetuses and pregnant uterus. Several additional libraries are 
currently in preparation. All libraries have been contrib- 
uted to the IMAGE consortium, and they are being widely 
used for sequencing and mapping. 

However, given the large scale nature of the ongoing se- 
quencing efforts and the fact that a significant fraction of 
the human genes has been identified already, the discovery 
of novel cDNAs is becoming increasingly more challeng- 
ing. In an effort to expedite this process further, in collabo- 
ration with Greg Lennon (LLNL) we have developed and 
applied subtractive hybridization strategies to eliminate 
pools of sequenced cDNAs from libraries yet to be sur- 
veyed. Briefly, single-stranded DNA obtained firom pools 
of arrayed and sequence I.M.A.G.E. clones are used as 
templates for PCR amplification of cDNA inserts with 
flanking T7 and T3 primers. PCR amplification products 
are then used as drivers in hybridizations with normalized 
libraries in the form of single-stranded circles. The remain- 
ing single-stranded circles (subtracted library) are purified 
by hydroxyapatite chromatography, converted to 
double -stranded circles and electroporated into bacteria. 
Preliminary characterization of a subtracted fetal 
liver-spleen library indicates that the procedure is effective 
to enhance the representation of novel cDNAs. 

In an effort to enhance the representation of full-length 
cDNAs in our libraries, as we strive towards our final ob- 
jective of generating full-length normalized cDNA librar- 
ies, we have adapted our normalization protocol to take 
advantage of the fact that it is now possible to produce 
single-stranded circles in vitro by sequentially digesting 
supercoiled plasmids with Gene II protein and Exonu- 
clease III (Life Technologies). This has proven significant 
because it circumvents the biases introduced by differen- 
tial growth of clones containing small and large cDNA in- 
serts when single-strands are produced in vivo upon super- 
infection with a helper phage. 

DOE Grant No. DE-FG02-91ER61233. 

{1 ] Soarcs. M.B . Boiuldo, M.F., Su, L., Lawlon, L. & Efstratiadis. A. 

(1994). Con-struclion and characterization of a normalized cDNA 

library. Proc. Nail. Acad. Sci. USA 91(20), 9228-9232. 
[21 Bonaldo. M.F.. Lennon, G and Soares, MB (1996) Normalization 

and subtraction: Two approaches to Militate gene di.scovery. Genome 

Research 6, 791-806. 

Mapping in Man-Mouse Homology 

Lisa Stubbs, Johannah Doyle, Ethan Carver, 

Mark Shannon, Joomyeong Kim, Linda Ashworth, ' and 

Elbert Branscomb' 

Biology Division; Oak Ridge National Laboratory; Oak 

Ridge. TN 37831 

423/574-0854, Fax: -1283, or 


'Human Genome Center; Lawrence Livermore National 

Laboratory; Livermore, CA 94550 

Numerous studies have confirmed the notion that mouse 
and human chromosomes resemble each other closely 
within blocks of syntenic homology that vary widely in 
size, containing from just a few to several hundred related 
genes. Within the best-mapped of these homologous re- 
gions, the presence and location of specific genes can be 
acciuately predicted in one species, based upon the map- 
ping results obtained in the other. In addition, information 
regarding gene function derived from the analysis of hu- 
man hereditary traits or mapped murine mutations, can 
also be extrapolated from one species to another. However, 
syntenic relationships are still not established for many 
hiunan regions, and local rearrangements including appar- 
ent deletions, inversions, insertions, and transposition 
events, complicate most of the syntenically homologous 
regions that appear simple on the gross genetic level. Be- 
cause of these complications, the power of prediction af- 
forded in any homology region increases tremendously 
with the level of resolution and degree of internal consis- 
tency associated with a particular set of comparative map- 
ping data. Our groups have been interested in further de- 
fining the borders of syntenic linkage groups in human and 
mouse, upon elucidating mechanisms behind evolutionary 
rearrangements that distinguish chromosomes of mamma- 
lian species, and upon devising means of exploiting the 
relationships between the two genomes for the discovery 
and analysis of new genes and other functional units in 
mouse and man. 

One of the larger contiguous blocks of mouse-human ge- 
nomic homology includes the proximal portion of mouse 
chromosome 7 (Mmu7). Detailed analysis of ihis large re- 
gion of mouse-human homology have served as the initial 
focus of these collaborative studies. Our results have 
shown that gene content, order and spacing are remarkably 
well-conserved throughout the length of this approxi- 
mately 23 cM/29 Mb region of mouse-human homology, 
except for six internal rearrangements of gene sequence in 
mouse relative to man. One of these differences involve a 
small segment of H19ql3.4 genes whose murine counter- 
parts have been transposed out of the large Mmu7/H 19q 
conserved synteny region into a separate linkage group 
located on mouse chromosome 17. The six internal rear- 
rangements, including two transpositions and four local 


DOE Huinan Genome Program Report, Part 2, 1996 Research Abstracts 



inversions, are clustered together at two sites; our data 
suggest that the rearrangements occurred in a coincident 
fashion, or were commonly associated with unstable DNA 
sequences at those sites. Inlerestinglv. both rearranged re 
gions are occupied by large tandemly clustered gene fami- 
lies, suggesting that these locally repeated sequences may 
have contributed to their evolutionary instability The 
structure and conserved functions of genes within these 
and other clustered gene families located on H 1^ also re'p- 
resent an active line of interest to our gre>up More re- 
cently, we have extended mapping studies to include clus- 
tered gene families located in other chremosomal regions, 
and are- working to define the borders of mouse-human 
syntenic .segments on a bre>ader, genome-wide scale. 

DOE Contract No. DE .\C0? P60R224W and Contract 
No. W-740.'>-ENG-48 with LawTCnce Livemiore National 

Positional Cloning of Murine G«nes 

Lisa Stubbs, Cymbeline Culiat. Ethan Carver. Johannah 

Doyle. Laura Chittenden. Mitchell Walkowicz, Nestor 

Cacheirv. tireg Lennon.' Gary Wright,- Joe Rutledge.' 

Robert Nicholls.' and Walderico 

Biology Division; Oak Ridge National Laboratory ; Oak 


423/574-0854. Fav: -12SX or 

stuht'sljt^ omLgov 

'Human Genome Center, Lawrence Liverraore National 

Laboratory ; Livermore, C.A ''4550 

•University of Texas Southwestern Medical Center at 

Dalhis; DaUas. TX 75:.V'i 

X3>ildre-n"s Ho-spital and Medical Center, I'niversity of 

Washington School of Medicine; Seattle. WA9SI05 

''Depaitinent of Genetics; Case Western Reserve LTniver- 

sity; Cleveland. Ohio 

Ch(v>mo.some rearrangements, notably deletions and trans- 
locatioas. have pre>ved invaluable as tools in the mapping 
and molecular cloning of a acquired and inherited human 
diseases. balanced tran.slocations are cytologically 
visible, and generally produce profound disturbances in 
both gene expression and DN.A structure- w ithout necessar- 
ily disturbing the structure of multiple genes, this type of 
mutation prv>vides an especially valuable lag" that greatly 
simplifies nupping. cloning, and assessment of candidate 
genes associated with a disease. .Although balanced trans- 
locatioi\s are relatively rare in human populations, they are 
readily induced in the nxiuse. Using varioiLs mutagenesis 
protocols, we have generated numerous translcxration-bear- 
ing mutant UKiiLse strains that display an mipres.sive vari- 
ety of health-re" lated anomalies, including obesity, polycys- 
tic kidneys, gastrointestinal disorders, limb and skeletal 
defomuties. neural tube defixts, ataxias, tremors, heredi- 
tary deafness and blindncy;. reprexhictive dysfunctioo. and 
complex behav loral defects. The ability to map die genes 

associated with translocation breakpoints cytogenelically, 
first crudeh through straightforward banding techniques 
and then to a higher level of resolution using tluore.scence 
in situ hybridization methods, allows us to avoid the costly 
and time-consuming caxsses that are required for the map- 
ping of masi mutant genes With this rapidh -obtained, 
crude-level mapping infoniiation available, we can re*adily 
assess po.ssible relationships between newly ansing mutant 
phenotypes and linked candidate genes or related diseases 
that map to homologous regions of the human genome. 
Using this appre>ach. we have recently begun to define the 
map positions of several mutations. Mapping results have 
led us to the identificabon of candidate genes for tw o mu- 
tations: one associated with congenital and pre- 
disposition to severe gastric ulcers, and another associated 
with late-onset obesity. So far. we have characterized only 
a fraction of the strains that comprise this valuable, 
recently-generated mutant collection in derail. .As a inte- 
gral part of this prv^gram. we are actively exploring new 
strategies and integrating information, technology and re- 
sources derived from the Human GcnoiTR- rc.seareh etYort, 
that prvmiise to incre'ase the efficiencv of breakpoint map- 
ping and cloning dramatically The mutations are scattere-d 
w idely threiughout the mouse genoiiK corresponding to a 
broad selection of human homology regions. As new 
bre'akpoints are' mapped, and large numbers of new ly-se- 
quenced cDN.A clones are" assigned to the mouse and hu- 
man maps, the potential for rapid association between 
cloned gene and mapped mutation will incre"ase dramati- 
callv . This large collection of murine translocation mutants 
therefore re-presents a powerful ttsoiuve for linking 
mapped cDN.A clones to health-related phenotypes 
throughout the genonw. 

In addition to the analysis of translocatioa mutants, we 
have also characterized other types of mouse mutations, 
including: ( I ) mnfnng and leaner, allelic mutations asso- 
ciated with ataxia and epilepsy in mice, and representing 
murine nwdels for human diseases, familial hemiplaegic 
migraine and episodic ataxia, respectively: and (2)jdf2. a 
locus associated with mutations causing runting. neui\>- 
muscular tre-mors and male sterility which is located in a 
mouse re"gion related to the Prader Willi -Angleman syn- 
di\ime gene interval of hiuuan 1 5q 1 1 -q 1 3. Both .sets of 
mutations affect large, complex, and highly conserved 
genes, and preivide important animal models for the explo- 
ration of the div erse roles their human counterparts may 
play in human disease. In concert with these geiK" cloning 
studies, we have been involved in exploring new means of 
exploiting mouse-human genomic conservation in the iso- 
lation of functionally-significant sequences from large 
cloned regions of human DN.A. The methods we ha\ e de- 
veloped hold great ptvimise as an efficient tool for gene 
discovery in cloned genomic regions. 

DOE Contract No. DE-AC05-%ORZ2464. 

DOE Human Qsnoma Progrwn Raport Part 2. 199C Raaaarch Abstracts 




Human Artificial Episomal 
Chromosomes (HAECS) for Building 
Large Genomic Libraries 

Min Wang, Panayotis A. loannou,' Michael Grosz, Subrala 
Banerjee, Evy Bashiardes.' Michelle Rider, Tian-Qiang 
Sun,' and Jean-Michel H. Vos' 

Lineberger Comprehensive Cancer Center and 'Depart- 
ment of Biochemistry and Biophysics; University of North 
Carolina; Chapel Hill, NC 27599 
Vos: 919/966-3036, Fax: -3015. 
'The Cyprus Institute of Neurology and Genetics; Nicosia, 

Of some 100,000 human genes, only a few thousand have 
been cloned, mapped or sequenced so far. Much less is 
known about other chromosomal regions such as those 
involved in DNA replication, chromatin packaging, and 
chromosome segregation. Construction of detailed physi- 
cal maps is only the first step in localizing, identifying and 
determining the function of genetic units in human cells. 
Studying human gene function and regulation of other 
critical genomic regions that span hundreds of 
pairs of DNA requires the ability to clone an entire func- 
tional unit as a single DNA fragment and transfer it stably 
into human cells. 

We have developed a human artificial episomal chromo- 
some (HAEC) system based on latent replication origin of 
the large herpes Epstein-Barr virus (EBV) for the propaga- 
tion and stable maintenance of DNA as circular 
minichromosomes in human cells. [1, 2] Individual HAECS 
carried human genomic inserts ranging from 60 to 330 kb 
and appeared genetically stable. An HAEC library of 1500 
independent clones carrying random human genomic frag- 
ments with average sizes of 150 to 200 kb was established 
and allowed recovery of the HAEC DNA. This autologous 
HAEC system with human DNA segments directly cloned 
in human cells provides an important tool for functional 
study of large mammalian DNA regions and gene 

Current efforts are focused on (a) shuttling large BAG/ 
PAC genomic inserts in human and rodent cells and (b) 
packaging BAC/PAC/HAEC clones as large infectious 
Herpes Viruses for shuttling genomic inserts between 
mammalian cells and (c) constructing bacterial-ba.sed hu- 
man and rodent HAEC libraries, (a) We have designed a 
"pop-in" vector, which can be inserted into current 
BAC-or PAC-based clone via site-specific integration. 
This "CRE-LOXP"-mediated system has been used to es- 
tablish BAC/PAC up to 250 kb in size in human cells as 
HAECS (b) We have obtained packaging of 160-180 kb 
exogenous DNA into infectious virions using the human 
lymphotropic Epstein-Barr virus. After delivery into hu- 
man beta-lymphoblasts cells the HAEC DNA was stably 

established as 160- 1 80 kb functional autonomously repli- 
cating cpisomes.|5,7) Wc have also generated a hybrid 
BAC/HAEC vector, which can shuttle large DNA in.serts, 
i.e., at least up to 260 kb, between bacteria and human 
cells. Such a system is being u.sed to develop large insert 
libraries, whose clones can be directly transferred into hu- 
man or rodent cells for functional analysis. These 
HAEC -derived systems will provide useful molecular 
tools to study large genetic units in humans and rodents, 
and complement the functional interpretation of current 
sequencing efforts. 

DOE Contract No. DE-FG05-9IER6I 135. 

1 1 1 Sun. T -Q . Fcnslcmachcr. D & Vos. T-M H Human anificial 

episomal chromosomes for cloning large DNA in human cells Nature 

GcnclH, 33-41 (1994), 
(2| Sun, T-0 & Vos, J.-M H Enginccnng of 100-300 kb of DNA as 

persisting exirachfomosomal elements in human cells using the 

HAEC system in Methods molcc. Genet, (cd Adolph. K W.) 

(Academic Press, San Diego. CA, 199.^1 
|3I Vos. J.-M,H, Herpes viruses as Genetic Vectors in Viruses in Human 

Gene Therapy (ed Vos. J -M H ) 1 09- 1 40 (Carolina Academic Press 

&. Chapman & Hall. Durham N C . USA & London. UK. 1995) 
(4) Kelleher. 7. A Vos. ) M Long-Term Episoinal Gene Delivery in 

Human Lymphoid Cells using Human and Avian Adcnoviral-assisted 

Transfeoion. Biolcchniques 17. 1110-1117(1994) 
(51 Banerjee, S.. Livanos. E, & Vos. J.-M.H. Therapeutic Gene Delivery 

in Human beta-lymphocytes with Engineered Epxtein-Bair Virus, 

Nature Medicine 1. 1303-1308(1995). 
|61 Sun. T.-Q,. Livanos. E.. & Vos. J,-M.H Engineering a mini- 

herpcsvirus as a general strategy to transduce up to 1 80 kb of 

functional self-replicating human mini-chromosomes. Gene Therapy 

3. 1081-1088(1996) 
|7| Wang. S. & Vos. J -M.H. An HSV/EBV based vector for High 

EfTicicnl Gene Transfer to Human Cells in vitro/in vivo. J. ViroL 70. 


*Cosmid and cDNA Map of a Human 
Chromosome 13ql4 Region Frequently 
Lost at B Cell Chronic Lymphocytic 

N.K. Yankovsky, B.I. Kapanadze, A.B. Semov, 

A.V. Baranova. and G.E. Sulimova 

N.I. Vavilov Institute of General Genetics; Moscow 

117809, Russia 

■H7-095/I35-5363, Fax: -\2%9. and (send to both addresses) 

Wc arc mapping a human chromosome 13ql4 region fre- 
quently lost at human blood malignancy cold B cell 
chronic lymphiKytic leukemia (BCLL), The final goal of 
the project is to find putative oncosuprcssor gene lost in 
the region at BCLL. We have constructed a cosmid contig 
between DBS 1 168 and D13S25 loci in the region. The 
interval had been shown to be in the center of the BCLL 
associated deletions. The contig consists of more than 100 
cosmids from LANL human chromosome 13 specific 


DOE Hunoan Genome Program Report, Pari 2, 1996 Reiearch Abatracts 



library (LAI3NC0I). We estimated the distance between probes for screening new cDNA clones. I.M.A.G.E. Con- 

D13S1168 and D13S25 loci as about 540 kb. We are con- sortium (IXNL) cDNA clones assigned to 13ql4 will be 

structing a transcriptional map of the region. Seven differ- mapped against the cosmid contig. Mapped cDNA clones 

ent cDNA clones were found with two of the cosmid will be checked as candidate oncosupressor genes for 

clones. All cosmids corresponding to the minimal tilling BCLL. 
path between DI3S1I68 and D13S25 are being used as 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 31 



BCM Server Core 

Daniel Davison and Randall Smith 

Baylor College of Medicine; Houston, TX 77030 

713/798-3738, Fax; -3759, 

We are providing a variety of molecular biology-related 
search and analysis services to Genome Program investi- 
gators to improve the identification of new genes and their 
functions. These services are available via the BCM 
Search Launcher World Wide Web (WWW) pages which 
are organized by function and provide a single 
point-of-entry for related searches. Pages are included for 
I) protein sequence searches, 2) nucleic acid sequence 
searches, 3) multiple sequence alignments, 4) pairwise se- 
quence alignments, 5) gene feature searches, 6) sequence 
utilities, and 7) protein secondary structure prediction. The 
Protein Sequence Search Page, for example, provides a 
single form for submitting sequences to WWW servers 
that provide remote access to a variety of different protein 
sequence search tools, including BLAST, FASTA, 
PROSITE, and BLOCKS searches. The BCM Search 
Launcher extends the functionality of other WWW ser- 
vices by adding additional hypertext links to results re- 
turned by remote servers. For example, links to the NCBI's 
Entrez database and to the Sequence Retrieval System 
(SRS) are added to search results returned by the NCBI's 
WWW BLAST server. These links provide easy access to 
Medline abstracts, links to related sequences, and addi- 
tional information which can be extremely helpful when 
analyzing database search results. For novice or infrequent 
users of sequence database search tools, we have pre-set 
the parameter values to provide the most informative 
fu^t-pass sequence analysis possible. 

A batch client interface to the BCM Search Launcher for 
Unix and Macintosh computers has also been developed to 
allow multiple input sequences to be automatically 
searched as a background task, with the results returned as 
individual HTML documents directly on the user's system. 
The BCM Search Launcher as well as the batch client are 
available on the WWW at URL 

The BCM/UH Server Core provides the necessary compu- 
tational resources and continuing support infrastructure for 
the BCM Search Launcher. The BCM/UH Server Core is 
composed of three network servers and currently supports 
electronic mail and WWW-based access; ultimately, spe- 
cialized client-server access will also be provided. The 
hardware used includes a 2048 -processor MasPar mas- 
sively parallel MIMD computer, a DEC /Upha AXP/OSFI , 
a Sun 2-processor SparcCenter 1000 server, and several 
Sun Sparc workstations. 

In addition to grouping services available elsewhere on the 
WWW and providing access to services developed at 
BCM and UH, the BCM/UH Server Core will also provide 
access to services from developers who are unwilling or 
unable to provide their own Internet network servers. 

Grant Nos.: DOE, DE-FGO3-9SER62097/A00O; National 
Library of Medicine, R01-LM05792; National Science 
Foundation, BIR 91-1 1695; National Research Service 
Award, F32-HG00133-0I; NIH, P3O-HGOO210 and 

A Freely Sharable 
Database-Management System 
Designed for Use in Component-Based, 
Modular Genome Informatics Systems^ 

Steve Rozen,' Lincoln Stein,' and Nathan Goodman 

The Jackson Laboratory; Bar Harbor, ME 04609 

Goodman: 207/288-6158, Fax: -6078, 

'Whitehead Institute for Biomedical Research; Cambridge, 

MA 02139 edu/informatics/workflow 

We are constructing a data-management component, built 
on top of commercial data-management products, tuned to 
the requirements of genome applications. The core of this 
genome data manager is designed to: 

• support the semantic and object-oriented data models 
that have been widely embraced for representing ge- 
nome data, 

• provide domain-specific built-in types and operations 
for storing and querying bimolecular sequences, 

• provide built-in support for tracking laboratory work 
flows, and admit further extensions for other 
special-purpose types, 

• allow core facilities to be readily extended to meet the 
diverse needs of biological applications 

The core data manager is being constructed on top of 
Sybase, Oracle, and Informix Universal Server The soft- 
ware is available free of charge and is freely 

We will be reporting progress on the core data manager's 
architecture and interface at the URLs above, and we so- 
licit comments on its design. 

DOE Grant No. DE-FG02-95ER62I0I. 

'Originally called Database Management Research for the 
Human Genome Project, this project was initiated in 1995 
at the Massachusetts Institute of Technology-Whitehead 

•Projects designaled by an a-stcrisk received ^^maJI emergency granls following December 1992 site reviews by David GaJa-s {formeriy DOE Office of 
Health and EnviroamenlaJ Rescarcfa. which was renamed Office of Biological and Environmental Rescajch in 1 997), Raymond Gesteland (Univcnily 
of Utah), and Elbcft Branscomb (Lawrence Livcnnore National Laboralofy). 

DOE Human Qanome Program Report, Part 2, 1996 Research Abstracts 




A Software Environment for Large- 
Scale Sequencing 

Mark Graves 

Department of Cell Biology; Baylor College of Medicine; 

Houston, TX 77030 

713/798-8271, Fax: -3759; 

hnp://www.bcm. tmc. edu 

http://.<itork. bcm.tmc. edu/gfp 

Our approach is to implement software systems which 
manage primary laboratory sequence data and explore and 
annotate functional information in genome sequence and 
gene products. 

Three software systems have been developed and are be- 
ing used: two sequence data managers which use different 
sequence assembly packages, FAK and Phrap, and a series 
of analysis and annotation tools which are available via the 
Internet. In addition, we have developed a prototype appli- 
cation for data mining of sequence data as it is related to 
metabolic pathways. 

Products of this project are the following: 

1. GRM -a sequence reconstruction manager using the 
FAQ assembly engine (available since October 1995). 

2. GFP -a sequence finishing support tool using the Phrap 
assembly engine (available since March 1996). 

3. A series of gene recognition tools (available since early 

4. A tool for visualizing metabolic pathways data and ex- 
ploring sequence data related to metabolic pathways (pro- 
totype available since August 19%). 

DOE Grant No. DE-FG03-94ER61618. 

Generalized Hidden Markov Models 
for Genomic Sequence Analysis 

David Haussler, Kevin Karplus,' and Richard Hughey' 

Computer Science Department and 'Computer Engineering 

Department; University of California; Santa Cruz, CA 


408/459 2105, Fax: -4829, 


We have developed an integrated probabilistic method for 
locating genes in human DNA based on a generalized hid- 
den Markov model (HMM). Each state of a generalized 
HMM represents a particular kind of region in DNA, such 
as an initial exon for a gene. The states are connected by 
transitions that model sites in DNA between adjacent re- 

gions, e.g. splice sites. In the full HMM, parametric statis- 
tical models are estimated for each of the states and transi- 
tions. Generalized HMMs allow a variety of choices for 
these models, such as neural networks, high order Markov 
models, etc. All that is required is that each model return a 
likelihood for the kind of region or transition It is supposed 
to model. These likelihoods are then combined by a dy- 
namic programming method to compute the most likely 
annotation for a given DNA contig. Here the annotation 
simply consists of the locations of the transitions identified 
in the DNA, and the labeling of the regions between transi- 
tions with their corresponding states. 

This method has been implemented in the genefinding pro- 
gram Genie, in collaboration with Frank Eeckman, Martin 
Reese and Nomi Harris at Lawrence Berkeley Labs. David 
Kulp, at UCSC, has been responsible for the core imple- 
mentation. Martin Reese developed the splice site models, 
promoter models, and datasets. You can access Genie at 
the second www address given above, submit sequences, 
and have them annotated. Nomi Harris has written a dis- 
play tool called Genotater that displays Genie's annotation 
along with the annotation of other genefinders, as well as 
the location of repetitive DNA, BLAST hits to the protein 
database, and other useful information. Papers and further 
information about Genie can be found at the first www 
address above. Since the ISMB '96 paper. Genie's exon 
models have been extended to explicitly incorporate 
BLAST and BLOCKS hits into their probabilistic 
framework. This results in a substantial increase in gene 
predicting accuracy. Experimental results in tests using a 
standard set of annotated genes showed that Genie identi- 
fied 95% of coding nucleotides correctly with a specificity 
of 88%, and 76% of exons were identified exactly. 

DOE Grant No. DE-FG03-95ER621 12. 

Identiflcation, Organization, and 
Analysis of Manmialian Repetitive 
DNA Information 

Jerzy Jurka 

Genetic Information Research Institute; Palo Alto, CA 


415/326-5588 Fax: ■2Q0\, 

There are three major objectives in this project: organiza- 
tion of databases of mammalian repetitive sequences, 
development of specialized software for analysis of repeti- 
tive DNA, and sequence studies of new mammalian re- 

Our approach is based on extensive usage of computer 
tools to investigate and organize publicly available se- 
quence information. We also pursue collaborative research 

34 DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



with experimental laboratories. The results are widely dis- 
seminated via the internet, peer reviewed scientific publi- 
cations and personal interactions. OiU' most recent research 
concentrates on mechanisms of retroposon integration in 
mammals (Jurka, J., PNAS, in press; Jurka, J and 
Klonowski. P., J. Mol. Evol. 43:685-689). 

We continue to develop reference collections of mamma- 
lian repeats which became a worldwide resource for anno- 
tation and study of newly sequenced DNA. The reference 
collections are being revised annually as part of a larger 
database of repetitive DNA, called Repbase. The recent 
influx of sequence data to public databases created an un- 
precedented need for automatic annotation of known re- 
petitive elements. We have designed and implemented a 
program for identification and elimination of repetitive 
DNA known as CENSOR. 

Reference collections of mammalian repeats and the CEN- 
SOR program are available electronically (via anonymous 
ftp to; directory repository/repbase). CEN- 
SOR can also be run via electronic mail (mail "help" mes- 
sage to 

DOE Grant No. DE-FG03-95ER62139. 

Databases on Gene-Expression 
Regulation as a Tool for Analysis of 
Functional Genomic Sequences 

A.E. Kel, O.A. Podkolodnaya, O.V. Kel, A.G. 

Romaschenko, E. Wingender,' G.C. Overton,^ and N.A. 


Institute of Cytology and Genetics; Novosibirsk, Russia 

Kolchanov: -1-7-3832/353-335, Fax: -336 or 7356-558, 

'Gesellschaft fUr Biotechnologische Forschung; 

Braunschweig, Germany 

^Department of Genetics; University of Pennsylvania 

School of Medicine; Philadelphia, PA 19104-6145 

The database on transcription regulatory regions in eukary- 
otic genomes (TRRD) has been developed [1] (http:// 
pub/trrd/). The main principle of data representation in 
TRRD is modular structure and hierarchy of transcription 
regulatory regions. TRRD entry corresponds to a gene as 
entire unit. Information on gene regulation is provided 
(cell-cycle and cell type specificity, developmental 
stage-specificity, influence of various molecular signals on 
gene expression). TRRD database contains information 
about structural organization of gene transcription regula- 
tory region. TRRD contains description of known promot- 
es and enhancers in 5', 3' regions and in introns. Descrip- 

tion of binding sites for transcription factors includes 
nucleotide sequence and precise location, name of factors 
that bind to the site, experimental evidences for the bind- 
ing site revealing. We provide cross-references to 
TRANSFAC database [2] for both sites and factors as well 
as for genes. TRRD 3.3 release includes 340 vertebrate 

The Gene Expression Regulation Database (GERD) col- 
lects information on features of genes expression as well 
as information about gene transcription regulation. The 
current release of GERD contains 75 entries with informa- 
tion on expression regulation of genes expressed in he- 
matopoietic tissues in the course of ontogenesis and blood 
cells differentiation. COMPEL database contains informa- 
tion about composite elements which are functional units 
essential for highly specific transcription regulation [3]. 
Direct interactions between transcription factors binding to 
their target sites within composite elements result in con- 
vergence of different signal transduction pathways. Nucle- 
otide sequences and positions of composite elements, 
binding factors and types of their DNA binding domains, 
experimental evidence confuming synergistic or antago- 
nistic action of factors are registered in COMPEL. 
Cross-references to TRANSFAC factors table are given. 
TRRD and COMPEL are provided by cross-references to 
each other. COMPEL 2.1 release includes 140 composite 

We have developed a software for analysis of transcription 
regulatory region structure. The CompSearch program is 
based on oligonucleotide weight matrix method. To collect 
sets of binding sites for the matrixes construction we have 
used TRANSFAC and TRRD databases. The CompSearch 
program takes into account the fine structure of experi- 
mentally confumed NFATp/AP-1 composite elements col- 
lected in COMPEL (distances between binding sites in 
composite elements, their mutual orientation). By means 
of the program we have found potential composite ele- 
ments of NFATp/AP- 1 type in the regulatory regions of 
various cytokine genes. Analysis of composite elements 
could be the first approach to reveal specific patterns of 
transcription signals encoding regulatory potential of eu- 
karyotic promoters. 


1 . Kcl O.V., Roma-schenko A.G.. Kel A.E.. Naumochkin A.N.. Kolchanov 
Nj^. Proceedings of the 2Rth Annual Hawaii IntemationaJ Confer- 
ence on System Sciences [HICSS]. (1995). v.5. Biotechnology 
Computing, lEE Computer Society Press, Los Alamos, California, p. 

2 Wingender E.. Dietze P.. Karas H., and Knuppel R TRANSFAC: a on transcription factors and their DNA binding sites (1996). 
Nucl. Acids Res.. 1 996, v. 24, pp. 238-241 . 

3. Kel O.V., A.G. Romaschenko. AE. Kcl, E. Wingender, N A. 

Kolchanov. Acompilation of composite regulatory elements affecting 
gene transcription in vertebrates ( 1995). Nucl, Acids Res., v. 23. pp 

(abstract continued) 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




Recent Publications 

Kcl, A.. Kel. O.. LschenJto. I . Kolchanov. N.. Karas. H., WiogoidCT. E. 
and Sklcnar. H ( 1 9%) TRRD and COMPEL databa-scs on transcrip- 
tion linked to TRANSFAC as tools for analysis and recognition of 
regulatory sequences Computer Science and Biology Proceedings of 
the German Conference on Bioinformatics (GCB'95), R. Hofesladt, 
T. Lcngaucr. M. Leffler. D. Schomburg (eds.). University of Leipzig. 
Leipzig 1996, pp 113-117. 
Wingender. E.. Kel. A- E.. Kel. O. V , Karas. R. Heinemeyer. T. Dietze. 
P. Knueppel. R.. Romaschenko. A. G. and Kolchanov, N. A. (1997). 
TRANSFAC. TRRD and COMPEL: Towards a federated 
system on transcriptional regulation. Nucleic Acids Res., in press. 
Ananko E A., Ignalieva E.V., Kel A.E.. Kolchanov N.A (1996). 
WWWTRRD: Hypertext information system on transcription 
regulation. Computer Science and Biology Proceedings of the 
German Conference on Bioinformatics (GCB'96). R. Hofestadl. T. 
Ungauer. M. LOffler. D. Schomburg (eds). University of Leipzig. 
Leipzig 1996. pp 153-155. 
A.E. Kel. O V. Kel. O.V. Vishnevsky. MP. Ponomarenko. LV Ischenko, 
H. Karas. N A Kolchanov. H. Sklenar, E Wingender ( 1997). TRRD 
and COMPEL on transcription linked to TRANSFAC as 
tools for analysis and recognition of regulatory sequences. (1997) 
Holger Karas. Alexander Kel. Olga Kel, Nikolay Kolchanov. and Edgar 
Wingender (1997). Integrating knowledge on gene regulation by a 
federated database approach: TRANSFAC. TRRD and COMPEL. 
Jumal Molekulamoy Biologii (Russian), in press. 
Kel A.E.. Kolchanov N A . Kel O V.. Roma.schenko AG. Ananko EA , 
Ignatyeva E V. Merkulova T.L. Podkolodnaya O.A . Stepancnko I.L., 
Kochelov A v., Kolpakov FA.. Podkolodniy N L., NaumochWn \A 
(1997). TRRD: A on transcription regulatory regions of 
eukaryotic genes. Jumal Molekulamoy Biologii (Russian) in press. 
O.V. Kel. A.E. Kel, AG. Romaschenko. E. Wingender. N A. Kolchanov 
(1997) Composite regulatory elements: classification and description 
in the COMPEL data Jumal Molekulamoy Biologii (Russian), 
in press. 

Data-Management Tools for Genomic 

Victor M. Markowitz and I-Min A. Chen 

Information and Computing Sciences Division; Lawrence 
Berkeley National Laboratory; Berkeley, CA 94720 
510/486-6835, Fax: -4004, 

The Object-Protocol Model (OPM) data management tools 
provide facilities for constructing, maintaining, and explor- 
ing efficiently molecular biology databases. Molecular bi- 
ology data are currently maintained in numerous molecular 
biology databases (MBDs), including large archival MBDs 
such as the Genome Database (GDB) at Johns Hopkins 
School of Medicine, the Genome Sequence Data Base 
(GSDB) at the National Center for Genome Resources, 
and the Protein Data Bank (PDB) at Brookhaven National 
Laboratory. Constructing, maintaining, and exploring 
MBDs entail complex and time-consuming processes. 

The goal of the Object-Protocol Model (OPM) data man- 
agement tools is to provide facilities for efficiently con- 
structing, maintaining, and exploring MBDs, using 
application-specific constructs on top of commercial data- 
base management systems (DBMSs). The OPM tools wiU 

also provide facilities for reorganizing MBDs and for ex- 
ploring seamlessly heterogenous MBDs. The OPM tools 
and documentation are available on the Web and are devel- 
oped in close collaboration with groups maintaining 
MBDs. such as GDB, GSDB, and PDB. 

Current work focuses on providing new facilities for con- 
structing and exploring MBDs. The specific aims of this 
work are; 

(1) Extend the OPM query language with additional con- 
structs for expressing complex conditions, and enhance the 
OPM query optimizer for generating more efficient query 

(2) Develop enhanced OPM query interfaces supporting 
MBD-specific data types (e.g., protein data type) and op- 
erations (e.g., protein data display and 3D search), and as- 
sisting users in specifying and interpreting query results. 

(3) Provide support for customizing MBD interfaces. 

(4) Extend the OPM tools with facilities for managing per- 
missions (object ownership) in MBDs, and for physical 
database design of relational MBDs, including specifica- 
tion of indexes, allocation of segments, and handling of 
redundant (denormalized) data. 

(5) Develop OPM tools for constructing and maintaining 
multiple OPM views for both relational and non-relational 
(e.g., ASN.l, AceDB) MBDs. For a given MBD, these tools 
will allow customizing different OPM views for different 
groups of scientists. For heterogeneous MBDs, this tool will 
allow exploring them using common OPM interfaces. 

(6) Develop tools for constructing OPM based 
multidatabase systems of heterogeneous MBDs and for 
exploring and manipulating data in these MBDs via OPM 
interfaces. As part of this effort, the OPM-based 
multidatabase system which consists currently of GDB 6.0 
and GSDB 2.0, will be extended to include additional 
MBDs, primarily GSDB 2.2 (when it becomes available), 
PDB, and Genbank. 

(7) Develop facilities for reorganizing OPM-based 
MBDs.The database reorganization tools will support au- 
tomatic generation of procedures for reorganizing MBDs 
following restructuring (revision) of MBD schemas. 

In the past year, the OPM data management tools have been 
extended in order to address specific requirements of devel- 
oping MBDs such as GDB 6 and the new version of PDB. 

The current version of the OPM data management tools 
(4.1) was released in June 1996 for Sun/OS, Sun/Solaris 
and SGI. The following OPM tools are available on the 
Web at; 

(1) an editor for specifying OPM schemas; 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



(2) a translator of OPM schemas into relational database 
specifications and procedures; 

(3) utilities for publishing OPM schemas in text (Latex), 
diagram (Postscript), and Html formats; 

(4) a translator of OPM queries into SQL queries; 

(5) a retrofitting tool for constructing OPM schemas 
(views) for existing relational genomic databases; 

(6) a tool for constructing Web-based form interfaces to 
MBDs that have an OPM schema; this tool was developed 
by Stan Letovsky at Johns Hopkins School of Medicine, as 
part of a collaboration. 

The OPM data management tools have been highly suc- 
cessful in developing new genomic databases, such as 
GDB 6 (released in January 1996; http://gdbgeneral.gdb. 
org/gdb/) and the relational version of PDB (http://, and in constructing OPM 
views and interfaces for existing genomic databases such 
as GSDB 2.0. The OPM data management tools are cur- 
rently used by over ten groups in USA and Europe. The 
research underlying these tools is described in several pa- 
pers published in scientific journals and presented at data- 
base and genome conferences. 

In the past year the OPM tools have been presented at da- 
tabase and bioinformatics conferences, including the Inter- 
national Symposium on Theoretical and Computational 
Genome Research, Heidelberg, Germany, March 1996, the 
Workshop on Structuring Biological Information, Heidel- 
berg, Germany, March 1996, the Meeting on Genome 
Mapping and Sequencing, Cold Spring Harbor, May 1996, 
the International Sybase User Group Conference, May 
1996, the Bioinformatics -Structure Conference, Jerusa- 
lem, November 1996, and the Pacific Symposium on 
Bioinformatics, January 1997. 

The results of the research and development underlying 
the OPM tools work have been presented in papers pub- 
lished in proceedings of database and bioinformatics con- 
ferences; these papers are available at 

DOE Contract No. DE-AC03-76SF0OO98. 

The Genome Topographer: System 

S. Cozza, D. Cuddihy, R. Iwasaki, M. Mallison, C. Reed, 
J. Salit, A. Tracy, and T. Man- 
Cold Spring Harbor Laboratory; Cold Spring Harbor, NY 

Marr: 516/367-8393, Fax: -»46\, or 

Genome Topographer (GT) is an advanced genome 
informatics system that has received joint funding from 
DOE and NTH over a number of years. DOE funding has 
focused on GT tools supporting computational genome 
analysis, principally on sequence analysis. GT is scheduled 
for public release next spring under the auspices of the 
Cold Spring Harbor Human Genome Informatics Research 
Resource. GT has 17 major existing frameworks: 1 . Views, 
including printing, 2. Default manager, 3. Graphical User 
Interface, 4. Query, 5. Project Manager, 6. Workspace 
Manager, 7. Asynchronous Process Manager, 8. Study 
Manager, 9. Help, 10. Application, 11, Notification, 12. 
Security, 13. World Wide Web Interface, 14. NCBI, 15. 
Reader, 16. Writer, 17. External Database Interface. GT 
Frameworks are independent sets of VisualWorks (client) 
or SmallTalkDB (GemStone) classes which interact to per- 
form the duties required to satisfy the responsibilities of 
the specific framework. Each framework is clearly defined 
and has a well-defined interface to use it. These frame- 
works are used over and over in GT to perform similar du- 
ties in different places. GT has basic tools and special 
tools. Basic tools get used many times in different applica- 
tions, while special tools tend to be special purpose, de- 
signed to do fairly limited things, although the distinction 
is somewhat arbitrary. Tools typically use several frame- 
works when they get assembled. Basic Tools: 1 . Project 
Browser, 2. Editor/Viewer, 3. Query, 4. NCBI Entrez, 5. 
File reader/vmter, 6. Map comparison, 7. Database Admin- 
istrator, 8. Login, 9. Default, 10. Help. Special Tools: 1. 
Study Manager, 2. Compute Server, 3. Sequence Analysis, 
4. Genetic /^alysis. These frameworks and tools are com- 
bined with a comprehensive database schema of very rich 
biological expression linked with plugable computational 
tools. Taken together, these features allow users to con- 
struct, with relative ease, on-line databases of the primary 
data needed to study a genetic disease (or genes and phe- 
notypes in general) from the stage of family collection and 
diagnostic ascertainment through cloning and functional 
analysis of candidate genes, including mutational analysis, 
expression information, and screening for biochemical in- 
teractions with candidate molecules. GT was designed on 
the premise that a highly informative, visual presentation 
of comprehensive data to a knowledgeable user is essential 
to their understanding. The advanced software engineering 
techniques that are promoted by using relatively new ob- 
ject oriented products has allowed GT to become a highly 
interactive and visually-oriented system that allows the 
user to concentrate on the problem rather than on the com- 
puter. Using the rich data representational features charac- 
teristic of this technology, the GT software enables users to 
construct models of real-world, complex biological phe- 
nomena. These unique features of GT are key to the thesis 
that such a system will allow users to discover otherwise 
intractable networks of interactions exhibited by complex 
genetic diseases. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 37 



The VisualWorks development environment allows the 
development of code that runs unchanged across all major 
workstation and personal computers, including PCS, 
Macintoshes and most Unix workstations. 

DOE Grant No. DE-FG02-91ER61190. 

A Flexible Sequence Reconstructor for 
Large-Scale DNA Sequencing: A 
Customizable Software System for 
Fragment Assembly 

Gene Myers and Susan Larson 

Department of Computer Science: University of Arizona; 

Tucson. AZ 85721 

602/621-6612, Fax: A2A(s. 

http://www. C.I. arizona. edu/faktory 

We have completed the design and begun construction of a 
software environment in support of DNA sequencing 
called the "FAKtory". The environment consists of ( 1 ) oiu- 
previously described software library, FAK, for the core 
combinatorial problem of assembling fragments. (2) a Tel/ 
Tk based interface, and (3) a software suite supporting a 
modest database of fragments and a processing pipeline 
that includes clipping and vector prescreening modules. A 
key feature of our system is that it is highly customizable; 
the structure of the fragment database, the processing pipe- 
line, and the operation of each phase of the pipeline are 
specifiable by the user. Such customization need only be 
established once at a given location, subsequently users 
see a relatively simple system tailored to their needs. In- 
deed one may direct the system to input a raw dataset of 
say ABI trace files, pass them through a customized pipe- 
line, and view the resulting assembly with two button 

The system is built on top of our FAK software library and 
as a consequence one receives (a) high-sensitivity overlap 
detection, (b) correct resolution to large high-fideUty re- 
peats, (c) near perfect multi-alignments, and (d) support of 
constraints that must be satisfied by the resulting assem- 
blies. The FAKtory assumes a processing pipeline for frag- 
ments that consists of an INPUT phase, any number and 
sequence of CLIP. PRESCREEN. and TAG phases, fol- 
lowed by an OVERLAP and then an ASSEMBLY phase. 
The sequence of clip, prescreen, and tag phases is 
customizable and every phase is controlled by a panel of 
user-settable preferences each of which permits setting the 
phase's mode to AUTO, SUPERVISED, or MANUAL. 
This setting determines the level of interaction required by 
the user when the phase is run, ranging from none to 
hands-on. Any diagnostic situations detected during pipe- 
line processing are organized into a log that permits one to 

confirm, correct, or undo decisions that might have been 
made automatically. 

The customized fragment database contains fields whose 
type may be chosen from TIME, TEXT, NUMBER, and 
WAVEFORM. One can associate default values for fields 
unspecified on input and specify a control vocabulary lim- 
iting the range of acceptable values for a given field (e.g., 
John, Joe, or Mary for the field Technician, and [ I, 36] for 
the field Lane). This database may be queried with 
SQL-like predicates that further permit approximate 
matching over text fields. Common queries and/or sets of 
fragments selected by them may be named and referred to 
later by said name. The pipeline status of a fragment may 
be part of a query. 

The system permits one to maintain a collection of alterna- 
tive assemblies, to compare them to see how they are dif- 
ferent, and directly manipulate assemblies in a fashion 
consistent with sequence overlaps. The system can be cus- 
tomized so that a priori constraints reflecting a given se- 
quencing protocol (e.g. double-barreled or transposon- 
mapped) are automatically produced according to the syn- 
tax of the names of fragments (e.g. X.f and X.r for any X 
are mates for double-barreled sequencing). The system 
presents visualizations of the constraints applied to an as- 
sembly, and one may experiment with an assembly by add- 
ing and/or removing constraints. Finally, one may edit the 
multi-alignment of an assembly while consulting the raw 
waveforms. Special attention was given to optimizing the 
ergonomics of this time-intensive task. 

DOE Grant No. DE-FG03-94ER6I9I I. 

The Role of Integrated Software and 
Databases in Genome Sequence 
Interpretation and Metabolic 

Terry Gaasterland, Natalia Maltsev, Ross Overbeek, and 

Evgeni Selkov 

Mathematics and Computer Science Division; Argoone 

National Laboratory; Argonne, IL 60439 

630/252-4171, Fax: $<)%(>, 

MAGPIE: hnp://www.mcs.anLgov/home/gaasterl/ 


WIT: hnp:// 

As scientists successfully sequence complete genomes, the 
issue of how to organize the large quantities of evolving 
sequence data becomes paramount Through our work in 
comparative whole genome analysis (MAGPIE, 
Gaasterland) and metabolic reconstruction algorithms 
(WIT, Overbeek, Maltsev, and Selkov), we carry genome 
interpretation beyond the identification of gene products to 
customized views of an organism's functional properties. 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



MAGPIE is a system designed to reside locally at the site 
of a genome project and actively carry out analysis of ge- 
nome sequence data as it is generated.'^ DNA sequences 
produced in a sequencing project mature through a series 
of stages that each require different analysis activities. 
Even after DNA has been assembled into contiguous frag- 
ments and eventually into a single genome, it must be 
regularly reanalyzed. Any new data in public sequence da- 
tabases may provide clues to the identity of genes. Over a 
year, for 2 megabases with 4-fold coverage, MAGPIE will 
request on the order of 100,000 outputs from remote 
analysis software, manipulate and manage the output, up- 
date the current analysis of the sequence data, and monitor 
the project sequence data for changes that initiate reanaly- 

In collaboration with Canada's Institute for Marine Bio- 
sciences and the Canadian Instimte for Advanced Re- 
search, MAGPIE is being used to maintain and study com- 
parative views of all open reading frames (ORFs) across 
fully sequenced genomes (currently 5), nearly completed 
genomes (currently 2) and 1 genome in progress 
(Sulfolobus solfataricus). Together, these genomes repre- 
sent multiple archaeal and bacterial genomes and one eu- 
karyotic genome. This analysis provides the necessary data 
to assign phylogenetic classifications to each ORE (e.g., 
"AE" for archaeal and eukaryotic). This data in turn pro- 
vides the basis for validating and assessing functional an- 
notations according to phylogenetic neighborhood (e.g., 
selecting the eukaryotic form of a biochemical function 
over a bacterial form for an "AE" ORE).' 

Once an automated functional overview has been estab- 
lished, it remains to pinpoint the organisms' exact meta- 
bolic pathways and establish how they interact.To this end, 
the WIT (What Is There) system supports efforts to de- 
velop metabolic reconstructions. Such constructions, or 
models, are based on sequence data, clearly established 
biochemistry of specific organisms, understanding of the 
interdependencies of biochemical mechanisms. WIT thus 
offers a valuable tool for testing current hypotheses about 
microbial behavior For example, a reconstruction may 
begin with a set of established enzymes (enzymes with 
strong similarities in identified coding regions to existing 
sequences for which the enzymatic function is known) and 
putative enzymes (enzymes with weak similarity to se- 
quences of known function). From these initial "hits," 
within a phylogenetic perspective, we identify an initial set 
of pathways. This set can be used to generate a set of ex- 
pected enzymes (enzymes that have not been clearly de- 
tected, but that would be expected given the set of hypoth- 
esized pathways) and missing enzymes (enzymes that oc- 
cur in the pathways but for which no sequence has yet 
been biochemically identified for any organism). Further 
reasoning identifies tentative coimective pathways. 

In addition to helping curators develop metabolic recon- 
structions, WIT lets users examine models curated by ex- 
perts, follow connections between more than two thousand 
metabolic diagrams, and compare models (e.g., which of 
certain genes that are conserved among bacterial genomes 
are found in higher life). The objective is to set the .stage 
for meaningful simulations of microbial behavior and thus 
to advance our understanding of microbial biochemistry 
and genetics. 

DOE Contract No. W-3 1 - 1 09-Eng-38 (ANL FWP No. 


{ 1 1 T. Gaaslcrlajid and C. Sensen, Fully Automated Genome Analysis that 
Reflects User Needs and Preferences -a Detailed Introduction to the 
MAGPIE System Architecture. Biochemie. 78<4). (accepted) 

12] T. Gaa-sterland. J- Lobe. N. Maltsev. and G. Chen. Assigning Function 
to CDS Through Qualified Query Answering. In Proc. 2nd Int. Conf, 
Intcll. Syst. for Mol. Bio, Stanford U. ( 1994). 

(3] T- Gaasterland and E- Selkov. Automatic Reconstnjctioti of Metabolic 
Structure from Incomplete Genome Sequence Data. In Proc. Int. 
Conf. Intell. Syst. for Mol. Bio. Cambridge, England (1995). 

Database Transformations for 
Biological Applications 

G. Christian Overton, Susan B. Davidson,' and Peter 

Department of Genetics and 'Department of Computer and 

information Science; University of Pennsylvania; 

Philadelphia, PA 19104 

Overton: 215/573-3105, Fax: -M\\.coverton@cbil.humgen. 

Davidson: 215/898-3490, Fax: -0587, xusan@ci'! 

Buneman: 215/898-7703, Fax: -05S7, 


h ttp://sdmc. iss. nus. .tg/kleisli-stuff/Morelnfo. html 

We have implemented a general-purpose query system, 
Kleisli, that provides access to a variety of "non-standard" 
data sources (e.g., ACeDB, ASN.l, BLAST), as well as to 
"standard" relational databases. The system represents a 
major advance in the ability to integrate the growing num 
ber and diversity of biology data sources conveniently and 
efficiently. It features a uniform query interface, the CPL 
query language, across heterogeneous data sources, a 
modular and extensible architecture, and most significantly 
for dealing with the Internet environment, a programmable 
optimizer We have demonstrated the utility of the system 
in composing and executing queries that were considered 
difficult, if not unanswerable, without first either building 
a monolithic database or writing highly application- 
specific integration code (details and examples available at 
URL above). 

In conjunction with other software developed in our group, 
we have assembled a toolset that supports a range of data 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


<1 on QO in 



integration strategies as well as the ability to create spe- 
cialized data warehouses initialized from community data- 
bases. Our integration strategy is based upon the concept 
of "mediators", which serve a group of related applications 
by providing a uniform structural interface to the relevant 
data sources. This approach is cost-effective in terms of 
query development time and maintenance. We have exam- 
ined in detail methods for optimizing queries such as "re- 
trieve all known human sequence containing an Alu repeat 
in an intragenic region" where the data sources are hetero- 
geneous and distributed across the Internet. 

Transformation of data resources, that is the structural re- 
organization of a data resource from one form to another, 
arises frequently in genome informatics. Examples include 
the creation of data warehouses and database evolution. 
Implementing such transformations by hand on a case by 
case basis is time consuming and error prone. Conse- 
quently there is a need for a method of specifying, imple- 
menting and formally verifying transformations in a uni- 
form way across a wide variety of different data models. 
Morphase is a prototype system for specifying transforma- 
tions between data sources and targets in an intuitively ap- 
pealing, declarative language based on Horn clause logic. 
Transformations specification in Morphase are translated 
into CPL and executed in the Kleisli system. The 
data-types underlying Morphase include arbitrarily nested 
records, sets, variants, lists and object identity, thus captur- 
ing the types common to most data formats relevant to ge- 
nome informatics, including ASN. I and ACE. Morphase 
can be connected to a wide variety of data sources simulta- 
neously through KJeisli. In this way, data can be read from 
multiple heterogeneous data sources, transformed using 
Morphase according to the desired output format, and in- 
serted into the target data source. 

We have tested Morphase by applying it to a variety of 
different transformation problems involving Sybase, ACE 
and ASN. 1 . For example, we used it to specify a transfor- 
mation between the Sanger Center's Chromosome 22 ACE 
database (ACE22DB) and a Chromosome 22 Sybase data- 
base (Chr22DB), as well as between a portion of GDB and 
Chr22DB. Some of these transformations had already been 
hand-coded without our tools, forming a basis for compari- 

Once the semantic correspondences between objects in the 
various databases were understood, writing the transforma- 
tion program in Morphase was easy, even by a non-expert, 
of the system. Furthermore, it was easy to find conceptual 
errors in the transformation specification. In contrast, the 
hand-coded programs were obtuse, difficult to understand, 
and even more difficult to debug. 

DOE Grant No. DE-FG02-94ER61923. 

Relevant Publications 

P Bimemar. SB Davidson, K. Hart, C Overton and L Wong."A Data 

Transformation System for Biological Data Sources," in Proceedings 

of VLDB, ScpL 1995 (Zurich. Switzerland). Also available a,s 

Technical Report MS-CIS-95-l 0. University of Pennsylvania. March 

SB. Davidson. C. Overton and P. Buneman, "Challenges in Integrating 

Biological Data Sources." J. Computational Biology 2 (1995). pp 

A. Kosky. "Transforming with Recursive Data Structures," 

PhD Thesis. December 1995 
SB Davidson and A Kosky. "Effecting Database Transformations Using 

Morphase." Technical Report MS-CIS-96-05. University of 

A. Kosky. S.B. Davidson and P. Buneman, "Semantics of Database 

Transformations," Technical Report MS-CIS-95-25. University of 

Pennsylvania. 1995. 
K. Hart and L. Wong. "Pruning Nested Data Values Using Branch 

Expressions With Wildcards," In Abstracts of MIMBD. Cambridge. 

England. July 1995. 

Las Vegas Algorithm for G^ne 
Recognition: Suboptimal and 
Error-Tolerant Spliced Alignment 

Sing Hoi Sze and Pavel A. Pevzner' 

Departments of Computer Science and 'Mathematics; 
University of Southern California; Los Angeles. CA 90089 
Pevzner: 213/740-2407, Fax: -2424 
ppe vzner@hto. use. edu 

Recently, Gelfand, Mironov, and Pevzner (Proc. Natl. 
Acad. Sci. USA, 1996, 9061-9066) proposed a spUced 
alignment approach to gene recognition that provides 99% 
accurate recognition of human gene if a related mamma- 
lian protein is available. However, even 99% accurate gene 
predictions are insufficient for automated sequence annota- 
tion in large-scale sequencing projects and therefore have 
to be complemented by experimental gene verification. 
100% accurate gene predictions would lead to a substantial 
reduction of experimental work on gene identification. Our 
goal is to develop an algorithm that either predicts an exon 
assembly with accuracy sufficient for sequence annotation 
or warns a biologist that the accuracy of a prediction is 
insufficient and further experimental work is required. We 
study suboptimal and error-tolerant spliced alignment 
problems as the first steps towards such an algorithm, and 
report an algorithm which provides 100% accurate recog- 
nition of human genes in 37% of cases (if a related mam- 
malian protein is available). For 52% of genes, the algo- 
rithm predicts at least one exon with 100% accuracy. 

DOE Grant No. DE-FG03-97ER62383. 

40 DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


Foundations for a Syntactic Pattern- 
Recognition System for Genomic DNA 
Sequences: Languages, Automata, 
Interfaces, and Macromolecules 

David B. Seaiis and G. Christian Overton' 
SmithKline Beecham Pharmaceuticals: King of Prussia, 
PA 19406 

610/270^551, Fax: -5580, 
'Department of Genetics; University of Pennsylvania; 
Philadelphia, PA 19104 

Viewed as strings of symbols, biological macromolecules 
can be modelled as elements of formal languages. Genera- 
tive grammars have been useful in molecular biology for 
purposes of syntactic pattern recognition, for example in 
the author's work on the GenLang pattern matching sys- 
tem, which is able to describe and delect patterns that are 
probably beyond the capability of a regular expression 
specification. More recently, grammars have been used to 
capture intramolecular interactions or long-distance depen- 
dencies between residues, such as those arising in folded 
structures. In the work of Haussler and colleagues, for ex- 
ample, stochastic context-free grammars have been used as 
a framework for "learning" folded RNA structures such as 
tRNAs, capturing both primary sequence information and 
secondary structural covariation. Such advances make the 
study of the formal status of the language of biological 
macromolecules highly relevant, and in particular the find- 
ing that DNA is beyond context-free has already created 
challenges in algorithm design. 

Moreover, to date, such methods have not been able to 
capture relationships between strings in a collection, such 
as those that arise via intermolecular interactions, or evolu- 
tionary relationships implicit in alignments. Recently we 
have attempted to remedy this by showing (1) how formal 
grammars can be extended to describe interacting collec- 
tions of molecules, such as hybridization products and, 
potentially, multimeric or physiological protein interac- 
tions, and (2) how simple automata can be used to model 
evolutionary relationships in such a way that complex 
model-based alignment algorithms can be automatically 
generated by means of visual programming. These results 
allow for a useful generalization of the language-theoretic 
methods now applied to single molecules. 

In addition, we describe a new software package — 
bioWidget — for the rapid development and deployment of 
graphical user interfaces (GUIs) designed for the scientific 
visualization of molecular, cellular and genomics informa- 
tion. The overarching philosophy behind bio Widgets is 
componentry: that is, the creation of adaptable, reusable 
software, deployed in modules that are easily incorporated 
in a variety of applications and in such a way as to pro- 
mote interaction between those applications. This is in 


sharp distinction to the common practice of developing 
dedicated applications. The bioWidgets project addition- 
ally focuses on the development of specific applications 
based on bioWidget componentry, including chromo- 
somes, maps, and nucleic acid and peptide sequences. 

The current set of bioWidgets has been implemented in 
Java with the goal in mind of delivering local applications 
and distributed applets via Intranet/Internet enviromnents 
as required. The immediate focus is on developing inter- 
faces for information stored in distributed heterogeneous 
databases such as GDB, GSDB, Entry, and ACeDB. The 
issues we are addressing are database access, reflecting 
database schemas in bioWidgets, and performance. We are 
also directing our efforts into creating a consortium of 
bioWidget developers and end-users. This organization 
will create standards for and encourage the development of 
bioWidget components. Primary participants in the consor- 
tium include Gerry Rubin (UC Berkeley) and Nat 
Goodman (Jackson Labs). 

DOE Grant No. DE-FG02-92ER61371. 
Relevant Publications 

D.B. Searls, "String Vanable Grammar: A Ix>gic Grammar Fonnali<im for 

DNA Sequences," Journal of Logic Programming 24 ( 1 ,2);73- 102 

D.B. Scarh. 'Tormal Grammars for Intermolecular Structure," First 

International Symposium on Intelligence in NeuiaJ and Biological 

Systems. 30-37(1995). 
D.B. Searls and KJ*. Murphy. "Automata-Theoretic Models of Mutation 

and Alignment," Third International Conference on Intelligent 

Systems for Molecular Biology. 341 -349 ( 1 995). 
D.B. Searls, "bioTk: Componentry for Genome Informatics Graphical 

User Interfaces," Gene 163 (2)<X:i-l6 (I99S). 

Analysis and Annotation of Nucleic 
Acid Sequence 

David J. States, Ron Cytron, Pankaj Agarwal. and Hugh 


Institute for Biomedical Computing; Washington 

University; St. Louis, MO 63108 

314/362-2134. Fax: -0234, 

http://www. ibc. 

Bayesian estimates for sequence similarity: There is an 
inherent relationship between the process of pairwise se- 
quence alignment and the estimation of evolutionary dis- 
tance. This relationship is explored and made explicit. As- 
suming an evolutionary model and given a specific pattern 
of observed base mismatches, the relative probabilities of 
evolution at each evolutionary distance are computed us- 
ing a Bayesian framework. The mean or the median of this 
probability distribution provides a robust estimate of the 
central value. Bayesian estimates of the evolutionary dis- 
tance incorporate arbitrary prior information about variable 
mutation rates both over time and along sequence position. 

DOE Hutnan Genome Program Report, Part 2, 1996 Research Abstracts 




thus requiring only a weak form of the molecular-clock 

The endpoints of the similarity between genomic DNA 
sequences are often ambiguous. The probability of evolu- 
tion at each evolutionary distance can be estimated over 
the entire set of alignments by choosing the best alignment 
at each distance and the corresponding probability of du- 
plication at that evolutionary distance. A central value of 
this distribution provides a robust evolutionary distance 
estimate. We provide an efficient algorithm for computing 
the parametric alignment, considering evolutionary dis- 
tance as the only parameter. 

These techniques and estimates are used to infer the dupli- 
cation history of the genomic sequence in C. elegans and 
in S. cerevisae. Our results indicate that repeats discovered 
using a single scoring matrix show a considerable bias in 
subsequent evolutionary distance estimates. 

Model based sequence scoring metrics: PAM based 
DNA comparison metric has been extended to incorporate 
biases in nucleotide composition and mutation rates, ex- 
tending earlier work (States, Gish and Altschul, 1993). A 
codon based scoring system has been developed that incor- 
porates the effects biased codon utilization frequencies. 

A dynamic programming algorithm has been developed 
that will optimally align sequences using a choice of com- 
parison measures (non-coding vs. coding, etc.). We are in 
the process of evaluating this approach as a means for 
identifying likely coding regions in cDNA sequences. 

Efficient sequence similarity search tools: Most se- 
quence search tools have been designed for use with pro- 
tein sequence queries a few himdred residues long. The 
analysis of genomic DNA sequence necessitates the use of 
queries hundreds of kilobases or even megabases in length. 
A memory and computationally efficient search tool has 
been developed for the identification of repeats and se- 
quence similarity in very large segments of nucleic acid 
sequence. The tool implements optimal encoding of the 
word table, repeat filters, flexible scoring systems, and 
analytically parameterized search sensitivity. Output for- 
mats are designed for the presentation of genomic se- 
quence searches. 

Federated databases: A Sybase server and mirror for 
GSDB are being developed to facilitate the annotation of 
repeat sequence elements in public data repositories. 

DOE Grant No. DE-FG02-94ER61910. 

Gene Recognition, Modeling, and 
Homology Search in GRAIL and 

Ying Xu, Manesh Shah, J. Ralph Einstein, Sherri Matis, 

Xiaojun Guan, Sergey Petrov, Loren Hauser,' Richard J. 

Mural,' and Edward C. Uberbacher 

Computer Science and Mathematics and 'Biology 

Divisions; Oak Ridge National Laboratory; Oak Ridge, 


Uberbacher: 423/574-6134, Fax: -7860, 

GRAIL is a modular expert system for the analysis and 
characterization of DNA sequences which facilitates the 
recognition of gene features and gene modeling. A new 
version of the system has been created with greater sensi- 
tivity for exon prediction (especially in AT rich regions), 
more accurate splice site prediction, and robust indel error 
detection capability. GRAIL 1.3 is available to the user in 
a Motif graphical client-server system (XGRAIL), through 
WWW-Netscape, by e-mail server, or callable from other 
analysis programs using Unix sockets. 

In addition to the positions of protein coding regions and 
gene models, the user can view the positions of a number 
of other features including poly-A addition sites, potential 
Pol II promoters, CpG islands and both complex and 
simple repetitive DNA elements using algorithms devel- 
oped at ORNL. XGRAIL also has a direct link to the 
genQuest server, allowing characterization of newly ob- 
tained sequences by homology-based methods using a 
number of protein, DNA, and motif databases and com- 
parison methods such as FastA, BLAST, parallel 
Smith-Waterman, and special algorithms which consider 
potential frameshifts during sequence comparison. 

Following an analysis session, the user can use an annota- 
tion tool which is part of the XGRAIL 1.3 system to gener- 
ate a "feature table" report describing the current sequence 
and its properties. Links to the GSDB sequence database 
have been established to record computer-based analysis 
of sequences during submission to the database or as third 
party annotation. 

Gene Modeling and Client-Server GRAIL: In addition 
to the current coding region recognition capabilities based 
on a multiple sensor-neural network and rule base, mod- 
ules for the recognition of features such as splice junc- 
tions, transcription and translation start and stop, and other 
control regions have been constructed and incorporated 
into an expert system (GAP III) for reliable 
computer-based modeling of genes. Heuristic methods and 
dynamic programming are used to construct fu^t pass gene 
models which include the potential for modification of ini- 
tially predicted exons. These actions result in a net im- 
provement in gene characterization, particularly in the rec- 

42 DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


ognition of very short coding regions. Translation of gene 
models and database searches are also supported through 
access to the genQuest server (described below). 

Model Organism Systems: A number of model organism 
systems have been designed and implemented and can be 
accessed within the XGRAIL 1.3 client including Escheri- 
chia coli, Drosophila melanogaster and Arabidopsis 
thaliana. The performance of these systems is basically 
equivalent to the Human GRAIL 1.3 system. Additional 
model organism systems, including several important mi- 
croorganisms, are in progress. 

Error Detection in Coding Sequences: Single-pass DNA 
sequencing is becoming a widely used technique for gene 
identification from both cDNA and genomic DNA se- 
quences. An appreciably higher rate of base insertion and 
deletion errors (indels) in this type of sequence can cause 
serious problems in the recognition of coding regions, ho- 
mology search, and other aspects of sequence interpreta- 
tion. We have developed two error detection and "correc- 
tion" strategies and systems which make low -redundancy 
sequence data more informative for gene identification and 
characterization purposes. The furst algorithm detects se- 
quencing errors by finding changes in the statistically pre- 
ferred reading frame within a possible coding region and 
then rectifies the frame at the transition point to make the 
potential exon candidate frame-consistent. We have incor- 
porated this system in GRAIL L3 to provide analysis 
which is very error tolerant. Currently the system can de- 
tect about 70% of the indels with an indel rate of 1%, and 
GRAIL identifies 89% of the coding nucleotides compared 
to 69% for the system without error correction. The algo- 
rithm uses dynamic programming and runs in time and 
space linear to the size of the input sequence. 

In the second method, a Smith-Waterman type comparison 
is facilitated in which the frame of DNA translation to pro- 
tein sequence can change within the sequence. The transi- 
tion points in the translation frame are determined during 
the comparison process and a best match to potential pro- 
tein homologs is obtained with sections of translations 
from more than one frame. The algorithm can detect ho- 
mologies with a sensitivity equivalent to Smith-Waterman 
in the presence of 5% indel errors. 

Detection of Regulatory Regions: An initial Polymerase 
II promoter detection system has been implemented which 
combines individual detectors for TATA, CAAT, GC, cap, 
and translation start elements and distance information us- 
ing a neural network. This system finds about 67% of 
TATA containing promoters with a false positive rate of 
one per 35 kilobases. Additionally a systems to detect po- 
tential polyA addition sites and CpG islands has been in- 
corporated into GRAIL. 

The GenQuest Sequence Comparison Server The 

genQuest server is an integrated sequence comparison 


server which can be accessed via e-mail, using Unix sock- 
ets ftom other applications, Netscape, and through a Motif 
graphical client-server system. The basic purpose of the 
server system is to facilitate rapid and sensitive compari- 
son of DNA and protein sequences to existing DNA, pro- 
tein, and motif databases. Databases accessed by this sys- 
tem include the daily updated GSDB DNA sequence data- 
base, SwissProt, the dbEST expressed sequence tag data- 
base, protein motif libraries and motif analysis systems 
(Prosite, BLOCKS), a repetitive DNA library (ftom J. 
Jurka), Genpept, and sequences in the PDB protein struc- 
tural database. These options can also be accessed from the 
XGRAIL graphical client tool. 

The genQuest server supports a variety of sequence query 
types. For searching protein databases, queries may be sent 
as amino acid or DNA sequence. DNA sequence can be 
translated in a user specified frame or in all 6 frames. 
DNA-DNA searches are also supported. User selectable 
methods for comparison include the Smith-Waterman dy- 
namic programming algorithm, FastA, versions of BLAST, 
and the IBM dFLASH protein sequence comparison algo- 
rithm. A variety of options for search can be specified in- 
cluding gap penalties and option switches for 
Smith -Waterman, FastA, and BLAST, the number of align- 
ments and scores to be reported, desired target databases 
for query, choice of PAM and Blosum matrices, and an 
option for masking out repetitive elements. Multiple target 
databases can be accessed within a single query. 

Additional Interfaces and Access: Batch GRAIL 1.3 is a 
new "batch" GRAIL client allows users to analyze groups 
of short (300-400 bp) sequences for coding character and 
automates a wide choice of database searches for homol- 
ogy and motifs. A Command Line Sockets Client has been 
constructed which allows remote programs to call all the 
basic analysis services provided by the GRAIL-genQuest 
system without the need to use the XGRAIL interface. 
This allows convenient integration of selected GRAIL 
analyses into automated analysis pipelines being con- 
structed at some genome centers. An XGRAIL Motif 
Graphical Client for the GRAIL release 1.3 has been con- 
structed using Motif with versions for a wide variety of 
UNDC platforms including Sun, Dec, and SGI. The e-mail 
version of GRAIL can be accessed at and 
the e-mail version of genCJuest can be accessed at Instructions can be obtained by sending the 
word "help" to either address. The Motif or Sun versions 
of XGRAIL, batch GRAIL, and XgenQuest client software 
are available by anonymous ftp from 
( Both GRAIL and genQuest are accessible 
over the World Wide Web (URL 
Conunimications with the GRAIL staff should be ad- 
dressed to 

DOE Contract No. DE-AC05-840R21400. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




Informatics Support for Mapping 
in Mouse-Human Homology Regions 

Edward Uberbacher, Richard Mural,' Manesh Shah, 

Loren Hauser,' and Sergey Petrov 

Computer Science and Mathematics Division and 'Biology 

Division; Oak Ridge National Laboratory; Oak Ridge, TN 


423/574-6134, Fax: -7860, 

The purpose of this project is to develop databases and 
tools for the Oak Ridge National Laboratory (ORNL) 
Mouse-Human Mapping Project, including the construc- 
tion of a mapping database for the project; tools for man- 
aging and archiving cDNAs and other probes used in the 
laboratory; and analysis tools for mapping, interspecific 
backcross, and other needs. Our initial effort involved in- 
stalling and developing a relational SYBASE database for 
tracking samples and probes, experimental results, and 
analyses. Recent work has focused on a corresponding 
ACeDB implementation containing mouse mapping data 
and providing numerous graphical views of this data. The 
initial relational database was constructed with SYBASE 
using a schema modeled on one implemented at the 
Lawrence Livermore National Laboratory (LLNL) center; 
this was because of documentation available for the LLNL 
system and the opportunity to maximize compatibility with 
hiunan chromosome 19 mapping. (Major homologies exist 
between human chromosome 19 and mouse chromosome 
7, the initial focus of the ORNL work.) 

With some modification, our ACeDB implementation was 
modeled somewhat on the Lawrence Berkeley National 
Laboratory (LBNL) chromosome 21 ACeDB system and 
designed to contain genetic and physical mouse map data 
as well as homologous human chromosome data. The use- 
fulness of exchanging map information with LLNL (hu- 
man chromosome 19) and potentially with other centers 
has led to the implementation of procedures for data export 
and the import of human mapping data into ORNL data- 

User access to the system is being provided by workstation 
forms-based data entry and ACeDB graphical data brows- 
ing. We have also implemented the LLNL database 
browser to view human chromosome 19 data maintained at 
LLNL, and arrangements are being made to incorporate 
mouse mapping information into the browser. Other appli- 
cations such as the Encyclopedia of the Mouse, specific 
tools for archiving and tracking cDNAs and other mapping 
probes, and analysis of interspecific backcross data and 
YAC restriction mapping have been implemented. 

We would like to acknowledge use of ideas from the 
LLNL and LBNL Human Genome Centers. 

DOE Contract No. DE-AC05-840R21400. 

SubmitData: Data Submission 
to Public Genomic Databases 

Manfred D. Zom 

Software Technologies and Applications Group; 
Information and Computing Sciences Division; Lawrence 
Berkeley National Laboratory; University of California; 
Berkeley CA 94720 

510/486-5041, Fax: -4004, 
http://www-hgc. html 

Making information generated by the various genome 
projects available to the community is very important for 
the researcher submitting data and for the overall project to 
justify the expenses and resources. Public genome data- 
bases generally provide a protocol that defines the required 
data formats and details how they accept data, e.g., se- 
quences, mapping information. These protocols have to 
strike a balance between ease of use for the user and op- 
erational considerations of the database provider, but are in 
most cases rather complex and subject to change to accom- 
modate modifications in the database. 

SubmitData is a user interface that formats data for sub- 
mission to GSDB or GDB. The user interface serves data 
entry purposes, checking each field for data types, allowed 
ranges and controlled values, and gives the user feedback 
on any problems. Besides one-time submissions, templates 
can be created that can later be merged with 
TAB-delimited data files, e.g., as produced by common 
spreadsheet programs. Variables in the template are then 
replaced by values in defined columns of the input data 
file. Thus submitting large amounts of related data be- 
comes as easy as selecting a format and supplying an input 
filename. This allows easy integration of data submission 
into the data generation process. 

The interface is generated directly from the protocol speci- 
fications. A specific parser/compiler interprets the protocol 
definitions and creates internal objects that form the basis 
of the user interface. Thus a working user interface, i.e., 
static layout of buttons and fields, data validation, is auto- 
matically generated from the protocol definitions. Protocol 
modifications are propagated by simply regenerating the 

The program has been developed using ParcPlace 
Visual Works and currently supports GSDB, GDB and 
RHdb data submissions. The program has been updated to 
use VisualWorks 2.0. 

DOE Contract No. DE-AC03-76SF00098. 

44 DOE Human Qanoma Program Report, Part 2, 1996 Research Abstracta 


Ethical, Legal, and Social Issues 

The Human Genome: Science and 
the Social Consequences; Interactive 
Exhibits and Programs on Genetics 
and the Human Genome 

Charies C. Carlson 

The Exploratorium; San Francisco, CA 94123 
415/561-0319, Fax: -0307; 

From April through September 1995, the Exploratorium 
mounted a special exhibition called Diving into the Gene 
Pool consisting of 26 interactive exhibits developed over 
the course of three years. The exhibits introduce the science 
of genetics and increase public awareness of the Human 
Genome Project and its implications for society. Founded 
in the success of exhibits developed for the 1992 genetics 
and biotechnology symposium "Winding Your Way Through 
DNA" (co-hosted with the University of California, San 
Francisco), the 1995 exhibition aimed to create an engag- 
ing and accessible presentation of specific information 
about genetic science and our understanding of the struc- 
ture and function of the human genome, genetic technol- 
ogy, and ethical issues surrounding current genetic science. 

In addition to creating a unique collection of exhibits, the 
project developed a range of supplemental public program- 
ming to provide public forum for discussion and interac- 
tion about genetics and bioethics. A lecture series entitled 
"Bioethics and the Hiunan Genome ProjecC featured such 
key thinkers as Mary Claire King, Leroy Hood, David 
Martin, Troy Duster, Michael Yesley, William Atchley, and 
Joan Hamilton (among others). A weekend event program 
focused on biodiversity in animal and plant life with 
events such as "Seedy Science," "Blooming Genes," and 
"Dog Diversity." A Biotech Weekend offered access to 
new technologies through demonstrations by local biotech 
firms and genetic counselors. And a specially-commis- 
sioned theatre piece, "Dog Tails," provided a instructive 
and comic look for kids into the foundations of genetics 
and issues of diversity. 

In the 5-month exhibition period, approximately 300,000 
visitors had the opportunity to visit the exhibition, and 
well over 5,000 participated in the special programming. 
Following the exhibition's close, the new exhibits will be- 
come a permanent part of the Exploratorium's collection 
of over 650 interactive exhibits. 

Additional funding for 1995-96 will support formal outside 
evaluation of the effectiveness of the exhibits, and support 
exhibit remediation based on the evaluation findings. This 
activity will both strengthen the Exploratorium's permanent 
collection of genetics exhibits and help to develop a feasi- 
bility study for a travelling version of the genetics exhibi- 
tion for other museums around the country and the world. 

DOE Grant No. DE-FG03-93ER61583. 

Documentary Series for Public 

Graham Chedd and Noel Schwerin 

Chedd-Angier Production Company; Watertown, MA 


617/926-8300, Fax: -2710 

Designed as a 4-hour docimientary series for Public 
Broadcasting, Genetics in Society (working title) will ex- 
plore the ethical, legal, and social implications of genetic 
technology. Currently funded and in production for a 90- 
minute special (Testing Family Ties), the first program pro- 
files several individuals and families as they confront ge- 
netic tests and the information they generate. One high- 
risk cancer family struggles to make sense of their genetic 
legacy as it debates prophylactic surgery and whether or 
not to test for BRCA 1 and BRCA2. In a family without that 
family risk, news of the Ashkenazi BRCA 1 finding pushes 
an anxious Jewish woman to demand testing for herself 
and her young daughter In another, a woman chooses to 
carry to term her prenatally diagnosed Cystic Fibrosis 
twins, despite social and personal pressures. In a third, a 
scientist researching the so-called "obesity gene" at a 
biotech company debates the proper "marketing" of his 
research and confronts the larger questions it raises about 
what should be considered "normal" and what constitutes 
therapy vs enhancement. 

Testing Family Ties will explore not only what genetic 
technology does — in testing, drug development, and po- 
tential therapy — but what it means to our sense of self, 
family, and future and to our concepts of health and nor- 

Depending on outstanding funding requests. Genetics in 
Society will be broadcast in the Fall of 1996 or the Winter 
of 1997 on PBS. Noel Schwerin is Producer/Director. Gra- 
ham Chedd is Executive Producer. 

DOE Grant No. DE-FG06-95ER6I995. 

Human Genome Teacher Networking 

Debra L. Collins and R. Nell Schimke 

Genetics Education Center; Division of Endocrinology and 

Genetics; University of Kansas Medical Center; Kansas 

City, KS 66160-7318 

913/588-6043, Fax: ^060, 

This project links over 150 middle and secondary teachers 
from throughout the United States with genetic and public 
policy professionals, as well as families who are knowl- 
edgeable about the ethical, legal, and social implications 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 45 



(ELSI) of the Human Genome Project Teachers network 
with peers and professionals, and acquire new sources of 
information during four phases: I ) the first one-week sum- 
mer workshop to update teachers on human genetics con- 
cepts and new sources for classroom curricula including 
online resources; 2) classroom use of new materials and 
information; 3) the second one-week summer workshop 
where teachers return to exchange successful teaching 
ideas and plan peer teaching sessions and mentor network- 
ing; 4) dissemination of genetic information through 
in-services and workshops for colleagues; and collabora- 
tion with genetic professional participating in our Mentor 

The applications of Human Genome Project technology 
are emphasized. Individuals who have contact and experi- 
ence with patients, including clinical geneticists, genetic 
counselors, attorneys, laboratories geneticists and families, 
take part in didactic sessions with teachers. Throughout the 
workshop, family panels provide an opportunity for par- 
ticipants to compare their textbook-based knowledge of 
genetic conditions with the personal experiences of fami- 
lies who discuss their condition, including: diagnosis, 
treatment, genetic risk, decisions, insurance, employment, 
family planning, and confidentiality. 

Because of this project, teachers feel more prepared and 
confident teaching about human genetics, the Human Ge- 
nome Project and ELSI topics. The teachers are effective 
in disseminating knowledge of genetics to their students 
who show a significant increase in human genome knowl- 
edge compared to students whose teachers have not par- 
ticipated in this project 

Teacher dissemination activities extend the project beyond 
participation at summer workshops. To date, 55 workshop 
participants have completed all four project phases by or- 
ganizing more than 200 local, regional, and national 
teacher education programs to disseminate knowledge and 
resources. More than 1500 colleagues and the general pub- 
lic have participated in teacher workshops, and over 
56,000 students have been reached through project partici- 
pants and their peers. 

The project participants organize interdisciplinary peer 
teaching sessions including bioethical decision making 
sessions combining debate and biology classes; sessions 
for social studies teachers; human genetics and 
multi-cultural collaborations; cooperative learning activi- 
ties; and curricular development sessions. Students were 
involved in sessions on ethics, politics, economics and law. 
Teachers organize bioethics curriculum writing sessions, 
laboratory activities using electrophoresis as well as other 
biotechnology, and sessions on genetic databases. 

A World Wide Web home page for Genetics Education as- 
sists teachers in remaining current on genetic information 
and helps them find answers to student inquiries. The 

home page has links to numerous genome sites, sources of 
information on genetic conditions, networking opportuni- 
ties with other genetics education programs, teaching re- 
sources, lesson plan ideas, and the Mentor Network of ge- 
netic professionals and a network of family support groups 
willing to work with teachers and their students. 

DOE Grant No. DE-FG02-92ER6I392. 

Human Genome Education Program 

Lane Conn 

Human Genome Education Program; Stanford Human 
Genome Center; Palo Alto, CA 94304 
415/812-2003. Fax: -1916, 

The Human Genome Education Program (HGEP) operates 
within the Stanford Human Genome Center. It is a collabo- 
rative effort among HGEP staff. Genome Center scientists, 
collaborating staff ftom other education programs, experi- 
enced high school teachers, and an Advisory Panel in the 
fields of science, education, social science, assessment 
and ethics. 

The Human Genome Project will have a profound impact 
on society with its applications in testing for and improv- 
ing treatment of genetic and the many uses of 
DNA profiling. The goal of HGEP is to help prepare high 
school students and community members to be able to 
make educated decisions on the personal, ethical, social 
and policy questions raised by the application of genome 
information and technology in their lives. 

The primary objectives for HGEP are to (1) develop a hu- 
man genome curriculum for high school science and (2) 
education outreach to schools and community groups in 
the San Francisco Bay Area. To achieve Objective 1, the 
HGEP is working to develop, field test and prepare for 
national dissemination a two laboratory-based curriculum 
units for high school students. Unit 1, "Dealing With Ge- 
netic Disorders," explores the variety of treatment options 
potentially available for a genetic disorder, including gene 
therapy. Unit 2, "DNA Snapshots, Peeking at Your DNA," 
explores human relatcdness through examining the 
student's own DNA polymorphisms using PCR. 

Each unit is centered around a societal or ethical problem 
raised by these important applications of genome informa- 
tion and technology. Students use modeling exercises and 
inquiry laboratory experiments to learn about the science 
behind a given application. Students then combine the sci- 
ence they have learned with other relevant information to 
choose a solution to the societal/ethical problem posed in 
the unit. As a culminating activity, the students work in 
groups to present and defend their solution. 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


To achieve Objective 2, the HGEP provides Genome Cen- 
ter tours for teacher, student and community groups that 
involve pre-tour lectures; tour exploration of genome map- 
ping, sequencing and informatics; and post-tour lectiu-e 
and discussion on genome applications, and their social 
and ethical implications. Also, the education program con- 
tinues to work to establish and sustain local science educa- 
tion partnerships among schools, industry, universities and 
national laboratories. 

DOE Grant No. DE-FG03-96ER62161. 

Your World/Our World-Biotechnology & 
You: Special Issue on the Human 
Genome Project 

JefT Davidson and Laurence Weinberger 

Pennsylvania Biotechnology Association; State College, 

PA 16801 

814/238-4080, Fax: -4081, 

Your World/Our World is a biotechnology science maga- 
zine published semi-annually by the non-profit Pennsylva- 
nia Biotechnology Association (PBA) describing for sev- 
enth to tenth grade students the excitement and achieve- 
ments of contemporary biotechnology. This is the only 
continuing source of biotechnology education specifically 
directed to this age group - an age at which students too 
frequently are mmed off from science. The special Spring 
1996 issue will be devoted to the presentation of the sci- 
ence behind the HGP, the HGP itself, and the ethical, legal, 
and social issues generated by the project. The strong em- 
phasis on attractive graphic presentation and age appropri- 
ate text that have been the hallmark of the earlier issues, 
which have been highly acclaimed and well received by 
the educational, scientific, and business commiuiity, will 
be continued. 

PBA believes that increased educational opportunities to 
learn about biotechnology are most effective if presented 
at the seventh to tenth grade levels for the following rea- 

• Full semester life science and biology classes often 
occur for the first time in these grades; 

• Across the nation, textbooks are typically 10 to 14 
years old, and even the most recent textbooks are 
quickly dated by the rapid development in the biologi- 
cal sciences; 

• Curricula at this level are more flexible than high 
school curricula, allowing the addition of information 
about exciting biological developments; and 

• Science at this level is generally not elective, and, 
therefore, a very comprehensive student population is 
addressed rather than the more selective populations 
available later in the educational program. 


In creating Your World/Our World, the PBA defined the 
following educational goals to guide the development of 
the magazine: 

• Contribute to general science literacy and an educated 

• Contribute to biological and technological literacy; 

• Motivate students to pursue additional science study 
and careers in science, particularly among women and 
minority populations. 

PBA recognizes that it has been a point of pride that 
biotechnologists have been uniquely concerned with the 
impact of their technology on society and have been the 
first to raise and encourage responsible public debate with- 
out being forced to do so by others. To do less now for the 
children would be a breach of this responsible history. Ac- 
cordingly, this special HGP issue will address the ethical, 
legal, and social issues raised by the new genomic tech- 
nologies. Special ethics advisors have been recruited to aid 
in the development of these aspects. 

A complimentary copy of the special issue and its teachers' 
guide will be mailed to every public and private school 
seventh to tenth grade science teacher (approximately 
40,000) in the United States. A cover announcement will 
explain the origin and development of the magazine and of 
the special edition. Teachers will be invited to piu-chase 
full classroom packets (30 copies & teacher's guide) from 
the PBA, but, if they are not able to afford the packets, 
they will be asked to respond by postcard indicating their 
interest The cost of the packets will probably be in the $20 
range. The PBA is actively seeking additional support so 
that the issue may be distributed for free or at a reduced 
cost. In addition, parts of the special issue will be available 
over the Internet via a World Wide Web Page. 

PBA believes this is a unique opportunity to educate 
America's youth about the HGP and insure that accurate 
non-sensational information will be made available to our 
country's children. 

DOE Grant No. DE-FG02-95ER62I07. 

The Human Genome Project and 
Mental Retardation: An Educational 

Sharon Davis 

Department of Research and Program Services; The Arc 
of the United States; Arlington, TX 76010 
8 17/261-6003, Fax: /277-3491, 

The Arc of the United States, a national organization on 
mental retardation, with 140,000 members and more than 
1000 affiliated chapters proposes to educate its general 

DOE Human Genome Program Report, Pan 2, 1996 Research Abstracts 47 



membership and volunteer leaders about the Human Ge- 
nome Project as it relates to mental retardation. A large 
number of identified causes of mental retardation are ge- 
netic, and many family members of The Arc deal with is- 
sues related to a genetic condition on a daily basis. We be- 
lieve it is critical for our members and leaders to be edu- 
cated about the scientific and ethical, legal and social as- 
pects of the HGP, so that the association can evaluate and 
discuss the issues and develop positions based on adequate 

The major objectives of the proposed three-year project 
are to develop and disseminate educational materials for 
members/leaders of The Arc to inform them about the Hu- 
man Genome Project and mental retardation and to con- 
duct training on the scientific and ethical, legal and social 
aspects of the Human Genome Project and mental retarda- 
tion using The Arc's existing training vehicles. 

The Arc will develop and disseminate educational materi- 
als oriented toward families and conduct training at its na- 
tional and state conventions, local chapter meetings and at 
board of director's meetings. The American Association of 
University Affiliated Programs for Persons with Develop- 
mental Disabilities (AAUAP) will assist with the project 
by providing needed expertise. The AAUAP membership 
includes university faculty who are experts on the genetic 
causes of mental retardation and on related ethical, legal 
and social issues. An advisory panel of university scientists 
and leaders of The Arc will guide the project. 

DOE Grant No. DE-FG03-96ER62I62. 

Pathways to Genetic Screening: 
Molecular Genetics Meets the High- 
Risk Family 

Troy Duster and Diane Beeson' 

Institute for the Study of Social Change; University of 

California; Berkeley, CA 94705 

510/642-0813, Fax: /8674, 

'Department of Sociology; California State University; 

Hayward, CA 94542 

The proliferation of genetic screening and testing is requir- 
ing increasing numbers of Americans to integrate genetic 
knowledge and interventions into their family life and per- 
sonal experience. This study examines the social processes 
that occur as families at risk for two of the most common 
autosomal recessive diseases, sickle cell disease (SC) and 
cystic fibrosis (CF), encounter genetic testing. Since each 
of these diseases is found primarily in a different ethnic/ 
racial group (CF in European Americans and SC is African 
Americans), this research will clarify the role of culture in 
integrating genetic testing into family life and reproductive 
planning. A third type of genetic disorder, the 

thalassemias, has recently been added to our sample in or- 
der to extend our comparative frame to include other eth- 
nic and racial groups. In California, the thalassemias pri- 
marily affect Southeast Asian immigrants, although an- 
other risk group is from the Mediterranean region. 
Thalassemias, like cystic fibrosis and sickle cell disease, 
have a similar pattern of inheritance and raise similarly 
serious bio-medical challenges and issues of information 

Data are drawn from interviews with members of families 
in which a gene for CF, SC or thalassemia has been identi- 
fied. Data collection consists primarily of focused inter- 
views with approximately 400 individuals from families in 
which at least one member has been identified as having a 
genetic disorder (or trait). In the most recent phase of the 
research, we are conducting focus groups selected to 
achieve stratified homogeneity around key social dimen- 
sions such as gender and relationship to disease. This is 
clarifying the social processes that facilitate and inhibit 
genetic testing. 

We are currently assessing the concerns expressed by re- 
spondents about the potential uses of genetic information. 
We find strong patterns of concern, often based on per- 
sonal experience, that genetic information may be used in 
ways that family members perceive as dangerous and/or 
discriminatory. First among these concerns is fear of losing 
access to health care. Additional concerns include fear of 
genetic discrimination in employment and other types of 
insurance, particularly life insurance. Similar patterns of 
concern exist among members of each ethnic group, and 
are frequently the focus of attention among family mem- 
bers, but take somewhat different form within each cul- 
tural group. These concerns constitute a growing obstacle 
to widespread use of genetic testing. 

DOE Grant No. DE-FG03-92ER61393. 

Intellectual Property Issues in 

Rebecca S. Eisenberg 

University of Michigan Law School; Ann Arbor, MI 48 109 
313/763-1372, Fax: -9375, 

Intellectual property issues have been uncommonly salient 
in the recent history of advances in genomics. Beginning 
with the filing of patent applications by NTH on the first 
batch of expressed sequence tags (ESTs) from the labora- 
tory of Dr. Craig Venter, each new development has been 
met with speculation about its strategic significance from 
an intellectual property perspective. Are ESTs of unknown 
function patentable, or is further work necessary before 
they satisfy patent law standards? Will patents on such 
fragments promote commercial investment in product de- 
velopment, or will they interfere with scientific communi- 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



cation and collaboration and retard the overall research 
effort? Without patent rights, how may the owners of pri- 
vate cDNA sequence databases earn a return on their in- 
vestment while still permitting other investigators to obtain 
access to the information on reasonable terms? What are 
the rights of those who contribute resources such as cDNA 
libraries that are used to create the databases, and of those 
who identify sequences of interest out of the morass of 
information in the databases by formulating appropriate 
queries? Will the disclosure of ESTs in the public domain 
preclude patenting of subsequently characterized 
full-length genes and gene products? And why would a 
commercial firm invest its own resources in generating an 
EST database for the public domain? 

Two factors have contributed to the fascination with intel- 
lectual property in this setting. First is a perception that 
some pioneers in genomics have sought to claim intellec- 
tual property rights that reach beyond their actual achieve- 
ments to cover future discoveries yet to be made by others. 
For example, the controversial NIH patent applications 
claimed rights not only in the ESTs that were actually set 
forth in the specifications, but also in the full-length 
cDNAs that might be obtained by using the ESTs as 
probes, as well as in other, undisclosed fragments of those 
genes. More recently, private owners of cDNA sequence 
databases have set as a condition for access agreement to 
offer the database owners licenses to any resulting intellec- 
tual property. These efforts to claim rights to the future 
discoveries of others raise issues about the fairness and 
efficiency of the law in allocating rewards and incentives 
along the path of cumulative innovation. 

Second is the counterintuitive alignment of interests in the 
debate. It was a public institution, NIH, that initially fa- 
vored patenting discoveries that some representatives of 
industry thought should remain unpatented, and it was a 
major pharmaceutical fum, Merck & Co., that ultimately 
took upon itself the quasi-governmental function of spon- 
soring a university-based effort to place comparable infor- 
mation in the public domain. These topsy-turvy positions 
in the public and private sectors raise intriguing questions 
about the proper roles of government and industry in 
genomics research, and about who stands to benefit (and 
who stands to lose) from the private appropriation of ge- 
nomic information. 

DOE Grant No. DE-FG02-94ER61792. 

AAAS Congressional Fellowship 

Stephen Goodman 

The American Society of Human Genetics; Bethesda, MD 


301/571-1825, Fax: /530-7079, 

Few individuals in the genetics community are conversant 
with federal mechanisms for developing and implementing 
policy on human genetics research. In 1 995 the American 
Society of Human Genetics (ASHG), in conjunction with 
OOE, initiated an American Association for the Advance- 
ment of Science (AAAS) Congressional Fellowship Pro- 
gram to strengthen the dialogue between the professional 
genetics community and federal policymakers. The fellow- 
ship will allow genetics professionals to spend a year as 
special legislative assistants on the staff of members of 
Congress or on congressional committees. Directed toward 
productive scientists, the program is intended to attract 
independent investigators. 

In addition to educating the scientific community about the 
public policy process, the fellowship is expected to dem- 
onstrate the value of science-government interactions and 
make practical contributions to the effective use of scien- 
tific and technical knowledge in government. The program 
includes an orientation to legislative and executive opera- 
tions and a year-long weekly seminar on issues involving 
science and public policy. 

Unlike similar government programs, this fellowship is 
aimed primarily at scientists outside government. It em- 
phasizes policy-oriented public service rather than obser- 
vational learning and designates its fellows as free agents 
rather than representatives of their sponsoring societies. 

One of the goals of DOE and ASHG is to develop a group 
of nongovernmental professionals who will be equipped to 
deal with issues concerning human genetics policy devel- 
opment and implementation, particularly in the current 
environment of health-care reform and managed care. 
Graduates of this program will serve as a resource for con- 
sultation in the development of public -health policy con- 
cerning genetic disease. 

Fellowship candidates must demonstrate exceptional basic 
understanding of and competence in human genetics; hold 
an earned degree in genetics, biology, life sciences, or a 
similar field; have a well-grounded and appropriately 
documented scientific and technical background; have a 
broad professional background in the practice of human 
genetics as demonstrated by national or international repu- 
tation; be cognizant of related nonscientific matters that 
impact on human genetics; exhibit sensitivity toward po- 
litical and social issues; have a strong interest and some 
experience in applying personal knowledge toward the 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




solution of social problems; be a member of ASHG; be 
articulate, literate, adaptable, and interested in working on 
long-range public policy problems; be able to work with a 
variety of people of diverse professional backgrounds; and 
function well during periods of intense pressure. 

The first fellow is working in the office of Senator 
Wellstone. Democrat from Minnesota, and devoting most 
of his time to studying and commenting on health-care and 
science issues. 

DOE Grant No. DE-FG02-95ER61974. 

A Hispanic Educational Program for 
Scientific, Ethical, Legal, and Social 
Aspects of the Human Genome Project 

Margaret C. Jefferson and Mary Ann Sesma' 

Department of Biology and Microbiology; California State 

University; Los Angeles CA 90032 

213/343-2059, Fax: -2095, 

'Los Angeles Unified School District 

The primary objectives of this grant are to develop, imple- 
ment, and distribute culturally competent, linguistically 
appropriate, and relevant curriculum that leads to Hispanic 
student and family interactions regarding the science, ethi- 
cal, legal, and social issues of the Human Genome Project. 
By opening up channels of familial dialogue between par- 
ents and their high school students, entire families can be 
exposed to genetic health and educational information and 
opportunities. In addition, greater interaction is anticipated 
between students and teachers, and parents and teachers. 
In the Los Angeles Unified School District alone, over 
65% of the approximately 850,000 student enrollment are 
bilingual Hispanics. The 1990 census data revealed that 
the U.S.A. had a total population of 248,709,873, of which 
22,354,059 were Hispanics, and thus, there is a need for 
materials to be disseminated throughout the U.S.A. that are 
relevant and understandable to this population. 

Student curriculum consists of BSCS HGP-ELSI curricu- 
lum available in both English and Spanish; supplemental 
lesson plans developed and utilized by high school teach- 
ers in predominantly Hispanic classrooms that will be 
available via the World Wide Web; student-developed sur- 
veys that ascertain knowledge and perceptions of genetics 
and HGP-ELSI in Hispanic and other ethnic communities 
in the greater Los Angeles area; the University of Wash- 
ington High School Human Genome Program exercises on 
DNA synthesis and sequencing; and career ladders and 
opportunities in genetics. The supplemental lesson plans 
are focused on four major units: the Cell; Mendelian Ge- 
netics and its Extensions; Molecular Genetics; and the Hu- 
man Genome Project and ELSI. The concise concepts un- 
derlying each unit are being utilized in two ways: (a) first, 

the student activities emphasize logical, problem-solving 
exercises; tools or technologies applicable to that concept; 
when and where appropriate, a focus on the Hispanic 
population; and an understanding of the problems and 
compassion for the families associated with learning of 
genetic diseases, (b) second, the concepts serve as the 
springboard for the topics that the students include in sci- 
ence newsletters to their parents. In addition to on-campus 
activities, we intend to arrange field trips and/or classroom 
demonstrations of genetic and molecular biology techniques 
by scientists and other experts. The speakers would also be 
asked to discuss career opportunities and the educational 
requirements needed to enter the specific careers presented. 

The parent curriculum consists of two major activities. 
First the student-parent newsletter is designed to drawn the 
parents into the curriculum. Students write newsletters on 
a biweekly basis. Each newsletter relates to a student cur- 
riculum subunit and the specific subunit concepts. English, 
Spanish, social science as well as biology and chemistry 
teachers assist the students in its production. The other ma- 
jor activity that involves the parents are the parent focus 
groups. Parents from each participating school are invited 
to monthly focus groups at their specific campus. The fo- 
cus groups discuss issues related to genetics and health, 
legal and social issues as well as science issues that stem 
from the student newsletters. The discussions are in both 
English and Spanish with translators available. Links with 
other programs have been established. 

DOE Grant No. DE-FG03-94ER61797. 

Implications of the G«neticization of 
Health Care for Primary Care 

Mary B. Mahowald. John Lantos, Mira Lessick, Robert 

Moss, Lainie Friedman Ross, Greg Sachs, and Marion Verp 

Department of Obstetrics and Gynecology and MacLean 

Center for Clinical Medical Ethics; University of Chicago; 

Chicago, IL 60637 

312/702-9300, Fax: -0840, 

"Geneticization" refers to the process by which advances 
in genetic research are increasingly applicable to all areas 
of health care.' Studies show that primary caregivers are 
often deficient in their knowledge of genetics and genetic 
tests, and the ethical, legal, and social implications of this 
knowledge." Accordingly, this project prepares primary 
caregivers who have no special training in genetics or ge- 
netic counseling to deal with the implications of the Hu- 
man Genome Project for their practice. 

Phase I (fall 1995): Generic topics will be addressed by PI 
and Co-PIs with Robert Wood Johnson clinical scholars 
and clinical ethics fellows, led by visiting or internal experts. 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



Topics : Goals, Methods, & Achievements of the HOP; Ty- 
pology of Genetic Conditions; Scientific. Clinical, Ethical, 
and Legal Aspects of Gene Therapy; Concepts of Disease; 
Genetic Disabilities; Gender and Socio-economic Differ- 
ences; Cultural and Ethnic Differences; Directive or Non- 
directive genetic counseling. 

Speakers : Jeff Leiden; Julie Pahner; Dan Brock; Anita Sil- 
vers; Abby Lippman; James Bowman; Beth Fine 

Phase II (Jan.-Mar. 1996); Teams of individuals, all 
trained in the same area of primary care, will identify and 
address issues specific to their area, developing course out- 
lines, bibliography, and methodology based on grand 
rounds given by national expert. 

Primary Care Area 

Pediatrics: Genetics expert: Stephen Friend, Ethics Expert: 

Lainie F. Ross -H fellow 
Obstetrics/Gynecology: Genetics expert: Joe Leigh 

Simpson, Ethics Expert: Marion Verp + fellow 
Medicine: Genetics expert: Tom Caskey. Ethics Expert: 

Greg Sachs + fellow 
Family medicine: Genetics expert Noralane Lindor, Ethics 

Expert: Robert Moss -i- fellow 
Nursing: Genetics expert: Mira Lessick, Ethics Expert: 

Colleen Scanlon + fellow 

Phase III (Apr.-May 1996): Policy issues will be identi- 
fied and addressed as above for all areas of primary care, 
based on grand rounds given by national expert. 

Policy team : Genetics expert: Sherman Elias; Ethics ex- 
pert: John L,antos -H trainee 

Phase IV (OcL-Dec. 1996): Presentation of content devel- 
oped to new group of fellows and scholars by each of the 
above teams, followed by evaluation & revision. 

Phase V (spring 1997): NATIONAL CONFERENCE and 
CME/CNE WORKSHOPS for primary caregivers, key- 
noted by Victor McKusick. 

DOE Grant No. DE-FG02-95ER61990. 


'Lippman A., PrcnalaJ genetic testing and screening. AmerJ Law <( Med 

XVn, 15-50(1991). 
=Hofinan, K J., Tambor, E.S . G.A., Geller. G.. Faden. R.R . and 

Holtzman, N.A., Physicians' knowledge of genetics and genetic tests. 

Acad Med 68, 625-32 { 1 993). 
^Holtzman, N.A.. The paradoxical effect of mcdica] training, J Clin 

£rticj 2, 24142(1992). 
Torsman. 1, Education of nurics in genetics, Amer J of Hum Genetics 

'Williams, JX)., Pediatric nurw practitioneri' knowledge of genetic 

discust Ped Nursing 9. 1 19-21 (1983). 
"George, J.B.. Genetics: Challenges for nursing education, J Fed Nursing 


Nontraditional Inheritance: Genetics 
and the Nature of Science; Instructional 
Materials for High School Biology 

Joseph D. Mclnemey and B. Ellen Friedman 

Biological Sciences Curriculum Study; Colorado Springs, 

CO 80918 

719/531-5550. Fax: -9\04, 

There often is a gap between the public's and scientists' 
views of new research findings, particularly if the public's 
understanding of the nature of science is not sound. Large 
quantities of new evidence and consequent changes in sci- 
entific explanations, such as those associated with the Hu- 
man Genome Project and related genetics research, can 
accentuate those different views. Yet an appealing second- 
ary effect of the unusually fast acquisition of data is that 
our view of genetics is changing rapidly during a brief 
time period, a relatively recent phenomenon in the field of 
biological sciences. This situation provides an outstanding 
opportunity to communicate the nature and methods of 
science to teachers and students, and indirectly to the pub- 
lic at large. The immediacy of new explanations of genetic 
mechanisms lets nontechnical audiences acmally experi- 
ence a changing view of various aspects of genetics, and in 
so doing, gain an appreciation of the nature of science that 
rarely is felt outside of the research laboratory. 

The Biological Sciences Curriculum Study (BSCS) is de- 
veloping a curriculum module that brings this active view 
of the nature and methods of science into the classroom 
via examples from recent discoveries in genetics. We will 
distribute this print module free of charge to interested 
high school biology teachers in the United States. 

The examples selected for classroom activities include the 
instability of trinucleotide repeats as an explanation of ge- 
netic anticipation in Huntington disease and myotonic dys- 
trophy, and the more widespread genetic mechanism of 
extranuclear inheritance, illustrated by mitochondrial in- 
heritance. Background materials for teachers discuss a 
wider range of phenomena that require nontraditional 
views of inheritance, including RNA editing, genomic im- 
printing, transposable elements, and uniparental disomy. 
The genetics topics in the module share the common char- 
acteristic that they are not adequately explained by the tra- 
ditional, Mendelian concepts that are taught in introduc- 
tory biology at the high school level. In addition to updat- 
ing the genetics curriculum and communicating the natiure 
of science, the module devotes one activity to the ethical 
and social aspects of new genetics discoveries by challeng- 
ing smdents to consider the current reluctance to test as- 
ymptomatic minors for the presence of the HD gene. 

The major challenge we have faced in this project is to 
make relatively technical genetics information accessible 
to high school teachers and students and to turn the often 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




passive treatment of scientific processes into an active ex- 
perience that helps students develop an understanding and 
appreciation of the nature and methods of science. The 
module is being field tested in classrooms across the coun- 
try. Evaluation data from the field test will guide final revi- 
sion of the module prior to distribution. 

DOE Grant No. DE-FG03-95ER61989. 

The Human Genome Project: Biology, 
Computers, and Privacy: Development 
of Educational Materials for High 
School Biology 

Joseph D. Mclnerney. Lynda B. Midkas, and B. Ellen 


Biological Sciences Curriculum Study; Colorado Springs, 

CO 80918 

719/531-5550. Fax: -9\M, 

One of the challenges faced by the Human Genome 
Project (HGP) is to handle effectively the enormous quan- 
tities and types of data that emerge as a result of progress 
in the project. The informatics aspect of the HGP offers an 
excellent example of the interdependence of science and 
technology. In addition, the electronic storage of genonuc 
information raises important questions of ethics and public 
policy, many revolving around privacy. 

The Biological Sciences Curriculum Study (BSCS) ad- 
dresses the scientific, technological, ethical, and policy 
aspects of genome informatics in the instructional program 
titled The Human Genome Project: Biology, Computers, 
and Privacy. The program, intended for use in high school 
and college biology, consists of software and a 150-page 
print module. The software includes two model databases: 
a research database housing anonymous data (map data, 
sequence data, and biological/clinical information) and a 
registry that attaches names of 52 fictitious individuals 
(three kindreds) to genomic data. Students manipulate the 
database software as they work through seven classroom 
inquiries described in the print material. Also included is 
50 pages of background material for teachers. 

An introductory activity lets students become familiar with 
the software and dramatically demonstrates the advantages 
of technology in analysis of sequence data. In activities 1 
and 2, students use the database to construct pedigrees and 
make initial choices about privacy with regard to genetic 
tests for their fictitious person. Activity 3 expands genetic 
anticipation, and in activities 4 and 5, students deal in 
depth with decision-making, ethics, and public policy, re- 
visiting their earlier decision about testing and data acces- 
sibility. A final extension activity shows how comparisons 
with genomic data can be used to test hypotheses about the 
biological relationships between individual humans and 

about the evolutionary significance of DNA sequence 
similarities between different species. 

External reviews and evaluation data from a field test in- 
volving 1,000 students in schools across the United States 
were used to guide final revision of the materials. BSCS 
will distribute the module free of charge to more than 
10,000 high school and college biology teachers. 

DOE Grant No. DE-FO03-93ER61584. 

Involvement of High School Students in 
Sequencing the Human Genome 

Maureen M. Munn, Maynard V. Olson, and Leroy Hood 

Department of Molecular Biotechnology, University of 

Washington; Seattle, WA 98 195 

206/616-4538, Fax: /685-7344, mmunn® 

For the past two years, we have been developing a pro- 
gram that involves high school students in the excitement 
of genetic research by enabling them to participate in se- 
quencing the human genome. This program provides high 
school teachers with the proper training, equipment, and 
support to lead their students through the exercise of se- 
quencing small portions of DNA. The participating class- 
rooms carry out two experimental modules, DNA synthe- 
sis (an introduction to DNA replication and the techniques 
used to study it) and DNA sequencing. Both of these ex- 
periments consist of three parts-synthesizing DNA frag- 
ments using Sequenase and a biotinlabeled primer, bench 
top electrophoresis using denaturing polyacrylamide gels, 
and colorimetric DNA detection that is specific for the 
biotinylated primer Students analyze their sequencing data 
and enter it into a DNA assembly program. This year, in 
collaboration with Eric Lynch and Mary-Claire King from 
the Department of Genetics at the University of Washing- 
ton, the students will be sequencing a region of chromo- 
some 5q that may be involved in a form of hereditary deaf- 

Students also consider the ethical, legal and social issues 
(ELSI) of genome research in a unit that explores the topic 
of presymptomatic testing for Huntington's disease (HD). 
This module was developed by Sharon Durfy and Robert 
Hansen from the Department of Medical History and Eth- 
ics at the University of Washington. It provides a scenario 
about a family that carries the HD allele, descriptions of 
the clinical and genetic aspects of the disorder, an exercise 
in drawing pedigrees and an autoradiograph showing the 
PCR assay used to detect HD. Students use an ethical 
decision-making model to decide whether, as a character 
from the scenario, they would be tested presymptomati- 
cally for the HD allele. Through this experience, they de- 
velop the skills to define ethical issues, ask and research 
the relevant questions about a particular topic and make 
justifiable ethical decisions. 

52 DOE Human Qanome Program Report, Part 2, 1996 Reaaarch Abatract* 



In the first two years of this program, our focus was on the 
development of robust, classroom friendly modules that 
can be presented in up to six classes at one time. This year 
we will focus on disseminating this program to local, re- 
gional, and national sites. During a week-long workshop in 
July, 1995, we trained an additional thirteen high school 
teachers, bringing our current number to twenty teachers at 
thirteen schools. We have recruited local scientists to act as 
mentors to each of the schools and provide classroom sup- 
port. On the regional level, four of our teachers are from 
outside the greater Seattle area and will be supported dur- 
ing the classroom experiments by scientists in their region. 
We have presented this program at national meetings and 
workshops, including the Human Genome Teacher Net- 
working Project Woilcshop in Kansas City, KS (June, 
1995) and the meeting of the National Association of Biol- 
ogy Teachers in Phoenix, AZ (October 1995). We have 
also distributed our modules to teachers and scientists 
throughout the nation to encourage the development of 
similar programs. This year we will also develop and pilot 
a module using automated sequencing. This will enable 
distant schools to participate in the program by providing 
them with the option of sending their DNA samples to the 
UW genome center for electrophoresis . 

WTiile we hope the human genome sequencing experience 
will interest some students in science careers, a broader 
goal is to encourage high school students to think con- 
structively and creatively about the implications of scien- 
tific findings so that the coming generation of adults will 
make judicious decisions affecting public policies. 

DOE Grant No. DE-FG03-96ER62175. 

The Gene Letter: A Newsletter on 
Ethical, Legal, and Social Issues in 
Genetics for Interested Professionals 
and Consumers 

Philip J. ReUly, Dorothy C. Wertz, and Robin J.R. Blatt' 

The Shriver Center for Mental Retardation; Division of 
Social Science, Ethics and Law; Waltham, MA 02254 
617/642-0230, Fax: l%9^-5}A0, 
'Also at Massachusetts Department of Public Health, Bos- 
ton, MA 

We propose to develop a newsletter on ELSI-related issues 
for dissemination to a broad general audience of profes- 
sionals and consumers. No such focussed public newsletter 
currently exists. Entitled The Gene Letter, the newsletter 
will be distributed monthly on-line, through the Internet. 
Updated weekly on the Internet, it will be poised to react 
in a timely fashion to new developments in science, law, 
medicine, ethics, and culture. The newsletter does not pro- 
pose to provide comprehensive education in genetics for 

the American public, but rather to begin an information 
network that interested people can use for further informa- 
tion. It will be the roost widely-distributed newsletter on 
ELSI genetics in the world, with the largest consumer 
readership. Features will be largely informational and will 
include new scientific/medical developments and attendant 
ELSI issues, new court decisions, legislation, and regula- 
tions, balanced responses to new concerns in the media, 
and new developments related to health that may be of in- 
terest to health care providers and consumers. Features 
will present balanced opinions. An editorial board will re- 
view each issue, prior to publication, for cultural sensitiv- 
ity, emphasis, balance, and concerns of persons with dis- 
abilities. The Gene Letter will also include factual infor- 
mation on upcoming events, new ELSI research, where to 
fmd genetics on the Internet, new publications (annotated), 
and where to fmd further information about each feature. 
Readers will be invited to send letters, queries, news, bibli- 
ography, comments, and consumer concerns either on The 
Gene Letter Internet chatroom or in hard copy. A hard 
copy of the fu^t on-line issue will be used to assess read- 
ers' needs and interests. It will be distributed to 5(X) com- 
mimity college students representing blue-collar ethnic 
groups, and to 2000 members of a broad general audience. 

A special evaluation of readers' knowledge and ethical/ 
social concerns raised by The Gene Letter will take place 
at the end of the second year in order to assess outcome. It 
is oiu- intention that The Gene Letter become self-support- 
ing after two years. 

DOE Grant No. DE-FG02-%ER62 1 74. 

The DNA Files: A Nationally 
Syndicated Series of Radio Programs 
on the Social Implications of Human 
Genome Research and Its Applications 

Bari Scott, Matt Binder, and Jude Thilman 

Genome Radio Project; KPFA-FM; Berkeley, CA 94704 

510/848-6767 ext 235, Fax: /883-0311, 

The DNA File.i is a series of nationally distributed public 
radio programs furthering public education on develop- 
ments in genetic science. Program content is guided by a 
distinguished body of advisors and will include the voices 
of prominent genetic researchers, people affected by ad- 
vances in the clinical application of genetic medicine, 
members of the biotech industry, and others from related 
fields. They will provide real-life examples of the complex 
social and ethical issues associated with new discoveries in 
genetics. In addition to the general public radio audience, 
the series will target educators, scientists, and involved 
professionals. Ancillary educational materials will be dis- 
tributed in paper and digital form through over (wo dozen 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




collaborative organizations and fulfillment of listener re- 

"DNA and Behavior Is Our Fate Written in Chir Genes?" 
is the pilot documentary for the series, scheduled for re- 
lease in early 1996. The show will help the lay person un- 
derstand and evaluate recent research in the area of behav- 
ioral genetics. Recently, we've seen news media reports on 
newly discovered genetic factors being related to behav- 
iors such as alcoholism, mental illness, sexual orientation 
and aggression. This program will look at several ex- 
amples of these "genetic factors" and evaluate the 
strengths and weaknesses of various methodologies in- 
volved in the research; and introduce such controversial 
issues as the re-emergence of a eugenics movement based 
on theoretical suppositions drawn from recent work in be- 
havioral genetics. 

With information linking major diseases such as breast 
cancer, colon cancer, and arteriosclerosis to genetic fac- 
tors, new dangers in public perception emerge. Many 
people who hear about them mistakenly conclude that 
these diseases can now be easily diagnosed and even 
cured. On the other end of the public perception spectrum, 
unfounded fears of extreme, and highly unlikely, conse- 
quences also appear. Will society now genetically engineer 
whole generations of people with "designer genes" offer- 
ing more "desirable physical qualities"? The DNA Files 
will ground public understanding of these issues in reality. 
"DNA and the Law" reviews the scientific basis for ge- 
netic fmgerprinting and looks at cases of alleged genetic 
discrimination by insurance companies, employers and 
others. This program also looks at disputes over paternity, 
intellectual property rights, the commercialization of ge- 
netic information, informed consent and privacy issues. 
Other shows include "The Search for a Breast Cancer 
Gene," "Prenatal Genetic Testing and Treatment," "Evolu- 
tion and Genetic Diversity," "Sickle-Cell Disease and 
Thalassemia: Hope for a Cure," and "Theology, Mythol- 
ogy and Human Genetic Research." 

DOE Grant No. DE-FGO3-95ER62003. 

Communicating Science in Plain 
Language: The Science+ Literacy for 
Health: Human Genome Project 

Maria Sosa, Judy Kass, and Tracy Gath 

American Association for the Advancement of Science; 

Washington, DC 20005 

202/326-6453, Fax: /37I-9849, 

Recent literacy surveys have found that a large number of 
adults lack the skills to bring meaning to much of what is 
written about science. This, in effect, denies them access to 
vital information about their health and well-being. To ad- 

dress this need, the American Association for the Advance- 
ment of Science (AAAS) is developing a 2-year project to 
provide low-literate adults with the background knowledge 
necessary to address the social, ethical, and legal implica- 
tions of the Human Genome Project. 

With its Science ■•■ Literacy for Health: Human Genome 
Project, AAAS is using its existing network of adult edu- 
cation providers and volunteer science and health profes- 
sionals to pursue the following overall objectives: (1) to 
develop new materials for adult literacy classes, including 
a high-interest reading book and accompanying curricu- 
lum, an implementation framework, a short video provid- 
ing background information on genetics, a database of re- 
sources, and fact sheets that will assist other organizations 
and researchers in preparing easy-to-read materials about 
the human genome project, and (2) to develop and conduct 
a campaign to disseminate project materials to libraries 
and community organizations carrying out literacy pro- 
grams throughout the United States. 

Because not every low-literate adult is enrolled in a lit- 
eracy class, our model for helping scientists communicate 
in simple language will have impact beyond classrooms 
and learning centers. In preliminary conucts, community 
groups providing health services have indicated that the 
proposed materials are not only desirable but needed; in- 
deed such groups often receive requests for information on 
heredity and genetics. The module developed by AAAS 
should enable other medical and scientific organizations to 
communicate more effectively with economically disad- 
vantaged populations, which often include a large number 
of low-literate individuals. 

DOE Grant No. DE-FG02-95ER6I988. 

The Community College Initiative 

Sylvia J. Spengler and Laurel Egenberger 

Lawrence Berkeley National Laboratory; Berkeley, CA 


510/486-4879, Fax: -5717, 

The Community College Initiative prepares community 
college students for work in biotechnology. A combined 
effort of Lawrence Berkeley National Laboratory (LBNL) 
and the California Community Colleges, we aim to de- 
velop mechanisms to encourage students to pursue science 
studies, to participate in forefront laboratory research, and 
to gain work experience. The initiative is structured to up- 
grade the skills of students and their instructors through 
four components. 

Summer Student Workshops: Four weeks summer resi- 
dential programs for students who have completed the first 
year of the biotechnology academic program. Ethical, legal 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



and social concents are integrated into the laboratory exer- 
cises and students learn to identify commonly shared val- 
ues of the scientific community as well as increase their 
understanding of issues of personal and public concern. 

Teacher Workshop Training: Seminars for biotechnology 
instructors to improve, upgrade, and update their under- 
standing of current technology and laboratory practices, 
with emphasis on curriculum development in current top- 
ics in ethical, legal, and social issues in science. 

Sabbatical Fellowships: For community college instruc- 
tors to provide investigative and field experience in re- 
search laboratories. During the fellowship, teachers also 
assist in development of student summer research activi- 

Summer Faculty-Student Teams: Post-fellowship faculty 
and biotechnology students who have finished their second 
year of study team on a research project. 

Genome Ekiucators 

Sylvia Spengler and Janice Mann 

Human Genome Program; Life Sciences Division; 

Lawrence Berkeley National Laboratory; Bertceley. CA 


510/486-4879. Fax: -5717. or 



Genome Educators is an informal network of educational 
professionals who have an active interest in all aspects of 
genetics research and education. This national group in- 
cludes scientists, researchers, educational curriculum de- 
velopers, ethicists, health professionals, high school teach- 
ers and instructors at college and graduate levels, and oth- 
ers in occupations affected by genetic research. 

Genome Educators is a unique collaborative effort dedi- 
cated to sharing information and resources to further un- 
derstanding of current advances in the field of genetics. 
Seminars, workshops, and special events are sponsored at 
frequent intervals. Genome Educators maintains an active 
World Wide Web site (URL: 
tion/Genome). This site contains a calendar of events, di- 
rectory of participating genome educators, and information 
about educational resources and reference tools. Participat- 
ing genome educators may publish articles and talks of 
interest at this site. In addition, a monitored discussion 
group is maintained to facilitate dialog and resource shar- 
ing among participants. 

Getting the Word Out on the Human 
Genome Project: A Course for 

Sara L. Tobin and Ann Boughton' 

Department of Biochemistry and Molecular Biology; 
Center for Biomedical Ethics; Stanford University; Palo 
Alto. C A 94304-1709 

415/725-2663. Fax: -6131. 
'Thumbnail Graphics; Oklahoma City. OK 73 1 18 

Progressive identification of new genes and implications 
for medical treatment of genetic diseases appear almost 
daily in the scientific and medical literature, as well as in 
public media reports. However, most individuals do not 
understand the power or the promise of the current explo- 
sion in knowledge of the human genome. This is also true 
of physicians, most of whom completed their medical 
training prior to the application of recombinant DNA tech- 
nology to medical diagnosis and treatment. This lack of 
training prevents physicians from appreciating many of the 
recent advances in molecular genetics and may delay their 
acceptance of new treatment regimens. In particular, physi- 
cians practicing in rural communities are often limited in 
their access to resources that would bring them into the 
mainstream of current molecular developments. This 
project is designed to fill two important functions: fu^t. to 
provide solid training for physicians in the field of molecu- 
lar medical genetics, including the impact, implications, 
and potential of this field for the treatment of human dis- 
ease; second, to utilize physicians as informed community 
resources who can educate both their patients and commu- 
nity groups about the new genetics. 

We propose to develop a flexible, user-friendly, interactive 
multimedia CD-ROM designed for continuing education 
of physicians in applications of molecular medical genet- 
ics. To initiate these objectives, we will develop the design 
of the CD and will produce a prototype providing a de- 
tailed presentation of one of the four training areas. These 
areas are (I) Genetics, including DNA as a molecular blue- 
print, chromosomes as vehicles for genetic information, 
and patterns of inheritance; (2) Recombinant techniques, 
stressing cloning and analytical tools and techniques ap- 
plied to medical case studies; (3) Current and future clini- 
cal applications, encompassing the human genome project, 
technical advances, and disease diagnosis and prognosis; 
and (4) Societal implications, focusing on approaches to 
patient counseling, genetic dilemmas faced by patients and 
practitioners, and societal values and development of an 
ethical consensus. Area (2) will be presented in the proto- 

The CD format will permit the use of animation, video, 
and audio, in addition to graphic illustrations and photo- 
graphs. We will build on our existing base of computer 
generated illustrations. A hypertext glossary, user notes. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstrects 





practice tests, and customized senings will be utilized to 
tailor the CD to the needs of the user. Brief, 
multiple-choice examinations will be evaluated for con- 
tinuing medical education credits by the Office of Continu- 
ing Medical Education. The CD will be programmed to 
permit updates of scientific and medical advances either 
by downloading from the Internet or from a disc available 
by subscription. 

This is a cooperative project involving individuals with 
documented expertise in teaching of molecular medical 
genetics, continuing medical education, graphic design, 
and CD-ROM production. The content of the CD will be 
supervised by a scientific board of directors. We present 
mechanisms for the evaluation of the CD by rural Okla- 
homa physicians. Arrangements have been made for distri- 
bution of the CD by a national publisher of medical and 
scientific materials. This CD will provide a powerful tool 
to educate physicians and the public about the power and 
potential of the human genome project for the benefit of 
human health. 

DOE Grant No. DE-FG03-96ER62172. 

The Genetics Adjudication Resource 

Franklin M. Zweig 

Einstein Institute for Science, Health, and the Courts; 

Bethesda.MD 20814 

301/961-1949, Fax: /9I3-0448, 


The Einstein Institute for Science, Health, and the Courts 
is preparing the foundation for a new utility needed to pre- 
pare the nation's 21,000 courts to adjudicate the genetics 
and ELSI-related issues that foreseeably will rush into the 
courtroom as the Human Genome Project completes its 
genomic mapping and sequencing mission during the next 
ten years. This project initiates practical collaboration 
among courts, legal and policy-making institutions, and 
science centers leading to modalities for understanding the 
scientific vaUdity of claims, and for the resolution of ethi- 
cal, legal, and social disputes arising within the genetic 
testing and gene therapy contexts. Our objective over the 
ensuing decade is to facilitate genetic testing and gene 
therapy dispute management, and to avoid to the extent 
possible the confusion that characterized adjudication of 
forensic DNA technologies during the decade just ended. 

The outlines of a genetics adjudication utility were given 
form by the 1995 Working Conversation on Genetics, Evo- 
lution, and the Courts, involving 37 federal and state 
judges and others in science and policymaking leadership 
positions from across the nation. The courts are becoming 
aware of genetics, molecular biology, and their applica- 
tions, and judges want public confidence to be maintained 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 

as the profound and complex issues set in motion by the 
HGP begin the long course of litigation. Modalities for 
understanding the underpinning science are needed, as 
well as instrumentalities to assure that the best cases are 
actually filed and piu^ued. Because the courts are the 
front-line for resolving disputes, creative lawyering will 
assure an abundance of lawsuits. Many such lawsuits will 
request the coiuts to make policy judgments, perhaps best 
undertaken by state legislatures and Congress. Accord- 
ingly, a new adjudication utility should provide forums for 
judicial/legislative exchange, preparatory deliberations in 
anticipation of pressure to make rushed policies under con- 
ditions of great social uncertainty in the wake of human 
genetics progress. 

EINSHAC will provide a design, planning, communica- 
tions, and implementation center for a multipurpose re- 
source project available to the courts. It will undertake 
over an 18 month period the following tasks, pilot-testing 
each and assessing the best organizational locales for those 
that exhibit promise: 

1. Judicial Education in Genetics & ELSI-Related Issues 
for six Judicial Branch leadership associations and nine 
metropolitan courts — aimed at 1.000 judges — in conjunc- 
tion with scientific faculty and coaches mobilized by 
DOE/national laboratories and the American Society for 
Human Genetics. 

2. Judicial Digital Electronic Collegium — technological 
modernization of the courts community by providing ac- 
cess to ELSI and genetics information through Internet 

3. Amicus Brief Development Trust Fund — a process and 
resources to support law development at the state and fed- 
eral appeals courts level. 

4. Genetics Indigent Party Trust Fund — a process and re- 
sources at the state and federal trial level to sustain merito- 
rious civil cases holding promise of effective law develop- 

5. Establishment of a Pro-Bono Legal Services Clearing- 
house — a personal and on-line referral resource for per- 
sons seeking representation for genetics and ELSI-related 


6. Access to Neutral Expert Witnesses — advisors to courts 
encountering particularly complex cases deemed right for 
the judicial exercise of Federal Rule of Evidence 706 and 
its State counterparts. 

7. Pilot of Judicial/Legislative ELSI Policy Forums — pro- 
vision of neutral staff and coordination in three 
mid-Atlantic states considering legislation related to health 
care, insurance, privacy, medical records. 



8. National Training Center for Minority Justice Person- rectors that includes prominent judges, justices and scien- 

nel — facilitating a leadership preparation program for the lists, several of whom participated in the 1995 Working 

nation's minority court-related personnel in a consortium Conversation on Genetics, Evolution and the Courts. As a 

arrangement with the Ruffin Society of Massachusetts, the continuing guidance forum, EINSHAC will conduct a 

College of Criminal Justice at Northeastern University, Working Conversation followup in Orleans, Cape Cod in 

and the Flaschner Judicial Institute. July, 1996. 

The Project actively involves judges, scientists, and promi- DOE Grant No. DE-FGO2-96ER6208 1 . 
nent lawyers. It will report to the EINSHAC Board of Di- 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 57 



Alexander Hoilaender Distinguished 
Postdoctoral Fellowships 

Linda Holmes and Eugene Spejewski 

Oak Ridge Institute for Science and Education; Oak Ridge, 

TN 37831-01 17 

423/576-3192, Fax: /24I-5220, or 

The Alexander Hoilaender Distinguished Postdoctoral Fel- 
lowships, sponsored by the Department of Energy (DOE), 
Office of Health and Environmental Research (OHER), 
support research in the fields of life, biomedical, and envi- 
ronmental sciences. Since the EXJE Human Genome Dis- 
tinguished Postdoctoral Fellowships and DOE Global 
Change Distinguished Postdoctoral Fellowships both had 
their last application cycles in FY 1995, the Hoilaender 
program is now open to recent PhD graduates in the fields 
of huntan genome and global change, as well. 

Fellowships of up to 2 years are tenable at any DOE, uni- 
versity, or private laboratory providing the proposed ad- 
viser at that laboratory receives at least $150,000 per year 
in support from OHER. Fellows earn stipends of $37,500 
the first year and $40,500 the second. To be eligible, appli- 
cants must be U.S. citizens or permanent residents at the 
time of application, and must have received their doctoral 
degrees within two years of the earliest possible starting 
date, which is May I of the appointment year. 

The Oak Ridge Institute for Science and Education 
(ORISE), administrator of the fellowships, prepares and 
distributes program literature to universities and laborato- 
ries across the country, accepts applications, convenes a 
panel to make award recommendations, and issues stipend 
checks to fellows. The review panel identifies finalists 
from which DOE selects the award winners. Deadline for 
the FY 1999 fellowship cycle is January 15, 1998. For 
more information or an application packet, contact Linda 
Holmes at the Oak Ridge Institute for Science and Educa- 
tion, R O. Box 117, Oak Ridge, TN 37831-0117 (423/ 
576-9975. Fax: /24 1-5220). 

DOE Contract No. DE-AC05-760R00033. 

Human Genome Management 
Information System 

Betty K. Mansfield, Anne E. Adamson, Denise K. Casey, 

Sheryl A. Martin, John S. Wassom, Judy M. Wyrick, 

Laura N. Yust, Murray Browne, and Marissa D. Mills 

Life Sciences Division; Oak Ridge National Laboratory; 

Oak Ridge, TN 37830 

423/576-6669, Fax: /574-9888, 


The Human Genome Management Information System 
(HGMIS), established in 1989, provides information about 
the international Human Genome Project in print and 
World Wide Web formats to both technical and general 
audiences. HGMIS is spoasored by the Human Genome 
Program Task Group of the DOE Office of Biological and 
Environmental Research to help fulfill DOE's commitment 
to informing scientists, policymakers, and the public about 
the program's funded research and the context in which the 
research is conducted. Several HGMIS products, including 
the Web sites and newsletter, have won technical and elec- 
tronic communication awards. 

HGMIS goals center on facilitating research at the inter- 
face of genomics and other biological disciplines that seek 
revolutionary solutions to biological, environmental, and 
biomedical challenges. By communicating information 
about the Human Genome Project and its impact, HGMIS 
increases the use of project-generated resources, reduces 
duplicative research efforts, and fosters collaborations and 
contributions to biology from other research disciplines. 

Furthermore, communicating scientific and societal issues 
to nonscientist audiences contributes to increased science 
literacy, thus laying a foundation for more informed deci- 
sion making and public-policy development. For example, 
since 1995 HGMIS has been participating in a project to 
educate the judiciary about the basics of genetics and gene 
testing. The aim is to prepare judges for the flood of cases 
involving genetic evidence that soon will enter the nation's 

Information Resources 

In keeping with its goals, HGMIS produces the following 
information resources in print and on the Web: 

Human Genome News (HGN). A quarterly forum for in- 
terdisciplinary information exchange, HGN uniquely pre- 
sents a broad spectrum of topics related to the Human Ge- 
nome Project in a single publication. Articles feature topics 
that include project goals, progress, and direction; avail- 
able resources; appUcations of project data and resources 
to provide a better understanding of biological processes; 
related or spinoff programs; medical uses of genome data; 
ethical, legal, and social considerations; legislative up- 
dates; other publications; meeting calendars; and fimding 
information. Most HGN articles also contain sources of 
additional information. In May 1997, DOE acknowledged 
the newsletter's value by presenting an exceptional service 
award to WC/Vs managing editor at a symposium celebrat- 
ing 50 years of biological and environmental research. 

Among 14,000 domestic and foreign HGN subscribers are 
genome and basic researchers at universities, national 
laboratories, nonprofit organizations, and industrial facili- 
ties; educators; industry representatives; legal personnel; 
ethicists; students; genetic counselors; medical profession- 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




als; science writers; and other interested individuals. All 41 
issues of HGN. indexed and searchable, are accessible via 
the HGMIS Web site. 

Other Publications. HGMIS also produces the DOE 
Primer on Molecular Genetics, progress reports on the 
DOE Human Genome Program, Santa Fe contractor- 
grantee workshop proceedings, 1-page topical handouts, 
and other related resource documents. Expanded and re- 
vised by HGMIS from an earlier DOE document, the DOE 
Primer on Molecular Genetics continues to be in demand. 
It is used as a handout for genome centers; a resource for 
new staff training by companies that make products for 
genome scientists; and an educational tool for teachers, 
genetic counselors, and such organizations as high schools, 
universities, and medical schools for student and 
continuing-education curricula. More than 35,000 hard 
copies have been distributed. The primer also is available 
in several formats at the HGMIS Web site, including an 
Adobe Acrobat version that can be used to print "origi- 
nals" firom users' printers. 

Distribution of Documents. HGMIS has distributed more 
than 65,000 copies of items requested by subscribers, 
meeting attendees, and managers of genetics meetings and 
educational events. These items include HGN, program 
and workshop reports, DOE-hflH 5-year plans, DOE 
Primer on Molecular Genetics, and To Know Ourselves. 
On request, HGMIS supplies multiple copies of publica- 
tions for meetings and educational purposes. 

Electronic Communications. In November 1994, HGMIS 
began producing a comprehensive, text-based Web server 
called Human Genome Project Information, which is de- 
voted to topics relating to the science and societal issues 
surrounding the genome project. In July 1997, this site was 
divided to better serve the two diverse audience categories 
that represent the majority of users: scientists and the pub- 
lic. The sites contain more than 1700 text files that are ac- 
cessed over 1.2 million times each year. Each month, 
about 10,000 host computers connect to the HGMIS sites 
directly and through more than 1000 other Web sites. In 
addition, HGMIS hnks to the National Institutes of Health 
and international Human Genome Organisation sites, as 
well as to sites dedicated to education and to the ethical, 
legal, and social implications of the Human Genome 

All HGMIS publications are published on the Web site, 
along with such DOE-sponsored documents as Your 
Genes, Your Choices; the Genetic Privacy Act; and histori- 
cal and other documents pertaining to the Human Genome 
Project. HGMIS collaborates with the Einstein Institute for 
Science, Health, and the Courts to produce CASOLM, the 
online magazine for judicial education in genetics and bio- 
medical issues. HGMIS also maintains the Genetics sec- 
tion of the Virtual Library firom CERN (Switzerland) and 

the DOE Human Genome Program pages and moderates 
the BioSci Human Genome Newsgroup. 

Information Source 

HGMIS answers individual questions and supplies general 
information about the Human Genome Project by tele- 
phone, fax. and e-mail and, as appropriate, links scientists 
with questions to appropriate Human Genome Project con- 
tacts. HGMIS staff exchange ideas and suggestions with 
investigators, industry representatives, and others when 
attending occasional scientific conferences and 
genome-related meetings and displaying the DOE Human 
Genome Project traveling exhibit. HGMIS staff also make 
presentations on the Human Genome Project to educa- 
tional, judicial, and other groups. 

HGMIS resources serve as a primary source for the popu- 
lar media and for discipline-specific publications that 
broaden the distribution of genome project information by 
extracting and reprinting firom HGMIS resources and by 
linking to various parts of the HGMIS Web site. 

HGMIS continuously monitors changes in the direction of 
the Internationa] Human Genome Project and searches for 
ways to strengthen the content relevancy of the newsletter, 
the Web site, and other services. 

DOE Contract No. DE-AC05-96OR22464. 

Human Genome Program 

Sylvia J. Spengler 

Lawrence Berkeley National Laboratory; Berkeley CA 


510/486-4879, Fax: -5717, 


The DOE Human Genome Program of the Office of 
Health and Environmental Research (OHER) has devel- 
oped a number of tools for management of the Program. 
Among these was the Human Genome Coordinating Com- 
mittee (HGCC), estabhshed in 1988. In 1996, the HGCC 
was expanded to a broader vision of the role of genomic 
technologies in OHER programs, and the name was 
changed to reflect this broadening. The HGCC is now the 
Biotechnology Forum. The Forum is chaired by the Asso- 
ciate Director, OHER. Members of the Human Genome 
Program Management Task group are ex officio members, 
as are members of the Health and Environmental Research 
Advisory Committee's subcommittee on the Human Ge- 
nome. Responsibihties of the Forum include: assisting 
OHER in overall coordination of DOE-funded genome 
research; facilitating the development and dissemination of 
novel genome technologies; recommending establishment 
of ad hoc task groups in specific areas, such as informatics. 


DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 


technologies, model organisms; and evaluation of progress 
and consideration of long-term goals. Members also serve 
on the Joint DOE-NIH Subcommittee on the Human ge- 
nome, for interagency coordination. The coordination 
group also participates in interface programs with other 
facilities and provides scientific support for development 
of other OHER goals, as requested. 

Support of Human Genome Program 
Proposal Reviews 

Walter Williams 

Education/Training Division; Oak Ridge Institute for 
Science and Education; Oak Ridge, TN 37831-01 17 
423/576-4811, Fax: /241-2727, 

The Oak Ridge Institute for Science and Education 
(ORISE), operated by Oak Ridge Associated Universities, 
provides assistance to the DOE Office of Health and Envi- 
ronmental Research in the technical review of proposals 
submitted in response to solicitations by the DOE Human 
Genome Program. ORISE staff members create and main- 
tain a database of all proposal information; including ab- 
stracts, relevant names and addresses, and budget data. 
This information is compiled and presented to proposal 
reviewers. Before review meetings, ORISE staff members 
make appropriate hotel and meeting arrangements, provide 
each reviewer with proposal copies and evaluation guide- 
lines, and coordinate reviewer travel and honoraria pay- 
ment Onsite meeting support includes collecting all re- 
viewer evaluation forms and scores, entering reviewer 
scores into the database, preparing appropriate reports, 
providing onsite computer support and handling all logis- 
tical issues. Other support includes assistance with pro- 
gram advertising and preparation of reviewer comments 
following each review. ORISE may also assist with pre- 
and post-review activities related to conferences, seminars, 
and site visits. 

DOE Contract No. DE-AC05-76OR00033. 


Former Soviet Union Office of Health 
and Environmental Research Program 

James Wright 

Education/Training Division; Oak Ridge Institute for 
Science and Education; Oak Ridge, TN 37831-01 17 
423/576-1716, Fax: /241-2727, 

The Former Soviet Union Office of Health and Environ- 
mental Research Program, sponsored by the U.S. Depart- 
ment of Energy, Office of Health and Environmental Re- 
search, recognizes outstanding scientists in the field of 
health and environmental research from the independent 
states of the former Soviet Union. The program fosters the 
international exchange of new ideas and innovative ap- 
proaches in health and environmental research; strengthens 
ties and encourages continuing collaboration among Rus- 
sians and U.S. scientists; and establishes and maintains 
environmental research capability in the former Soviet 
Union. The program has supported more than 23 Russian 
principal investigators and approximately 1 10 other re- 
search associates in Moscow, St. Petersburg, and 
Novosibirsk. More importantly, the program has enabled 
many high quality Russian biological, genome informatics, 
physical mapping and mutagenesis detection, human ge- 
netics,, biochemistry, DNA sequencing technology, protein 
analysis, molecular genetics, and other related research 
infrastructures to continue operating in an lucertain eco- 
nomic environment. 

DOE Contract No. DE-AC05-76OR00033. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 



Small Business Innovation Research 

1996 Phase I 

An Engineered RNA/DNA Polymerase 
to Increase Speed and Ekronomy of 
DNA Sequencing 

Mark W. Knuth 

Promega Corporation; Madison, WI 53711-5399 
608/274-4330, Fax: /277-2601 

DNA sequence information is the cornerstone for consider- 
able experimental design and analysis in the biological 
sciences. The proposed studies will focus on advancing 
DNA sequencing by creating a new enzyme that eliminates 
the need for an oligonucleotide primer to initiate DNA 
synthesis at a defined site, and that can use dideoxy nucle- 
otides for chain termination. The new method should re- 
duce the time and cost required to obtain DNA sequences 
and enhance the speed and cost effectiveness of current 
DNA sequencing technologies. Phase I studies will focus 
on purifying mutant T7 RNA polymerases known <o incor- 
porate dNTPs into DNA chains, developing protocols for 
rapid small scale mutant enzyme purification, evaluating 
the purified mutants for properties relevant to DNA se- 
quencing, developing facile mutagenesis schemes and pro- 
ducing mutant RNA/DNA polymerases with altered pro- 
moter recognition. The results from phase I will provide 
the foundation for Phase II research, which will focus on 
refming properties of the mutant by: (1) expanding the 
niunber of mutations examined using the purification pro- 
tocols, assays, and mutagenesis screening methods devel- 
oped in Phase I and (2) examining the effect of each muta- 
tion on enzymatic properties important to DNA sequencing 
applications, and (3) optimizing conditions for sequencing 
performance. In Phase III, Promega will commercialize the 
new mutant enzymes through its own extensive distribu- 
tion network and by collaborating with major instrumenta- 
tion firms to adapt the technology to automated DNA se- 
quencing systems. 

DOE Grant No. DE-FG02 96ER8226. 

Directed Multiple DNA Sequencing and 
Expression Analysis by Hybridization 

Gualberto Ruano 

BIOS Laboratories, Inc.; New Haven, CT 06511 
800/678-9487 or 203/773-1450, Fax: 800/315-7435 or 

The overall goal of this project is to develop molecular 
resources with direct applications to either DNA sequence 
analysis or gene expression analysis in multiplexed for- 
mats using sequential hybridization of Peptide Nucleic 
Acid (PNA) oligomer probes. PNA oligomers hybridize 
more stably and specifically to cognate DNA targets than 
conventional DNA oligonucleotides. The Phase I project 
discussed here is concerned with development of PNA 
probe technology having direct application either to the 
directed sequencing process or to gene expression profil- 
ing. With regard to directed sequencing, we seek improve- 
ments in the three multiply repeated steps associated with 
this process, namely (1) probe assembly, (2) sequencing 
reactions, and (3) gel electrophoresis. In PNA hybridiza- 
tion sequencing, sequences are generated directly from the 
template by multiplex DNA sequencing using anchor 
primers known to have frequent annealing sites. Electro- 
phoresis is performed en masse for each anchor primer 
reaction, blotted to nylon membranes and individual se- 
quences are selectively exposed by iterative hybridization 
to specific 8-mer PNA probes derived from sequences sta- 
tistically over-represented in expressed DNA and obtained 
from a pre-synthesized library. Additionally, the same PNA 
library can be used as a source of hybridization probes for 
querying expression patterns of specific genes in any cell 
line or tissue. Specific gene expression can be monitored 
by coupling gene-specific RT-PCR with hybridization 
when cDNA products are separated by gel electrophoresis 
and blotted to nylon membranes. Patterns of gene expres- 
sion are then resolved by hybridization using PNA oligo- 
mers. Bands corresponding to specific genes can be 
deconvoluted using sequence information from RT-PCR 
primers and PNA probes. Higher throughput expression 
analysis can be achieved by multiplexed gel electrophore- 
sis, blotting and iterative probing of RT-PCR reactions 
with individual PNA probes. 

DOE Grant No. DE-FG02-96ER82I3. 

DOE Human Genome Program Report, Part 2, 1996 Research Abstracts 




1996 Phase n 

A Graphical Ad Hoc Query Interface 
Capable of Accessing Heterogeneous 
Public Genome Databases 

Joseph Leone 

CybeiConnect Corporation; Storrs, CT 06268 
860/486-2783, Fax: 7429-2372 

The interoperability of public genome databases is ex- 
pected to be crucial in making the Human Genome Project 
a success. This project will develop software tools in 
which users in the genome community can learn or exam- 
ine public genome database schemes in a relatively short 
time and can produce a correct Structured Query Language 
(SQL) expression easily. In Phase I, a concept system was 
constructed and the effectiveness of formulating ad hoc 
queries graphically was demonstrated. Phase U will focus 
on transforming the concept system into a product that is 
robust and portable. TWo types of computer programs will 
be developed. One is a client program which is to be dis- 
tributed to community users who intend to access public 
genomic databases and link them with local databases. The 
other is a server program and a suite of software tools de- 
signed to be used by those genome centers which intend to 
make their databases publicly accessible. 

DOE Grant No. DE-FG02-95ER81906. 

In Phase II work we are developing an instrument which 
simultaneously purifies plasmid DNA from up to 192 (2 
X 96) bacterial samples in 1 .5 hours. Prototypes of this 
instrument thus far constructed have allowed the purifi- 
cation of 3-7 micrograms of high purity plasmid DNA 
per lane from 1 .5 ml of bacterial culture. We have at- 
tempted to optimize all of the: instrument electrophoretic 
run parameters, lysis chemistry, lysis reagent delivery 
devices, reagent storage at room temperature, desalting 
processes and overall instrument mechanical and elec- 
tronic control. Instrument prototypes have also been 
used to prepare cosmid or yeast DNA in quantities of 1- 
5 micrograms per cassette lane. Trials thus far have 
yielded plasmid DNA of sufficient purity for direct use 
in automated fluorescent and manual sequencing as well 
as other molecular biology protocols. We have studied 
the purity of the resulting DNA when directly sequenced 
on a Licor 4000 Long Reader and ABI 373A automated 
DNA sequencers. Results from the Licor 4000 instru- 
ment give routine read lengths of >850 base pairs with 
98% accuracy while ABI 373A reads generally exceed 
400 base pairs with similar accuracy. 

The proposed 2 X 96-channel instrument will purify up 
to 1200 plasmid DNA preps per eight hour day. It will 
significantly reduce the cost and technician labor of high 
throughput plasmid DNA purification for automated se- 
quencing and mapping. 

DOE Grant No. DE-FG03-94ER8 1 802/AOOO. 

Low-Cost Automated Preparation of 
Plasmid, Cosmid, and Yeast DNA 

Tuyen Nguyen, Randy F. Sivila, Joshua P. Dyer, and 
WUliam P. MacConnell 

MacConnell Research Corporation; San Diego, CA 92121 
619/452-2603, Fax: -6753 

MacConnell Research currently manufactures and sells a 
low cost automated bench-top instrument that can purify 
up to 24 samples of plasmid DNA simultaneously in one 
hour at a cost of $0.65 per sample and under $8000 for the 
instrument. The patented instrument uses a form of agar- 
ose gel electrophoresis to purify the plasmid DNA and 
electroelutes into approximately a 20 -i-l volume. The in- 
strument has many advantages over other robotic and 
manual methods including the fact that is it two times 
faster, at least six times less expensive, much smaller in 
size, easier to operate, less cost per sample, and results in 
DNA pure enough for direct use in fluorescent automated 
sequencing. The instrument process begins with bacterial 
culture which is loaded directly into a disposable cassette 
in the machine. 

GRAIL-GenQuest: A Comprehensive 
Computational Framework for DNA 
Sequence Analysis 

Ruth Ann Manning 

ApoCom, Inc.; Oak Ridge, TN 37830 
423/482-2500, Fax: /220-2030 

Although DNA sequencing in the Human Genome 
Project is occurring fairly systematically, biotechnology 
companies have focused on sequencing regions thought 
to contain particular disease genes. The client-server 
DNA sequence analysis system GRAIL is the most accu- 
rate and widely used computer-based system for locating 
and characterizing genes in DNA sequences, but it is not 
accessible to many biotechnology environments. The 
GRAIL client software and graphical displays have been 
developed for high-end UNIX-based computer worksta- 
tions. Such workstations are standard equipment in uni- 
versities and large companies, but personal computers 
(PCs) and Macintosh computers are the prevalent tech- 
nology within the biotechnology commimity. This 
Phase I project will design Macintosh- and Wmdows- , 
based client graphical user interface prototypes for 


DOE Human Genome Program Report, Part 2, 1996 Reaearch Abstracta 



The growth of DNA databases is expected to continue at a 
fast pace in the attempt to sequence the human genome 
completely by the year 2005. Parallel processing is a vi- 
able solution to handle searching through the ever-increas- 
ing volume of data. During Phase I, genQuest — the se- 
quence comparison server portion of the GRAIL system — 
will be parallelized for shared-memory platforms and will 
use PVM' for the development of genQuest servers on net- 
works of PCs and workstations and other innovative, high- 
performance computer architectures. 

Prototype graphical interface systems for Macintosh, NT 
Windows, and Windows 95 that mimic the function and 
operation of the current GRAIL -genQuest clients will en- 

able a larger portion of biotechnology companies to make 
use of the GRAIL suite of analysis tools. Parallel genQuest 
servers will improve response time for searches and in- 
crease user capacity per server Such fast shared- and dis- 
tributed-memory computing solutions will improve the 
cost-performance ratio and make parallel searches more 
affordable to the biotechnology community using general 
multipurpose hardware. 

DOE Grant No. DE-FG02-95ER8I923. 

'The Parallel Virtual Machine (PVM) message-passing 
library allows a collection of UNIX -based computers to 
function as a single multiple-processor supercomputer 

DOE Human Genome Progrem Report, Part 2, 1996 Research Abstracts 



Projects Completed FY 1994-95 

Projects in this section have been completed or did not receive support through the DOE Human Genome Program in 
FY 1996. 


Sequencing by Hybridization: Methods to Generate 
Large Arrays of Oligonucleotides 
Thomas M. Brennan 

Sequencing by Hybridization: Development of an 
Efficient Large-Scale Methodology 
Radomir Crkvenjakov 

Genomic Instrumentation Development: Detection 
Systems for Film and High-Speed Gel-Less Methods 
Jack B. Davidson and Robert S. Foote 

Single-Molecule Detection Using CSiarge-Coupled 
Device Array Technology 

M. Bonner Denton, Richard Keller. Mark E. 

Baker, Colin W. Earle, and David A. Radspinner 

Coupling Sequencing by Hybridization with Gel 
Sequencing for Inexpensive Analysis of Genes and 

Radoje Drmanac, Snezana Drmanac, and 

Ivan Labat 

Physical Structure and DNA Sequence of Human 

Glen A. Evans 

Using Scanning Tunneling Microscopy to Sequence 

the Human Genome 

Thomas L. Ferrell, Robert J. Wannack, 
David P. Allison, K. Bruce Jacobson, 
Gilbert M. Brown, and Thomas G. Thundat 

DNA Sequence Analysis by Solid-Phase Hybridization 
Robert S. Foote, Richard A. Sachleben, and 
K. Bruce Jacobson 

DNA Sequencing Using Stable Isotopes 

K. Bruce Jacobson, Heinrich F. Arlinghaus, 
Gilbert M. Brown, Robert S. Foote, 
Frank W. Larimer, Richard A. Sachleben, 
Norbert Thonnard, and Richard P. Woychik 

Preparation of Oligonucleotide Arrays for Hybridiza- 
tion Studies 

Michael C. Pirrung, Steven W. Shuey, 
David C. Lever. Lara Fallon, J.-C. Bradley, and 
WUliam P. Hawe 

Improvement and Automation of Ligation-Mediated 
Genomic Sequencing 

Arthur D. Riggs and Gerd F Pfeifer 

♦Analysis of a 53-Kb Nucleotide Sequence from the 
Right Genome Terminus of the Variola Major Virus 
Strain India- 1967 

Sergei N. Shchelkunov, Vladimir M. Blinov, 
Sergei M. Resenchuk, Alexei V. Totmenin, 
Viktor N. Krasnykh, Ludmilla V. Olenina, 
Oleg I. Serpinsky. and Lev S. Sandakhchiev 

A High-Speed Automated DNA Sequencer 
Lloyd M. Smith 

Characterization and Modification of DNA 
Polymerases for Use in DNA Sequencing 
Stanley Tabor 


♦Toward Cloning Human Chromosome 19 in Yeast 
Artificial Chromosomes 

Ifiga P. Arman, Alexander B. Devin, Svetlana P. 

Legchihna, Irina G. Efimenko, Marina E. 

Smimova, and Dina V. Glazkova 

A Panel of Mouse-Human Monochromosomal 
Hybrid Cell Lines, Each Containing a Single Differ- 
ent Tagged Human Chromosome 

Arbansjit K. Sandhu, G. Pal Kaur, and 

Raghbir S. Athwal 

♦Preparation of a Set of Molecular Markers for 
Human Chromosome 5 Using G-t-C-Rich and 
Functional Site-Specific Oligonucleotides 

M.L. Filipenko, A.I. Muravlev, E.I. Jantsen, 
V.V. Smimova. N.A. Chikaev, V.P Mishin, and 
M.A. Ivanovich 

An Improved Method for Producing Radiation 
Hybrids Applied to Human Chromosome 19 

Cynthia L. Jackson and Hon Fong L. Mark 

DOE BOBMUBBn>Aar«ni9i«mf*ap«iRppsrt.7da9S 



Completed Projects 

Construction of a Human Genome Library Com- 
posed of Multimegabase Acentric Chromosome 

Michael J. Lane, Peter Hahn, and John Hozier 

Reagents for Understanding and Sequencing the 

Human Genome 

J.R. Korenberg, X-N. Chen, S. Mitchell, 

S. Gerwehr, Z. Sun. D. Noya, R. Hubert, 

U-J. Kim, H. Shizuya, X. Wu. J. Silva, B. Birren, 

T.J. Hudson, P. de Jong, E. Lander, and M. Simon 

Chromosome Mapping by FISH to Interphase Nuclei 
Barbara J. Trask 

Flow Karyotyping and Flow Instrumentation Devel- 

Ger van den Engh and Barbara Trask 

Isolation of Specific Human Telomeric Clones by 
Homologous Recombination and YAC Rescue 
Geoffrey Wahl and Linnea Brody 

Development of Diallelic Marker Maps Using 

Deborah A. Nickerson and Pui-Yan Kwok 


Multiplex Mapping of Human cDNAs 

William C. Nierman, Donna R. Maglott, and 

Scott Durkin 

Physical Mapping in Preparation for DNA Sequencing 
Andreas Gnirke, Regina Lim, Gane Wong, 
Jun Yu, Roger Bumgamer, and Maynard dson 

Construction of a Genetic Map Across Chromosome 21 
Elaine A. Ostrander 

Integrated Physical Mapping of Human cDNAs 
Mihael H. Polymeropoulos 

Sequence-Tagged Sites for Human Chromosome 19 

Michael J. SicUiano and Anthony V. Carrano 

cDN A/STS Map of the Human Genome: Methods 
Development and Applications Using Brain cDNAs 

James M. Sikela, Akbar S. Khan, Arto K. 

Orpana, Andrea S. Wilcox, Janet A. Hopkins, and 

Tamara J. Stevens 

Physical Structure of Human Chromosome 21 
Cassandra L. Smith, Denan Wang, 
Kaoru Yoshida, Jesus Sainz, Carita Fockler, and 
Meite Bremer 

Physical Mapping of Human Chromosome 1 6 

David F. Callen, Sinoula Apostolou, Elizabeth 
Baker, Helen Kozman, Sharon A. Lane, 
Julie Nancarrow, Hilary A. Phillips, Scott A. 
Whitmore, Norman A. Doggett, John C. Mulley, 
Robert I. Richards, and Grant R. Sutherland 

*A Method for Direct Sequencing of Diploid 
Genomes on Oligonucleotide Arrays; Theoretical 
Analysis and Computer Modeling 
Alexander B. Chetverin 

Sampling-Based Methods for the Estimation of DNA 
Sequence Accuracy 

Gary Churchill and Betty Lazareva 

Computer-Aided Genome Map Assembly with 
SIGMA (System for Integrated Genome Map 

Michael J. Cinkosky, Michael A. Bridgers, 
William M. Barber, Mohamad Ijadi, and 
James W. Fickett 

Informatics for the Sequencing by Hybridization 

Aleksandar Milosavljevic and Radomir 


Sequencing by Hybridization Algorithms and 
Computational Tools 

Radoje Drmanac Ivan Labat, and 

Nick Stavropoulos 

HGIR: Information Management for a Growing Map 
James W. Fickett, Michael J. Cinkosky, 
Michael A. Bridgers, Henry T. Brown, Christian 
Buries, Philip E. Hempfner, Iran N. Lai, Debra 
Nelson, Robert M. Pecherer, Doug Sorenson, 
Peichen H. Sgro, Robert D. Sutherland, 
Charles D. Troup, and Bonnie C. Yantis 




Completed Projects 

Identification of Genes in Anonymous DNA 

Christopher A. Fields and Carol A. Soderlund 

Algorithms in Support of the Human Genome Project 
Dan Gusfield. Jim Knight, Kevin Murphy, 
Paul Stelling, Lushen Wang, Archie Cobbs, 
Paul Horton. Richard Karp, and Gene Lawler 

Predicting Future Disease: Issues in the Develop- 
ment, Application, and Use of Tests for Genetic 

Ruth E. Bulger and Jane E. Fullarton 

HUGO International Yearbook: Genetics, Ethics, 
Law, and Society (GELS) 

Alex Capron and Bartha Knoppers 

BISP: VLSI Solutions to Sequence-Comparison 

Tim Hunka