Biophysics 101 Readings

From FreeBio

Contents

Background

These are intended to encourage out-of-the-box discussion of important current problems with technological components. You may have already read some of these. More technical texts will be used as needed. Starting comments for discussion can be posted at 101 Project Ideas.

Book describing the global prioritization experiment 'Copenhagen Consensus' in which eight expert economists made a prioritized list of 10 global challenges. More info

  • Natural Capitalism by P Hawken, A Lovins and L Hunter Lovins ISBN 0316353000

About how we can move towards a sustainable economy taking into account the value of natural resources and processes. More info


Conversation starters

Articles

Technical references

  • Beginning Perl for Bioinformatics , James Tisdall; O'Reilly & Assoc. 2001 . $32 B&N , $28 Amazon. See also www.bookpool.com.
  • Bioinformatics: Sequence and Genome Analysis , David Mount; CSHL 2001 . $75 B&N , $75 Amazon.
  • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, S. Eddy, A. Krogh & G. Mitchison $35. This book covers basic and advanced models for sequence analysis.
  • Introductory Statistics, Neil A. Weiss, 5th edition, Addison-Wesley 1999. $93
  • Mathematica 4.2 free for all students of this course (Check with your TF if you are not a full time Harvard student) Or more permanently: $140 student price.
  • MATLAB $99 Since not free for some students this will not be used (but will be accepted for the problem sets if you already use it).
  • Developing Bioinformatics Computer Skills , Cynthia Gibas & Per Jambeck, O'Reilly 2001 . $28
  • The Perl CD Bookshelf, Linda Walsh (Editor) from O'Reilly $72. This CD contains 6 "books" that might supplement numerous free resources on the web: General Perl documents & download , Perl for biologists: , Perl tutorials.


Personalized Medicine

A gene has been found where a variant allele that increases the risk of a heart attack is much more prevalent in whites and East Asians, but causes a much greater increase in risk when found in persons of African descent. The article discusses implications of this discovery on race-based medicine.

  • Pollack A. (2005) A Special Drug Just for You.... NYT [1]

A drug receives FDA approval but on the conditition that people for whom it's prescribed first receive a test for an enzyme deficiency that could cause anemia when taking the drug.

  • Burgard AP, Pharkya P, Maranas CD. (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003 Dec 20;84(6):647-57. PDF

The advent of genome-scale models of metabolism has laid the foundation for the development of computational procedures for suggesting genetic manipulations that lead to overproduction. In this work, the computational OptKnock framework is introduced for suggesting gene deletion strategies leading to the overproduction of chemicals or biochemicals in E. coli. This is accomplished by ensuring that a drain towards growth resources (i.e., carbon, redox potential, and energy) must be accompanied, due to stoichiometry, by the production of a desired product. Computational results for gene deletions for succinate, lactate, and 1,3-propanediol (PDO) production are in good agreement with mutant strains published in the literature. While some of the suggested deletion strategies are straightforward and involve eliminating competing reaction pathways, many others suggest complex and nonintuitive mechanisms of compensating for the removed functionalities. Finally, the OptKnock procedure, by coupling biomass formation with chemical production, hints at a growth selection/adaptation system for indirectly evolving overproducing mutants.

  • The International HapMap Consortium (2005) A haplotype map of the human genome Nature 437, 1299-1320 Full text & PDF

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

  • Young Chung et al. Embryonic and extraembryonic stem cell lines derived from single mouse blastomeres Nature advance online publication; published online 16 October 2005 | doi: 10.1038/nature04277 Full text & PDF

Abstract: The most basic objection to human embryonic stem (ES) cell research is rooted in the fact that ES cell derivation deprives embryos of any further potential to develop into a complete human being. ES cell lines are conventionally isolated from the inner cell mass of blastocysts and, in a few instances, from cleavage stage embryos. So far, there have been no reports in the literature of stem cell lines derived using an approach that does not require embryo destruction. Here we report an alternative method of establishing ES cell lines—using a technique of single-cell embryo biopsy similar to that used in pre-implantation genetic diagnosis of genetic defects10—that does not interfere with the developmental potential of embryos. Five putative ES and seven trophoblast stem (TS) cell lines were produced from single blastomeres, which maintained normal karyotype and markers of pluripotency or TS cells for up to more than 50 passages. The ES cells differentiated into derivatives of all three germ layers in vitro and in teratomas, and showed germ line transmission. Single-blastomere-biopsied embryos developed to term without a reduction in their developmental capacity. The ability to generate human ES cells without the destruction of ex utero embryos would reduce or eliminate the ethical concerns of many.

see also: Stem Cell Test Tried on Mice Saves Embryo NYT

  • Leisure-time physical activity at midlife and the risk of dementia and Alzheimer's disease Full text & PDF

Summary: Leisure-time physical activity at midlife at least twice a week was associated with a reduced risk of dementia and AD [...], even after adjustments for age, sex, education, follow-up time, locomotor disorders, APOE genotype, vascular disorders, smoking, and alcohol drinking. The associations were more pronounced among the APOE 4 carriers.

  • Weston AD and Hood L. Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res. 2004 Mar-Apr;3(2):179-96. PDF MedlineRead and commented upon by Mark and Chiki 18-Oct.
  • Two very interesting articles by Atul Gawande on the practice of medicine.
    • Medicine’s money problem. ATUL GAWANDE [2]
    • Bad Medicine. Atul Gawande writes about who pays the price when patients sue doctors. Here, with Daniel Cappello, he talks about the costs and consequences of medical malpractice [3]

Chiki's readings

Jeffrey's readings

  • Wade N. (2005) Genetic Find Stirs Debate on Race-Based Medicine. NYT [4]

Abstract: This paper models how the evolving field of pharmacogenomics (PG), which is the science of using genomic markers to predict drug response, may impact drug development times, attrition rates, costs, and the future returns to research and development (R&D). While there still remains an abundance of uncertainty around how PG will impact the future landscape of pharmaceutical and biological R&D, we identify several likely outcomes. We conclude PG has the potential to significantly reduce both expected drug development costs (via higher probabilities of technical success, shorter clinical development times, and smaller clinical trials) and returns. The impact PG has on expected returns is partially mitigated by higher equilibrium prices, expedited product launches, and longer effective patent lives. Our conclusions are, of course, accompanied by numerous caveats.

"Our paper is structured around three inter-related economic perspectives of pharmacogenomics: its impact on costs, returns (net revenues), and social welfare. Specifically, we will first consider how pharmacogenomics is likely to impact drug development costs, times, and attrition rates (all of which are related). Second, we will consider the financial risks and returns associated with the use of pharmacogenomics in drug development. Third, we will discuss the potential welfare implications of pharmacogenomics in terms of its likely impact on patient access to new and existing drugs, firm incentives for R&D, and public health. We will proceed as follows.

Section II will describe the primary ways in which pharmacogenomics is likely to influence the cost of drug development. For example, relative to traditional drug development, pharmacogenomics has the potential to reduce clinical trial costs (through smaller and possibly fewer clinical trials), drug attrition rates, and clinical developmental times. We consider and discuss these and several other related issues in this section. Section III will analyze the market environment for products developed using this technology. Specifically, we will consider how product revenues may be affected when markets are segmented using pharmacogenomics. For products developed using pharmacogenomics there is likely to be a substantially different post-launch cash flow profile. This section will build upon our previous work, which examined the market demand side issues of PG segmentation and product pricing, and consider the impact PG may have on present value net revenues. More specifically, we will describe how expedited product launches and longer effective patent lives, via shorter clinical development times, may, in certain circumstances, increase present value product cash flows despite a smaller, segmented patient population. This section will also consider the net present value (NPV) profile of a pharmacogenomically-developed product versus a traditionally developed one. This will be based, as will all our qualitative remarks in this paper, on a formal mathematical model we present in the appendix. While generalized conclusions will not be drawn, the principle factors affecting a product’s NPV within both environments (a traditional development paradigm versus a pharmacogenomics one) will be discussed and some predictions will be offered with caution. Section IV will conclude the paper with a brief discussion on the potential welfare implications of pharmacogenomics."

Definitions, Background, Requests for Research

Existing Technologies and Lines of Research

Policy Issues in Implementation - Public Health, Access to Care, Insurance Policy

Personal Genome Project (PGP)

HapMap resources

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

New Energy Sources

Metabolic Engineering Resources

  • Ind. Eng. Chem. Res. 2005, 44, 6154-6163 Vunjak-Novakovic D, Kim Y, Wu X, Berzin I, & Merchuk JC. Air-Lift Bioreactors for Algal Growth on Flue Gas: Mathematical Modeling and Pilot-Plant Studies. (Species: Dunaliella parva & tertiolecta)
  • Biotechnol Bioeng. 2005 Sep 5;91(5):643-8. Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, Palsson BO. In silico design and adaptive evolution of Escherichia coli for production of lactic acid.

J Mol Evol. 2005 Aug;61(2):171-80 Rozen DE, Schneider D, Lenski RE Long-term experimental evolution in Escherichia coli. XIII. Phylogenetic history of a balanced polymorphism.

  • Burgard AP, Pharkya P, Maranas CD. (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003 Dec 20;84(6):647-57. PDF

The advent of genome-scale models of metabolism has laid the foundation for the development of computational procedures for suggesting genetic manipulations that lead to overproduction. In this work, the computational OptKnock framework is introduced for suggesting gene deletion strategies leading to the overproduction of chemicals or biochemicals in E. coli. This is accomplished by ensuring that a drain towards growth resources (i.e., carbon, redox potential, and energy) must be accompanied, due to stoichiometry, by the production of a desired product. Computational results for gene deletions for succinate, lactate, and 1,3-propanediol (PDO) production are in good agreement with mutant strains published in the literature. While some of the suggested deletion strategies are straightforward and involve eliminating competing reaction pathways, many others suggest complex and nonintuitive mechanisms of compensating for the removed functionalities. Finally, the OptKnock procedure, by coupling biomass formation with chemical production, hints at a growth selection/adaptation system for indirectly evolving overproducing mutants.

  • Martin VJ, Pitera DJ, Withers ST, Newman JD, Keasling JD. (2003) Engineering a mevalonate pathway in Escherichia coli for production of terpenoids Nat Biotechnol. Jul;21(7):796-802. Full text & PDF

Not exactly personalized medicine, but a very interesting application of systems and synthetic biology for drug production. Ironically, Jay Keasling will be giving a talk at the ICSB 2005 meeting during class on Thursday, October 20.

Abstract: Isoprenoids are the most numerous and structurally diverse family of natural products. Terpenoids, a class of isoprenoids often isolated from plants, are used as commercial flavor and fragrance compounds and antimalarial or anticancer drugs. Because plant tissue extractions typically yield low terpenoid concentrations, we sought an alternative method to produce high-value terpenoid compounds, such as the antimalarial drug artemisinin, in a microbial host. We engineered the expression of a synthetic amorpha-4,11-diene synthase gene and the mevalonate isoprenoid pathway from Saccharomyces cerevisiae in Escherichia coli. Concentrations of amorphadiene, the sesquiterpene olefin precursor to artemisinin, reached 24 mug caryophyllene equivalent/ml. Because isopentenyl and dimethylallyl pyrophosphates are the universal precursors to all isoprenoids, the strains developed in this study can serve as platform hosts for the production of any terpenoid compound for which a terpene synthase gene is available.


  • Price N, Reed J, Palsson B. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nature Reviews Microbiology 2, 886-897 (2004); doi:10.1038/nrmicro1023 PDF

Abstract: Microbial cells operate under governing constraints that limit their range of possible functions. With the availability of annotated genome sequences, it has become possible to reconstruct genome-scale biochemical reaction networks for microorganisms. The imposition of governing constraints on a reconstructed biochemical network leads to the definition of achievable cellular functions. In recent years, a substantial and growing toolbox of computational analysis methods has been developed to study the characteristics and capabilities of microorganisms using a constraint-based reconstruction and analysis (COBRA) approach. This approach provides a biochemically and genetically consistent framework for the generation of hypotheses and the testing of functions of microbial cells.


Bioremediation Readings

  • A survey of the present and future of bioremediation[5]. It seems that the modeling involved in bioremediation would use almost all the same tools as modeling involved in biomass production.


  • Lovley D. Cleaning up with genomics: applying molecular biology to bioremediation. Nature Reviews Microbiology 1, 35-44 (2003); doi:10.1038/nrmicro731 PDF Read and commented upon 18-Oct--Jleith 02:09, 18 October 2005 (EDT)

Abstract: Bioremediation has the potential to restore contaminated environments inexpensively yet effectively, but a lack of information about the factors controlling the growth and metabolism of microorganisms in polluted environments often limits its implementation. However, rapid advances in the understanding of bioremediation are on the horizon. Researchers now have the ability to culture microorganisms that are important in bioremediation and can evaluate their physiology using a combination of genome-enabled experimental and modelling techniques. In addition, new environmental genomic techniques offer the possibility for similar studies on as-yet-uncultured organisms. Combining models that can predict the activity of microorganisms that are involved in bioremediation with existing geochemical and hydrological models should transform bioremediation from a largely empirical practice into a science.


  • (semi-)Review of and comments on "The Controlled Eutrophication Process: Using Microalgae

for CO2 Utilization and Agricultural Fertilizer Recycling." J.R. Benemann et al. (2002)

This is a working paper for a conference on controlling greenhouse gases, held in Kyoto in 2002 [6]. It is not peer-reviewed, but its purpose seems not to be published in a journal. The authors estimate, admittedly optimistically, that the cost of algae farms would be $40,000/ha initially, and $12,000/ha per year thereafter. In their breakdown of the costs, they do not mention the cost of water explicitly. While the increasing strain on water supply has been in the news much less than the increasing strain on oil supply, it is a major trend nonetheless. This seems especially important in the desert Southwest, as the population of the area is expected to continue skyrocketing in the coming years.

This brings up an important point for those of us embarking on metabolic engineering. Our goal isn't just to maximize the output of a desired product; it's do maximize the output of that product relative to the expenses of the inputs. Carbon dioxide and oxygen are of course free, but even water may not be. Has anyone who has looked at metabolic engineering more in depth found any examples that take into account the need for water? Also, the build-up of waste products that impede the system's production seems to be what really requires that water be brought into the system. The metabolic engineer, then, needs not only to look at the lipids produced but at other products whose concentrations cannot be allowed to get above a certain value.

Going back to the figures in Benemann's article, it would be interesting to compare the cost per unit energy for the biomass oil production versus crude oil production. The cost to produce crude oil is much less than and effectively unhinged from the market price. A caveat for this comparison, of course, is that it neglects entirely the cost of refining the crude oil versus the cost of refining the feedstock.

Benemann et al. estimate that from the algal farms described in their article 1 kWhr can be produced for $15, given the estimated yields an production costs of the algae. This includes taxes, transportation, and sundry other costs. Estimates for the cost to produce a barrel of crude oil vary widely, from $1.50 for OPEC oil[7] (2005) to approximately $10.00 for oil in the U.S.[8] (2003), or $14.00[9] (2003). The second figure is from the DOE, and so I'm inclined to trust it more. If a barrel of oil provides 5.8 million btu = 1714 kWhr[10], and we say that a barrel of oil costs $10 to produce, then 1 kWhr from oil costs only $0.0058, about a thirtieth as much as from algae under Benemann et al.'s scheme. This directly contradicts Benemann's own statement that the cost of a kWhr from biomass-oil-producing algae "is about a factor of two to three-fold higher than current costs for fossil fuels".

It seems like a lot of the data we have on costs for fuels etc. is mutually contradictory. Ugh. -- Jason Leith

Biodiesel Readings

  • A DOE report on research needs for further development of biomass-oil industry in the United States[11].
  • The DOE report on biomass oil from algae[12], which Briggs draws heavily upon for his article[13].
  • The conference paper that discusses algae farms in the desert Southwest, which forms the basis of Briggs's discussion[14].


  • Masakazu et al. (2005) Green chemistry: Biodiesel made with sugar catalyst full text

Abstract: The production of diesel from vegetable oil calls for an efficient solid catalyst to make the process fully ecologically friendly. Here we describe the preparation of such a catalyst from common, inexpensive sugars. This high-performance catalyst, which consists of stable sulphonated amorphous carbon, is recyclable and its activity markedly exceeds that of other solid acid catalysts tested for 'biodiesel' production.

  • Reuters (2005) Virgin Airways boss eyes plants for fleet fuel 16 November 2005. [15] Discusses the possibility of using vegetable fuel for aircraft as well as for automobiles and ships.
  • Biodiesel Vehicle Fuel: GHG Reductions, Air Emissions, Supply and Economic Overview by Lawrence Schmidt, March 11, 2004
    "Economic Viability: Research shows that biodiesel is not currently cost competitive with petroleum diesel. However, at a B20 blend with some form of tax exemption on the biodiesel portion of the fuel, per litre costs range close to or slightly less than the retail cost of fully taxed petroleum diesel. Furthermore, assuming a trend toward higher costs for petroleum diesel and decreased costs of biodiesel production (increased technology efficiencies), economic viability of biodiesel as a niche fuel seems probable over the short term, relying on some for of modest tax exemption."
  • A WORLD WIDE REVIEW OF THE COMMERCIAL PRODUCTION OF BIODIESEL – A technological, economic and ecological investigation based on case studies©by Mag. Stephan Friedrich, Wien 2004 "INTRODUCTION: As early as the beginning of the 20th century Rudolf Diesel proposed vegetable oil as fuel for his engine. A short time later, before and during World War Two, vegetable oil was examined in “up-to-date? diesel engines. In 1940 first trials with vegetable oil methyl and ethyl esters were carried out in France and, at the same time, scientists in Belgium were using palm oil ethyl ester as a fuel for buses.2 In 1973, the oil crisis refocused attention on and interest for local energy sources. In recent decades, research concerning and knowledge about the external benefits of renewable raw materials have intensified the efforts for sustainable energy sources. Biodiesel plays a major role in this field because of the world wide research, development and deployment activities of this sustainable energy source. Due to this recent increased awareness and development, the objective of this thesis is to provide a world-wide review of the production of Biodiesel. For the description of the Biodiesel scene a three-stage development is referred to
    • Phase I consists of the very first ideas and thoughts of Biodiesel being used as a fuel until the actual adaptation of the ideas on the part of the decision makers who are then motivated to put these ideas into practice. The end of Phase I (=beginning of Phase II) is the political decision to invest money and other resources to Biodiesel research.
    • Phase II is characterised by research efforts, pilot projects, setting of frame conditions and financially supported technical trials.
    • Countries in Phase III show a Biodiesel economy based primarily on a feasible economic production, distribution and use of Biodiesel, and a self supportive Biodiesel economy."
  • COSTS OF BIODIESEL PRODUCTION Prepared for: Energy Efficiency and Conservation Authority By: John Duncan May 2003

Articles that suggest potential problems with biodiesel

  • Particle size and mutagenic effects due to biodiesel[16].
  • Various challenges, costs, etc. A thorough article that explains technical and economic details of biodiesel and ethanol production. Advocates a "cradle-to-grave" approach to determining the costs and benefits of the use of a technology, not dissimilar to what Morten has said.[17]

Links regarding (renewable) energy

Biocomplexity and Randomness readings

The problem with defining complexity and randomness is that they both already have very precise technical definitions, relating them to algorithmic information theory and Godel's Incompleteness Theorem in surprising and insightful ways.

GregoryChaitin defines the complexity of something as "the size of the smallest program which computes it or a complete description of it"

Irreducible randomness is therefore defined as "those binary strings of length n that require programs of about length n" to compute.

In the complexity measure defined by Crutchfield and Young, they attempt to define a measure of the chaos in a system using statistical mechanics. While this is useful for describing the complexity of different parameters of the logistic equation, our intuitive notion of complexity says that nothing the logistic equation can produce could really be considered complex, simply because the equation that produces it is so simple. We want a measure of complexity that enables us to identify when a system acts as an autonomous agent.

According to Investigations , Stuart Kaufman defines an molecular autonomous agent as a self-reproducing system that is capable of producing at least one thermodynamic work cycle. This definition correctly classifies all living things as autonomous agents, and some artifacts that can only be produced by living beings.

However appealing that definition, it is not rigorous enough for our purposes. We wish to have measure, and this does not make it obvious how one would calculate the "degree of life".

We need a better word than complexity, because it has so many meanings. We choose the word interrelatedness, because we what makes a system complex is that it is composed of many interrelated parts, and that the relations result in a system whose overall description is simpler than the description of each of the parts. Furthermore, the definition should include the notion that complex systems tend to have organization at multiple scales. That is, the interrelatedness measure should be able to take into account measures taken at different resolution levels.

In Towards a mathematical definition of "life", Chaitin incorporates a measure that takes into account multiple scales and mutual information. One first measures the mutual information of the whole ensemble, identifying clustered regions of interrelatedness at every scale. For example, the amount of genetic information in DNA is well approximated by the mutual information of each cell because all the other uncorrelated positions and momenta will cancel each other out. Therefore, there is an important cluster of mutual information whose diameter is approximately the size of a cell. At larger scales, clusters of mutual information at the characteristic sizes of tissues and organs will appear. A complex system is therefore one in which clusters of mutual information exist at all scales. Equivalently, Fourier transforms of complex systems exhibit structure at many different wavelengths. Finally, he is able to show that the resulting measure is capable of assigning higher complexity measures to sequences with bilateral symmetry and large repeated sequences than it does to simple repeated sequences and large random numbers. --JeremyZucker 17:36, 12 January 2006 (EST)