Sequence and Structure of Biopolymers

In the previous section we discussed the basic physics of homopolymers, or chains of segments which are all the same. As a result, their physics is the physics of rather homogenous little blobs and globules. The two cases we focused on were

Random-walk polymer:
Size R » N1/2 b
Volume V » N3/2 b3
Concentration c = N/V » 1/(N1/2 b3) Number of self-contacts » N b3 c » N1/2

Collapsed polymer (w < -b3):
Size R » N1/3 b
Volume V » N b3
Concentration c = N/V » 1/b3
Number of self-contacts » N b3 c » N

In both these cases, the number of equivalent states is large, i.e. exponential in N. We can sort of count up the number of states if we consider polymers drawn on a cubic lattice, where the lattice step is equal to the polymer segment length b.

For the simple random walk on the lattice, we have W = 6N since at each step you have six choices for its direction.

If we disallow the case of reversal of direction which causes sharp hairpins, we have W = 5N, still large.

For the collapsed polymer, it turns out that the number of states is still exponential in N; numerical studies indicate that W » 1.85N for the simple cubic lattice (see Pande et al, J. Phys. A 27, 6231 (1994) for a recent numerical study).

One other case worth mentioning is that of the self-avoiding walk on the simple cubic lattice, for which W » 4.7N (see de Gennes' book, p.39 for more details).

In any case, we see that all states of homopolymers are characterized by exponentially many states W » zN and therefore entropy proportional to the number of segments S/kB = lnW µ N.


The biological polymers we have been focusing on - proteins and nucleic acids - are distinguished in being heteropolymers with particular sequences of segments. Each segment of a protein can be one of 20 amino acids; each segment of a nucleic acid can be one of 4 nucleotides. Their sequence of segments determines the interactions between different parts of these molecules, and governs their folding into unique structures.

For these folded structures, considered at the spatial resolution of our lattice model, W << zN .
In some cases (e.g. globular proteins with one structure) really only one conformational state is populated, i.e. W = 1.


Nucleic acid complementary base pairing (hybridization)

NA's are held together by hydrogen bonding and base stacking, usually acting between complementary bases
(a=t and g º c) for DNA).

Base-pairing defines both the DNA double helix, and the stem regions of RNA stem-loop (or helix-loop) structures. I will tend to talk about DNA, but most of the following comments apply to RNA base-paired stem structures as well.

We can easily understand the stability of these structures by studying base-unpairing fluctuations for a few cases.

1. Unzipping from an end:

Suppose we have a DNA double helix made of complementary-sequence strands, for example:

5'-ccatgattcg-3'
3'-ggtactaagc-5'

In general, let's talk about the total length of the strands as N nucleotides, so in the specific example above we have N = 10. Now consider the unzipping of this double helix from the right end. If we unpair n bases, leaving the other N-n bases paired, we will get a `frayed' molecule.

In our example, if n = 5 we have

         ttcg-3'
        a
5'-ccatg
3'-ggtac
        t
         aagc-5'

We can imagine that for each base-pair that we break apart, there is an energy cost e. Of course, this energy must depend on the sequence, and great effort has been made to determine the `right' values of e for each base pair.

One of the most commonly used models for base-pairing free energies is due to Breslauer et al (Proc. Natl. Acad. Sci. USA 83, 3746 (1986)) which actually considers the energies of breaking adjacent bases, with the application in mind to prediction of the melting temperature of short DNAs. Table II of Breslauer et al shows how {a,t}-rich regions contribute (Gibbs) free energies of about 2 kcal/mol per base pair, while {g,c}-rich regions contribute closer to 3-4 kcal/mol per base pair. Using 1 kB T = 0.6 kcal/mol at 300 K tells us that we have something like 3 kB T of Gibbs free energy holding at pairs together, and more like 5-6 kB T holding gc pairs together.

These data are for pretty strongly bound double helicies at 25 C and 1 M NaCl; at more physiological salt concentrations of 0.1 M NaCl, the Gibbs free energies of base-pairing are closer to 2 kB T for at pairs, and 4 kB T for gc pairs. Therefore we can roughly use e = 3 kB T as the free energy difference between paired and unpaired bases in our unzipping example.

So, the work that must be done to unzip n bases will be something like

DG = n e » 3 n kB T
 
Therefore we can use the Boltzmann distribution to estimate the probability that n bases are unzipped:
P(n) µ exp[-DG /(kB T)] = exp[ - n e/ (kB T) ] 
 
This simple calculation already tells something important. Although the individual interactions between bases are not strong - only 3 kB T per base - they quickly add up to make dissassembly of even a few bases extraordinarily unlikely. Plugging in e = 3 kB T tells us the probability of unzipping n bases, relative to just leaving the molecule in its double-helix form:
P(n) 
P(0)
= exp[-3 n]
 
Unzipping n = 2 bases is suppressed by a factor e-6 = 0.002 relative. Unzipping of longer runs of bases becomes impressively unlikely.

Note that for the case n = N where the two strands actually get pulled completely apart, there is now additional entropy associated with the relative translation of the now-separated ssDNAs. This entropy is roughly

Strans. = -ln(b3 c)
 
where c is the (molecular) concentration of the DNA in solution. Therefore the relative probability of the fully unzipped to zipped states is
P(N) 
P(0)
exp[-3 N] 
b3 c
 
If N is not too large, low concentrations can make dissociation of the two strands likely. But once N > 10, the exponential energy dependence starts to make it impossible for the strands to come apart even at very low concentration.

The net result is that not-too-long DNAs (e.g. 24 bp) are extremely stable against accidental dissassembly by randomly acting thermal forces. On the other hand, two strands of DNA can be pulled apart by directed forces which progressively do » 3 kB T of work per base pair.


Problem: Consider an N-base-pair double helix, held together by base-pairing interactions contributing e = 3 kB T of cohesive energy per base-pair. Estimate the lifetime or off-time for the two strands, given that a new `attempt' at unzipping is made every t0 » 10-11 sec.

To do this you will need to interpret the Boltmann distribution as giving the probability for a single attempt to excite an unzipping states. This assumption is widely used as a simple kinetic model for thermally excited processes.

Plot your result for lifetime as a function of N, for N < 20. You will want to use a logarithmic scale for the lifetime. How long does a dsDNA have to be for it to be considered as `stable'?


Problem: Roughly estimate the force that must be applied to the two strands of a double helix to pull them apart (Hint: think of the work done per base pair unzipped).


Problem: Using the pure harmonic model for the weak-stretching behavior of a flexible polymer as a model for the elasticity of single-stranded DNA (b » 0.7 nm), find the number of bases which unzip as a function of the displacement x of the two strand ends. At what force does unzipping begin to occur?

Impressive experiments on unzipping of long double helices have been done in the last few years. See especially Essevaz-Roulet et al, Proc. Natl. Acad. Sci. USA 94, 11935 (1997).


Problem: Everything we have done up to now considers base-pairing at 25 C. However, in the biochemistry lab dsDNAs are often unzipped using elevated temperatures. The model presented in the paper by Breslauer et al allows one to compute the relative Gibbs free energy of double-helix vs. single-stranded DNA at any temperature, from DG = DH0 - T DS0 where T is in Kelvin.

Find the temperature dependence of DG for the 12-mer

5'-aggtcgccgccc-3'
3'-tccagcggcggg-5'

using the model of Breslauer et al (Proc. Natl. Acad. Sci. 83, 3746 (1986)).

At what temperature does this double helix `melt'?

Why is it not very important to worry about the `ideal gas' translational entropy

Stransl » -kB ln(b3 c)
 
of the two strands in the unpaired state (c is concentration of the oligomers)?


Even a moderate number of weak interactions acting in parallel can make large structures very stable against thermal fluctuation. This strategy is used over and over again, to fold biomolecules into their active forms, and to assemble folded biomolecules into larger multi-molecule structures.


Bubble in middle of hybridized nucleic acid

We might worry about the probability of a single-stranded bubble opening up in the middle of a double helix or RNA stem:

Suppose n base-pairs unpair, opening a bubble of 2n bases. As before, the energy associated with breaking the n base-pairs will be ne, where e » 3 kB T on average.

So far this is the same as the end-fraying discussed above. But - for a loop, there is a constraint that the loop close. We previously discussed the entropy cost associated with this, and found it to be -(3/2) kB ln(2n). Therefore, the free energy to open n bases in the interior of a dsNA region is

DFbubble = en +(3/2) kB ln(2n) 
 
Well, this gives us the probability of opening n bases, up to a normalization constant:
Pbubble(n) µ exp[-Fbubble / (kB T) ] =  exp[- n e/ (kB T) ] 
(2n)3/2
 
The loop entropy adds insult to injury - internal loops caused by n base-pairs spontaneously breaking are even less likely to occur than end-fraying, as long as e is a few kB T.

The one situation where end fraying and internal bubbles become important is when 0, which occurs when the temperature approaches the melting temperature, which is between 40 and 60 C depending on the DNA sequence.


RNA stem-loop structure

RNAs are organized into a hierarchy of stem-loop structures, each of which can be considered to be a short base-paired `stem' region terminated by a ssRNA loop.  Here is a picture of a long loop formed by unpairing n base-pairs:

Such loops are always at least 4 bases in circumference. The existence of a finite loop size limit follows from some simple physical considerations to deal with the case of very short loops formed by unpairing only a few bases:
 

We suppose that we have a RNA sequence that could completely base-pair. However, to do this would require a sharp `hairpin' bend which would cost a lot of energy (and which would really be stereochemically impossible). This large energy cost can be thought of in terms of a bending energy contribution from the loop. If there are 2n unpaired bases in the loop (i.e. n unpaired complementary base-pairs) the net stem-loop energy has the form:

Floop = ne 3 kB
2
ln(2n) + bending energy 
 
A small loop would look something like this (rather magnified compared to the previous sketch so as to show the individual bases in the loop):

What should we use for the bending energy of the 2n unpaired bases? The 2n unpaired bases form a polygon with 2n+1 sides, and the bending angle at each vertex is q = 2p/(2n+1).

 We can suppose that at each vertex there is a bending energy µ q2, and thus a rough model for the bending energy is

bending energy = k (2n+1) q2 4 p2
2n+1
 
The bending constant k can be expected to be on the order of a few kB T, so that bends of more than a radian cost a relatively large amount of energy.

This gives a net loop free energy of

Floop = ne 3 kB
2
lnn +  4 p2
2 n + 1
 
For large n, the base-pairing energy and loop entropy contribute a diverging energetic cost, and for small n, the bending energy produces a large upturn in energy. Therefore the minimum free energy is for some intermediate n. The minimum (ignoring the ln term) occurs when
e 4 p2
(2n+1)2
 
or for
2n+1 =  æ 
ç 
è
4 p2
e
ö 
÷ 
ø
1/2 
 
 
 
Given k = 3 kB T and e = 3 kB T we find the loop size to be
2n = ( 4 p2 )1/2 - 1 = 5.3
 
In fact, RNA and DNA have a minimum size of this type of loop of about 4 bases.

The model presented here is very rough but illustrates the basic physics that controls the size of single-stranded nucleic acid loops.



File translated from TEX by TTH, version 2.53.
On 9 Mar 2001, 13:12.