The Protein Folding Problem

To carry out its assigned function, a protein molecule must adopt a specific three-dimensional form, called its “native” state, as it emerges from the ribosome. Under “native” conditions of the cell, the majority of protein molecules exist in their native state.  Protein folding is the process that steers a polypeptide chain from its linear amino acid sequence to a defined spatial structure characteristic of the native state of the protein.  How a protein’s amino acid sequence determines its three-dimensional atomic structure is essentially “the protein folding problem”.

Protein folding mechanisms were first investigated by Christian Anfinsen in the late 1950s.  In a series of experiments, he demonstrated that the denaturation of a protein could be reversed (i.e., the denatured protein could be refolded to its native structure) without any auxiliary agent.

 

Anfinsen's experiment

Anfinsen’s experimental enzyme was bovine pancreatic ribonuclease.  The 124 amino acid (aa) residue-long protein has eight cysteines that form four disulfide bonds (- CH2 – S – S – CH2 -).  A reducing agent was used to cleave the disulfide bonds in solution.  Next, urea was added up to a concentration of 8 M to denature (or unfold) the protein.  The denatured protein did not show any enzymatic activity. 

 

At this point, if urea was removed first, followed by the addition of an oxidizing agent to allow the disulfide bonds to reform, the product obtained was practically indistinguishable from the starting native protein and had regained full biological activity.  In contrast, if the oxidizing agent was added first, followed by the withdrawal of urea, the product obtained was a mixture of many (or all) of the possible 105 isomeric disulfide bonded forms.  The mixture regained hardly 1% of the activity of the native enzyme. 

         

Schematic representation of unfolding and refolding of bovine pancreatic ribonuclease. Symbols used: R – reducing agent; U – up to 8 M urea; O – oxidizing agent; red dot – cysteine; red line – disulfide bond; + indicates addition; – indicates withdrawal.

Levinthal’s paradox

Evidently, Anfinsen’s experiment established that the amino acid sequence of a protein determines its native structure and gave rise to the protein folding problem.  In fact, three different but interrelated problems are associated with protein folding: (a) the “folding code” that would indicate the particular combination of interatomic forces dictating the native three-dimensional structure of the protein; (b) computational “structure prediction” of the protein from its amino acid sequence; and (c) kinetics and thermodynamics of the remarkably rapid “folding process”. 

 

Considering the fact that usually an unfolded polypeptide chain has a large number of degrees of freedom, the molecule can exist in a vast conformational space (with about 10143 possible conformations as estimated by Cyrus Levinthal).  Clearly, if a protein chain “walks” randomly in this conformational space to arrive at the correctly folded state, it would take a ridiculously long time.  Yet most proteins fold in milliseconds to seconds.  This contradiction is known as Levinthal’s paradox.

 

Yet, to Levinthal this was no paradox. According to his view, which subsequently proved to be correct, the assumption of random walk is not valid.  Instead, he speculated that protein folding is speeded and guided by the rapid formation of local interactions.  These interactions essentially determine further folding of the polypeptide.  Levinthal did not specify what those interactions are, and how do they speed and guide protein folding.

Physical forces in protein folding

We now know that protein folding is driven by forces exerted on the atoms of the amino acid chain.  In particular, the side chains of the protein are more relevant here, since they make the difference between two proteins.  Hence the folding code must be written in the side chains, and not the backbone. 

 

The forces on certain atoms or groups of atoms arise from interactions with other atoms of the protein itself as well as the solvent molecules (solvent-induced forces).  The interactions can be in the form of H-bonds, ion pairs, van der Waals contacts and water-mediated hydrophobic effect.  There has been, and is still, a debate as to which one of these interactions is the dominant component in the folding code.  However, the role of none of them has been fully discounted.

Electrostatic interactions

Electrostatic (or ion pair) interactions, as we have seen in earlier chapters, arise from the charged side chains of protein molecules.  Folding is not likely to be dominated by these interactions since most proteins contain relatively few localized charged residues, and also protein stabilities are found to be mostly independent of pH (within a range) and salt concentration.

Hydrogen bonds

At one point of time, H-bonds were considered to be the dominant factors, since almost all possible H-bonding interactions are actually formed in native structures.  H-bonds among backbone amides and carbonyls are certainly the key components of all secondary structures.  Subsequently, it was found that the strength of an H-bond between a donor and an acceptor in a protein is effectively weakened by the formation of an H-bond with a solvent water molecule.  Hence, H-bonds cannot be considered as a significant driving force so far as protein folding is concerned. 

van der Waals interactions

The energy of van der Waals interactions in tightly packed folded proteins was found to be comparable to that of hydrophobic interactions.  Hence, the contribution of van der Waals interactions to protein folding has been recognized, but certainly not as a dominant factor.

Hydrophobicity – a major player

On the other hand, considerable evidence emerged in favor of hydrophobicity as the major player in protein folding.  Two compelling observations were: (a) large negative free energies associated with the transfer of non-polar solutes from water to organic solvent and (b) most of the proteins have hydrophobic cores and non-polar side chains tend to be buried in the core sequestered from the polar environment of water molecules.

 

Yet, the issue of dominance is far from settled.  Newer experimental and computational analyses found H-Bonds, both intra-protein as well as between the protein and solvent water molecules, to contribute at least as much as the hydrophobic effect.  Further, when instead of individual atoms, hydrophobic and hydrophilic functional groups were considered as the sites of interactions, the forces on the hydrophilic groups were found to be significantly stronger than that on the hydrophobic groups.  Leaving the question of dominance aside, it may be inferred that the hydrophilic and hydrophobic groups play complementary roles in accelerating the folding process.

Protein energy landscape

For a protein molecule, the completely folded native structure is the lowest energy state.  The problem of folding is to attain this thermodynamically stable state.  It is to be noted here that there is a clear distinction between protein folding and a simple chemical reaction.  The latter proceeds from a reactant to a product through a pathway which is essentially a succession of individual chemical structures.  An unfolded protein, in contrast, is not a single microscopic structure.  Hence, folding is a transition from disorder to order that requires the completely folded state to have minimum entropy as well.

 

Therefore, the entropy-energy thermodynamics of folding can be explained by considering the nature of the conformational space of the protein.  A mapping of the chain conformation to the intramolecular-plus-solvation free energy defines the energy landscape.  Further, using statistical mechanical approaches, the density of states (DOS), that is, the number of conformational states at each energy level, can be determined.  The logarithm of the DOS is the conformational entropy.  However, since an enormous number of conformational states are involved, simplified models are required to understand a protein’s DOS. 

Entropy-energy “trade-off”

The folding process has been best described as an approach to, followed by a descent into, a funnel-shaped free energy landscape (Figure on right).  The unfolded state has the highest energy as well as entropy while the completely folded state has the lowest energy and entropy.  Folding proceeds through multiple pathways, each of which is a ‘trade-off’ between energy and entropy.  The energy landscape is accompanied by rules defining what configurations are available from a given configuration.  Thus, the possibility of “random walk” is ruled out.

 

The loss of conformational entropy of the protein is compensated by an increase in solvent entropy due to the burial of non-polar side chains and reorganization of the polar groups.  In case of incomplete cancellation of energy and entropy, free energy barriers arise.  As a result, proteins can be transiently trapped in local free energy minima as depicted by ruggedness of the landscape.  This is known as kinetic trapping.   Eventually, the kinetically trapped proteins reach the native state.

Folding in cellular environment

The cytosolic environment of the cell is fairly crowded with protein concentration of ~ 300 – 400 mg / ml.  In this milieu, spontaneous protein folding is likely to be error-prone, inefficient and time-consuming.  Another problem is that due to the requirement of conformational flexibility for biological activity, proteins are generally only marginally stable and, therefore, susceptible to misfolding.  Non-specific interactions in the misfolded states often lead to toxic aggregate formation.  This causes the loss of protein function and, further, accumulation of toxic protein species may lead to diseases such as Alzheimer’s and Parkinson’s.  To overcome this challenge, the cell engages a network of molecular chaperones. 

 

Chaperones assist in a number of cellular processes such as de novo protein folding, refolding of stress-denatured or aggregated proteins and assembly of oligomeric proteins.  Some of the heat-shock proteins (Hsps), whose expression is up-regulated in response to heat-induced folding stress, function as chaperones.  Yet, there are chaperones which are abundant and functional under non-stress conditions.

Protein folding site

The mechanisms of protein folding are intimately related to the question as to when and where in the cell does the process occur.  It is thought that folding of a nascent polypeptide does not begin in the early stages of translation.  Some secondary structures, such as a-helix, do form during the passage of a nascent chain through the ribosomal exit channel.  However, it is when the polypeptide chain is exposed at the ribosomal surface and has enough time to accumulate sequence elements necessary for the formation of tertiary structures, folding initiates outside the exit tunnel. 

 

Cotranslational folding is dependent upon the nature of the nascent chain, speed of translation and interactions with relevant chaperones.  Some chaperones can associate with the ribosome to assist the nascent polypeptide early in the folding process (cotranslationally) while there are others which do not associate with the ribosome but act at a later stage of translation or after chain release.