reaction leaves the C of the carboxyl group directly linked to the
N of the amino group. This linked group of atoms (CONH) is called
the peptide bond. Polypeptides can be thought of as a string of
alpha carbons alternating with peptide bonds. Since each alpha carbon
is attached to an R-group, a given polypeptide is distinguished
by the sequence of its R-groups. In the protein data bases, each
R-group is represented by a single letter of the English alphabet.
C = cysteine
D = aspartate
E = glutamate
F = phenylalanine
G = glycine
H = histidine
L = leucine
M = methionine
N = asparagine
P = proline
R = arginine
S = serine
T = threonine
V = valine
W = tryptophan
Y = tyrosine
Amino acid R-groups
can be divided into four families: water-insoluble
(hydrophobic) , water-soluble
(hydrophilic) , positively
charged and negatively
charged (both very hydrophilic).
The names of the amino acids in the table above have been color
coded according to family resemblances. The small amino acid glycine
is a special case as it effectively has no R-group.
The twenty amino
acids can be assigned to a musical scale, for example the C-major
scale below. A number of different criteria can be used in selecting
pitches to represent the amino acids. In this scale, the amino acids
are ordered by their relative hydrophobocity. Some of the choices
below are arbitrary, for example Q, N, D and E all have the same
here to play this Amino Acid Scale.
duration of each note varies with the number of DNA codons associated
with the amino acid. The DNA codons are represented by a harp playing
the three bases of each codon under its amino acid. The last three
codons to sound are stop codons and do not correspond to any amino
The linear order
of amino acids in a polypeptide is called its primary structure.
Primary structure is represented in the protein data bases by a
string of the single letters, like a long word or sentence. The
order of letters is the order in which the amino acids were strung
together when the polypeptide was synthesized. This order is specified
by genetic information in the form of a string of DNA codons: sets
of three bases from the four base DNA alphabet. Click here to see
of the Genetic Code. The letters below represent the
sequence of human calmodulin. Calmodulin is a calcium-binding protein;
the four calcium binding sites are underlined. Click here to see
the coding DNA for Calmodulin.
pitches to amino acids allows us to play the tune generated by the
sequence of amino acids in a polypeptide. Because
humans communicate using speech, our brains are very good at recognizing
sound patterns. Click on the link below to hear the patterns represented
in the primary sequence of calmodulin. played using a major scale
like that illustrated above.
tune + Calcium binding sites
entire sequence is played through by the same synthesized voices
used in the scale above, with vibraphone entering four times to
play the Calcium binding sites. When an amino acid is repeated,
the note is sustained until the next amino acid appears in the sequence.
The protein sequence is accompanied by harp playing the DNA codons.
After you have listened to the tune a couple of times, see if you
can follow the written sequence as the tune plays out.
up into complex three dimensional shapes that give them the ability
to interact with other molecules -- Calmodulin is shaped to recognize
and bind Calcium. The first level of protein folding is called secondary
structure and consists of very regular folding patterns stabilized
by weak interactions between the atoms of the peptide bond. Three
common types of secondary structure are alpha helix, beta strands,
looks like a spring (and is springy), beta strands are folded back
and forth like an accordion pleat and may align with other beta
strands to form a beta sheet, and turns are just a simple bend in
the protein chain. The sequence below is marked to show what parts
of the amino acid sequence of calmodulin fold in these different
ways. In the musical example that follows the sequence, these different
folds are represented by the different instruments listed. The "DNA
harp" plays throughout, but the flute begins with the first
alpha helix below. The vibraphone continues to play the Calcium
= alpha helix (flute)
[ ] = beta strand (tubular chime)
<> = turn
Note that each
Calcium binding site is bracketed by two regions of alpha helix.
These regions of secondary structure compose part of a larger, more
complex folding pattern call the protein tertiary structure. The
image below represents the tertiary structure of Calmodulin. Locate
the helical ribbons, the flat beta strands and the turns in this
figure. The four yellow balls represent the Calcium bound at the
two ends of this dumbbell shaped molecule. In this image, the sequence
begins with the lime green section at the upper right and ends with
the blue helix at the lower left.
is maintained by interactions of R-groups. The relative placement
of the different R-groups determines whether a given section of
a protein will form alpha-helix, a beta-strand, or some other
folding configuration. Hydrophobic (water insoluble) groups tend
to hang out together, hydrophilic (water soluble) groups tend
to hang out together, and the positive and negative R-groups may
attract each other. Generally hydrophobic R-groups will line up
along one edge of a helix or in the interior of a globular region.
The next musical
example illustrates how the hydrophobic and hydrophilic R-groups
are distributed in the protein. The lower and higher solubility
R groups are represented by different voices so that the musical
phrases are divided into two groups: one group consists of the
hydrophobic amino acids (lower pitches) and the other group consists
of the hydrophilic (higher pitches) amino acids. Among the hydrophilic
amino acids, the charged ones have been asigned the highest pitches.
You can think of the folding structure of a protein as a hydrophobic
core decorated or overlaid with hydrophilic surfaces. The lower
tones in these duets define the structural core of the protein.
this piece, the 146 amino acids of beta globin have again been separated
into hydrophobic and hydrophic groups, with harp, bass and guitar
representing the lower octave of hydrophobic R-groups (I
V L F C M A G), and flute representing the the upper notes
of the hydrophilic groups. The sequence plays through three times,
with different species contributing their improvisational changes.
Think of an
ensemble with four musicians: tree shrew, human, Sumatran
tiger and African elephant. The piece begins with the
tree shrew playing both the upper and lower voices
with flute and harp respectively. At the point indicated on the
Globin Sequences chart, tree shrew
hands off to human, who plays bass in a duet
with tree shrew's harp, and who takes over the flute line.
On the second
iteration of the tune, tiger enters, plays guitar
in trio with tree shrew and human
and also takes over the upper flute part. Again about halfway
through, elephant enters, adding harp to make
a "hydrophobic quartet," and in turn takes the flute
line to the end of the sequence. In all the duets, trios and quartets,
sequence divergence is heard as a chord; otherwise the species
play in genetic unison.
the final iteration, tree shrew and human
play both the hydrophobic and hydrophilic lines of the full sequence
on their respective instruments. As you listen to the last section,
which part of the sequence has diverged more between these two species:
the hydrophobic core or the hydrophilic surface notes?