What happens if you change the structure of a protein




















This holds the stretch of amino acids in a right-handed coil. Every helical turn in an alpha helix has 3. The tertiary structure of a polypeptide chain is its overall three-dimensional shape, once all the secondary structure elements have folded together among each other. Interactions between polar, nonpolar, acidic, and basic R group within the polypeptide chain create the complex three-dimensional tertiary structure of a protein.

When protein folding takes place in the aqueous environment of the body, the hydrophobic R groups of nonpolar amino acids mostly lie in the interior of the protein, while the hydrophilic R groups lie mostly on the outside.

Cysteine side chains form disulfide linkages in the presence of oxygen, the only covalent bond forming during protein folding. All of these interactions, weak and strong, determine the final three-dimensional shape of the protein.

When a protein loses its three-dimensional shape, it will no longer be functional. Tertiary structure : The tertiary structure of proteins is determined by hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages.

The quaternary structure of a protein is how its subunits are oriented and arranged with respect to one another. As a result, quaternary structure only applies to multi-subunit proteins; that is, proteins made from more than one polypeptide chain. Proteins made from a single polypeptide will not have a quaternary structure. In proteins with more than one subunit, weak interactions between the subunits help to stabilize the overall structure.

Enzymes often play key roles in bonding subunits to form the final, functioning protein. For example, insulin is a ball-shaped, globular protein that contains both hydrogen bonds and disulfide bonds that hold its two polypeptide chains together. Four levels of protein structure : The four levels of protein structure can be observed in these illustrations.

Denaturation is a process in which proteins lose their shape and, therefore, their function because of changes in pH or temperature. Each protein has its own unique sequence of amino acids and the interactions between these amino acids create a specify shape.

Pepsin, the enzyme that breaks down protein in the stomach, only operates at a very low pH. The stomach maintains a very low pH to ensure that pepsin continues to digest protein and does not denature. Because almost all biochemical reactions require enzymes, and because almost all enzymes only work optimally within relatively narrow temperature and pH ranges, many homeostatic mechanisms regulate appropriate temperatures and pH so that the enzymes can maintain the shape of their active site.

It is often possible to reverse denaturation because the primary structure of the polypeptide, the covalent bonds holding the amino acids in their correct sequence, is intact. Once the denaturing agent is removed, the original interactions between amino acids return the protein to its original conformation and it can resume its function.

However, denaturation can be irreversible in extreme situations, like frying an egg. The heat from a pan denatures the albumin protein in the liquid egg white and it becomes insoluble. The protein in meat also denatures and becomes firm when cooked. Denaturing a protein is occasionally irreversible : Top The protein albumin in raw and cooked egg white. Chaperone proteins or chaperonins are helper proteins that provide favorable conditions for protein folding to take place.

The chaperonins clump around the forming protein and prevent other polypeptide chains from aggregating. Once the target protein folds, the chaperonins disassociate. Privacy Policy. Skip to main content. Biological Macromolecules. Search for:. Types and Functions of Proteins Proteins perform many essential physiological functions, including catalyzing biochemical reactions.

Learning Objectives Differentiate among the types and functions of proteins. In fact, NMR has been used to directly study proteins in the cell, even if they are unfolded. We will see that NMR can be used to reveal all the atomic positions within proteins and how these move and change in real-time when interacting with other molecules such as other proteins or drugs. Such information can tell the structural biologist how a protein can exert its function inside the cell.

With this information, we can better understand how proteins lose function in disease, how to engineer them to be more effective or how to design drugs to alter their behaviour. NMR is based on the observation that certain atomic nuclei have a property called a spin magnetic moment.

A common analogy is that each nucleus behaves as if it were a tiny bar magnet pointing in a particular direction. A typical protein sample at millimolar concentrations will contain 10 17 molecules and therefore 10 17 copies of each given atom Figure After a short time few seconds , equilibrium is reached and the bar magnets start to rotate or precess around the external field all at the same resonance or Larmor frequency, although at a range of different angles, like a large collection of spinning tops that are tilted to various degrees from the ground second column in Figure However, instead of cancelling out, there are now slightly more in an orientation parallel to the external magnetic field B 0.

This occurs as this orientation has the lowest energy with the external magnetic field B 0. This slight bias within the group produces on average, a slight net upward magnetic moment, which we call bulk magnetisation.

At equilibrium this magnetisation is stationary pointing along the z-axis, and no net magnetisation is present in the x—y plane and the spins are out of phase and do not precess together in synchrony. A given atom in a protein is represented as many vectors with different directions as there will be many copies in the sample bottom panel.

The individual vectors average or sum to generate a bulk magnetisation vector thick black line with properties that represent all of these identical atoms top panel.

Before an external magnetic field B 0 is applied, individual vectors point in all directions and no bulk magnetisation vector is present left. However, after a B 0 field is applied grey arrow in bottom panel the sample generates a net magnetisation along the magnetic field direction the z-axis which can be represented by a bulk magnetisation vector thick black arrow in top panel.

When a short RF-Pulse along the x-axis has been applied, the bulk magnetisation is nudged into the x—y plane and immediately afterwards starts to rotate about the z-axis in a corkscrew motion at its Larmor frequency chemical shift as it returns back to its equilibrium position. The RF-pulse is effective as it generates a short-lived oscillating B 1 magnetic field in the coil, along the x-axis, which is at the same Larmor frequency of the nuclei under study, allowing it to rotate magnetisation toward the x—y plane.

This is similar to effectively pushing a child on a swing, one constant push constant B 1 is not as effective as pushing with the natural frequency of the swing oscillating B 1. When a very short, yet carefully chosen length, radiofrequency RF electrical pulse is applied through a wire coil close to the sample but wrapping around the x-axis, it generates a weak varying magnetic field B 1 that is perpendicular to B 0.

This transverse RF magnetic field tips the bulk magnetisation away from the vertical axis exactly into the x—y plane. Following the pulse, spins start to precess out of phase again, and the bulk magnetisation returns to align with the z-axis in a corkscrew motion, precessing around this z-axis at its distinctive resonance Larmor frequency Figure The x-component of the rotating bulk magnetic field that is generated immediately after the pulse causes a simple oscillating electrical current with exponentially decaying amplitudes that is recorded as a time-dependent free-induction decay FID in the same coil that generated the pulse.

The key to this experiment is measuring the oscillating signal away perpendicular from the B 0 field, which is much stronger and would mask this signal if you tried to measure along the z-axis.

For clarity, Figure 17 only shows multiple copies of the same atom, however a protein contains many different atoms and so the FID we record is a mixture of different oscillations at different frequencies. Fourier transformation of this complicated FID by a computer generates a frequency-dependent spectrum consisting of signals separated by the Larmor frequencies of the atoms in the molecule.

The number of signals is equal to the number of magnetically different atoms in the molecule. The position of signals is called the chemical shift and is measured in ppm parts per million units relative to the frequency of a standard chemical included in the sample. Using the ppm scale instead of a Larmor frequency scale makes spectra independent of the B 0 magnetic field used for a given NMR spectrometer. For a protein, when using an RF-pulse designed to only perturb hydrogens, there could be as many as 1 H nuclei within the combined amino acids.

Each 1 H atom in a protein is surrounded by a unique chemical environment from electrons in nearby atoms in the biomolecule that leads to a slightly different Larmor frequency compared with other 1 H atoms. These nearby electrons have the effect of shielding the nuclei from the full strength of the external B 0 magnetic field, which in turn affects its rate of precession Larmor frequency.

For example, if electrons are pulled away from a hydrogen, the Larmor frequency for that hydrogen is shifted downfield as the atom is less shielded, which causes the chemical shift to increase. This is seen for amide hydrogens which are attached to nitrogen, an electronegative atom Figure Conversion of Larmor frequency Hz into chemical shift ppm as seen in the 1D 1 H NMR spectrum of a protein allows for values to be independent of the magnet strength used.

Each peak represents the hydrogen atoms connected to different carbons or nitrogens in the protein. The chemical shifts are different because the 1 H nuclei all experience slightly different magnetic environments based on their chemical group and position in the protein and thus their bulk magnetisation vectors rotate at slightly different frequencies.

Hydrogens found in common chemical groups in amides, aromatics, aliphatics, methyl etc. The well-dispersed peaks between 6 and 10 ppm in the backbone amide region indicate that the protein is well folded. It is common to make a higher dimensional spectrum such as the 2D spectrum that plots the chemical shift values for pairs of atoms connected by a covalent bond to better resolve the overlapping signals.

Abbreviation: 2D, two dimensional. Although studying one nucleus from one atom in a protein can be informative, in order to study all of them, we must know which chemical shift belongs to which atom in the protein. This requires a series of experiments that measure multiple different types of magnetically active nuclei 1 H, 13 C, 15 N on recombinant proteins that have incorporated 13 C and 15 N isotopes. A simple H-N two dimensional 2D spectrum can be recorded on a recombinant protein that has incorporated the relatively inexpensive 15 N isotope.

After our assignment experiments described above, we can label each peak in this spectrum as an amino acid according to the chemical shift value for its backbone amide nitrogen and hydrogen. The simple H-N 2D spectrum is incredibly powerful, as it is an excellent check on the condition of a protein, before embarking on lengthy experiments such as structure determination.

It can tell you if the protein is folded, by checking if the peaks are well dispersed in the spectrum and not simply concentrated in the middle of the spectrum between 7. It can tell you if the protein is aggregated by examining the shape of the peaks if they are spread out and broadened then that could indicate some form of self-association. It can also indicate if parts of your protein are dynamic as usually these peaks are missing in the spectrum.

Crucially, H-N 2D spectra are frequently used to look at interactions with other proteins, ligands or drugs. Binding partners often are unlabelled contain the natural 12 C and 14 N isotopes to ensure they will not contribute to the spectra.

However, when they are added to a labelled protein, we can quickly tell which of its amino acids are involved in binding, as these peaks will shift due to the new environment created by the binding partner. This allows us to map the binding surface on to the protein and estimate the strength of binding by titrating the binding partner into the protein and recording a series of 2D spectra to follow the peak positions.

Each numbered peak in this 2D spectrum represents an amino acid in a simple protein domain through its backbone or sidechain amide group. An amide group has one nitrogen and one hydrogen and given each amino acid is in a slightly different chemical environment based on how the protein has folded and which sidechain it contains, the chemical shift values for each N and H pair are different for each amino acid.

To gain the full 3D structure of a protein, we need to assign all atoms to chemical shift values. In this experiment, the transfer of magnetisation from one hydrogen atom to another nearby hydrogen atom in 3D space is recorded.

The size or strength of the bulk magnetisation vector after the transfer has occurred in a NOESY experiment tells us how close that atom was to the nearby atom. After identifying all possible Nuclear Overhauser Effects NOEs for the protein, we produce a series of atom—atom distances that connect the polypeptide to itself and help define its fold.

We use a computer to find the fold that is consistent with all of these measured distances by doing a series of molecular dynamics simulations which is repeated approximately times. Protein NMR spectroscopy is powerful as once an ensemble of structures is determined, further experiments are performed that detail the dynamics of each atom and the bonds they form.

NMR thus gives information about how the protein moves in solution and combined with additional molecular dynamics techniques Table 3 , it is possible to estimate its conformational ensemble.

NMR dynamics experiments can also be performed on assigned proteins which do not have an NMR structure determined, and the results simply mapped on to a model determined previously making the process quicker.

As such, NMR can quickly provide a wealth of information as soon as protein has been purified. NMR is an essential tool as protein motions are central to function inside the cell as we saw when we considered how proteins fold and their associated dynamics.

One of the drawbacks of X-ray crystallography is the need for a crystal to produce the diffraction patterns and a drawback of NMR is there is a limitation on the size of the protein that can be studied. In the s, Nigel Unwin was trying to determine the shape of a protein called bacteriorhodopsin.

Unable to produce a crystal of the molecule, electron microscopy EM was used to gain the structural outline of this protein, demonstrating how it can move protons across a membrane. Improvements in the methodology enabled Richard Henderson in to determine the first atomic-resolution images of bacteriorhodopsin using newly developed cryo-EM methods.

It opened the door to the structural determination and functional understanding of very large complex protein structures without the need for crystallisation.

Such is the progress and quality of cryo-EM images that the images now rival those of X-ray crystallography, with all the additional advantages for easier sample preparation. Transmission EM TEM operates on the same basic principles as a light microscope but uses a beam of electrons to examine the structures of cells and tissues. The incoming and outgoing lenses of a light microscope are replaced by a series of coil-shaped electromagnetic lenses through which the electron beam travels to produce magnified images.

Only parts of the beam are transmitted through the sample depending on their thickness and electron transparency. A final lens then refocusses this and projects an image of the sample onto a camera detector.

To help improve the contrast of the very thin samples, heavy metal stains are often used to bind the proteins and stop the transmission of the electrons. The image then shows regions of the specimen where the electron transmission has been prevented. Biological molecules such as individual proteins and complexes are not compatible with the high vacuum needed for TEM as the high energy electrons burn the protein and evaporate the water that surrounds them.

Cryo-EM uses the same principle as TEM but cools the samples to cryogenic temperatures and embeds them in an environment of vitreous ice, allowing protein and protein complexes to be studied.

To do this an aqueous protein sample solution is applied to a grid-mesh and plunge-frozen in liquid ethane. The process is so quick that the water molecules do not have time to arrange into a crystalline lattice. Stains are not needed here as the surrounding buffer allows for enough contrast to observe the specimen, to improve contrast multiple images are taken instead.

Randomly orientated proteins are struck by the electron beam, producing a faint image on the detector. A computer then decides what is a faint molecular image of the proteins and what is the background. Similar images are then placed grouped together. Thousands of similar images are averaged by the computer to generate high signal to noise 2D images Figure 20 that are used to clean-up the dataset from contamination and other junk particles. Software is then used to calculate how all the good molecular images relate to each other and generates a high-resolution 3D image or density map.

The amino acid chain is then threaded into this map in a similar process to X-ray crystallography. Cryo-EM offers a significant advantage in that through the direct acquisition of the images, the specimen can be statistically analysed allowing for the reconstruction of the structural information and different conformations can be determined in the same sample. It is also possible to control the chemical environment, which in turn allows for effective examination of different functional states of different types of molecules.

The final major advantage of cryo-EM is that large intact complexes can be studied allowing the 3D structure of ribosomes, proteins and viruses, almost to the atomic scale. Image processing outline illustrated with data from the small pore-forming toxin lysenin.

To capture the initial images, protein samples are transferred onto a copper mesh grid coated with a perforated carbon film. A beam of electrons is then use to capture a faint trace image of the protein. The computer determines what is protein and what is background. Similar images of the protein in the same orientation are placed into the groups. Using thousands of similar images of the protein, the computer generates a high-resolution 2D image by averaging all the faint images.

A 3D image is then calculated by working out how the 2D images relate to each other producing an electron density map from which the structure is then determined. Biochemist 41 , 46— Cryo-EM of proteins and their complexes promises to revolutionise structural biology as many life processes depend on large dynamic macromolecular assemblies, however like all the methods described here there are some naunces.

For example, it can be difficult to prepare a grid that has a well-represented number of orientations as sometimes the proteins will preferentially align towards the hydrophobic air—water interface, on occasions the proteins will denature, and screening multiple grids with different conditions can be expensive.

Nevertheless, datasets sufficient for high-resolution structures can be recorded in just a few hours or overnight and the amount of protein required is much less than X-ray crystallography or NMR, and the samples do not have to be as pure, all of which helps balance the cost of this incredibly powerful technique. There are a vast range of other methods that are also used to study protein structure and their interactions, many of which can be performed in just one day and yield complementary information to the techniques mentioned above.

Table 3 gives an overview of some of the more common methods and their applications. Within this essay, we have explored proteins through the eyes of a structural biologist. We have considered the following areas: The structural organisation of proteins and their range of shapes and conformations. How to experimentally determine protein structures and their interactions at the molecular level.

We hope this will inspire readers to view some of the suggested resources which provide more detail on uncovering protein structure. Bioinformatics 33 , 56—63]. Nucleic Acids Res. All other pdb codes for figures are indicated in their legends. We also acknowledge Bryan Sutton for providing the electron density figure. Sign In or Create an Account. Advanced Search. Sign In. Skip Nav Destination Article Navigation. Close mobile search navigation Article navigation.

Volume 64, Issue 4. Previous Article Next Article. All Issues. Cover Image Cover Image. The Understanding Biochemistry issues of Essays in Biochemistry provide informative and accessible up-to-date overviews of key areas of biochemistry for post students, teachers and undergraduates.

Part 1: The structural properties of proteins. Part 2: Approaches to study protein structure. Concluding comments. Data Availability. Competing Interests. Author Contribution. Further reading and resources. Article Navigation. Review Article September 25 Uncovering protein structure Elliott J Stollar Elliott J Stollar. Correspondence : Elliott Stollar e. This Site. Google Scholar. Essays Biochem 64 4 : — Article history Received:.

Revision Received:. Connected Content. A correction has been published: Correction: Uncovering protein structure. Get Permissions. Figure 1. View large Download slide. Figure 2. Figure 3. Proteins have diverse structures and functions.

Figure 4. Resonance stabilisation causes the peptide bond to have double-bond character and carry a dipole. Figure 5. Figure 6. Table 1 The two principal systems for classifying protein domains. View Large. Box 1 Thermodynamics Box.

The second law of thermodynamics states that the entropy of the universe always increases, in other words, for protein folding to be favourable to occur, the entropy of the universe must increase as a result of this process.

Entropy is often described as disorder, which is a familiar term to most of us in a physical sense, for example, as we have seen in the main text, water molecules that surround and interact with an unfolded protein are quite ordered and constrained and it is only when proteins fold and expel these water molecules that they can leave the protein surface and move around more and essentially increase their disorder.

A better way to think of entropy is to do with the number of ways energy can be distributed in a system. For example, if an object is hot, it has lots of thermal energy concentrated in one place in the object. However, if you place that object in some cold water, heat always transfers to the water and heats it up as the thermal energy is dispersed and spread away from the object into the water. This happens as energy dispersal increases the number of ways that energy can be distributed.

In fact, whenever there is greater movement of bonds or atoms in molecules there are more ways to distribute energy. In an exothermic reaction, energy is released to the surroundings and increases the entropy of the universe as the energy has now been dispersed.

The quantity of Gibbs free energy is used to keep track of the entropy change of the universe eqn 1. Figure 7. Figure 8. Figure 9. Figure Feedback inhibition in metabolic pathways.

Table 2 Types of motions found in proteins. These can range in timescales from hours to fractions of seconds. Time taken s. Energy source. Richardson and D. Turns generally occur when the protein chain needs to change direction in order to connect two other elements of secondary structure. The most common is the beta turn, in which the change of direction is executed in the space of four residues.

You will sometimes hear the phrase "beta hairpin" which can be used to describe a beta turn joining two anti-parallel beta strands together. Beta turns are subdivided into numerous types on the basis of the details of their geometry. Some regions of the protein chain do not form regular secondary structure and are not characterized by any regular hydrogen bonding pattern.

These regions are known as random coils and are found in two locations in proteins:. Random coils can be 4 to 20 residues long, although most loops are not longer than 12 residues. Most loops are exposed to the solvent and are have polar or charged side-chains. In some cases loops have a functional role, but in many cases they do not. As a result, loop regions are often poorly conserved i. Gernert and Kim M. As we have learned, the order of the AAs is the primary structure and all residues in a polypeptide chain have the same main-chain atoms.

What vary are the side chains R groups. Do the specific AAs present dictate the secondary structure? As shown in the figure, all amino acids can be found in all secondary structure elements, but some are more or less common in certain elements. Pro and Gly, for isntance, aren't good in helices but are favored in beta-turns. If we take this a step further and ask whether 2, 3, or 4 amino acids combinations dictate secondary structure we find a stronger correlation, but still not strong enough to reliably predict tertiary structure.

Proteins are abundant in all organisms and are fundamental to life. The diversity of protein structure underlies the very large range of their functions: enzymes biological catalysts , storage, transport, messengers, antibodies, regulation, and structural proteins. Proteins are linear heteropolymers of fixed length; i. There is therefore a great diversity of possible protein sequences. The linear chains fold into specific three-dimensional conformations, which are determined by the sequence of amino acids and therefore are also extremely diverse, ranging from completely fibrous to globular.

Covalent disulfide bonds can be introduced between cysteine residues placed in close proximity in 3D space -- this provides rigidity for the resulting 3D structure.

Ribbon diagrams like the one shown here are a common way to visualize proteins. Protein structures can be determined to an atomic level by X-ray diffraction and neutron-diffraction studies of crystallized proteins, and more recently by nuclear magnetic resonance NMR spectroscopy of proteins in solution.

The structures of many proteins, however, remain undetermined. To view an example of tertiary structure in KiNG, click here. This is ribonuclease A, an enzyme responsible for the degradation of RNA. The image depicts all atoms of one half of the molecule cyan for side chains, brown for hydrogen atoms and just main chain and side chains for the other half.

The alternate view shows main-chain atoms and H-bonds purple. Click "Animate" to cycle between the views. Although hydrogens constitute about half the atoms in a protein, they are seldom shown explicitly because they are hard to detect with x-ray crystallography due to low electron density and they very much complicate the picture.

Mutations that alter structure locally can be distinguished from those that do not through a machine-learning logistic regression method. This discriminative power was particularly unexpected given the enormous structural variability of pentamers.

Mutants for which our method predicted a change of structure were also enriched in terms of disrupting stability and function. Although distinguishing change and no change in structure, the new method overall failed to distinguish between mutants with and without effect on stability or function.

Local structural change can be predicted. Future work will have to establish how useful this new perspective on predicting the effect of nsSNPs will be in combination with other methods. Evolution creates the specific protein landscape that we observe today. Mutations are random but selection is the driving force that shapes the observable protein variety by favoring those deviations that maintain or improve phenotype.

Although many different sequences map to similar structures, point mutants can change structure dramatically [ 4 — 6 ]. Some of the intricate details of 3D structures are crucial for function. Therefore, such local conformational changes may impact protein function and may cause disease. Usually, this is more likely for structure changes connected to binding sites. For instance, the disruption of hydrophobic interactions, or the introduction of charged residues into buried sites, or mutations that break beta-sheets often impact phenotype severely and raise the susceptibility for disease [ 7 — 9 ].

Using 83 X-ray mutant structures from 13 classes of proteins, an early work pioneered the prediction of local structural changes by expert rules operating on position-dependent rotamers [ 10 ]. It is unclear, how well such an approach would cope with the protein variety found in the current PDB [ 11 ]. Thus, we followed a different approach. We compiled a set of structurally superimposed pairs of protein fragments with identical sequence except for one central residue mismatch, and applied machine-learning to predict structural change from sequence.

Then we applied two techniques for redundancy reduction. Each pentamer from the first set cdhit98 was paired with each pentamer from the second set hval0. We also filtered out pairs for which either fragment was already in a much larger fragment that fulfilled the above criteria.

This procedure yielded 35, pentamer pairs. For each pair, we calculated the root mean square displacement RMSD over all C-alpha atoms after optimal superposition of the two pentamer backbones McLachlan algorithm [ 15 ] as implemented in ProFit [ 16 ]. To turn the continuous RMSD differences into a binary problem mutant changes structure or not , we had to decide what constitutes a structural effect and what is neutral in that sense.

In lack of a scientifically meaningful definition for structural change of pentamers, we chose thresholds that appeared reasonable given the observed distributions and that separated all pentamer pairs into an even amount of structurally neutrals and non-neutrals. For each such pair we randomly designated one fragment as wild type fragment and the central mismatch residue of the other fragment as the mutant amino acid.

For comparison, we also used two data sets that had been used previously Additional file 1. The first set comprised 12, functionally neutral and 35, functional effect mutants from 3, proteins [ 17 , 18 ].

The second consisted of mutants having an effect on protein stability and mutants with no effect on stability covered by 47 proteins [ 19 , 20 ]. Various methods predict other aspects of the impact for amino acid changes, e.

Both methods return raw numerical scores reflecting direction and reliability of the prediction. SNAP values range from neutral for function to change of function. We adhered to the same decision cutoffs as mentioned above to define neutral and non-neutral. We applied logistic regression to learn the structural change upon amino acid change. Many protein features may be relevant for the given prediction task. Our feature construction procedure adhered to a protocol established during the development of SNAP [ 17 ].

All features were derived from protein sequence alone and were extracted from PredictProtein [ 23 ], a wrapper that combines a large number of independent prediction methods. We used three conceptually different types of features: 1 global features describing the global characteristics of a protein, 2 local features describing one particular pentamer and its immediate sequence neighborhood, and 3 difference features that explicitly describe sequence-derived aspects by which wild type and mutant amino acid differ.

The bin that represented the sequence length was set to 0. Amino acid composition was encoded by 20 values representing relative frequencies of standard amino acids.

We predicted secondary structure and solvent accessibility using PROFphd [ 24 , 25 ]. Three values represented the relative content of residues in predicted helix, strand and loop conformation and, similarly, three values were used to encode the relative content of predicted buried, intermediate and exposed residues.

We considered window lengths of 1 position of change only , 5, 9, 13, 17 and 21 consecutive residues centered on the position of change. Values were normalized to the interval [0, 1]. The biochemical characteristics of an amino acid influence the local structural conformation. We considered six different structural and biochemical propensities: mass, volume [ 26 ], hydrophobicity [ 27 ], C-beta branching [ 28 ], helix breaker only proline and electric charge of side chain.

Evolutionary information contained in sequence profiles is a valuable source to obtain knowledge about which amino acids are compatible with a specific region in the protein. While some residues are tolerated others could disrupt structure. Furthermore, we took position-specific independent counts PSIC [ 31 ] and adhered to a protocol necessary for sequence extraction and generation of multiple alignment as described elsewhere [ 17 ].

In addition, we used the following predicted structural and functional features: secondary structure [ 32 , 33 ] and solvent accessibility [ 24 , 25 , 32 ], protein flexibility [ 34 ], protein disorder [ 35 — 38 ], protein-protein interaction hotspots [ 39 — 41 ] and DNA-binding residues [ 42 ].

Most prediction methods used to generate features returned both a discrete prediction and a score reflecting the strength and reliability of the prediction. We incorporated both outputs in our feature set. Two-state predictions disorder, protein and DNA interaction were encoded as two mutually exclusive combinations of 1 and 0, each representing the presence 1 and absence 0 of a state e.

Three-state predictions secondary structure elements helix, strand, other and solvent accessibility states buried, intermediate, exposed were handled similarly. Flexibility was predicted as a numerical value only. We considered information about the location of the site of change in the sequence relative to a protein domain as an important feature. For example, a hydrophobic-to-polar exchange within the core of a domain may have a more severe impact on local structure than a change that happens in a surface loop.

Of specific interest was the information about whether the residue resided in a domain, the conservation of that position within the domain alignment, how well the residue fitted into the alignment position and the posterior probability of that match.

We represented the difference of a particular property separately by its absolute and its sign, encoded as 0 negative or 1 positive. The following properties were encoded in that respect: Change in any of the six amino acid propensities, difference in conservation scores PSSM, relative frequency, PSIC , change in IUPred predictions for both short and long disorder, change in predicted secondary structure and solvent accessibility.

For the latter two we ran PROFphd on raw sequence rather than sequence profile. Although this mode resulted in reduced prediction performance, it allowed us to observe an actual difference in the prediction outcome, which would have been disguised by the use of sequence alignments otherwise.



0コメント

  • 1000 / 1000