Quantitative Structure-Activity Relationships (QSAR)
Quantitative Structure-Activity Relationships (QSAR)
History of QSAR
The first application of QSAR is attributed to Hansch (1969), who developed an equation that related biological activity to certain electronic characteristics and the hydrophobicity of a set of structures.
log (1/C) = k1log P - k2(log P)2 + k3s + k4
for: C = minimum effective dose P = octanol - water partition coefficient s = Hammett substituent constant kx= constants derived from regression analysis
Hanschs Approach
Log P is a measure of the drugs hydrophobicity, which was selected as a measure of its ability to pass through cell membranes. The log P (or log Po/w) value reflects the relative solubility of the drug in octanol (representing the lipid bilayer of a cell membrane) and water (the fluid within the cell and in blood). Log P values may be measured experimentally or, more commonly, calculated.
Calculating Log P
Log P = Log K (o/w) = Log ([X]octanol/[X]water)
most programs use a group additivity approach:
1 Aromatic ring 7 Hs on Carbon 1 C-Br bond 1 alkyl C 0.780 1.589 -0.120 0.195
CH2 Br
some use more complicated algorithms, including factors such as the dipole moment, molecular size and shape.
Hanschs Approach...
The Hammett substituent constant (s) reflects the drug molecules intrinsic reactivity, related to electronic factors caused by aryl substituents. In chemical reactions, aromatic ring substituents can alter the rate of reaction by up to 6 orders of magnitude! For example, the rate of the reaction below is ~105 times slower when X = NO2 than when X = CH3
X C H Cl CH3OH
C OCH3 + H
H Cl
Hammett Equation
Hammett observed a linear free energy relationship between the log of the relative rate constants for ester hydrolysis and the log of the relative acid ionization (equilibrium) constants for a series of substituted benzoic esters & acids.
log (kx/kH) = log (Kx/KH) = rs
He arbitrarily assigned r, the reaction constant, of the acid ionization of benzoic acid a value of 1.
Definition of Hammett r
O C X OH X O C O + H
substituent s p Eq. constant -NH2 -0.66 0.00000554 -OCH3 -0.27 0.000015 -CH3 -0.17 0.000023 -H 0.00 0.000034 -Cl 0.23 0.000055 -COCH3 0.5 0.000088 -CN 0.66 0.000128 -NO2 0.78 0.000166
Hammett Plot
-3.7 -3.9 -4.1 -4.3 -4.5 -4.7 -4.9 -5.1 -5.3 -1 -0.5 0 sigma p
Log K
0.5
These sp values are obtained from the best fit line having a slope = 1
Hammett Plot
Aryl substituent constants (s) were determined by measuring the effect of a substituent on a reaction rate (or Keq). These are listed in tables, and are constant in widely different reactions. Reaction constants (r) for other reactions may also be determined by comparison of the relative rates (or Keq) of two differently substituted reactants, using the substituent constants described above. Some of these values (s and r) are listed on the following slide.
X C H Cl
CH3OH
X C OCH3 + H H Cl r = - 5.0
Substituent (Sigma) Values s (the electronic effect of the substituent; negative values are electron donating) p-NH2 -0.66 p-Cl 0.23 p-OCH3 -0.27 p-COCH3 0.50 p-CH3 -0.17 p-CN 0.66 m-CH3 -0.07 p-NO2 0.78
density pKa ionization energy boiling point Hvaporization refractive index molecular weight dipole moment (m) Hhydration reduction potential lipophilicity parameter p = log PX - log PH
ovality HOMO energy polarizability molecular volume vdW surface area molar refractivity hydration energy
QSAR Methodology
Often it is found that several descriptors are correlated; that is, they describe observables that are closely related, such as MW and boiling point in a homologous series. Statistical analysis is used to determine which of the variables best describe (correlate with) the observed biological activity, and which are cross-correlated. The final QSAR involves only the most important 3 to 5 descriptors, eliminating those with high cross-correlation.
2 points exactly determine a line (2 compds, 2 prop) 3 points exactly determine a plane (etc., etc.) A data set of drug candidates that is similar in size to the number of descriptors will give a high (and meaningless) correlation.
Example of a QSAR
Br X Y CH3 N CH3
Anti-adrenergic Activity and Physicochemical Properties of 3,4- disubstituted N,N-dimethyl-a-bromophenethylamines p = Lipophilicity parameter s+ = Hammett Sigma+ (for benzylic cations) Es(meta) = Tafts steric parameter
Example of a QSAR...
m-X H F H Cl Cl Br I Me Br H Me H Cl Br Me Cl Me H H Me Br Br p-Y H H F H F H H H F Cl F Br Cl Cl Cl Br Br I Me Me Br Me p 0.00 0.13 0.15 0.76 0.91 0.94 1.15 0.51 1.09 0.70 0.66 1.02 1.46 1.64 1.21 1.78 1.53 1.26 0.52 1.03 1.96 1.46 s+ 0.00 0.35 -0.07 0.40 0.33 0.41 0.36 -0.07 0.34 0.11 -0.14 0.15 0.51 0.52 0.04 0.55 0.08 0.14 -0.31 -0.38 0.56 0.10 Es(meta) 1.24 0.78 1.24 0.27 0.27 0.08 -0.16 0.00 0.08 1.24 0.00 1.24 0.27 0.08 0.00 0.27 0.00 1.24 1.24 0.00 0.08 0.08 log (1/C)obs 7.46 7.52 8.16 8.16 8.19 8.30 8.40 8.46 8.57 8.68 8.82 8.89 8.89 8.92 8.96 9.00 9.22 9.25 9.30 9.30 9.35 9.52
Calc. log (1/C)a 7.82 7.45 8.09 8.11 8.38 8.30 8.61 8.51 8.57 8.46 8.78 8.77 8.75 8.94 9.15 9.06 9.46 9.06 8.87 9.56 9.25 9.35
Calc. log (1/C)b 7.88 7.43 8.17 8.05 8.34 8.22 8.51 8.36 8.51 8.60 8.65 8.94 8.77 8.94 9.08 9.11 9.43 9.26 8.98 9.47 9.29 9.33
Example of a QSAR...
QSAR Equation a: (using 2 variables) log (1/C) = 1.151 p - 1.464 s + + 7.817 (n = 22; r = 0.945)
QSAR Equation b: (using 3 variables) log (1/C) = 1.259 p - 1.460 s + + 0.208 Es(meta) + 7.619 (n = 22; r = 0.959)
Example of a QSAR...
m-X H F H Cl Cl Br I Me Br H Me H Cl Br Me Cl Me H H Me Br Br p-Y H H F H F H H H F Cl F Br Cl Cl Cl Br Br I Me Me Br Me p 0.00 0.13 0.15 0.76 0.91 0.94 1.15 0.51 1.09 0.70 0.66 1.02 1.46 1.64 1.21 1.78 1.53 1.26 0.52 1.03 1.96 1.46 s+ 0.00 0.35 -0.07 0.40 0.33 0.41 0.36 -0.07 0.34 0.11 -0.14 0.15 0.51 0.52 0.04 0.55 0.08 0.14 -0.31 -0.38 0.56 0.10 Es(meta) 1.24 0.78 1.24 0.27 0.27 0.08 -0.16 0.00 0.08 1.24 0.00 1.24 0.27 0.08 0.00 0.27 0.00 1.24 1.24 0.00 0.08 0.08 log (1/C)obs 7.46 7.52 8.16 8.16 8.19 8.30 8.40 8.46 8.57 8.68 8.82 8.89 8.89 8.92 8.96 9.00 9.22 9.25 9.30 9.30 9.35 9.52
Calc. log (1/C)a 7.82 7.45 8.09 8.11 8.38 8.30 8.61 8.51 8.57 8.46 8.78 8.77 8.75 8.94 9.15 9.06 9.46 9.06 8.87 9.56 9.25 9.35
Calc. log (1/C)b 7.88 7.43 8.17 8.05 8.34 8.22 8.51 8.36 8.51 8.60 8.65 8.94 8.77 8.94 9.08 9.11 9.43 9.26 8.98 9.47 9.29 9.33
Neolignans
Descriptors Used
Log P: the values of this property were obtained from the hydrophobic parameters of the substituents; superficial area (A) and molecular volume (V), log of the partition coefficient (Log P), hydration energy (HE): properties evaluated with the molecular modeling package HyperChem 5.0; partial atomic charges (Qn) and bond orders (Ln) derived from the electrostatic potential; energy of the HOMO (H) and LUMO (L) frontier orbitals; hardness (h): obtained from the equation h =(ELUMO-EHOMO)/2; Mulliken electronegativity (c): calculated from the equation c = -(EHOMO+ELUMO)/2; other electronic properties were calculated: total energy (ET), heat of formation (DHf); ionization potential (IP), dipole moment (m) and polarizability (POL), whose values were obtained from the molecular orbital pprogram Ampac 5.0.
Antifungal QSAR
Log 1/C = -2.85 - 0.38 HE - 1.45 Q1'
F=29.63, R2=0.86, Q2=0.80, SEP=0. where: F is the Fisher test for significance of the eqn. R2 is the general correlation coefficient, Q2 is the predictive capability, and SEP is the standard error of prediction.
A.A.C. Pinheiro, R.S. Borges, L.S. Santos, C.N. Alves, Journal of Molecular Structure: THEOCHEM, Vol 672, pp 215-219 (2004).
New Neolignans
CoMFA of Testosterone
Blue means electronegative groups enhance, red means Electng. grps reduce binding Green means bulky groups enhance, yellow means they reduce binding