Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

Share Embed


Descripción

Send Orders for Reprints to [email protected] Mini-Reviews in Medicinal Chemistry, 2014, 14, 35-55

35

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Manoj G. Damale1, Sanjay N. Harke1, Firoz A. Kalam Khan2, Devanand B. Shinde3 and Jaiprakash N. Sangshetti*2 1

Department of Bioinformatics, MGM’s Institute of Biosciences and Technology, Aurangabad (MS) India-431003; Y.B. Chavan College of Pharmacy, Dr. Rafiq Zakaria Campus, Rauza Baugh, Aurangabad (MS) India-431001; 3 Department of Chemical Technology, Dr. B.A.M. University, Aurangabad (MS) India-431004 2

Abstract: The quantitative structure activity relationship (QSAR) study is the most cited and reliable computational technique used for decades to obtain information about a substituent’s physicochemical property and biological activity. There is step-by-step development in the concept of QSAR from 0D to 2D. These models suffer various limitations that led to the development of 3D-QSAR. There are large numbers of literatures available on the utility of 3D-QSAR for drug design. Three-dimensional properties of molecules with non-covalent interactions are served as important tool in the selection of bioactive confirmation of compounds. With this view, 3D-QSAR has been explored with different advancements like COMFA, COMSA, COMMA, etc. Some reports are also available highlighting the limitations of 3D-QSAR. In a way, to overcome the limitations of 3D-QSAR, more advanced QSAR approaches like 4D, 5D and 6D-QSAR have been evolved. Here, in this present review we have focused more on the present and future of more predictive models of QSAR studies. The review highlights the basics of 3D to 6D-QSAR and mainly emphasizes the advantages of one dimension over the other. It covers almost all recent reports of all these multidimensional QSAR approaches which are new paradigms in drug discovery.

Keywords: Biological activity, Molecular descriptors, Multidimensional QSAR, Physicochemical property, QSAR. INTRODUCTION The account of Rational Drug design starts with the discovery of lead molecule by trial-and-error process or screening the library of lead compounds [1]. Nowadays, a Quantitative Structure Activity Relationship analysis is mostly used in high-throughput screening of combinatorial libraries of small chemical compounds and moved further to check the activity of a diverse set of designed small compounds [2]. The QSAR is a knowledge-based method where a statistical prediction model is made about biological activity and the presence of molecular descriptor. The aim of carrying out a QSAR study is with the help of computational methods the QSAR model can help evaluate biological activity; this is mostly done to reduce failure rate in the drug development process [3]. The historical aim of QSAR studies is to predict the specific biological activity of a series of test compounds. Nowadays the main objective of these studies is to predict biological activity of Insilico-designed compounds on the basis of already synthesized compounds [4]. In a QSAR study molecules are characterized based on the presence of molecular descriptors and these descriptors are mostly used to calculate the basis of physicochemical properties of ligand molecule such as logP, pKa, mol. wt, logD, molecular refractive index, molecular surface area, molecular interaction field, etc. The constructed mathematical

model which indicates the association between molecular descriptors and biological activity, is validated internally and externally in order to assess the predicative power of the QSAR model (Fig. 1). The interpretations of these models are carried out by various methods like pattern recognition, machine and artificial intelligence. The first structure activity relationship study was conducted in the late eighteenth century when different alkaloids were studied, by CrumBrown and Fraser. To demonstrate the alkylation of basic nitrogen, of a ring system results in the formation of quaternary t-amine compounds which are different from basic amines, and that now have significant change in its biological action. Since then a variety of quantitative structure activity relationship studies have been reported to predict cytotoxicities, depressant and antibacterial activity of chemical compounds [5-7]. Thousands of QSAR equations have been formulated using the QSAR methodology to validate and elucidate the predicative power of QSAR hypothesis about the mechanism of action of drugs at the molecular level and a more complete understanding of physicochemical phenomena such as hydrophobicity. In 1962 Hansch and Muir published their brilliant study on the 2D structure-activity relationships of plant growth regulators and their dependency on Hammett constants and hydrophobicity [2]. The present review covers all recent developments in the field of QSAR. SCHEME OF QSAR STUDY

*Address correspondence to this author at the Dr. Rafiq Zakaria Campus, Y.B. Chavan College of Pharmacy, Aurangabad-431001 (M.S.), India; Tel:/Fax: +91-240-23801129; E-mail: [email protected] 1875-5607/14 $58.00+.00

The QSAR model studies are mostly carried out in three different steps. © 2014 Bentham Science Publishers

36 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

Damale et al.

Fig. (1). Schematic overview of the QSAR process.

i. Understanding and Selection of Potential Molecular Descriptors from Set of Biologically Active Conformers Understanding and selection of potential molecular descriptors from set of biologically active conformers is most critical step in QSAR model generation as it helps to understand the nature of molecular descriptors prior to actual QSAR model construction. This mostly helps to reduce un-necessary error in study data. The specified properties of chemical compound are used to select the potential molecular descriptors like physicochemical properties; quantum-chemical, geometrical and topological (Table 1). As we do select the potential molecular descriptor the necessary biological information will be obtained from them. One of the earliest approaches for selection of molecular descriptor by manual inspection was plotting a (2D) plot of important molecular descriptor of bioactive conformers. And as of then several methods have developed but first and most important computational method was cluster analysis developed by Hansch which made easier to select compounds with diverse substituent on it [7]. Selection of relevant molecular descriptors is covered under 1D-QSAR model.

Table 1.

Sr. No.

List of desirable attributes of molecular descriptors for use in QSAR studies. Desirable Features Associated with Descriptors

1.

Structural interpretation

2.

Show good correlation with at least one property

3.

Preferably allow for the discrimination of isomers

4.

Applicable to local structure

5.

Generalizable to “higher” descriptors

6.

Independence

7.

Simplicity

8.

Not to be based on properties

9.

Not to be trivially related to other descriptors

10.

Allow for efficient construction

11.

Use familiar structural concepts

12.

Show the correct size dependence

13.

Show gradual change with gradual change in structures

1D-QSAR Various parameters are used to select the potential molecular descriptor that defines the specific molecular properties of conformer like electronic constraints, hydrophobic constraints and steric constraints. a. Electronic Constraints The main aim behind the calculation of electronic effect is to know about inter and intra- molecular interactions, which significantly contribute to biological action. Here the common constant in the QSAR equation is studied, i.e. Hammett constants which include quantum chemical indices such as the lowest unoccupied molecular orbital, the highest unoccupied molecular orbital and polarizabilty.

Hammett was the first to study the electronic nature of chemical compound in case of benzoic acid ionization with water in a chemical reaction to determine activation energy (G). The various substitutions at meta and ortho positions with the help of electron-withdrawing and donating groups are studied. The analysis of both reactions was done, which helps in understanding that electron donating groups will assist the rate of reaction. From the above observation one can make meaningful correlation about change the in the

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

37

electronic nature of substituent and change in the activation energy.

ii. Analysis of Potential Molecular Descriptors in the Context of Analysis of Activity

As the QSAR methodology developed most intensively with the help of the computational method, the electronic nature of chemical compounds is now studied as a wave function using methods like calculation of quantum chemical descriptor and the semi-empirical method. The quantum chemical descriptor method uses constraints such as net atomic changes, highest occupied molecular orbital/lowest unoccupied molecular orbital (HOMO-LUMO) energies, frontier orbital electron densities, and super delocalizabilities to correlate these it with biological activity.

The correlations of biological activity to physicochemical property are made by using the manual method by forming a linear relationship between them [8]. The effects of Hammett and Taft constant are studied for biological activity. The numbers of equation are generated to significantly intercorrelate the activity using a narrow gap between the number of descriptors and the set of dataset. Once the set of molecular descriptors is selected from them most informative sets can be used for the study.

b. Hydrophobic Constraints Hydrophobicity of a compound refers to the physicochemical nature of that compound mostly in connection with the solvent. The first QSAR study of hydrophobic property was conducted in case of growth hormone by Hansch. There are many areas where this property is studied: ligand-receptor interaction, for example solvent interaction in case of detergency, coagulation, membrane permeation and many more. The hydrophobic nature of solute molecule across the solvent is mostly studied by measuring the partition coefficient P, where P is the ratio of concentration of solute present in polar and non-polar solvents. The shake flask method is commonly used to measure the P of a compound in the form of logP value ranging from -3 to 6. There is inconsistency in the manual measurement of accurate logP value so it has been replaced by the automated system ClogP. c. Steric Constraints Steric effect of a molecule arises when a chemical compound attempts to take certain space and this is because of its charge or its specified shape or size. The steric nature is an important phenomenon to study the transport of chemical moiety across bio-membranes. R.W. Taft was the first to study who studied the steric effect of a compound and termed it as Es. Some important parameters that address steric effect more effectively are molar refraction (MR) and molecular volume. Molar refraction of the compound is studied in the form of refractive index which measures the overall bulk of a compound. However, as the reports of several studies show measuring molar refraction of the compound did not allow distinguishing between some shapes of alkyl substituents; hence, molar refraction has been replaced by STERIMOL parameters. These STERIMOL parameters give an overall account of the dimensions of compound like length of the substituent or the bond angle between substituents. In this the steric effect of a compound at several fixed axes is studied. The 3D steric property can be studied for both substituent and parent compound by studying parameters such as bond length and bond order. Parameters such as the length of the substituent and bond length between the parent and the substituent are taken into consideration and by including all these parameters in the study, chemical compound can be studied three dimensionally. Molecular volume of the compound affects on bulkiness study by affecting the transport of moiety across the cellular membrane [2].

iii. Mapping the Specified Value Obtained for Each Descriptor and Feeding as Independent Variable to Correlate With Its Biological Activity The analyzed sets of molecular descriptors which are used for mapping either by linear or nonlinear mapping techniques. The quantified values of each descriptor are used as a function of activity. Many times the methods used for carrying out mapping utilize the information about a training set to obtain the optimal function [7]. 2D-QSAR The 2D-QSAR study is mostly based on specialized molecular fragments that constitute the chemical compound. Mostly, there are different descriptors that include constitutional, topology, total polar surface area, electrostatic and quantumchemical, geometrical and molecular fingerprints property of the chemical compound [9, 10]. a. Constitutional Descriptors These are the descriptors that reflect the constitutional property of a chemical compound without dealing with connectivity or geometry of the chemical compound. These descriptors are molecular weight, number of atoms, number of hydrogen, number of carbon, number of halogen, number of oxygen, number of nitrogen, number of ring system, number of bond like, number of single bond, number of double bond, number of triple bond as well as number of aromatic bond and many more [7]. b. Topological Descriptors The topological descriptors mostly deal with the arrangement of chemical compound which defines the information of a compound like orientation of the internal bond, molecular size, shape, branching and presence of hetero-atoms [7]. Overall the topology of a compound is mentioned in the form of 2-D graph like nodes and edges. There are several indices that are used to signify the molecular connectivity of the compound and are categorized as topochemical and topostructural indices. These indices include the Wiener index, the Randic index, the Balaban J index, the Scultz index, the Kier and Hall index, the Galvez topological charge index, the BCUT Eigen value index, the E-sate index and so on [7, 11]. All of the above-mentioned indices define the overall connectivity, average distance, average valence, net charge transfer, bond polarizabilty, electronic and topological organization of atom in the form of nodes and bond in the form of edges. Topological index is also very helpful in determining the structure and substructure

38 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

Damale et al.

in case of chemical database mining where structure similarity and diversity are measured in a set of structural data using precise computational algorithm [12].

generates numbers of key fingerprints corresponding to a feature of molecule like the nearest neighborhood atom pair, bonded sequence of atom, the specific fragment of ring, etc [14].

c. Topological Polar Surface Area

1D vs. 2D-QSAR

The topological polar surface area (TPSA) is the most conveniently used term in case of ADME prediction of chemical compounds. TPSA is also commonly exploited to measure the relative propensity of ligand molecules, for polar interaction in specific receptor molecule, but still TPSA is the least commonly exploited term. The methods like multiple linear-regression analysis have been employed to study TPSA. Analysis of coefficient of linear–regression will help us know the polar fragments in a compound favoring or disfavoring the biological activity [13].

1D-QSAR is a “classical” form of the QSAR, mostly focused on macroscopic properties of chemical compounds, by representing them in the 1D linear form for example, molecular formulas [15]. Predefined variables are calculated by hand count or a pocket calculator where the concept works on the classical approach given by Hansch. The numbers of additive properties and indicator variable are used to construct the QSAR model [2, 16].

d. Quantum Chemical Descriptors Quantum chemistry method in combination with recently developed efficient computational algorithms is used to study quantum mechanical calculations. These calculations are mostly focused on electronic and geometrical properties of molecules. Because of this reason most of the time electrostatic and quantum chemical descriptors are studied in combination. The mostly used quantum chemical descriptors in QSAR studies are atomic charges, molecular orbital energies, frontier orbital densities, atom-atom polarizabilty, molecular polarizabilty, dipole moment and polarity indices, total energy and many more. Quantum chemical methods are very fast and accurate methods and mostly neglect the solvent effect, so the results obtained from the quantum chemical method are most valid and can be directly applied for correlation with biological activity [7]. e. Geometrical Descriptors/Molecular Fingerprint Descriptors Molecular fingerprint descriptors are used to represent the molecular properties of a compound. There are two methods using which molecular fingerprint can be calculated: one based on fragment dictionary and another is the hash method (binary bit string). In hash method, each bit is a pattern which that represents the characteristic of molecules like structural fragment, connectivity of the molecule and pharmacophoric nature. Each bit is encoded in the form of binary format with a specific value. These molecular fingerprints are mostly used for searching of molecules “similar” to a query molecule; for example, Daylight chemical information system fingerprint is a unique subgraph search algorithm that works on the path-based approach. This algorithm works on learning from the set of training molecular fingerprint of unique connection path (subgraph) to a large bit string (maximum of eight bit) [7]. In 1982, Molecular Drug limited (MDL) started a program that helps in storage and retrieval of chemical reaction information. MDLs create a key or compact set of fingerprints, where predefined observations are used to create a pattern matching fingerprint. The pre-defined data search learning is carried out using a large dataset of chemical compounds present in the MDL databank (320 keys from their 966 sets). Barnard Chemical Information Systems (BCI) is another predicative fingerprint model that works by on first predicting the fingerprint and then implementing it like the presence or absence of certain fragment of molecule. Typically, BCI

The 2D-QSAR is more superior to the classical approach [17]. Topological encoded information is enough to construct the model. The limited numbers of additive and indicator variables are used to construct the QSAR model [7]. It is very easy and fast to calculate the encoded information related to molecular architecture [12]. The 2D-QSAR is a knowledge-based approach that requires some constitutional information about the compound while constructing the model [18]. The linear and nonlinear methods like multiple linear regression method, GA, GA-PLS and PLS are used to predict the QSAR model [19]. The steriochemical information about the compound is not required by the model as it is built on 2D properties of compounds. It gives better value for a correlation coefficient than actual QSAR model prediction. The six subtle principles are set by Organization for Economic Co-operation and Development (OECD), and 2D-QSAR satisfies all the principles except for mechanical interpretation. The 2D-QSAR also suffers from problems such as lack of interoperation or recognition ability in search of active compounds. The 2D-QSAR prediction method also has a limitation in the prediction of stereochemistry of a training dataset selected in the study [20]. 3D-QSAR Three dimensional quantitative structure activity relationships focused broadly on all such properties of atoms in a compound that are represented as descriptor and it mainly corresponds to spatial representation of a molecule [21]. In early 1980, a novel approach in structure activity relationship study was put forward, which include the study of molecular properties of chemical compounds in a 3D grid box. The calculated molecular properties are then subsequently correlated with biological activity, using a technique called DYLOMMS (dynamic lattice oriented molecular modeling system) [22]. These properties are innate in nature and are prolonged due to their molecular framework and mainly are geometrical, electrostatic and quantum in nature. The mentioned properties were first studied by Hansh et al. using a multilinear regression equation. The prediction of biological activity of chemical a compound is mostly done based on detailed information of receptor and ligand molecules. However, in many cases 3D information of receptor molecules was not known, and in these cases the indirect method of 3D-QSAR is mostly followed. The indirect approach of 3DQSAR is based on information of the ligand molecule such as molecular alignment of atoms, pharmacophores, volume, or fields to generate a virtual receptor [23].

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

A. 3D-QSAR DESCRIPTORS The 3D-QSAR approach is advanced, well established and the most exploited technique for structure activity relationship study. The concept is focused on the prediction of biological activities based on 3D properties of lead compounds by using a series of linear and nonlinear analytical predicative methods. The 3D-QSAR approach is simpler than the traditional structure activity relationship study. The major objective of a 3D-QSAR study is to improve the activity of the lead compound by optimization and structural modification [24]. The first step in a 3D-QSAR study is to collect or design a representative starting 3D structures of ligand molecules and thereafter refine them based on geometry and energy values [7]. The mostly used methods for energy optimization are semi-empirical, molecular mechanics, and quantum mechanics [10]. In the next step a database of conformers are created belonging to lead molecules. These are flexible in nature and are available in multiple forms. These multiple forms of compounds are included in the QSAR study. From multiple conformers, peculiar bioactive conformers are searched and selected for a particular study. The selection of bioactive conformer is done based on knowledge-based methods like experimental and theoretical by measuring the binding affinity towards receptor molecules. The selected datasets of bioactive compounds are then aligned uniformly using the computational tool. Once all the conformers are aligned 3D properties of a molecule like steric and electrostatic are calculated using a lattice probe by placing the probe at different location lattices. At each probe point, the molecular structure can be measured with sets of numbers which are called descriptors [24]. These numbers mostly represent physicochemical and biological properties of a molecule. The descriptor calculation can be done with (dependent) or without (independent) the alignment of bioactive conformers [7]. i. Alignment Dependent Descriptor Methods There are several methods used in the calculation of 3D descriptors focusing on molecular alignment prior to the calculation of 3D descriptors. These methods calculate the descriptor by mapping receptor atoms or ligand atoms or complexes of receptor-ligand atoms. Various alignmentdependent descriptors are Comparative Molecular Field Analysis (CoMFA), Comparative Molecular Similarity Indices Analysis (CoMSIA), Genetically Evolved Receptor Modeling (GERM), Comparative Binding Energy Analysis (CoMBINE), Adaptation of the Fields for Molecular Comparison (AFMoC), Hint Interaction field analysis (HIFA) and Comparative Residue Interaction Analysis (CoRIA) [7, 25, 26]. a. Comparative Molecular Field Analysis (CoMFA) Comparative Molecular Field Analysis (CoMFA) helps in building the quantitative relationship. This is molecular field-based method and was developed by Cramer in 1988 [4]. This is a more selective method as compared to the traditional classical QSAR methods. The first CoMFA study was carried out in the case of steroid [27]. It mostly focuses on ligand properties (steric and electrostatic) and ligand-

Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

39

receptor interactions like favorable and unfavorable receptorligand interaction. As previously discussed, CoMFA is an alignment-dependent descriptor methods. All aligned ligands are placed in energy grid and by placing the probe at each lattice point, energy is calculated. The resultant energy calculated at each unit fraction corresponds to electrostatic (Coulombic) and steric (Van der Waals). These values serve as descriptors for further analysis. These values are further correlated with other biological activities using the linear regression method like partial least square (PLS). PLS results served as an important signal to identify the favorable and unfavorable electrostatic and steric potential and also to help correlate it with biological activity [24]. b. Comparative Molecular Similarity Indices Analysis (CoMSIA) Comparative Molecular Similarity Indices Analysis (CoMSIA) is a recent modification of CoMFA. The approaches of CoMFA and CoMSIA are similar, except for molecular similarity which is calculated additionally [22]. The CoMFA mostly focuses on the alignment of molecules and may lead to error in alignment sensitivity and interpretation of electrostatic and steric potential. To cut lose CoMSIA fields, Gaussian potentials are used; they are much 'softer' than the CoMFA functions. The regular energy grid box is constructed and similar probes are placed throughout the grid lattice. In addition to this solvent dependent molecular entropic term which defines hydrophobic term also included in the study. To analyze the property of dataset atom, a common probe is placed and similarity at each grid point is calculated. The calculation is mostly done on steric, electrostatic, hydrophobic and hydrogen bonding properties. All of these properties are calculated at regular spacings of grid point corresponding to a particular descriptor and these are important in correlation with biological activity [24]. c. Genetically Evolved Receptor Modeling (GERM) Genetically Evolved Receptor Modeling (GERM) is a theoretical knowledge-based method. The construction of a 3D structure of receptor active site is done by using Homology modeling in the absence of experimental structure like X-ray crystallography and NMR spectroscopy [28]. As an initial step in the GERM 3D-QSAR, reasonable series of structure activity relationship are selected and alignment of these reasonable bioactive conformers is done. All the aligned conformers are enclosed into the receptor active site and allocating them as a shell of atoms. The allocated shells of atoms are considered an explicit set atom (aliphatic H, aliphatic C, polar H) and matched at the receptor active site similar to those found in the receptor active site. Hence, the shell of aliphatic carbon has been replaced by a uniform sphere of an aligned training set. The position of model aliphatic carbon atom and aligned ligand training set are adjusted so that maximum Van der Waals interactions are obtained. Once the position of aliphatic carbon has been identified their position can be occupied by any other atom or no atom. As practically, one could replace atom type with spheres or combination of both can be used and as a result of this large rendered model can be generated. To deal with this problem and to a create number of possible conformers in to

40 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

the active site of receptor, genetic algorithm is used, where a number of possible conformers of ligand molecules are generated using the well-suited docking program. The intermolecular bonding between the ligand and receptor complex is calculated using advance force field like CHARMM, which mostly computes electrostatic and Van der Waals interaction. The calculated binding energy value is then used to correlate with biological activities [29]. The GERM technique is mostly suited for de novo drug design process where above the mentioned-approach is followed [30]. d. Comparative Binding Energy Analysis (COMBINE) This method works on empirical knowledge-based method where experimentally demonstrated complex of the receptor ligand is used for further determination of molecular properties to correlate with biological activity [31, 32]. As the name suggests, this technique mostly focuses on the free energy of binding of ligand molecules, and are calculated using molecular mechanics force field. The series of complexes are observed for intermolecular potential interaction like electrostatic and Van der Waals as per residue in the active site. Another alternative approach has been followed where ligands are fragmented into similar fragments and each ligand molecule is build by incorporating a dummy fragment into it that is not essential and intermolecular potential interactions are calculated. The energy is calculated for all pair of atoms in receptor active site residue and ligand atom on the basis of the distance-based method. The significant descriptors are retained and others are eliminated for the study data. Then statistical technique like PLS is used to generate the QSAR model to quantify the most important energy interaction in terms of activity prediction [24]. e. Adaptation of the Fields for Molecular Comparison (AFMoC) This is a very recently developed QSAR technique put forward by Klebe et al. [4]. It is also called “Inverted CoMFA” derived from potential scoring function (drug score). The methodology of AFMoC is similar to that of CoMFA and CoMSIA but the additional advantage is the involvement of protein environment in the study. The protein-specific potential fields are generated into binding sites, which are used for the prediction of binding affinity [24, 33]. The overall methodology of AFMoC is discussed below in three steps. i. Potential Field Calculation and Ligand Alignment Drug score is a new developed knowledge-based scoring function based on distance-dependent pair-potentials. The atom by atom pair-wise potential is calculated in case of ligand and protein environments. These potential values are calculated using a suitable probe at the intersection at each grid point constructed around the binding pocket. ii. Interaction Field Calculations Potential field map generated for complexes of ligand and receptor is used as an interaction field to calculate atom type and distance-dependent interaction (3D Gaussian functions) for each atom at each grid point.

Damale et al.

iii. Making Correlation between Interaction Field Value Calculations and Binding Affinity Prediction Theoretically calculated interaction field values are correlated with experimentally determined biological affinity for surrogated ligand molecules using PLS analysis [24, 34]. f. Hint Interaction Field Analysis (HIFA) It is a newly developed program used to calculate empirical hydrophobic interaction and extension of CoMFA. As a result of the introduction of hydrophobicity calculation in CoMFA, the predicative power of it for QSAR model has increased. It calculates key hydrophobic features, which are atom-based analogs of the fragment constant. It uses the already-published data to predict hydrophobic field interaction. The methodology of HIFA is to calculate hydrophobic field interaction in the same manner as that of CoMFA by aligning the ligands and then placing them into a grid, followed by the interpretation of the net sum of hydrophobic interaction. g. Comparative Residue Interaction Analysis (CoRIA) Comparative Residue Interaction Analysis is a recent invention in the field of 3D-QSAR studies. There are several advanced modifications in CoRIA methodology like reverseCoRIA (rCoRIA) and mixed-CoRIA (mCoRIA) [35]. The main emphasis of CoRIA study is to calculate and analyze the receptor–ligand complex and thereafter predict the binding affinity of the complex. The binding energies in the form of non-bonded interactions like van der Waals and Coulombic which describe thermodynamic events involved in ligand binding to receptor, are calculated. These interactions are correlated with biological activities using the G/PLS analysis method, which is an advancement of the PLS method which additionally covers several variables like lipophilicity, molar refractivity, surface area, molecular volume, Jurs descriptors and strain energy [36-38]. ii. Alignment-Independent Descriptor Methods The conventional methods based on the alignment approach have many limitations like they are time consuming, can introduce user biasness and it may affect the sensitivity of the resultant model. To overcome all these limitations a novel class of method has been adopted, which is independent of alignment and is not affected by radiation or transformation of the molecule. The different methods belonging to this category include Comparative Molecular Moment Analysis (CoMMA), COMPASS, Holo-QSAR (HQSAR), Weighted Holistic Invariant Molecular Descriptors (WHIM), Comparative Spectral Analysis (CoSA) and Grid Independent Descriptors (GRIND). a. Comparative Molecular Moment Analysis (CoMMA) The Comparative Molecular Moment Analysis (CoMMA) is alignment independent descriptor method. The CoMMA mainly focuses on 3D/spatial arrangement of molecular fragment and calculates different molecular moment with respect to center of mass, center of dipole and center of charge. Descriptors for CoMMA are zero order descriptor (molecular weight and moment of inertia with respect to center of mass), first order descriptor (magnitude of dipole moment ()) and second order descriptor (quadruple moment

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

(Q)) with respect to molecular charge. Secondly, dipole moment components around the axes (x,y,z) and displacement d in case of center of dipole and center of charge are calculated using principle of inertial axes. All these are calculated by taking reference frame (initial) into account who will superimpose with center of dipole moment. Each value obtained from calculation of principle of inertial axes for center of mass, center of dipole and center of charge are used as descriptors. Finally these descriptors are correlated with biological activity using PLS analysis techniques [7, 39]. b. Compass The quantitative structure activity relationship often uses nonlinear method of correlation in activity because it mostly predicts accurate activity as compared with linear method. Compass work on concept of artificial neural network. The compass focuses on features such as molecular surface and conformation selection for alignment. The selection of proper ligand shape is more significant because it mainly flexible in nature and may adopt many conformations with slight change in geometry. The alignment of conformation is done and corresponding similar part is found. As said above the conformation will be aligned in many ways and which help in predicting biological activity. There is automatic selection of bioactive conformation and creating a model of alignment for each conformer. The algorithm on which compass work is divided into three phases, the initial phase is called as alignment of pose where alignment of different conformation is done to find bioactive one. Second phase is called as bioactive model construction; this phase is focusing on facts like selection of bioactive conformation and constructing statistical model which uses properties of molecular features. The quantitative relationship is established between molecular surface properties and biological activity. The third phase is called as molecular display, the relationship between molecular properties and biological activity displayed and which will help in molecular modeling [39-41]. c. Holo-QSAR (H-QSAR) It is recently developed QSAR technique by Heritage and Lowis (1997), which mostly focus on molecular fragments, exploring the chemical and biological data of chemical compounds. Each molecule is broken into the specified unique form and there after these fragments are used to form a Holo-gram. The molecular fingerprints have been defined in a pattern of fragments that binds in predefined order of array. These fingerprints are used to define nature and type of molecular fragments. The three dimensional property of the molecule are used to define hybridization and chirality. Each corresponding molecular fragment has its peculiar physicochemical properties and that will be correlated with corresponding biological activities by using PLS analysis in to order construct H-QSAR model [42]. d. Weighted Holistic Invariant Molecular Descriptors (WHIM) Weighted Holistic Invariant Molecular Descriptors (WHIM) is recently developed and slightly different approach of 3D-QSAR technique as compared to conventional approach.

Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

41

It is mostly used to represent properties of 3D descriptors of chemical structure in the form of indices. It contains the information about ligand structure in the form of their size, shape, symmetry and atom distribution. These indices are calculated about Cartesians coordinates around x, y, z axis of energy minimized ligand structure. These minimal energy structures are subjected to weighting scheme has been applied to describe them in the form of unitary conceptual framework. The indices are used to search proper QSAR model. The G-WHIM and MS-WHIM are modifications of WHIM method where in case of G-WHIM (Grid-Weighted Holistic Invariant Molecular) descriptors are calculated using gird. The coordinates of each atom of grid are set and probe interaction energy potential is calculated at each point. The MS-WHIM called as molecular surface Weighted Holistic Invariant indices, it mostly focuses on theoretical descriptors which enabling the information like size, shape and electrostatic distribution of a molecule [4, 7]. e. Comparative Spectral Analysis (CoSA) Comparative Spectral Analysis (CoSA) has been recently developed and not yet explored fluently except few applications. In this technique molecular spectroscopy methods have been used for determination of three dimensional molecular descriptors of chemical compounds in 3D-QSAR study. The molecular spectra used to predict biological activity of three dimensional structures. The spectroscopic method mostly includes are Proton (H)-NMR, carbon C13NMR, IR and Mass spectrometry. The data generated through spectroscopic studies are converted into matrices values with the help of appropriate tool and then correlated with biological activity by using PLS analysis. The comparative study of CoSA and CoMFA has carried out to evaluate predicative nature of CoSA and fortunately it gives better correlation values as compared to CoMFA studies [43]. f. Grid Independent Descriptors (GRIND) Grid Independent Descriptors (GRIND) is first method developed as an alternative for method like CoMFA. The basic approach of GRIND and CoMFA are same. Both methods are grid based methods where probe is placed at particular gird lattice around the three dimensional structure of macromolecule complex. The non bonded interactions are also calculated like electrostatic, steric and van deer Waals. This technique has several advantages as it uses “DRY Probe” to calculate HBD, HBA and Hydrophobic interaction. Additionally it utilizes most softer/smoother potential function method to calculate Van der Waals interaction. The values of each potential probing point can be correlated with biological activities by using multilinear regression like PLS analysis [44, 7]. B. STATISTICAL APPROACHES FOR SELECTION OF RELEVANT MOLECULAR DESCRIPTORS Structure activity relationship study concepts mainly focus on various chemical characteristics of chemical data and from that it mostly focuses on the retrieval of a desired set of information. To do this, wild spectrum of data analysis method has been used, which works on various statistical correlations. These data analysis methodologies are mostly meant for recovery of primary and secondary information.

42 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

This information is about dependent and independent variables in a correlation study, i.e. activity (y-variable) and molecular descriptors (x-variable) [24]. i. Linear Regression Analysis It is the first type of regression analysis and has more application in the practical predicative method mostly used for prediction of relationship between dependent and independent variables (i.e. ss x and y). The simple linear regression can be expressed by the following equation: y= a + bx where ‘a’ is called the intercept constant, ‘b’ the regression coefficient, “x” is depicted as a molecular descriptor, which is one or more than one in numbers and also called the explanatory variable, whereas “y” is called as the dependent variable and it mainly corresponds to activity in the QSAR study. Here values of x and y variables are fed to the above equation and linear regression analysis can be applied for predicative analysis [24]. ii. Multiple Linear Regressions (MLR) The LRA is used to predict the relationship between variables in SLR equation, whereas MLR is used to determine the quantitative relationship between them. It is also referred to as the linear free energy relationship method; thus it is an extension of SLRA [45]. Here 2D relationship between x and y is defined using the SLR equation. The aim of applying the SLR equation is the value of ‘a’ and ‘b’ in the equation defining the best prediction of the x and y variables. There are several best methods available for finding the correlation between them like the Student t-test, standard deviation and multiple correlation coefficients or the independent method like the leave on out method. In MLR analysis the relationship between x and y variable is expressed by modifying the SMR equation and adding several new terms in it. The MLR equation is as follows: y = b0 + b1 x1 + b2x2 +………… + bm xm + e where b1 is used to estimate the regression coefficients and e is the minimizing residual error used to quantify deviation from linear relationship on the regression line. The significance of correlation is judged by calculating the correlation coefficient and cross-validating it [24]. iii. Multivariate Data Analysis The chemical data used in QSAR analysis are multidimensional in nature where features of a chemical compound are defined by many other data components [46]. The chemical data used in QSAR analysis are usually multidimensional in nature, by means of which features of the chemical compound can be defined by many other data components. The multivariate techniques are specifically used to reduce the multiple components within the data. The techniques used are principle regression analysis (PCA), partial least square analysis (PLS), genetic algorithms (GA) and genetic algorithms- partial least square analysis (GA-PLS). The statistical analysis of these features is represented using the matrix that has row and column. Each of them represents the property and features of chemical compounds [24].

Damale et al.

iv. Partial Least Squares (PLS) This is an improved QSAR model predication technique introduced by Hermann and Svante Wold. PLS is the most commonly used technique in QSAR model analysis. It is used to make a more attractive QSAR model because it predicts a more realistic and complicated SAR data for biological activity. It is also called the latent or projection structure method. Here the large numbers of descriptors can be transformed into a small number of new orthogonal terms, called the latent variable. The numbers of latent variables are used to define the dependent variable. The main aim of this technique is to form a relationship between matrixes (features and property). The SIPLS is the most commonly used algorithm in the PLS technique. It is mostly used as a predicative model of 3D-QSAR. v. Genetic Function Approximation (GFA) Genetic function approximation (GFA) is a new computational algorithm derived from the G/SPLINES algorithm developed by Rogers [46]. It has an advantage over the conventional approach used for QSAR model prediction, like it builds the model that has a higher predictability and can address the problem that is not solved by the standard regression method. It also has the capabilities like generating multiple models and selecting multiple features for model construction. It includes the higher order polynomial and spline function where the main focus is on creation of many nonlinear models. The initial models generated are evaluated using the natural selection hypothesis. The automatic method then after is applied to remove outliers. The GFA method uses the LOF (lack of fit) score for measurement of error in each QSAR model. Each model goes routinely under evaluation test for fitness of model and the numbers of feature required for constructing an accurate model. It also provides checks for outlier and over-fitted data [47]. vi. Pattern Recognition Pattern recognition is a value-defined method mainly focused on the classification of data [24]. The basic difference between the classical approach of QSAR and the pattern recognition method is that it uses a large number of variables. The relationships between large numbers of variables are studied using the pattern recognition method like how much diverse or closeness is present in it. The most commonly employed methods of pattern recognition are cluster analysis, Artificial Neural Network (ANN) and k-Nearest Neighbor (k-NN) [24, 48, 49]. vii. Cluster Analysis Cluster analysis is the basic method for classification of data and also has been called as the statistical pattern recognition method or the distance-based approach, where data is clustered in so many groups according to the close proximity between them. In this method, four key steps are involved: generating key features of each chemical data set and calculating similarity or diversity in data set, then using various clustering programs cluster the data and finally one representative is selected from each clustered class. The similarity between data in each class is calculated using similarity coefficient and each chemical data representative

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

is noted using a binary set of descriptors. Most clustering techniques are non-overlapping and this overlapping method is divided into two classes: hierarchical and non-hierarchical [50]. viii. Artificial Neural Networks (ANN) Artificial neural networks (ANN) was developed by Teuvo and Kohonen and is mostly used in the QSAR approach because it is transparent and easily interpretable. An ANN is a data analyzer and a data processer technique and works on the function of biological nerve system. The architecture of ANN is similar to that of a biological nerve system where each neuron is connected to each other and each neuron is artificial in nature. Here each neuron is a highly processing unit connected to each other and it works in parallel, mostly used to design parallel computational systems. A series of layers arranged as a network on each other and each network consists of nodes and edges. The first layer consists of the input layer where it uses the input fed by the user, the second one is the middle layer – also called the hidden layer – where data analyzed in QSAR are mostly of independent variables like property of molecular fragments and last layer is output [51]. ix. k-Nearest Neighbor (k-NN) k-Nearest Neighbor (k-NN) is one of the most simple and exploited pattern recognition methods. Here, k is a small positive integer and the method has an objective that it classifies the object (i.e. chemical compound). The k-NN utilizes the principle of Euclidean distance metric to classify study data with close proximity. In the QSAR study, the average majority is calculated by the voting neighbor in a similarity index value to its molecular descriptors. The actual working of k-NN follows the Euclidean distance metric and it is used to calculate the similarity index between numbers of chemical compound present in the dataset, i.e. training and unknown entity [51]. This calculation will help determine the possible value of integer kappa (k), the above values obtained will help classify the unknown compound in the dataset dependent upon average majority of the unknown compounds. The abovementioned technique is cross-validated with the highest leave-one-out (LOO) cross-validated correlation coefficient (r2) and (q2) [52]. C. LIMITATIONS OF 3D-QSAR The 3D-QSAR is a novelistic approach as compared to the 2D-QSAR but still facing some drawbacks like huge number of chemical data (thousands of chemical structure) cannot be handled by this approach and for this high-throughput virtual screening method has been used [53]. Investigation of the active conformation of flexible compounds in a study set is critical followed by the specification for the molecular alignment in constructing a 3D-QSAR model [54]. The intermolecular interactions of receptors also need to be studied because different types of interaction will take place at different sites. These intermolecular interactions of the molecules or receptors are called as interaction pharmacophore. The topological descriptor studied in 3D-QSAR works on the same connectivity principle as in 2D-QSAR. The 3D orientation of ligands needs to be studied as it may affect the rate of selection of correct bioactive conformation [55].

Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

43

2D vs. 3D-QSAR The primary goal of 3D-QSAR is to establish the relationship between biological activity and spatial properties of chemical compounds like steric, electrostatic and lipohilic. The 3D-QSAR is mostly applied to series of chemical compound to find out the molecular pharmacophoric features in it by carrying out analysis and optimization of those ones that can increase the biological activity [24]. The 3D-QSAR studies are mostly focused on how change in 3D structural features of chemical moiety will result in change in the biological activity. It mostly provides information about the structure activity relationship in the form of graphical form for easy understanding, thus making it the most attractive method [22]. The structurally diverse set of compounds is easily studied for structure activity relationship as compared to the 2D-QSAR traditional approach [17]. The prime difference between 2D and 3D-QSAR is in the use of statistically more robust methods for selection of molecular descriptors like simple linear regression, multiple linear regression (MLR), principle component analysis (PCA), Principle component Regression (PCR), PLS analysis, GFA, Cluster analysis, Artificial Neural Networks and k-Nearest Neighbor method, which help in quantitative prediction of a diverse set of 3D properties of chemical compounds [19]. 0D- 3D QSAR The simplest and easiest way to represent chemical compound is molecular formula, simply is the constitutional properties. Pérez-Garrido et al. [89] explained multiple linear regressions by QSAR model of - cyclodextrin (CD). The aim and objective of study was to establish correlation between CD binding constant and substituent changes in CD structure. Free software package is used to study 0D to 3DQSAR called as DRAGOAN. The study of 233 chemical compounds are done by calculating 1600 molecular descriptors and further divided them in to 0D, 1D, 2D and 3D. The separation between these descriptors is done by clustering technique called as k-means clustering analysis. The efficient variable in the procedure is selected by selection procedure called as Genetic algorithm; the multiple QSAR models are satisfying the internal predicative accuracy by cross validation. The further study done in case of external data using genetic simulation approach for cross validation of the data. This will help in statistical predication of external data. There is significant variation in case of many models (0D,1D and 2D), but there is good correlation between CD binding constant and substituent changes in case of hydrophobic and steric descriptors (3D). 4D-QSAR The recent study has suggested that techniques used in 3D-QSAR like CoMFA had the same limitation in predictive quality. To work out the above limitation there is a need to improve the description or representation of molecules, alignment of the compounds and statistics used in activity predication [57-58]. To overcome these limitations a new approach of QSAR (4D-QSAR) is evolved where modeling statistics are improved, which helps in better predicative quality. The recent advancement in QSAR developed two approaches, namely receptor independent (RI-QSAR) and

44 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

receptor dependent (RD-QSAR) [59]. These approaches have a number of steps that are involved, like the generation of multiple conformation, alignment and consideration of multiple sub-structure groups, are taken into consideration [60]. The determination of free energy of binding is very important in receptor-ligand interaction as it mainly corresponds to the loss of free energy, also excluding of some solvent molecules from the active site of receptors. These factors significantly contribute to the measurement of binding affinity between them [61]. All these activities in the active site of receptor will lead to change in topology; this phenomenon is commonly called as induced fit. The topological change will also lead to change features like hydrophobic, hydrophilic, electrostatic, dielectric or steric and solvent accessibility [62]. The induced fit and flexibility the of receptor binding pocket to an individual ligand topology are intensively studied in multidimensional QSAR. All these multidimensional properties predicted by the QSAR approach is quantifies for all such parameters to sufficiently predict the QSAR model. The fourth dimension of QSAR analysis is also called “ensemble sampling” [55]. This is new dimension to quantitative structure activity relationship study mainly focuses on some of the problems raised in the 3D-QSAR study. 4D-QSAR is as an extension of molecular shape analysis (MSA) introduced by Hopkins et al. [62, 63]. They found that the 4D-QSAR model was more robust and yielded a more predicative model as compared to the conventional 3DQSAR approach like CoMFA. The main focus of this approach is to the screen libraries of bioactive conformations to check their structure activity relationship and the establishment of fond relationship to biological activity. The principle behind 4D-QSAR study is structure-based drug design (SBDD) where issues like ligand conformational flexibility the multiple alignment exploitation have been solved [55]. There are several parameters in 4D-QSAR used in analysis like grid cell size(s), molecular dynamic simulation of reference molecules (R), temperature (T), size of initial ensemble sampling (Es), number of alignments (Na) and numbers of descriptors in initial basis set (Nd) [55, 62, 64-67]. As discussed above there are two main elements. A. RECEPTOR INDEPENDENT 4D-QSAR Receptor independent 4D quantitative relationship study has a significant impact in rational drug design. The application of RI 4D-QSAR comes mostly in the picture, when the researcher either wants to find pharmacophoric features of the ligand molecules or to find the projected changes in ligand structure [67]. The ultimate aim of RI 4DQSAR is to obtain maximum structural information after the structure activity relationship study. The advantages of RI over RD, it will design and construct Pharmacophoric features for limited number of substituent, design and map rational base for substituent placement on scaffold and designed Pharmacophoric model can be used as an initial filter in virtual screening. The successful implementation of RI 4DQSAR is done in studies like TMPKmt inhibitors and Isoniazid Derivatives [67]. There are ten principal steps involved in RI 4D- QSAR.

Damale et al.

i. Initiation of Reference Grid for 3D Models of Training Set This is a foundation step in RI 4D-QSAR analogous to CoMFA in 3D-QSAR where around 3D structure of training set reference gird box is specified. This is one of the parameter in RI 4D-QSAR study. In 4D-QSAR initial 3D structure are starting point in conformational ensemble sampling of the training set. The training set conformations with minimum free energy and having common torsion angle are selected. This also provides reference points in 4DQSAR analysis. ii. Selection of Interaction Pharmacophore Elements (IPE) The each atom in each molecules are classified into five to six different classes like, all atoms of the molecules (IPE)a, polar atom of the molecules (IPE)p, non polar atom of the molecules (IPE)n, hydrogen bond donor(IPE)hbd, hydrogen bond acceptor(IPE)hba, user defined IPE types(IPE)x, aromatic carbon and hydrogen. This classification helps to analyze and understand different interactions involved in each pharmacophoric site. iii. Creation of Conformational Ensemble Profile (CEP) The conformational ensemble sampling is done in training dataset in study which helps in find out active conformation in training data set. The molecular dynamic simulation (MDS) is an advance approach routinely used to create ensemble for each training set molecules. CEP uses Boltzmann sampling techniques. The objective of this step is achieved by systematic conformational search technique or stochastic conformational search technique. Large number of conformers are explore and correct conformation state is selected. As mentioned above, Boltzmann sampling is commonly used because of list of advantages like: 1) It is independent on sampling size. 2) Different starting state lead to produce same sample distribution. 3) As different sampling scheme has used that will lead to produce same state of optimized three dimensional structures. 4) The average rate of change in energy with change in state is almost zero. iv. Selection of the Trail Alignment The molecular alignment of training data set is major problem in 4D-QSAR study and can be solve by rapidly evaluating the trail alignment. The rapid evaluation of trail alignment is done by searching and sampling operation analogs to CEP. In general this is achieved by designing RI 4D-QSAR algorithms which help in alignment analysis by decoupling of conformational analysis and further rapid analysis of conformation to investigate the molecular descriptor on molecular alignment. The CEP for each compound from training data set is evaluated for molecular alignment and this mainly corresponds to significance of 4D-QSAR model. The molecular alignment produces unique models of every

Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

compound in gird box. This will result in development of occupancy distribution for given CEP. v. Construction of Grid Cell Occupancy Profile (GCOP) and Calculation of Grid Cell Occupancy Descriptors (GCOD) The each conformation is placed in reference cubic lattice reference grid cell and spacing of cell is as per trail alignment. The GCOP is calculated for each compound based on five to six different classes of interaction pharmacophore elements. These IPEs used to do trail alignment of 4DQSAR descriptors. The cell occupancy for each grid cell is taken into consideration after alignment of each IPEs atom in a grid cell it result in formation of unique set of IPEs. This set of IPEs for each atom in QSAR called as grid cell occupancy descriptors, GCODs. Three grid cell occupancy is calculated for each IPEs by taking in accounts of absolute occupancy (Ao), Cartesian coordinate of grid cell where i, j, k defines dimension of grid cell and time t for ensemble generation for each IPEs atoms. The joint occupancy (Jo) measurement is an important task and effectively done by putting reference compounds (R) into gird cell. And lastly, self-occupancy is studied it generally relative to grid cell occupancy of reference compounds (R). There is no official guideline regarding which occupancy descriptor regarding which should be included and which should be left from particular study. The reference compounds are really helpful or will lead to generation of biasness in 4D-QSAR model. The biasness may be helpful in generation of 4D-QSAR model towards template based properties. The reference compounds selected for study so that highly potent member of series get selected and that result in highly influence with activity potent features. From numbers of study it is suggested that use of joint occupancy descriptor are used when small group of compound present in study set, where as absolute occupancy descriptors useful in case of large number of compounds. vi. PLS Analysis to Reduce Number of GCODs Against the Biological Activity Measures There are several reasons, because of which large number of grid cell occupancy descriptors are generated such as a result of enormous rigorous trail alignment, numbers of grid cell, five to six IPEs and three different possible occupancy. Data reduction step in 4D-QSAR is similar to CoMFA with slight difference where complete set of grid cell occupancy descriptors are included into study. The all atoms IPE CEPs generated in grid cell, and location of molecular shape with respect to the IPE is calculated using plot of Boltzmann average. In this plot joint occupancy (Jo) measurement is done by mapping grid cell location into a single location index m this plot also known as molecular shape spectrum (MSS). The difference in biological activity of two compounds is distinguished by difference in their molecular shape spectrum. In this step location of such IPE CEPs is tried to identify in the grid cell. The MSS may be dependent on several things like size of grid cell, coordinate positioning of the compound and can be evaluated by shifting the coordinate of compounds or by changing the cell size. The PLS is used to perform regression analysis to remove unnecessary GCODs by establishing the relationship between experiential

Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1

45

biological activity and occupancy value of GCODs. The values obtain from PLS regression related quantitatively with 4D-QSAR model, by giving specific weight age to each GCODs and quantitative relationship between small groups of selected GCOD represented by graphical mode. In general small groups (
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.