Nucleosomal DNA property database

Share Embed


Descripción

Vol. 15 nos 7/8 1999 Pages 582-592

BIOINFORMATICS Nucleosomal DNA property database

Victor G. Levitsky, Mikhail P. Ponomarenko, Julia V. Ponomarenko, Anatoly S. Frolov and Nikolay A. Kolchanov Laboratory of Theoretical Genetics, Institute of Cytology & Genetics, 630090, Lavrentieva 10, Novosibirsk, Russia Received on November 30, 1998; revised on March 11, 1999; accepted on April 22, 1999

Abstract Motivation: Chromatin structure plays the crucial role in proper gene functioning. Therefore, it is very important to investigate nucleosomal DNA properties and recognize genome nucleosome positioning sequences. Nevertheless, applying different sequence analysis methods separately is insufficient for complete nucleosomal DNA description. One of the most probable reasons for that is the weakness of nucleosome positioning signals. The present paper offers a set of methods to reveal the most important nucleosomal DNA characteristics and to show a common pattern of nucleosome site properties. Results: A complex approach was used to determine conformational and physicochemical properties that are most significant for nucleosome binding site description. The integrated database of nucleosomal DNA properties is compiled. This database comprises different sections for description of DNA characteristics. Revealing significant DNA characteristics allows the classification of various samples of site sequences and the generation of programs for site recognition. Availability: The current version of the database is available at http://wwwmgs.bionet.nsc.ru/system/BDNAvideo/. C-code of the recognition program may be found in the section FEATURE. WWW-available programs for testing arbitrary sequences are accessible at http://wwwmgs.bionet.nsc.ru/ Programs/bDNA/NA_bDNA.htm/. The links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/ links/mirrors.html. Contact: [email protected] Introduction Chromatin structure strongly influences the transcription of eukaryotic genes. The nucleosome, an elementary repeating subunit of the chromatin, may collaborate with numerous regulatory proteins to provide the proper gene expression regulation (for a review, see van Holde, 1989). The nucleosome core particle (histone octamer) is composed of a central tetramer (H3/H4)2 flanked by two H2A/H2B dimers. The nucleosome is composed of a histone octamer and 146 bp of DNA that is wrapped around it as a left-handed superhelix.

582

The linker histone H1 binding approximately to 20 bp completes the second turn of a superhelix and organizes a subnucleosomal particle chromatosome. In the chromatin, neighboring nucleosome particles are related by a linker DNA segment of various length. Both transcription initiation and elongation require chromatin structure perturbation. In this connection, special interest is invoked by the cases when nucleosome location is strictly confined. These cases may be considered as rotational or translational positioning. The first refers to DNA helix orientation relative to the core particle surface so that some phased minor or major groove regions are facing the octamer surface. The second, translational positioning, means that the nucleosome has a precisely defined positioning along the DNA sequence. A positioned nucleosome is often found near gene regulatory regions; hence, it may confine accessibility of DNA, or bring two DNA pieces close to each other (Wolffe, 1994). Nucleosome positioning signal dispersed in the eukaryotic genome may be considered as specific chromatin code, one of many codes contained in genomic DNA (Trifonov, 1997). That code may be in superposition with the other codes, but it appeared to be necessary for proper DNA segregation chromatin packaging. The sequence periodicities of eukaryotic DNA that reflect its deformational anisotropy were first considered as chromatin folding facilitating factors (Trifonov and Sussman, 1980). This was strongly supported by the prominent periodicities in the distribution of the di- and trinucleotides revealed for nucleosomal DNA sequences (Satchwell et al., 1986). Multi-alphabet consensus search reveals that the nucleosomal pattern is of a very degenerate nature (Ulyanov and Stormo, 1995). Multiple sequence alignment shows the main part of that signal is created by a specially phased AA and TT dinucleotide positional frequency distribution (Ioshikhes et al., 1996). Varieties of methods were developed for investigating nucleosome positioning (Calladine and Drew, 1986; Staffelbach et al., 1994). Different approaches were proposed for nucleosomal DNA conformation study (Fitzgerald et al., 1994; Sivolob and Kharpunov, 1995; Ponomarenko et al., 1997b). Crystal structure analysis revealed that the path of nucleosomal DNA is kinked in several points and irregularly curved (Luger et al., 1997). All these investigations of nucleosomal DNA conformation E Oxford University Press 1999

Nucleosomal DNA property database

confirm that there exists a special pattern of DNA helix curvative facilitating core particle binding. Finally, many approaches confirm that the nucleosome site pattern consists of two regions, 50–60 bp in length, of increased bending specificity, which are divided by the central 15–20 bp region of a site, where DNA is more indifferent to bending. Owing to its ubiquitous and degenerate nature, the nucleosome signal probably has no obvious consensus and most likely is defined by particular DNA conformation recognition. In general, signals of nucleosome positioning appear to be very weak, hence a combination of different methods may enable them to be localized more reliably. In this connection, the present research is devoted to analysis of various conformational and physicochemical parameters (38 sets in total) directed to find significant DNA features. To investigate nucleosomal DNA, an experimentally defined nucleosome site database was used (Ioshikhes and Trifonov, 1993). Our paper classifies and reveals a hierarchy of significant nucleosomal DNA peculiarities. The compilation of revealed DNA characteristics is presented in the Nucleosomal DNA property database. On the basis of revealed significant nucleosomal DNA characteristics, the programs for nucleosome site recognition were generated. These programs are available on the WWW and may be used for chromatin structure research and genome study.

System and methods The Nucleosomal DNA property database comprises several sections (Figure 1). The database SAMPLES contains site sequences. The database PROPERTY includes the sets of context-dependent conformational and physicochemical properties. The database PROFILES presents nucleosomal DNA property profiles with their extrema and linear trend descriptions. The database FEATURES consists of knowledge on the most significant DNA properties and recognition programs generated by this knowledge. The Nucleosomal DNA Database (Ioshikhes and Trifonov, 1993) was used for extraction of 130 nucleosome site sequences of 200 bp in length from EMBL. It is available at

Fig. 1. The scheme of the Nucleosomal DNA property database.

http://www.embl-heidelberg.de/Services/index.html, section Molecular Biology Databases, catalog nucleosomal_dna. The retrieved site set consists of sequences of different species located in various genome regions. Owing to heterogeneity of the site set, it was subdivided. Five individual samples based on concrete genome region or gene type were generated. Additional sample of stable nucleosome binding sites was used (Widlund et al., 1997). These sequences, available at http://130.241.160.80/sequence/, are localized at the centromeric regions of mouse metaphase chromosomes. After adjusting to a common length of 108 bp, they were compiled into a separate sample. describes the resulting six nucleosome site samples. The sequences of all samples were aligned relative to the footprint center. Because of internal symmetry of the nucleosome, all site sequences were analyzed in both possible orientations. The sequences of all used samples are available at http://wwwmgs.bionet.nsc.ru/Dbases/NSamples/auto1.exe (SAMPLE database). Thirty-eight dinucleotide parameter sets compiled in PROPERTY database, at http://wwwmgs. bionet.nsc.ru/systems/BDNAVideo/, section DNA Conformational and Physicochemical Parameters, were considered as DNA properties. The PROPERTY database contains a description and literature reference for every property. Some properties are illustrated graphically. The majority of considered DNA properties were described earlier (Kolchanov et al., 1998).

Table 1. Nucleosome site samples used for analysis

Sample number

Sample name

No. of sequences

Sequence length

1

Moderate repeats (rRNA, tRNA, histone genes)

24

200

2

Satellite DNA

14

200

3

Virus SV40 genome

24

200

4

5′ and 3′ region sequences of unique genes

26

200

5

Coding region sequences of unique genes

42

200

6

Stable mouse sequences

87

108

583

V.G.Levitsky et al.

The database PROFILES contains profiles of DNA properties and descriptions of their extrema and linear trends. To construct a profile, various sliding window sizes were applied. The best suited window size was evaluated by comparison of the sample profile with the profile constructed on the random sequences with the same dinucleotide content as that of the site. The best selected profile is included in the database entry. Then the profiles were checked for linear trends and the resulting data are also compiled in the database. The database FEATURES includes the data on the most significant DNA properties and the programs for site recognition generated on the basis of this knowledge. For this purpose, average property values were considered for the certain site region. The distributions of these values for the sets including nucleosome sites and random sequences were compared by several criteria. Afterwards, a fixed utility value was assigned to each comparison and their values were compared.

Algorithm Input data for algorithm execution are: 1. Consider a sample of N nucleotide sequences {S} = {S1, …, Sn , …, SN } of length L bp such that Sn = X, X2 … XL , where 1 ≤ n ≤ N, and X denotes a nucleotide type. 2. For 16 dinucleotide types {D1, …, D16}, 38 sets of dinucleotide parameters d(k) ={ d (k) , AAAd (k) }, where 1 ≤ k 1 16 ≤ 38, were used.

Profiles section The first step of an algorithm is the calculation of position-specific dinucleotide frequency matrix Fi,j , where i denotes a position along a sequence (where 1 ≤ i ≤ L – 1) and j means a dinucleotide type (1 ≤ j ≤ 16). This matrix was calculated as follows:

ȍf

ȍF 16

N

F i,j + (1ńN)

(n), i,j

so that for every i :

i, j

+1

j+1

n+1

Here f (n) is equal to observed Dj dinucleotide frequency in the i,j ith position of the nth sequence of the sample, i.e. f i,j(n) = 1, if Xi , Xi + 1 and Dj dinucleotides are the same; otherwise, f (n) = 0. i,j To compare peculiarities of different sizes, the similar matrix F(W) was calculated with several dinucleotides averaging window sizes W = 1,.., W max as follows:

ȍf

Matrix F ij(W) and parameters set d(k) produce the profile (W)} of property k and window size W as follows: { P (k) i

ȍ(F (W) 16

P (k) (W) + i

i,j

d (k) ) j

(2)

j+1

A value of 10 bp was used as the upper limit of the window size W max. Profile selection was executed by calculation of a quality ratio R = Q siteńQ random, here Q depends on the optimized profile window size and indexes denote samples of site sequences and random sequences of the same dinucleotide content as that of the site sequences. The dependence of the quality function Q on the window size W was accepted as the ratio of a profile span to its standard deviation: (k) (k) Q (k)(W) + (p (k) max(W) * p min(W)ńs (W)

(3)

(k) Here p (k) max(W) and p min(W) denote maximum and minimum profile values calculated with window size W, and s (k) is a standard deviation for profile { p (k) (W)}, 1 ≤ i ≤ L – W. i The profile with the best suited window size was included in the database entry. In what follows, this profile was examined for significant extreme values. The Student’s criterion was used for evaluation of significance level. A list of profile extrema with a significance level α < 0.02 was included in the database entry. The least squares method was used to find linear trends. The trend significance level was estimated by Fisher–Snedecor criterion. Only linear trends of significance level α < 0.05 were compiled in the PROFILE database. The trend for the kth DNA property carrying points { X i} on the interval [ X start; X end] is presented in the following form:

Y (k)(X i) + Y mean (k) ) K (k) * (X i * X center)

(4)

Here, Y mean (k) is the mean value of the property over the region [ X start; X end], K (k) is the respective slope coefficient and X center + (X start ) X end)ń2 trend center position. If two trends (1 and 2) of a single profile overlap partially, then the following restriction is applied: (2) | X (2) center * X center | w 1 )

(X (2) * C (2) ) ) (X (1) * X (1) ) start start end end (5) 2

This means that the distance between two centers of the trends should be greater than the average trend length. The profiles of conformational and physicochemical DNA parameters, their significant extremum points and linear trends are described in the PROFILES database.

N

F i,j(1) + F i,j + (1ńN)

n+1

ȍF W–1

F ij(W) + 1ńW

r+0

584

j)r,j(1)

(n) i,j

Features section (1)

An algorithm used to find out nucleosomal DNA contextual features was developed earlier for functional sites (Kel et al., 1993). Let us consider the mth DNA parameter set for a single

Nucleosomal DNA property database

sequence S of the sample. The mean value of the kth parameter over the region [a; b] (1 ≤ a ≤ b ≤ L) is calculated as follows:

ȍ b*1

P k,a,b(S) +

1 P (X X ) b * a i+a k i i)1

(6)

Applying equation (6) to the site sequence set {S} at a fixed k, a and b yields the distribution Pk,a,b {S} for the site. Similarly, the distribution Pk,a,b {R} is generated for random sequences {R} with the same nucleotide frequencies as in the real sequences. The difference between these distributions Pk,a,b {S} and Pk,a,b {R} is tested for significance by using six statistical criteria. Each criterion was tested on 100 subset {Sn } and {Rn } (1 ≤ n ≤ 100), randomly retrieved from {S} and {R}, respectively. If the difference between the distributions Pk,a,b {Sn } and Pk,a,b {Rn } is significant by the mth criterion (1 ≤ m ≤ 6), then a positive value between 0 and 1 is assigned to the weight Umn (Pk,a,b ); otherwise, a negative value between –1 and 0. Hence, the total number of weights is 6 × 100 = 600 {Umn (Pk,a,b )}. The generalized difference between Pk,a,b {S} and Pk,a,b {R} is the mean of 600 weights:

ȍȍU 6

U(P k,a,b) +

100

mn

(P k,a,b)

m+1 n+1

600

(7)

Thus, the calculated value U(Xk,a,b ) is the integral characteristic of the discriminating ability of Xk,a,b . It is called a utility and has two important features: U(Xk,a,b ) < 0

implies that ‘Xk,a,b falls short of significance’ (8)

U(Xk,a,b ) > U(Xq,c,d ) ≥ 0

implies that ‘Xk,a,b is more significant than Xq,c,d ’

(9)

Note that the highest value of U(Xk,a,b ) points to the best, in terms of utility, B-DNA feature Xk,a,b of the site. Each conformational feature Xk,a,b with U(Xk,a,b ) < 0 is discarded by decision (8). If any two features Xk,a,b and Xq,c,d correlate, the feature Xq,c,d with a lower value of U(Xq,c,d ) is discarded by decision (9).

Implementation An entry of the PROFILES database is exemplified by the description of DNA property twist theoretically calculated by Sklenar (Karas et al., 1996) for 5′ and 3′ region nucleosome sites. The entry header (Figure 2a) contains the name of a site sample in the field SD and the link to the SAMPLES database on the DNA sequences in the field LD. The list of links to the individual property entries (profile identifiers) and the names of properties are included in the field PI. The detailed profile description is found in the separate property entry (Figure 2b). The best selected window size (9 bp) is presented in the field PW; average profile value 36.54 and standard deviation 0.12

Fig. 2. The entry of the PROFILE database. (a) DNA property twist (Karas et al., 1996) for the sample of 5′ and 3′ gene region nucleosome site. Entry header which presents site sample and the DNA property list. (b) Individual property entry.

are shown in fields PA and PD, respectively. Quantities of significant extrema (two maxima and two minima) are included in field PN. Every extremum is described in the next fields ET–EP. Each extremum is supplied with its profile value, position and significance level. The current part of the entry contains two links to the figure (fields FG), which are also shown in Figure 3a and b. The first figure explains why this very profile window size was selected and the second shows the profile itself.

585

V.G.Levitsky et al.

Fig. 3. Charts linked to the PROFILE database entry. DNA property twist (Karas et al., 1996) for the sample of 5′ and 3′ gene region nucleosome site used. (a) Window selection. Dependence of window quality function Q(W) on window size. (b) Profile with the best selected window size. For each point are given standard deviations calculated by individual sequence profiles. Average value computed over all the points of the profile also shown. (c) Profile of the minimal window size (one dinucleotide) and revealed significant linear trends (significance level α < 0.05).

586

Nucleosomal DNA property database

Table 2. The best conformational and physicochemical features which appeared to be significant for all six samples of nucleosome sites Property name

Probability of contact

Units

%

with a nucleosome core (Satchwell and Travers, 1989)

Tilt for DNA–protein complex

degree

(Suzuki et al., 1996)

Twist [theoretically calculated

degree

by Sklenar (Karas et al., 1996)]

Sample name

Region

Utility

Averaged mean for

[a; b]

U

SITE

Repeats

–50; 50

0.98

12.48 ± 0.70

11.28 ± 0.50

Satellites

–71; 71

0.845

12.30 ± 0.74

11.27 ± 0.42

RANDOM

Virus

–62; 62

0.999

12.94 ± 0.36

11.28 ± 0.44

5′ and 3′ regions

–86; 86

0.993

12.53 ± 0.35

11.27 ± 0.38

Coding regions

–26; 26

0.66

12.27 ± 0.75

11.31 ± 0.71

Stable

–45; 45

0.675

12.03 ± 1.17

11.28 ± 0.55 0.78 ± 0.05

Repeats

–90; 90

0.992

0.94 ± 0.06

Satellites

–92; 92

0.935

0.95 ± 0.03

0.78 ± 0.06

Virus

–71; 71

0.915

0.95 ± 0.08

0.78 ± 0.06

5′ and 3′ regions

–95; 95

0.936

0.92 ± 0.07

0.78 ± 0.05

Coding regions

–96; 96

0.635

0.87 ± 0.08

0.78 ± 0.05

Stable

–51; 51

0.655

0.54 ± 0.24

0.78 ± 0.13

Repeats

–85; 85

0.87

36.47 ± 0.26

36.14 ± 0.24

Satellites

–93; 93

0.903

36.51 ± 0.16

36.14 ± 0.22

Virus

–79; 79

0.945

36.63 ± 0.25

36.14 ± 0.24

5′ and 3′ regions

–97; 97

0.905

36.61 ± 0.15

36.14 ± 0.21

Coding regions

–78; 78

0.89

36.58 ± 0.29

36.14 ± 0.24

Stable

–53; 53

0.634

36.50 ± 0.70

36.13 ± 0.29

Table 3. The significant conformational and physicochemical features revealed for a sample of nucleosome sites from 5 and 3 gene regions Region

Utility

Averaged mean for

[a; b]

U

SITE

RANDOM

%

–86; 86

0.993

12.53 ± 0.35

11.27 ± 0.38

Tilt for DNA–protein complex

degree

–95; 95

0.936

0.92 ± 0.07

0.78 ± 0.05

Twist

degree

–97; 97

0.905

36.61 ± 0.15

36.14 ± 0.21

Enthalpy change

kcal/mol

–66; 66

0.807

–8.20 ± 0.35

–8.64 ± 0.19

Free energy change

kcal/mol

–90; 90

0.806

–1.48 ± 0.12

–1.61 ± 0.05

Property name

Probability of contact

Units

with a nucleosome core

Entropy change

cal/mol/K

–76; 76

0.78

–21.71 ± 0.65

–22.64 ± 0.42

Propeller twist

degree

–70; 70

0.701

–13.33 ± 0.58

–12.53 ± 0.31

Free DNA roll

degree

–31; 31

0.65

1.07 ± 0.40

0.65 ± 0.42

Rise

degree

–26; 26

0.634

3.47 ± 0.06

3.52 ± 0.05

Clash strength

angstrom

–24; 24

0.627

1.00 ± 0.10

1.10 ± 0.07

The trends are presented in the database entry in the following way. Borders of the trend region are included in the field GX. For a DNA property twist for the 5′ and 3′ region nucleosome sites, two significant trends were found, located within [–44.5; –16.5] and [16.5; 44.5] site regions. The average property value for this region (twist equaling 36.54_) enters the field GA. For the right trend, the slope coefficient is 0.026_/bp and its 0.05 confidence interval is 0.022_/bp. These values are given in fields GK and GI, respectively. The

finally compiled trend formula (4) constitutes field GF. The last occurring field FG contains a link to the figure that displays all revealed trends and the profile of a minimal window size (W = 1 bp, one dinucleotide). An illustration for DNA property twist for 5′ and 3′ region nucleosome sites is given in Figure 3c. Figure 4 contains the profiles and all the significant linear trends for conformational DNA property bend. Profile examples shown in Figure 4 prove that profile trends may be

587

V.G.Levitsky et al.

Fig. 4. Profiles of the DNA property bend and corresponding trends with significance level α < 0.05 for all six considered nucleosome site samples. Window size is equal to one dinucleotide. Bend angle unit is degree.

very useful for collective nucleosomal DNA presentation. In the site region of central 10 bp, there are breaks or small falls of bend. In the intervals [–50; –10] and [+10; +50], the bend tends to decrease towards the edges of the site. In the regions [–90; –70] and [+70; +90] entering the linker DNA, there is a tendency for the bend angle to increase out of the site center. The presentation of significant results in the database FEATURES may be found elsewhere (Ponomarenko et al., 1997a). Consider again the DNA property twist for 5′ and 3′ gene region nucleosome sites. In this example (Table 2), the sites within the region [–97; 97] relative to the center of the site appeared to differ significantly by the mean values of the DNA property for the sample of random sequences. Utility of this feature is U = 0.905. The averaged value of the twist over

588

the region was 36.610 ± 0.154 for the site sequences and 36.138 ± 0.211 for the random sequences. The distributions of the mean twist calculated over the considered region for the real and random sequences are compared in Figure 5a. Note the right-shift of the distribution for the real sites compared to that for the random sequences. This result confirms the previous observation (Ponomarenko et al., 1997a). The database FEATURES entry contains an automatically generated C-code of computer program calculating the value of the considered parameter from a DNA sequence. There are also links to the executable program constructing the profiles of the significant conformational and physicochemical features along an arbitrary DNA sequence. Recognition programs apply the list of all significant site properties

Nucleosomal DNA property database

Fig. 5. Histograms of the mean values of the revealed significant features of the nucleosome sites (black columns) and the random sequences (white columns). (a) DNA property twist (Karas et al., 1996) for the sample of 5′ and 3′ gene region nucleosome site. (b) All significant features found for the sample of 5′ and 3′ gene region nucleosome site used for discrimination. (c) All significant features found for the united set of nucleosome sites used (whole set includes samples 1–5 from Table 1).

revealed, so that the user may choose any of them for site recognition or apply them all to calculate mean recognition by all significant site properties. The distributions of the mean value calculated by all significant properties for 5′ and 3′ gene region nucleosome sites and random sequences are compared in Figure 5b. The information accumulated in the database FEATURES may be useful for studying nucleotide sequences of interest. The database FEATURES also contains the link to the executable program. Via this link, a user may input a DNA sequence from the known databases or files and obtain a profile of a DNA feature of interest. Consider the significant feature revealed for 5′ and 3′ gene region nucleosome sites for the parameter twist on the region [–97; 97] relative to the site center. The value profile of this feature along the sequence of the human pS2 gene (EMBL: X05030; HSPS2G1) with experimentally determined nucleosome binding site (Sewack and Hansen, 1997) is shown in Figure 6a. The mean recognition profile for this gene is given in Figure 6b. Consideration of Figure 6 proves that the sequence has the highest feature values in two regions of nucleosome binding as compared to the internucleosomal and neighboring regions. In order to appreciate the predictive opportunities of our program and compare it with the other analogous programs, we compile two sets of human regulatory sequences. The first set contains 53 sequences of promoters extracted from EPD (region [–600; +600] relative to transcription start). The second set consists of donor splicing site sequences (region [–400; +400] relative to exon–intron boundary). We expect that both sets have special patterns of the average profiles of nucleosome site recognition function. The constructed average profile for promoters is shown in Figure 7a. Both remote from transcription start profile sections (distance >400 bp) are characterized by the high values of recognition function. Within the upstream region [–200; +1] there is the steep fall, and in the downstream region [+100; +400] the less steep growth of

function is observed. The minimal value of the profile is located downstream of the start transcription point (region [+1; +100]). The average profile for the splicing site is given in Figure 7b. This profile confirms that introns lacking genetic code burden may bind nucleosome better than exons. This proves the earlier assumption about the role of introns for chromatin folding (Solovyev and Kolchanov, 1985).

Discussion Six samples of nucleosomal DNA sequence from the database SAMPLES were considered. Thirty-eight conformational and physicochemical dinucleotide parameters from the database PROPERTY were examined. Finally, for every sample, the profiles of all properties were constructed. In addition, all significant extrema and linear trends were described in the database PROFILES. Besides, each sample of sites is characterized by a specific set of significant conformational and physicochemical features in the knowledge database FEATURES. Table 2 presents the following most significant features: probability of contact with a nucleosome core, tilt for DNA–protein complex and twist. These properties appeared to be significant for all six considered sets of nucleosome sites. Location of the region within the site, the averaged property values over this region and in random sequences, and the utilities are also indicated in the table. Among the other valuable nucleosomal DNA properties are the following: clash strength, enthalpy change, entropy change, free energy change and propeller twist. These properties are significant for almost all site samples. Table 3 shows the most significant features with the highest utilities for the set of 5′ and 3′ region nucleosome sites. This sample seems to be very interesting because of the important role of chromatin structure in the regulatory gene regions. The database integration, which was performed in the current work, is very valuable for site characteristic

589

V.G.Levitsky et al.

Fig. 6. Nucleosome site recognition by revealed significant features. Sequence of the human pS2 gene (EMBL: X05030; HSPS2G1) with two experimentally determined nucleosome binding sites (Sewack and Hansen, 1997) is used. For program generating, the sample of nucleosome sites from 5′ and 3′ gene regions was taken as the training set of nucleosomal DNA sequences.. Arrows indicate approximate experimentally defined nucleosome site centers. (a) Recognition by significant DNA property twist (Karas et al., 1996). (b) Mean recognition over all significant DNA properties found.

investigation. For example, consider again 5′ and 3′ gene region nucleosome sites and DNA property twist. The database FEATURE states that this property is one of the most important for the sites. The twist distribution has the right-shift

590

for the real sites compared to that for the random sequences (Figure 5a). The database PROFILES discovers the fine structure of the twist value distribution along the site sequences. The selected twist profile (Figure 3b) has the pair

Nucleosomal DNA property database

Fig. 7. Nucleosome site recognition profiles for human regulatory regions (averaged by sets of sequences). (a) Promoter region (53 sequences from EPD, profiles show positions [–500; +500] relative to transcription start). (b) Splice site region (50 sequences of >800 bp in length from EMBL aligned relative to exon–intron boundary, every sequence has an exon in the right half.)

of minima at positions –20.5 and +20.5 (see database entry in Figure 2) and a pair of maxima at positions –41.5 and +41.5. Profile consideration (Figure 3b) shows that the central region of the site [–30; +30] has the lowest twist in comparison with two neighboring regions [–50; –30] and [+30; +50] with the extremely high twist. The additional information is given by two revealed significant trends (Figure 3c), which are placed in the regions [–44.5; –16.5] and [+16.5; +44.5]. Such complex property profile presentation helps to realize the

pattern for property twist for the sample of sites. The output data of the generated program for nucleosome site recognition (Figure 6) show that revealed significant site features may be useful for investigation of DNA affinity to nucleosome. This is strongly confirmed by the comparison of distributions of the features for real and random sequences. To check this idea, the sets of nucleosome sites were united in a set comprising five samples (1–5 in Table 1) and consisting of 130 sequences. Comparison of the real and random distributions calculated by

591

V.G.Levitsky et al.

all revealed significant features (Figure 5c) for united site set convinces that the recognition program gives reliable results. In future, we are planning to develop our complex representation of nucleosome site sequences and to present the results in the integrated database Nucleosomal DNA property. The analysis scheme described above is useful for the identification of significant DNA conformation features together with classification and appreciation of various DNA context-dependent parameter set utilities for a sample of sequences.

Acknowledgements We are grateful to Dr E.N.Trifonov for helpful discussions and critical comments. The work was supported by the Russian Foundation for Basic Research, the Russian Human Genome Program, Russian State Committee on Science and Technology and Integrated Program of Siberian Department of the Russian Academy of Sciences.

References Calladine,C.R. and Drew,H.R. (1986) Principles of sequence-dependent flexure of DNA. J. Mol. Biol., 192, 907–918. Fitzgerald,D.J., Dryden,G.L., Bronson,E.C., Williams,J.S. and Anderson,J.N. (1994) Conserved patterns of bending in satellite and nucleosome positioning DNA. J. Biol. Chem., 269, 21303–21314. Ioshikhes,I. and Trifonov,E.N. (1993) Nucleosomal DNA sequence database. Nucleic Acids Res., 21, 4857–4859. Ioshikhes,I., Bolshoy,A., Derenshteyn,K., Borodovsky,M. and Trifonov,E.N. (1996) Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences. J. Mol. Biol., 262, 129–139. Karas,H., Knuppel,R., Schulz,W., Sklenar,H. and Wingender,E., (1996) Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci., 12, 441–446. Kel,A.E., Ponomarenko,M.P., Likhachev,E.A., Orlov,Yu.L., Ischenko,I.V., Milanesi,L. and Kolchanov,N.A. (1993) SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites. Comput. Appl. Biosci., 9, 617–627. Kolchanov,N.A., Ponomarenko,M.P., Ponomarenko,J.V., Podkolodnyi,N.L. and Frolov,A.S. (1998) Functional sites of pro- and eukaryotic genomes: computer modeling and predicting activity. Mol. Biol. (Mosk.), 32, 255–267.

592

Luger,K., Mader,A.W., Richmond,R.K., Sargent,D.F. and Richmond,T.J. (1997) Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature, 389, 251–260. Ponomarenko,M.P., Ponomarenko,J.V., Kel,A.E. and Kolchanov,N.A. (1997a) Search for DNA conformational features for functional sites. Investigation of the TATA box. Pacif. Symp. Biocomput., 2, 340–351. Ponomarenko,M.P., Savnikova,L.K., Ponomarenko,J.V., Kel,A.E., Titov,I.I. and Kolchanov,N.A. (1997b) Modeling TATA-box sequences in eukaryotic genes. Mol. Biol. (Mosk.), 31, 726–732. Satchwell,S.C. and Travers,A.A. (1989) Asymmetry and polarity of nucleosomes in chicken erythrocyte chromatin. EMBO J., 9, 229–238. Satchwell,S.C., Drew,H.R. and Travers,A.A. (1986) Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol., 191, 659–675. Sewack,G.F. and Hansen,U. (1997) Nucleosome positioning and transcription-associated chromatin alterations on the human estrogen-responsive pS2 promoter. J. Biol. Chem., 272, 31118–31129. Sivolob,A.V. and Kharpunov,S.N. (1995) Translational positioning of nucleosomes on DNA: the role of sequence-dependent isotropic DNA bending stiffness. J. Mol. Biol., 247, 918–931. Solovyev,V.V. and Kolchanov,N.A. (1985) The eucaryotic genes exon-intron structure can be determined by the nucleosomes organisation of the chromatin and related characteristics of gene expression regulation. Dokl. Akad. Nauk SSSR, 284, 232–237. Staffelbach,H., Koller,T. and Burks,C. (1994) DNA structural patterns and nucleosome positioning. J. Biomol. Struct. Dyn., 12, 301–325. Suzuki,M, Yagi,N. and Finch,J.T. (1996) Role of base-backbone and base-base interactions in alternating DNA conformations. FEBS Lett., 379, 148–152. Trifonov,E.N. (1997) Genetic level of DNA sequences is determined by superposition of many codes. Mol. Biol. (Mosk.), 31, 759–767. Trifonov,E.N. and Sussman,J.L. (1980) The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc. Natl Acad. Sci. USA, 77, 3816–20. Ulyanov,A.V. and Stormo,G.D. (1995) Multi-alphabet consensus algorithm for identification of low specificity protein–DNA interactions. Nucleic Acids Res., 23, 1434–1440. Van Holde,K.E. (1989) Chromatin. Springer-Verlag, New York. Widlund,H.R., Cao,H., Simonsson,S., Magnusson,E., Simonsson,T., Nielsen,P.E., Kahn,J.D., Crothers,D.M. and Kubista,M. (1997) Identification and characterization of genomic nucleosome-positioning sequences. J. Mol. Biol., 267, 807–817. Wolffe,A.P. (1994) Nucleosome positioning and modification: chromatin structures that potentiate transcription. Trends Biochem Sci., 19, 240–244.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.