DNA walks;
Visibility graphs;
DNA sequences;
Network analysis;
Sequence composition;
CHAOS GAME REPRESENTATION;
SHORT-RANGE CORRELATIONS;
ESSENTIAL GENES;
TIME-SERIES;
SIGNATURE;
SEQUENCES;
GENOMES;
ENCODE;
D O I:
10.1016/j.physa.2023.129043
中图分类号:
O4 [物理学];
学科分类号:
0702 ;
摘要:
DNA walks are mathematical representations of DNA sequences that have been used to study the long-range organization of DNA. Visibility graphs are network representations of time series that give access to features usually difficult to analyze on the time series level alone. Here we combine the two approaches and introduce the concept of DNA visibility graphs. The main goal of our investigation is to calibrate this new tool by analyzing DNA visibility graphs for various random DNA sequences.Our results show that the method is robust with respect to the arbitrary choice of step up for pyrimidine (C or T) and step down for purine (A or G), and varies predictably with respect to sequence composition (notably the pyrimidine content, pC + pT ). Short range correlations cause changes in the topology of the visibility graphs, with more profound changes occurring near a balanced composition point pC + pT & SIM; 0.5.As a first illustration of the applicability of DNA visibility graphs, we study the full genome of the bacterium Escherichia coli and compare DNA visibility graphs for short DNA sequences from different domains of life. These initial results show an approximate power law for the degree distribution of the E. coli DNA visibility graph and systematic differences in the clustering coefficients and other topological quantities of DNA visibility graphs between bacteria and eukaryotic species.Our findings suggest that DNA Visibility Graphs can be a powerful tool for studying the properties of DNA and may provide new insights into its structure and function.& COPY; 2023 Elsevier B.V. All rights reserved.