# RNA duplex melting

## Introduction

This example shows how our code can be used to calculate melting temperature of an RNA duplex (8-mer) using the Virtual Move Monte Carlo (VMMC) algorithm. The corresponding files are located in the repository in subdirectory EXAMPLES/RNA_DUPLEX_MELT

## Setting the order parameters and weights

The example uses umbrella sampling in order to make the sampling more efficient. In order to obtain a correct estimate of the melting temperature, the simulation has to sample many times the transitions between unbonded and bonded states. The sampling is aided by assigning weights to a particular state, specified for this example in file wfile.txt , where the first column specifies the value of order parameter (number of native bonds in the duplex in our case) and second column specifies the weight assigned to the state:

```0 8.
1 16204
2 1882.94
3 359.746
4 52.5898
5 15.0591
6 7.21252
7 2.2498
8 2.89783
```

The weights are typically chosen first by an educated guess, and then by running the simulations and adapting the weights manually to ensure that all relevant states are sampled. The order parameters, in this case base pairs, are specified in a file which is in our example called op.txt:

```{
order_parameter = bond
name = all_native_bonds
pair1 = 0, 15
pair2 = 1, 14
pair3 = 2, 13
pair4 = 3, 12
pair5 = 4, 11
pair6 = 5, 10
pair7 = 6, 9
pair8 = 7, 8
}
```

which specifies what base pairs count towards the order parameters. In our example, it specifies all complementary base pairs. The nucleotides are numbered from 0 to N-1, where N is the total number of nucleotides in the simulation. In our example, we have two complementary strands with eight nucleotides each, so in total sixteen nucleotides. The nucleotides are numbered from 3' end to the 5' end, so the first 3' nucleotides of the first strand (with index 0) is complementary to the first 5' nucleotide on the second strand (which has index 15). The system can have between 0 to 8 bonds formed, and for each possible value of an order parameter, a corresponding weight is assigned in the wfile.txt.

We need to specify the weight file and ordere parameter file in the input file, which is done by including the following options:

```op_file = op.txt
weights_file = wfile.txt
```

The program will now count the number of time the simulation spends in the respective states (i.e. how much time the simulation spends in states with 0 bonds, 1 bond, ..., 8 bonds). In order to be able to calculate the melting temperature, one is interested in the ratio of the number states where the system is in duplex state (i.e. has at least 1 bonds) and the number of states when the system has 0 bonds. We also need to know this ratio for a range of temperatures, so that we can interpolate them and find when is the ratio equal to 2, which is the definition of melting temperature when the finite size effects are taken into account (see discussion in [this paper] for details). One possibility is to run a series of simulations, each at different temperature. However, it is more efficient to use histogram reweighting in order to calculate the respective yields at different temperatures by extrapolating the contribution of states visited in one simulation. This is achieved by specifying

```extrapolate_hist =  52C, 54C, 56C, 58C, 60C, 62C, 64C, 66C, 68C, 70C
```

in the input file. Now, the algorithm will use histogram reweighting to extrapolate the number of occupied states to the specified temperatures.

## Data evaluation and estimation of the melting temperature

The simulation, after executing

```oxDNA input
```

will produce, among other files, last_hist.dat. Also note that the values of the order parameter are printed out in energy.dat file during the simulation. The contents of the last_hist.dat file are regularly updated as the simulation keeps running (the values are updated with each simulation step, but only written to file as often as specified by print_conf option). Note that simulation needs to run long enough to properly sample transitions between all states. For the case of the 8-mer, one needs at least about ${\displaystyle 10^{9}}$ iterations with the appropriate weights in order to estimate melting temperature within 1-2K precision. An example of the contents of the last_hist.dat file can look for this example like

```#t = 803480511; extr. Ts: 0.108383 0.10905 0.109717 0.110383 0.11105 0.111717 0.112383 0.11305 0.113717 0.114383
0 7.82092e+09 9.77615e+08 1.65144e+10 9.03838e+09 5.04243e+09 2.86621e+09 1.65915e+09 9.77615e+08 5.86057e+08 3.57265e+08 2.21364e+08 1.39341e+08
1 1.14272e+08 7052.11 146764 76907.7 41117.3 22416.8 12456.8 7052.11 4065.34 2385.22 1423.65 863.974
2 6.08557e+07 32319.5 1.18642e+06 555020 264888 128915 63949.7 32319.5 16633.5 8713.56 4644.05 2517.02
3 9.12706e+07 253708 1.56321e+07 6.5964e+06 2.83907e+06 1.24575e+06 557034 253708 117652 55524.5 26656.4 13012.5
4 1.13301e+08 2.15444e+06 2.16511e+08 8.29258e+07 3.23788e+07 1.28829e+07 5.22111e+06 2.15444e+06 904787 386567 167955 74178.8
5 2.53979e+08 1.68655e+07 2.68639e+09 9.39074e+08 3.34511e+08 1.21377e+08 4.48457e+07 1.68655e+07 6.45377e+06 2.51192e+06 994075 399852
6 7.74418e+08 1.07371e+08 2.68578e+10 8.58849e+09 2.79688e+09 9.27231e+08 3.12831e+08 1.07371e+08 3.74782e+07 1.32996e+07 4.79646e+06 1.75748e+06
7 1.03628e+09 4.60608e+08 1.75626e+11 5.16622e+10 1.54698e+10 4.71386e+09 1.46119e+09 4.60608e+08 1.4761e+08 4.80752e+07 1.59081e+07 5.34659e+06
8 2.5904e+09 8.93909e+08 5.0908e+11 1.38277e+11 3.82229e+10 1.07489e+10 3.07421e+09 8.93909e+08 2.64187e+08 7.93341e+07 2.41997e+07 7.49618e+06
```

Where the first line specifies the simulation time at which the file was saved and temperatures to which the states were extrapolated (specified in simulation units, they correspond to the ones specified extrapolate_hist option. The first column specifies the value of the order parameter (from 0 to 8 in our example) and the second column specifies the number of iterations that the simulation spend in that state. Note that the simulation is biased, i.e. some states are more likely to be visited because they have larger weight on them. The third column shows the unbiased time for each value of order parameter, which is obtained by dividing the number in the second column by the weight assigned to that order parameter value. The fourth and further columns correspond to the unbiased number of states extrapolated to the respective temperatures. From the ratios of the number of bonded states (1 to 8 bonds) to the number of unbonded states (0 bonds), one can calculate the yields at the respective temperatures and obtain melting temperature by interpolating them and finding for which temperature the ratio is 2. For convenience, a script is provided to do the interpolation:

```estimate_Tm.py  last_hist.dat
```

and output, for the example last_hist.dat specified above, is

``` 52.00 0.9774084 0.8590862
54.00 0.9566703 0.8086249
56.00 0.9185418 0.7432626
58.00 0.8521952 0.6613256
60.00 0.7470068 0.5632429
62.00 0.6024042 0.4531267
64.00 0.4380124 0.3397566
66.00 0.2868094 0.2352155
68.00 0.1723566 0.1503401
70.00 0.0977175 0.0897356
## Tm =  61.1492313255
```

Where the first column specifies the temperature, second column is the yield of duplexes in the simulation box and third column is the finite-size effect correct yield of duplexes. The melting temperature corresponds to the yield (with finite-size correction) equal to 0.5.