miniMDS
miniMDS is a tool for inferring and plotting 3D structures from normalized Hi-C data, using partitioned MDS, a novel approximation to multidimensional scaling (MDS). It produces a single 3D structure from a Hi-C BED file, representing an ensemble average of chromosome conformations within the population of cells. Using parallelization, it is able to process high-resolution data quickly with limited memory requirements. The human genome can be inferred at kilobase-resolution within several hours on a desktop computer. Standard MDS results in inaccuracies for sparse high-resolution data, but miniMDS focuses on local substructures to achieve greater accuracy. miniMDS also supports inter-chromosomal structural inference. Together with Mayavi, miniMDS produces publication-quality images and gifs.
|| citation || download || manual | installation | usage ||
Citation
miniMDS: 3D structural inference from high-resolution Hi-C data
L Rieber, S Mahony
Bioinformatics (2017) 33 (14):i261-i266
[Open Access Article].
Download
Open source code (released under the MIT license) is available from: https://github.com/seqcode/miniMDS.
Manual
Installation
Prerequisites:
- python 2.7
- numpy
- scikit-learn
- pymp
- mayavi (optional; for plotting)
- ImageMagick (optional; for creating gifs)
- scipy (optional; for creating figures from paper)
- matplotlib (optional; for creating figures from paper)
Testing
Please run test.sh (in the scripts directory) and report any issues.
TLDR
python minimds.py -l [path to low-res BED] -o [output path] [path to high-res BED]
Usage
Input file format
miniMDS uses intra- or inter-chromosomal BED files as input. Data must be normalized prior to use (for example, using https://bitbucket.org/mirnylab/hiclib).
Format:
chrA bin1_start bin1_end chrB bin2_start bin2_end normalized_contact_frequency
Example – chr22 intra-chromosomal data at 10-Kbp resolution:
chr22 16050000 16060000 chr22 16050000 16060000 12441.5189291 ...
Intra-chromosomal miniMDS
Intra-chromosomal analysis is performed using minimds.py.
To view help:
python minimds.py -h
By default, standard MDS (not partitioned MDS) is used:
python minimds.py GM12878_combined_22_100kb.bed
However, this will not offer the benefits of miniMDS and is not recommended.
Structures are not saved by default. Use the -o option with the path where you want to save the structure.
python minimds.py -o GM12878_combined_22_100kb_structure.tsv GM12878_combined_22_100kb.bed
Structures are saved to tsv files. The header contains the name of the chromosome, the resolution, and the starting genomic coordinate. Each line in the file contains the point number followed by the 3D coordinates (with “nan” for missing data).
Example – chr22 at 10-Kbp resolution:
chr22 10000 16050000 0 0.589878298045 0.200029092422 0.182515056542 1 0.592088232028 0.213915817254 0.186657230841 2 nan nan nan ...
To run partitioned MDS, you must have a normalized BED file at a lower resolution than the BED file you want to infer. For example, to use a 100-Kbp-resolution BED file to aid in the inference of a 10-Kbp-resolution file:
python minimds.py -l GM12878_combined_22_100kb.bed -o GM12878_combined_22_10kb_structure.tsv GM12878_combined_22_10kb.bed
The resolution you choose for the low-res file depends on your tradeoff between speed and accuracy. Lower resolutions are faster but less accurate. For now, the high resolution must be a factor of the low resolution. For example, a 500-Kb-resolution file can be used to infer a 100-Kb-resolution structure, but a 250-Kb-resolution file cannot.
Other parameters (optional)
Controlling the number of partitions
The miniMDS algorithm creates partitions in the high-resolution data and performs MDS on each partition individually. A greater number of partitions can increase speed but also reduce accuracy. On the other hand, for very sparse data a greater number of partitions can actually increase accuracy. If your output appears “clumpy”, increase the number of partitions.
The number of partitions cannot be set directly because partitions are created empirically to maximize clustering of the data. However, the degree of clustering of the data can be tweaked with the following parameters:
-m: minimum partition size (as a fraction of the data). Default = 0.05 -p: smoothing parameter (between 0 and 1). Default = 0.1
Make these parameters smaller to increase the number of partitions. For very high resolution data (such as 5-Kbp), m=0.01 and p=0.01 is recommended:
python minimds.py -l GM12878_combined_22_100kb.bed -o GM12878_combined_22_5kb_structure.tsv -m 0.01 -p 0.01 GM12878_combined_22_5kb.bed
You can limit the maximum RAM (in Kb) used by any given partition using -R (default = 32000):
python minimds.py -l GM12878_combined_22_100kb.bed -o GM12878_combined_22_5kb_structure.tsv -R 50000 GM12878_combined_22_5kb.bed
Number of threads
miniMDS uses multithreading to achieve greater speed. By default, 3 threads are requested, because this is safe for standard 4-core desktop computers. However, the number of threads used will never exceed the number of processors or the number of partitions, regardless of what is requested. You can change the number of requested threads using -n.
For example, to run miniMDS with four threads:
python minimds.py -l GM12878_combined_22_100kb.bed -o GM12878_combined_22_10kb_structure.tsv -n 4 GM12878_combined_22_10kb.bed
Classical MDS
Classical MDS (cMDS), also called principal coordinates analysis, is a variant of MDS that is faster under certain circumstances. The miniMDS tool supports cMDS but NOT with partitioned MDS. Use the –classical option.
python minimds.py --classical GM12878_combined_22_10kb.bed
This mode is mainly used for testing.
Inter-chromosomal miniMDS
Inter-chromosomal analysis is performed using minimds_inter.py
To view help:
python minimds_inter.py -h
The usage of minimds_inter.py is similar to minimds.py, however inter-chromosomal files are required in addition to intra-chromosomal. To avoid entering filenames separately for each chromosome, you must name your files using a standard format.
Intra-chromosomal format:
{prefix}_{ChrA}_{resolution}{kb or Mb}.bed
Example:
GM12878_combined_22_100kb.bed
Inter-chromosomal format:
{prefix}_{ChrA}_{ChrB}_{resolution}{kb or Mb}.bed
where A is before B in:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
Example:
GM12878_combined_21_22_100kb.bed
Enter the prefix and resolution of the inter-chromosomal and intra-chromosomal files, respectively:
python minimds_inter.py [inter-chromosomal file prefix] [intra-chromosomal file prefix] [inter-chromosomal resolution] [intra-chromosomal resolution]
For example, if your files are stored in the directory data:
python minimds_inter.py data/GM12878_combined_interchromosomal data/GM12878_combined_intrachromosomal 1000000 10000
Because of the challenges of inter-chromosomal inference, it is recommended that a resolution no greaer than 1-Mbp be used for inter-chromosomal data.
By default, partitioned MDS is not performed. To perform partitioned MDS on each intra-chromosomal structure, use the option -l followed by the resolution of the low-res intra-chromosomal files. (It is assumed that the naming of these files is otherwise identical to that of the high-res intra-chromosomal files.)
python minimds_inter.py -l 100000 data/GM12878_combined_interchromosomal data/GM12878_combined_intrachromosomal 1000000 10000
This will perform partitioned MDS on each of the intra-chromosomal structures at 10-Kbp resolution and then assemble the chromosomes into a whole-genome structure using 1-Mbp-resolution inter-chromosomal data. Remember that structures are not saved by default.
Other parameters (optional)
All of the parameters from minimds.py are also available for minimds_inter.py
Specifying chromosomes
By default, minimds_inter.py uses all human chromosomes other than Y. You can specify chromosomes using the option -c.
To perform interchromosomal analysis on chromosomes 1 and 8:
python minimds_inter.py -l 100000 -c 1 8 data/GM12878_combined_interchromosomal data/GM12878_combined_intrachromosomal 1000000 10000
Note: it is often necessary to use this option if you are using a genome other than human, so that it won’t search for chromosomes that don’t exist.
Plotting
Read a structure into a Cluster object:
cluster = data_tools.clusterFromFile(path)
Example:
cluster = data_tools.clusterFromFile("GM12878_combined_22_100kb_structure.tsv")
Create an interactive 3D plot in Mayavi. (Mayavi allows you to rotate the image and save a view.)
plotting.plot_cluster_interactive(cluster, color=(1,0,0), radius=None)
By default, the radius is the to-scale radius of heterochromatin.
Multiple clusters can be plotted simultaneously:
chroms = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X)
clusters = [data_tools.clusterFromFile("GM12878_combined_{}_100kb_structure.tsv".format(chrom) for chrom in chroms)]
plotting.plot_clusters_interactive(clusters)
plotting.py has 23 built-in colors designed to be as different to the human eye as possible. By default, these colors are used when plotting multiple clusters. You can also specify a list of colors:
chroms = (1, 2)
clusters = [data_tools.clusterFromFile("GM12878_combined_{}_100kb_structure.tsv".format(chrom) for chrom in chroms)]
plotting.plot_clusters_interactive(clusters, colors=[(1,0,0), (0,0,1)])
The radius can also be specified, as above.
The option cut creates a cross-section of the plot. For example, this is useful for viewing the interior of the nucleus.
chroms = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X)
clusters = [data_tools.clusterFromFile("GM12878_combined_{}_100kb_structure.tsv".format(chrom) for chrom in chroms)]
plotting.plot_clusters_interactive(clusters, cut=True)
A plot can be saved as a gif:
plotting.plot_cluster_gif(cluster, outname, color=(1,0,0), radius=None, increment=10)
A smaller value of increment will lead to a smoother gif.
Multiple clusters can also be plotted in a single gif:
plotting.plot_clusters_gif(clusters, outname, colors=default_colors, radius=None, increment=10)