By using proportional subsampling from each sample, the relative number of cells from each cluster in each sample can be preserved in a smaller dataset, allowing for interpretable analysis via DR. As dimensionality reduction tools are primarily used to visualise cellular data and clustering results, we plot a subset of the clustered data, which addresses scalability and retains legibility. Whilst some improvements to runtime (flt-SNE) and plot crowding (opt-SNE) have been made, scalability and plot crowding limitations persist. Whilst clustering tools such as FlowSOM scale well to large datasets, dimensionality reduction approaches such as t-SNE and UMAP do not as they incur lengthy computing time, excessive memory usage, and significant crowding effects that inhibit their utility.
![make graphs uniform flowjo 10 make graphs uniform flowjo 10](https://docs.flowjo.com/wp-content/uploads/sites/6/2013/03/2014-04-29_1331.png)
T he simplicity of this data structure facilitates extremely fast and simple filtering/subsetting by data.table, as every cell (row) contains all of the information relevant for that cell: such as cellular expression, samples/groups, clusters/populations, and dimensionality reduction coordinates.Ĭlustering and dimensionality reduction strategies for large datasets Rather than storing analysis outputs (clusters, dimensionality reduction values, annotations etc) in separate areas of a custom data format, Spectre simply adds new columns to the existing data.table. This simple data.table structure allows for the high-speed processing, manipulation (subsetting, filtering, etc.), and plotting of large datasets, as well as fast reading/writing of large CSV files. This table-like structure organises cells (rows) against cellular features or metadata (columns). In Spectre, d ata management and operations are performed using the data.table f ormat, an extension of R’s base ame, provided by the data.table package. Many existing computational tools store data in a custom format, such as the flowFrame or SingleCellExperiment object, that provide excellent field-specific structuring of single-cell data. Along with the various R packages used within Spectre, we would like to acknowledge the cytofkit and Seurat R packages from providing inspiration for elements of the package design.
![make graphs uniform flowjo 10 make graphs uniform flowjo 10](https://docs.flowjo.com/wp-content/uploads/sites/6/2013/03/Screenshot_112915_043914_PM.jpg)
The Spectre package was constructed on the basis of the CAPX workflow in R. The simple, clear, and modular design of analysis workflows allow for these tools to be used by informaticians and laboratory scientists alike. In addition to high-dimensional cytometry datasets, we’ve also developed functions to allow for spatial analysis of high-dimensional imaging datasets, such as those generated by Imaging Mass Cytometry. Critically, the fundamental data structures used within Spectre, along with the implementation of classifiers allow for the scalable analysis of very large high-dimensional datasets.
![make graphs uniform flowjo 10 make graphs uniform flowjo 10](https://docs.flowjo.com/wp-content/uploads/sites/6/2013/03/FlowJo-X.png)
Strategic implementation of batch-alignment, data-integration, and cell-type classification tools allow for the integrated analysis of multiple experiments, as well as a reproducible system for rapid and repeated cell type identification in large datasets. Here we present Spectre ( ), an R package designed to facilitate data analysis workflows that simplify and streamline data manipulation and annotation, population identification (clustering, classification), and dimensionality reduction (t-SNE, UMAP) etc in high-dimensional cytometry data. Furthermore, popular clustering and dimensionality reduction tools alone are insufficient for scalable or reproducible analysis across batches, experiments, or different cytometry technologies. As the size and complexity of high-dimensional cytometry data continues to expand, comprehensive computational tools that can scale to large datasets are required.