An unsupervised feature selection method is proposed for analysis of data sets of high dimensionality. The Least Square Error (LSE) of approximating the complete data set via a reduced feature subset is proposed as the quality measure for feature selection. Guided by this LSE criterion, a feature selection algorithm is developed to find the feature subset with the lowest LSE. The algorithm (named KLS-FS) is granted the capability of non-linear feature selection by using the kernel representation. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.
Liu R, Rallo R, Cohen Y. (2011) Unsupervised Feature Selection using Incremental Least Squares. International Journal of Information and Decision Making, 10(6):967-987
- PUBMED ID: N/A
Self-Organizing Map Analysis of Toxicity-related Cell Signaling Pathways for Metal and Metal Oxide Nanoparticles
The response of a murine macrophage cell line exposed to a library of seven metal and metal oxide nanoparticles was evaluated via High Throughput Screening (HTS) assay employing luciferase-reporters for ten independent toxicity-related signaling pathways. Similarities of toxicity response among the nanoparticles were identified via Self-Organizing Map (SOM) analysis. This analysis, applied to the HTS data, quantified the significance of the signaling pathway responses (SPRs) of the cell population ex- posed to nanomaterials relative to a population of untreated cells, using the Strictly Standardized Mean Difference (SSMD). Given the high dimensionality of the data and relatively small data set, the validity of the SOM clusters was established via a consensus clus- tering technique. Analysis of the SPR signatures revealed two cluster groups corresponding to (i) sublethal pro-inflammatory responses to Al2O3, Au, Ag, SiO2 nanoparticles possibly related to ROS generation, and (ii) lethal genotoxic responses due to exposure to ZnO and Pt nanoparticles at a concentration range of 25-100 μg/mL at 12 h exposure. In addition to identifying and visualizing clusters and quantifying similarity measures, the SOM approach can aid in developing predictive quantitative-structure relations; however, this would require significantly larger data sets generated from combinatorial libraries of engineered nanoparticles.
Rallo R, France B, Liu R, Nair S, George S, Damoiseaux R, Giralt F, et al. (2011) Self-Organizing Map Analysis of Toxicity-related Cell Signaling Pathways for Metal and Metal Oxide Nanoparticles. Environmental Science and Technology, 45(4): 1695-1702