Category Archives: machine learning

Unsupervised Feature Selection using Incremental Least Squares

Slide1An unsupervised feature selection method is proposed for analysis of data sets of high dimensionality. The Least Square Error (LSE) of approximating the complete data set via a reduced feature subset is proposed as the quality measure for feature selection. Guided by this LSE criterion, a feature selection algorithm is developed to find the feature subset with the lowest LSE. The algorithm (named KLS-FS) is granted the capability of non-linear feature selection by using the kernel representation. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.

Liu R, Rallo R, Cohen Y. (2011) Unsupervised Feature Selection using Incremental Least Squares. International Journal of Information and Decision Making, 10(6):967-987


Classification Nano-SAR development for Cytotoxicity of Metal Oxide Nanoparticles

p10A classification-based cytotoxicity nanostructure–activity relationship (nanoSAR) is presented based on a set of nine metal oxide nanoparticles to which transformed bronchial epithelial cells (BEAS-2B) were exposed over a range of concentrations (0.375–200 mg L−1) and exposure times up to 24 h. The nanoSAR is developed using cytotoxicity data from a high-throughput screening assay that was processed to identify and label toxic (in terms of the propidium iodide uptake of BEAS-2B cells) versus nontoxic events relative to an unexposed control cell population. Starting with a set of fourteen intuitive but fundamental physicochemical nanoSAR input parameters, a number of models were identified which had a classification accuracy above 95%. The best-performing model had a 100% classification accuracy in both internal and external validations. This model is based on three descriptors: atomization energy of the metal oxide, period of the nanoparticle metal, and nanoparticle primary size, in addition to nanoparticle volume fraction (in solution). Notwithstanding the success of the present modeling approach with a relatively small nanoparticle library, it is important to recognize that a significantly larger data set would be needed in order to expand the applicability domain and increase the confidence and reliability of data-driven nanoSARs.

Liu R, Rallo R, George S, Ji Z, Nair S, Nel AE, Cohen Y (2011). Classification Nano-SAR development for Cytotoxicity of Metal Oxide Nanoparticles. Small, 7(8):1118-1126