Accurate and efficient data-driven psychiatric assessment using machine learning
Uses machine learning on a database of over 4,000 children to build short, accurate questionnaires that screen for mental health conditions and learning disabilities — using just 3 to 27 questions instead of the usual 100+.
Abstract
Background
Accurate assessment of mental disorders and learning disabilities is essential for timely intervention. Machine learning and feature selection techniques have potential for improving the accuracy and efficiency of mental health assessments. However, limited research has explored the use of large transdiagnostic datasets, as well as the application of these techniques in developing quick, briefer, question-based assessments. This study applies machine learning and feature selection techniques to a large transdiagnostic dataset featuring a high number of assessment items, and to create a tool for construction of streamlined, efficient, and effective assessments from existing data.
Methods
Using the Healthy Brain Network dataset (n = 4,136 at the time this study was conducted) containing over 1000 questionnaire items, a two-stage feature selection approach, with Elastic Net models, was used to identify optimal, parsimonious item subsets for assessing various disorders and symptoms, as well as custom test-based outcome measures for learning disabilities. The study then compared model performance to existing assessments through rigorous cross-validation.
Results
Machine learning models using parsimonious item subsets significantly outperformed traditional assessments (p = 0.004). Models for specific learning disorders achieved AUC values up to 0.855. Importantly, restricting analysis to non-proprietary assessment items did not significantly reduce performance.
Discussion
This study demonstrates the feasibility of using existing datasets to create efficient, effective assessment tools for mental disorders and learning disabilities. Our open-source, modular software architecture facilitates adaptation to diverse datasets, though external validation remains necessary before clinical implementation. The ability to achieve strong performance using only non-proprietary items supports the development of accessible assessment tools.
@article{false,
title = {Accurate and Efficient Data-Driven Psychiatric Assessment Using Machine Learning},
author = {Konishcheva, Kseniia and Leventhal, Bennett L. and Koyama, Maki and Panda, Sambit and Vogelstein, Joshua T. and Milham, Michael P. and Lindner, Ariel B. and Klein, Arno},
year = 2026,
month = jan,
journal = {BMC Medical Informatics and Decision Making},
volume = {26},
pages = {40},
issn = {1472-6947},
doi = {10.1186/s12911-025-03329-5},
}