"Depression is one of the most common mental disorders, with devastating consequences for both the individual and society, so we are developing a new, more objective diagnostic method that could become accessible to everyone in the future," says Rytis Maskeliūnas, a professor at KTU and one of the authors of the invention.
Scientists argue that while most diagnostic research for depression has traditionally relied on a single type of data, the new multimodal approach can provide better information about a person’s emotional state.
Impressive accuracy using voice and brain activity data
This combination of speech and brain activity data achieved an impressive 97.53 per cent accuracy in diagnosing depression, significantly outperforming alternative methods. "This is because the voice adds data to the study that we cannot yet extract from the brain," explains Maskeliūnas.
According to Musyyab Yousufi, KTU PhD student who contributed to the invention, the choice of data was carefully considered: "While it is believed that, facial expressions might reveal more about a person’s psychological state, but this is quite easily falsifiable data. We chose voice because it can subtly reveal an emotional state, with the diagnosis affecting the pace of speech, intonation, and overall energy".
In addition, unlike electrical brain activity (EEG) or voice data, the face can directly identify a person’s state of severity up to certain extent. "But we cannot violate patients’ privacy, and also, collecting and combining data from several sources is more promising for further use," says the professor.
Maskeliūnas emphasises that the used EEG dataset was obtained from the Multimodal Open Dataset for Mental Disorder Analysis (MODMA), as the research group represents computer science and not the medical science field.
MODMA EEG data was collected and recorded for five minutes while participants were awake, at rest, and with their eyes closed. In the audio part of the experiment, the patients participated in a question-and-answer session and several activities focused on reading and describing pictures to capture their natural language and cognitive state.
AI will need to learn how to justify the diagnosis
The collected EEG and audio signals were transformed into spectrograms, allowing the data to be visualised. Special noise filters and pre-processing methods were applied to make the data noise free and comparable, and a modified DenseNet-121 deep-learning model was used to identify signs of depression in the images. Each image reflected signal changes over time. The EEG showed waveforms of brain activity, and the sound showed frequency and intensity distributions.
The model included a custom classification layer trained to split the data into classes of healthy or depressed people. Successful classification was evaluated and then the accuracy of the application was assessed.
In the future, this AI model could speed up the diagnosis of depression, or even make it remote, and reduce the risk of subjective evaluations. This requires further clinical trials and improvements to the programme. However, Maskeliūnas adds, that the latter aspect of the research might raise some challenges.
"The main problem with these studies is the lack of data because people tend to remain private about their mental health matters," he says.
Another important aspect mentioned by the professor of the Department of Multimedia Engineering is that the algorithm needs to be improved in such a way that it is not only accurate but also provides information to the medical professional on what led to this diagnostic result. "The algorithm still has to learn how to explain the diagnosis in a comprehensible way," says Maskeliūnas.
According to a professor, due to the growing demand for AI solutions that directly affect people in areas such as healthcare, finance, and the legal system, similar requirements are becoming common.
This is why explainable artificial intelligence (XAI), which aims to explain to the user why the model makes certain decisions and to increase their trust in the AI, is now gaining momentum.
The article Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121 was published in Brain Sciences Journal, and can be accessed here.