The AI tool developer SensiML announced this week the launch of a site to collect audio samples of cough data from a broad swath of people—from healthy individuals to those with non-COVID respiratory conditions and patients diagnosed with COVID-19.
But why cough data? Coughing is a symptom in a majority of COVID-19 patients, and studies have shown that coughing is a main mechanism of social spreading of the virus.
A subsidiary of QuickLogic, the company will use its AI solution, the SensiML™ Analytics Toolkit, to generate predictive model code that can detect the presence of COVID-19 in audio cough samples. The more data gathered, the better the quality of the results, which means they need people like you to provide samples.
If its efficacy is proven out, the application is not intended to be used as a clinical diagnostic. Rather, it will serve as a decision support tool for screening for individuals in combination with other available measures such as temperature screening for fever symptoms.
A pre-diagnostic screening tool
The concept of using AI for pre-diagnostic screening of cough acoustic samples has been studied and validated in recent published academic research out of Michigan State University, University of Oklahoma, Carnegie Mellon University, and University of Cambridge, among others. These early studies suggest strong promise that the pathomorphology of COVID-19 in the respiratory system is distinct from healthy individuals or even other respiratory diseases.
“At SensiML, our expertise is not in medical pathology. It is in the implementation of algorithms for supervised machine learning, particularly in resource-constrained mobile and embedded IoT devices where efficient preprocessing, feature extraction, and classification are demanded,” said Chris Rogers, CEO, SensiML. “We trust and seek to build upon the insights and conclusions coming from academia in this application and wish to contribute our part in providing autonomous algorithms capable of running in a variety of smartphone, mobile and portable IoT edge devices that can be of benefit for screening for COVID-19."
The valuable input of medical professionals will also be important as the data is collected and analyzed. Rogers explained that the crowdsourced data collection initiative is just one vector for acquiring data for AI model development. Both crowdsource-labeled data and expert-labeled data will be used to train, test, and validate the model as the initiative progresses.
Rogers said that although he is not yet at liberty to share details on the specific organizations SensiML is partnering with, the company is undertaking this effort as part of a larger initiative that involves other private technology companies, academia, and health institutions. “Our plans are to make the dataset we collect through this initiative available to both our partners and other research efforts under an open-source license,” he added.
Although the team has access to data from COVID-19 subjects, Rogers said it is just as important to gather data from otherwise healthy subjects—something they’re betting on their public data collection site to deliver. Less than a day after the site launched, close to 100 samples had been uploaded.
“The number of subjects required is something of an unknown until we run trial analysis and determine the level of variability when presented with other known conditions like bronchitis, COPD, asthma, and other respiratory conditions,” Rogers said. “But, initially, we are shooting for several hundred examples within each labeled class to give us a good first assessment of data sufficiency needs."
What about fakers?
One might wonder if a faked cough--which never fooled a savvy grade-school teacher--can provide any meaningful data. “Interestingly, the research indicates that even “fake” or voluntarily induced coughs exhibit the acoustic artifacts needed for accurate classification,” Rogers said. “Thus, if so, the practicality of such a device for screening workers seems quite plausible. Whether such a system could be spoofed with pre-recorded samples is really a question of implementation, and it’s one of the reasons we believe local real-time assessment from ML algorithms running locally on the IoT device has a great deal of merit."
Background noise when collecting audio samples could also be problematic, but Rogers says there are several ways of addressing the issue of noise.
“The most obvious is in the protocol for real-world data collection and minimizing the potential for noise errors by collecting samples that are as clean as is practical. Recognizing that may or may not always be possible, especially in noisy workplace environments, we can design robustness in from the outset by training models with various noise additions and good DSP filtering as part of the pre-processing in the algorithm,” he said. “In the end, one of the purposes of the study and data collection exercise is to address such factors using the methods we have available within the toolkit."
To date, SensiML has completed the first phase of the project, which involved the creation of a robust cough detection algorithm. The next step will be to proceed with cough classification modeling. “We will be scaling up our data collection efforts to curate a broader dataset and welcome help in getting the word out that we need more crowdsourced data samples.,” said Rogers. “As we progress and gain access to additional data, we will have several iterations of analysis refining the cough recognition models.”
The biggest challenge of this initiative is not the technology, nor is it the data collection efforts. Rather, it is time.
“We have every reason to believe the technology can be applied and that AI running on IoT endpoint devices is achievable with the technology we can deliver from SensiML,” said Rogers. "What we don’t have is the luxury of time to develop a viable screening tool through ordinary (proprietary) product development methods. It requires a highly collaborative approach with contributions from many stakeholders."
If you would like to assist in this research effort, please visit the crowdsourcing site SensiML has set up to submit an audio clip of your cough.