Lack Of High-Quality Training Data Impedes AI Advances

Artificial Intelligence systems may one day perform a plethora of human tasks, however the third annual Data Scientist Survey conducted by CrowdFlower finds the work to get there is tedious and time consuming. The survey of approximately 200 data scientists found the jobs they hate the most, the ones that consume the bulk of their timem are cleaning, labeling, and categorizing data. They spend 500% more time cleaning, labeling and categorizing data than they spend mining the data. In fact, those surveyed said they spend double the amount of time on these laborious task than creating and building algorithms.

 

The reason for this is twofold.  First, the lack of high quality training data is the single biggest reason AI systems fail according to the results of the survey.  It is so critical, respondents said they'd rather break their leg than delete their training data.  Secondly, data scientists have concerns about the integrity of the training data and worry that if they aren't careful, the wrong training data could bias an AI system because it could be influenced by human prejudices around things such as religion, race or gender. 

Free Monthly Newsletter

Compelling read? Subscribe to FierceEmbeddedTech!

The embedded tech sector runs the market’s trends. FierceEmbeddedTech subscribers rely on our suite of newsletters as their must-read source for the latest news, developments and analysis impacting their world. Sign up today to get news and updates delivered to your inbox and read on the go.

 

As AI systems increasingly enter the mainstream, their usefulness is often defined by the quality of the training data used. While a machine can process complex mathematical equations or structured data in milliseconds, training data teaches a machine how to process more abstract data like flagging inappropriate content or distinguishing between objects in images. While higher quality initial training data will improve the accuracy of an algorithm's initial output, ongoing training data is required to constantly improve upon the algorithm's results.

 

To view the full report, visit  http://crwdflr.com/2oMPCzh

For more information, visit http://www.crowdflower.com

Suggested Articles

Korean electronics giant Samsung expects its third fiscal quarter of 2019 to be the fourth straight quarter of a year-to-year revenue decline.

Apple finishes week at all-time high on news of a tentative trade deal with China

The ultra-low-power microcontroller market is projected to grow from $4.4 billion in 2019 to $12.9 billion in 2024 for a CAGR of 24.1%.