Underspecification is the failure to specify in enough detail, according to Wiktionary. In machine learning and AI, the underspecification of training samples can result in vastly different predictions for edge cases, even when very similar models.
In some instances with artificial neural networks, even the same model with differing starting points can produce vastly different predictions when tested on edge cases, according to an unpublished study of machine learning from about numerous Google researchers, including the article’s lead authors Alexander D’Amour, Katherine Heller, and Dan Moldovan.
According to the researchers, “ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains.We identify underspecification as a key reason for these failures.”
Artificial neural networks can be trained to make accurate predictions without knowing how variables interact or the constraints of these variables.
The testing and evaluation sets are usually drawn from the same sample population. So, when the data set does not include edge cases, the predictions from this anomalous data can vary drastically.
Another potential issue can occur when the unusual cases are not as infrequent as the sample predicts.
For example, pictures of skin cancer made in a clinical setting are likely to be much cleaner and less pixilated than those taken in a busy doctor’s office. So while a predictive model based upon clinical data can make good predictions about clean images, predictions from real-world, pixelated images would likely be less accurate.
Speech recognition in the real world can have a similar issue if training is done without background noise. Suppose you have the TV blaring, and you try to turn off the lights with Alexa. Alexa will have difficulty distinguishing the words of your commands from the noise of the TV.
Including edge cases in the training and testing sets is one solution, but what if you don’t know edge case characteristics? Therefore, you wouldn’t know how to include or even if they are included in the sampling.
Pixalated images and background noise can be anticipated in real-word scenarios. However, other edge cases may not be as obvious and could be entirely unknown.
Say for example you were on the front lines of the first infections and deaths from COVID-19. Any models that estimated its spread and death rate, would have been flawed if you didn’t know that asymptomatic carriers make up a large portion of cases.
So any real-world forecasts that you could make at the time would most likely be very inaccurate.
Not knowing how often asymptomatic carriers get the illness was a major shortfall of early projections. Asymptomatic carriers of COVID-19 are not an edge case. They are extremely common.
Having pixelated images in the samples may work, but it does not fix all other variables in images that are not necessarily lined or lit properly. A representative sample for training and testing should have such variations built-in.
Additionally the training variables being examined in the sample may not include the primary variables that have a causal relationship with the dependent variable to be predicted. The sampled variables should encompass the causal relationships among the factors. Otherwise, predictive accuracy will suffer.
According to a recent article in Medium, “The path forward for robust AI will always depend on good explanatory models that capture the relevant causality of the system being predicted. Advanced AI should not be driven naively by curve fitting, but rather by relevance realization.”
D’Amour, A., Heller, K., Moldovan, D., et al. (Nov. 6, 2020).Underspecification in Machine Learning, Underspecification Presents Challenges for Credibility in Modern Machine Learning. arxiv.org. https://arxiv.org/pdf/2011.03395.pdf
Perez, C. E. (Nov 17, 2020). Deep Learning Underspecification and Causality. Medium. https://medium.com/intuitionmachine/deep-learning-underspecification-and-causality-bf762f118780