Crédito:Pixabay/CC0 Dominio público
Hace aproximadamente una década, los modelos de aprendizaje profundo comenzaron a lograr resultados sobrehumanos en todo tipo de tareas, desde vencer a jugadores de juegos de mesa campeones del mundo hasta superar a los médicos en el diagnóstico del cáncer de mama.
Estos poderosos modelos de aprendizaje profundo generalmente se basan en redes neuronales artificiales, que se propusieron por primera vez en la década de 1940 y se han convertido en un tipo popular de aprendizaje automático. Una computadora aprende a procesar datos usando capas de nodos interconectados, o neuronas, que imitan el cerebro humano.
A medida que ha crecido el campo del aprendizaje automático, las redes neuronales artificiales también lo han hecho.
Los modelos de aprendizaje profundo ahora a menudo se componen de millones o miles de millones de nodos interconectados en muchas capas que están capacitados para realizar tareas de detección o clasificación utilizando grandes cantidades de datos. Pero debido a que los modelos son tan enormemente complejos, incluso los investigadores que los diseñan no entienden completamente cómo funcionan. Esto hace que sea difícil saber si están funcionando correctamente.
Por ejemplo, tal vez un modelo diseñado para ayudar a los médicos a diagnosticar a los pacientes predijo correctamente que una lesión en la piel era cancerosa, pero lo hizo centrándose en una marca no relacionada que ocurre con frecuencia cuando hay tejido canceroso en una foto, en lugar de en el canceroso. tejido mismo. Esto se conoce como una correlación espuria. El modelo acierta en la predicción, pero lo hace por la razón equivocada. En un entorno clínico real donde la marca no aparece en las imágenes positivas de cáncer, podría dar lugar a diagnósticos erróneos.
Con tanta incertidumbre girando en torno a estos llamados modelos de "caja negra", ¿cómo se puede desentrañar lo que sucede dentro de la caja?
Este rompecabezas ha llevado a un área de estudio nueva y de rápido crecimiento en la que los investigadores desarrollan y prueban métodos de explicación (también llamados métodos de interpretabilidad) que buscan arrojar algo de luz sobre cómo los modelos de aprendizaje automático de caja negra hacen predicciones.
¿Qué son los métodos de explicación?
En su nivel más básico, los métodos de explicación son globales o locales. Un método de explicación local se enfoca en explicar cómo el modelo hizo una predicción específica, mientras que las explicaciones globales buscan describir el comportamiento general de un modelo completo. Esto se hace a menudo mediante el desarrollo de un modelo separado, más simple (y con suerte comprensible) que imita el modelo de caja negra más grande.
Pero debido a que los modelos de aprendizaje profundo funcionan de manera fundamentalmente compleja y no lineal, desarrollar un modelo de explicación global efectivo es particularmente desafiante. Esto ha llevado a los investigadores a centrar gran parte de su enfoque reciente en los métodos de explicación local, explica Yilun Zhou, estudiante de posgrado en el Grupo de Robótica Interactiva del Laboratorio de Ciencias de la Computación e Inteligencia Artificial (CSAIL) que estudia modelos, algoritmos y evaluaciones en lenguaje interpretable. aprendizaje automático.
Los tipos más populares de métodos de explicación local se dividen en tres categorías amplias.
El primer tipo de método de explicación y el más utilizado se conoce como atribución de características. Los métodos de atribución de características muestran qué características fueron las más importantes cuando el modelo tomó una decisión específica.
Las características son las variables de entrada que se alimentan a un modelo de aprendizaje automático y se utilizan en su predicción. Cuando los datos son tabulares, las características se extraen de las columnas en un conjunto de datos (se transforman utilizando una variedad de técnicas para que el modelo pueda procesar los datos sin procesar). For image-processing tasks, on the other hand, every pixel in an image is a feature. If a model predicts that an X-ray image shows cancer, for instance, the feature attribution method would highlight the pixels in that specific X-ray that were most important for the model's prediction.
Essentially, feature attribution methods show what the model pays the most attention to when it makes a prediction.
"Using this feature attribution explanation, you can check to see whether a spurious correlation is a concern. For instance, it will show if the pixels in a watermark are highlighted or if the pixels in an actual tumor are highlighted," says Zhou.
A second type of explanation method is known as a counterfactual explanation. Given an input and a model's prediction, these methods show how to change that input so it falls into another class. For instance, if a machine-learning model predicts that a borrower would be denied a loan, the counterfactual explanation shows what factors need to change so her loan application is accepted. Perhaps her credit score or income, both features used in the model's prediction, need to be higher for her to be approved.
"The good thing about this explanation method is it tells you exactly how you need to change the input to flip the decision, which could have practical usage. For someone who is applying for a mortgage and didn't get it, this explanation would tell them what they need to do to achieve their desired outcome," he says.
The third category of explanation methods are known as sample importance explanations. Unlike the others, this method requires access to the data that were used to train the model.
A sample importance explanation will show which training sample a model relied on most when it made a specific prediction; ideally, this is the most similar sample to the input data. This type of explanation is particularly useful if one observes a seemingly irrational prediction. There may have been a data entry error that affected a particular sample that was used to train the model. With this knowledge, one could fix that sample and retrain the model to improve its accuracy.
How are explanation methods used?
One motivation for developing these explanations is to perform quality assurance and debug the model. With more understanding of how features impact a model's decision, for instance, one could identify that a model is working incorrectly and intervene to fix the problem, or toss the model out and start over.
Another, more recent, area of research is exploring the use of machine-learning models to discover scientific patterns that humans haven't uncovered before. For instance, a cancer diagnosing model that outperforms clinicians could be faulty, or it could actually be picking up on some hidden patterns in an X-ray image that represent an early pathological pathway for cancer that were either unknown to human doctors or thought to be irrelevant, Zhou says.
It's still very early days for that area of research, however.
Words of warning
While explanation methods can sometimes be useful for machine-learning practitioners when they are trying to catch bugs in their models or understand the inner-workings of a system, end-users should proceed with caution when trying to use them in practice, says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in CSAIL.
As machine learning has been adopted in more disciplines, from health care to education, explanation methods are being used to help decision makers better understand a model's predictions so they know when to trust the model and use its guidance in practice. But Ghassemi warns against using these methods in that way.
"We have found that explanations make people, both experts and nonexperts, overconfident in the ability or the advice of a specific recommendation system. I think it is very important for humans not to turn off that internal circuitry asking, 'let me question the advice that I am
given,'" she says.
Scientists know explanations make people over-confident based on other recent work, she adds, citing some recent studies by Microsoft researchers.
Far from a silver bullet, explanation methods have their share of problems. For one, Ghassemi's recent research has shown that explanation methods can perpetuate biases and lead to worse outcomes for people from disadvantaged groups.
Another pitfall of explanation methods is that it is often impossible to tell if the explanation method is correct in the first place. One would need to compare the explanations to the actual model, but since the user doesn't know how the model works, this is circular logic, Zhou says.
He and other researchers are working on improving explanation methods so they are more faithful to the actual model's predictions, but Zhou cautions that, even the best explanation should be taken with a grain of salt.
"In addition, people generally perceive these models to be human-like decision makers, and we are prone to overgeneralization. We need to calm people down and hold them back to really make sure that the generalized model understanding they build from these local explanations are balanced," he adds.
Zhou's most recent research seeks to do just that.
What's next for machine-learning explanation methods?
Rather than focusing on providing explanations, Ghassemi argues that more effort needs to be done by the research community to study how information is presented to decision makers so they understand it, and more regulation needs to be put in place to ensure machine-learning models are used responsibly in practice. Better explanation methods alone aren't the answer.
"I have been excited to see that there is a lot more recognition, even in industry, that we can't just take this information and make a pretty dashboard and assume people will perform better with that. You need to have measurable improvements in action, and I'm hoping that leads to real guidelines about improving the way we display information in these deeply technical fields, like medicine," she says.
And in addition to new work focused on improving explanations, Zhou expects to see more research related to explanation methods for specific use cases, such as model debugging, scientific discovery, fairness auditing, and safety assurance. By identifying fine-grained characteristics of explanation methods and the requirements of different use cases, researchers could establish a theory that would match explanations with specific scenarios, which could help overcome some of the pitfalls that come from using them in real-world scenarios.