Information Planes and Decompositions
Information Planes and Decompositions
Disciplines
Computer Sciences (95%); Mathematics (5%)
Keywords
-
Information Theory,
Neural Networks,
Efficient Ai Implementations,
Partial Information Decomposition,
Information Bottleneck
Neural networks, the proverbial black boxes of artificial intelligence, are increasingly permeating our lives thanks to advances in language modelling, speech recognition, and computer vision. While their remarkable, often super-human performance simplifies our private and professional lives, they suffer from two relevant shortcomings. On the one hand, while the elementary operations of neural networks are mathematically straightforward, large-scale models exhibit complex behavior during and after training that is still not fully understood. Indeed, the scientific community is continuously proposing hypotheses and approaches that aim to explain why neural networks are so successful, and a clear picture is emerging only very slowly. Because of this and some reported catastrophic failures, neural network models are considered to lack transparency and trustworthiness. On the other hand, especially the recently proposed large language models (e.g., GPT, Claude, and similar systems) consume tremendous amounts of energy during both training and when in use, leading to substantial carbon emissions and nonneglible consumption of freshwater for cooling. With these models becoming increasingly popular, the environmental footprint of AI is determined to rise. This project aims to address both of these shortcomings. First and foremost, the project shall investigate the behavior of neural networks during and after training using methods from information theory, a mathematical theory for the processing, transmission, and storage of information. More concretely, the planned work shall help settling the long-standing debate about whether neural networks that inherently remove irrelevant information achieve better performance. Furthermore, it will be investigated how information is duplicated within the many elements of neural networks and how this redundancy can be reduced by novel training strategies. Second, toward the end of the project, it will be evaluated how the obtained insights can be used to reduce the environmental footprint of future neural network systems. This may be achieved by removing redundant elements or by ensuring efficient information compression during training. Thus, while the primary goal of the project is to shed light on the inner workings of neural networks and thus improve their trustworthiness, it also aims to leverage these insights to make artificial intelligence more sustainable.
- Technische Universität Graz - 100%
- Franz Pernkopf, Technische Universität Graz , associated research partner
- Asja Fischer, Ruhr Universität Bochum - Germany
- Artemy Kolchinsky, Universitat Pompeu Fabra Barcelona - Spain
- Pedro Mediano, Imperial College London