LSTM for Uniform Credit Assignment to Deep Networks
LSTM for Uniform Credit Assignment to Deep Networks
Disciplines
Computer Sciences (100%)
Keywords
-
Long Short-Term Memory,
LSTM,
Neural Networks,
Recurrent Neural Networks,
Machine Learning,
Deep Learning
LSTM (Long Short-Term Memory) networks are learning systems that can be understood as a special type of recurrent artificial neural networks with an explicit memory architecture consisting of so-called memory cells and gates that control the inflow and outflow to and from the memory cells. LSTMs scan sequences of inputs (e.g. text, biological sequences) sequentially and automatically learn to recognize and memorize particular inputs or patterns of inputs that occur throughout the sequences by storing them in their memory cells. Recently, LSTM networks have emerged as the best-performing technique in speech and language processing. Recent conferences in these fields have been dominated by LSTM-based approaches, e.g. the flagship conference ICASSP. Recent benchmark records were achieved with LSTM, often by major IT companies like Google, IBM, Microsoft, and Baidu. The success of LSTM networks stems from its capability to perform uniform credit assignment to inputs, that is, LSTM allows for treating all inputs on the same level. For example, when a sentence is processed, the first word may be as important for learning as the last word. Uniform credit assignment considers all input information equally no matter where it is located in the input sequence. If learning is biased to the most recent inputs, sub-optimal solutions are often obtained. In this project, we want to go beyond uniform credit assignment to simple inputs like words. The goal of this project is to develop LSTM networks and corresponding learning techniques for uniform credit assignment to so-called deep networks which pre-process complex inputs, such as, images, speech, or chemical compounds. Such networks can be applied to the classification of actions in videos, where single frames may not convey sufficient information. That may also include photo series that show the same object from different angles, with the aim to extract features that are not visible on single images. High-content imaging of cells in drug design is another application in which a high-resolution image is split into multiple sub- images that are presented sequentially to the classification system. A further application is to predict the toxicity of a mixture of chemical compounds (e.g. a soil sample), where an unknown number of chemical structures are presented sequentially to the network. The new architectures and new approaches to LSTM-based uniform credit assignment to deep networks will be applied to the following tasks and will be benchmarked and tested on corresponding data sets: 1) video activity recognition and video description; 2) classification of large images which are split into sub-images; 3) classification of mixtures of compounds with unknown number of components, where the components are sequentially presented to the LSTM network.
Reinforcement learning (RL) is an area of machine learning in which an agent has to learn to interact with an environment such that it achieves a goal. For example, the agent can be a deep artificial neural network that has to learn how to play a computer game and maximize it's score (="reward") in the game. The field of RL is recently gaining increased public attention, with success stories such as AlphaGo, an RL program developed by DeepMind/Google, which beat the worlds best player in the board game GO, and OpenAI-Five, an RL program developed by OpenAI, which is able to competitively play the multiplayer-online-battle-arena video game Dota2. However, despite these progresses, RL is still struggling with real-world tasks and more complex strategy games, such as starcraft-II. In this project we were able to identify one of the fundamental limitations of current RL methods and were able to provide a solution to overcome this limitation. We were able to show that current methods severely suffer from long delays between actions and resulting rewards. However, real-world tasks and strategy games typically include long delays between actions and rewards. After analyzing this problem for current RL methods, we derived a novel method RUDDER, which removes the delays in rewards. It does so by training a deep learning model to predict the accumulated rewards at the end of completed state-action sequences. In other words, it predicts the outcome of a sequence by looking at the complete sequence.This is a supervised learning task and can be solved using e.g. long short-term memory networks. By doing so, the model learns which actions cause the reward. We then apply contribution analysis to determine the contributions of each action to the prediction of the model. If the model is able to predict the accumulated reward at the end of the sequence, these contributions will correspond to the contribution of the actions to the accumulated reward. As such, we can replace the original reward with these contributions, effectively moving (="redistributing") delayed rewards to the actions that caused these rewards. This redistributed reward without delays can then be used to train an RL agent, resulting in exponential speed-ups for training. As an intuitive example, let us consider the case of trying to learn how to play the piano. Current RL approaches can be compared to telling the student how good or bad the play was at the end of the play. RUDDER, however, moves the feedback from the end of the play directly to the points in the play where the student was good or bad. The student gets immediate feedback after doing something good or bad, which makes learning much easier.
- Universität Linz - 100%
Research Output
- 1815 Citations
- 18 Publications
- 3 Datasets & models
- 1 Software
-
2021
Title A note on leveraging synergy in multiple meteorological data sets with deep learning for rainfall–runoff modeling DOI 10.5194/hess-25-2685-2021 Type Journal Article Author Kratzert F Journal Hydrology and Earth System Sciences Pages 2685-2703 Link Publication -
2020
Title Additional file 1 of Industry-scale application and evaluation of deep learning for drug target prediction DOI 10.6084/m9.figshare.12154023 Type Other Author Mayr A Link Publication -
2020
Title Additional file 1 of Industry-scale application and evaluation of deep learning for drug target prediction DOI 10.6084/m9.figshare.12154023.v1 Type Other Author Mayr A Link Publication -
2019
Title Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning DOI 10.1029/2019wr026065 Type Journal Article Author Kratzert F Journal Water Resources Research Pages 11344-11354 Link Publication -
2019
Title Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation DOI 10.1007/978-3-030-28954-6_15 Type Book Chapter Author Hofmarcher M Publisher Springer Nature Pages 285-296 -
2021
Title Quantum Optical Experiments Modeled by Long Short-Term Memory DOI 10.3390/photonics8120535 Type Journal Article Author Adler T Journal Photonics Pages 535 Link Publication -
2019
Title Benchmarking a Catchment-Aware Long Short-Term Memory Network (LSTM) for Large-Scale Hydrological Modeling DOI 10.5194/hess-2019-368 Type Preprint Author Kratzert F Pages 1-32 Link Publication -
2019
Title Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets DOI 10.48550/arxiv.1907.08456 Type Preprint Author Kratzert F -
2019
Title A GAN based solver of black-box inverse problems Type Conference Proceeding Abstract Author Gillhofer M Conference NeurIPS 2019 Workshop on Solving Inverse Problems with Deep Networks Link Publication -
2019
Title RUDDER: Return Decomposition for Delayed Rewards Type Conference Proceeding Abstract Author Arjona-Medina J Conference Advances in Neural Information Processing Systems 32 (NIPS 2019) Link Publication -
2019
Title Quantum Optical Experiments Modeled by Long Short-Term Memory DOI 10.48550/arxiv.1910.13804 Type Preprint Author Adler T -
2019
Title Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets DOI 10.5194/hess-23-5089-2019 Type Journal Article Author Kratzert F Journal Hydrology and Earth System Sciences Pages 5089-5110 Link Publication -
2019
Title Benchmarking a Catchment-Aware Long Short-Term Memory Network (LSTM) for Large-Scale Hydrological Modeling DOI 10.13140/rg.2.2.18385.48487 Type Other Author Klotz D Link Publication -
2019
Title Detecting cutaneous basal cell carcinomas in ultra-high resolution and weakly labelled histopathological images DOI 10.48550/arxiv.1911.06616 Type Preprint Author Kimeswenger S -
2017
Title GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium DOI 10.48550/arxiv.1706.08500 Type Preprint Author Heusel M -
2020
Title Industry-scale application and evaluation of deep learning for drug target prediction DOI 10.1186/s13321-020-00428-5 Type Journal Article Author Sturm N Journal Journal of Cheminformatics Pages 26 Link Publication -
2018
Title Large-scale comparison of machine learning methods for drug target prediction on ChEMBL DOI 10.1039/c8sc00148k Type Journal Article Author Mayr A Journal Chemical Science Pages 5441-5451 Link Publication -
2018
Title RUDDER: Return Decomposition for Delayed Rewards DOI 10.48550/arxiv.1806.07857 Type Preprint Author Arjona-Medina J
-
2019
Link
Title Industry-scale Application and Evaluation of Deep Learning for Drug Target Prediction DOI 10.5281/zenodo.3559987 Type Database/Collection of data Public Access Link Link -
2019
Link
Title Industry-scale Application and Evaluation of Deep Learning for Drug Target Prediction DOI 10.5281/zenodo.3239499 Type Database/Collection of data Public Access Link Link -
2019
Link
Title Industry-scale Application and Evaluation of Deep Learning for Drug Target Prediction DOI 10.5281/zenodo.3239498 Type Database/Collection of data Public Access Link Link