mountain ridge

Research

J. G. Zilly, R. K. Srivastava, J. Koutník and J. Schmidhuber
International Conference on Machine Learning (ICML), 2017.
Abstract Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with “deep” transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin’s circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose Recurrent Highway Networks, which extend the LSTM architecture to allow step-to-step transition depths larger than one. Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.
Download
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M. M. Bronstein
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Abstract Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean-structured data such as acoustic signals, images, or videos. Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of our framework. We test the proposed method on standard tasks from the realms of image-, graph- and 3D shape analysis and show that it consistently outperforms previous approaches.
Download
W. Jaśkowski, O. R. Lykkebø, N. E. Toklu, F. Trifterer, Z. Buk, J. Koutník and F. Gomez
The NIPS '17 Competition: Building Intelligent Systems (First Place), 2017.
Abstract This paper describes the approach taken by the NNAISENSE Intelligent Automation team to win the NIPS ’17 “Learning to Run” challenge involving a biomechanically realistic model of the human lower musculoskeletal system.
Download
M. Ciccone, M. Gallieri, J. Masci, C. Osendorfer, and F. Gomez
Neural Information Processing Systems (NeurIPS), 2018.
Abstract This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.
Download
D. Ha and J. Schmidhuber
Neural Information Processing Systems (NeurIPS), 2018.
Abstract A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of this paper is available at https://worldmodels.github.io
Download
F. Lattari, M. Ciccone, M. Matteucci, J. Masci, and F. Visin
2018 DAVIS Challenge on Video Object Segmentation - IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
Abstract We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the 10-th position.
Download
A. Zeyer, K. Irie, R. Schlüter, H. Ney
Interspeech, 2018.
Abstract Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model. Index Terms: attention, end-to-end, speech recognition.
Download
J. Svoboda, J. Masci, F. Monti, M.M. Bronstein, and L. Guibas
International Conference on Representation Learning (ICLR), 2018.
Abstract Deep learning systems have become ubiquitous in many aspects of our lives. Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses. Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones. In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. This results in a form of non-local forward propagation in the model, where latent features are conditioned on the global structure induced by the graph, that is up to 3 times more robust to a variety of white- and black-box adversarial attacks compared to conventional architectures with almost no drop in accuracy.
Download
W. Byeon, Q. Wang, R. K. Srivastava, and P. Koumoutsakos
European Conference on Computer Vision (ECCV), 2018.
Abstract Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry pre- dictions. We identify an important contributing factor for imprecise pre- dictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context- aware architecture that captures the entire available past context for each pixel using Parallel Multi-Dimensional LSTM units and aggregates it us- ing blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of-the-art performance for next step prediction on three challenging real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101. Moreover, it does so with fewer parameters than several recently proposed models, and does not rely on deep convolutional networks, multi-scale architectures, sepa- ration of background and foreground modeling, motion flow learning, or adversarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.
Download
L. Kidziński et al. (co-authored by all challenge participants)
arXiv
Abstract In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.
Download
P. Shyam, W. Jaśkowski, and F. Gomez
International Conference on Machine Learning (ICML), 2019.
Abstract Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.
Download
J. E. Lenssen, C. Osendorfer, and J. Masci
arXiv
Abstract This paper presents an end-to-end differentiable algorithm for anisotropic surface normal estimation on unstructured point-clouds. We utilize graph neural networks to iteratively infer point weights for a plane fitting algorithm applied to local neighborhoods. The approach retains the interpretability and efficiency of traditional sequential plane fitting while benefiting from a data-dependent deep-learning parameterization. This results in a state-of-the-art surface normal estimator that is robust to noise, outliers and point density variation and that preserves sharp features through anisotropic kernels and a local spatial transformer. Contrary to previous deep learning methods, the proposed approach does not require any hand-crafted features while being faster and more parameter efficient.
Download
J. Svoboda, A. Anoosheh, C. Osendorfer and J. Masci
arXiv
Abstract This paper introduces a neural style transfer model to conditionally generate a stylized image using only a set of examples describing the desired style. The proposed solution produces high-quality images even in the zero-shot setting and allows for greater freedom in changing the content geometry. This is thanks to the introduction of a novel Peer-Regularization Layer that recomposes style in latent space by means of a custom graph convolutional layer aiming at separating style and content. Contrary to the vast majority of existing solutions our model does not require any pre-trained network for computing perceptual losses and can be trained fully end-to-end with a new set of cyclic losses that operate directly in latent space. An extensive ablation study confirms the usefulness of the proposed losses and of the Peer-Regularization Layer, with qualitative results that are competitive with respect to the current state-of-the-art even in the challenging zero-shot setting. This opens the door to more abstract and artistic neural image generation scenarios and easier deployment of the model in production.
Download
T. Willi, J. Masci, J. Schmidhuber and C. Osendorfer
arXiv
Abstract We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs can learn dynamical patterns from sequential data and deal with non-stationarity. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation and favourable runtime with respect to Gaussian Processes. We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability.
Download
A. Quaglino, M. Gallieri, J. Masci and J. Koutník
arXiv
Abstract This paper proposes the use of spectral element methods for fast and accurate training of Neural Ordinary Differential Equations (ODE-Nets). This is achieved by expressing their dynamics as truncated series of Legendre polynomials. The series coefficients, as well as the network weights, are computed by minimizing the weighted sum of the loss function and the violation of the ODE-Net dynamics. The problem is solved by coordinate descent that alternately minimizes, with respect to the coefficients and the weights, two unconstrained sub-problems using standard backpropagation and gradient methods. The resulting optimization scheme is fully time-parallel and results in a low memory footprint. Experimental comparison to standard methods, such as backpropagation through explicit solvers and the adjoint technique, on training surrogate models of small and medium-scale dynamical systems shows that it is at least one order of magnitude faster at reaching a comparable value of the loss function. The corresponding testing MSE is one order of magnitude smaller as well, suggesting generalization capabilities increase.
Download