Research

Jan Svoboda, Asha Anoosheh, Christian Osendorfer, Jonathan Masci
CVPR 2020
Abstract

This paper introduces a neural style transfer model to generate a stylized image conditioning on a set of examples describing the desired style. The proposed solution produces high-quality images even in the zero-shot setting and allows for more freedom in changes to the content geometry. This is made possible by introducing a novel Two-Stage Peer-Regularization Layer that recombines style and content in latent space by means of a custom graph convolutional layer. Contrary to the vast majority of existing solutions, our model does not depend on any pre-trained networks for computing perceptual losses and can be trained fully end-to-end thanks to a new set of cyclic losses that operate directly in latent space and not on the RGB images. An extensive ablation study confirms the usefulness of the proposed losses and of the Two-Stage Peer-Regularization Layer, with qualitative results that are competitive with respect to the current state of the art using a single model for all presented styles. This opens the door to more abstract and artistic neural image generation scenarios, along with simpler deployment of the model.

Download
Jan Eric Lenssen, Christian Osendorfer, Jonathan Masci
CVPR 2020
Abstract

This paper presents an end-to-end differentiable algorithm for robust and detail-preserving surface normal estimation on unstructured point-clouds. We utilize graph neural networks to iteratively parameterize an adaptive anisotropic kernel that produces point weights for weighted least-squares plane fitting in local neighborhoods. The approach retains the interpretability and efficiency of traditional sequential plane fitting while benefiting from adaptation to data set statistics through deep learning. This results in a state-of-the-art surface normal estimator that is robust to noise, outliers and point density variation, preserves sharp features through anisotropic kernels and equivariance through a local quaternion-based spatial transformer. Contrary to previous deep learning methods, the proposed approach does not require any hand-crafted features or preprocessing. It improves on the state-of-the-art results while being more than two orders of magnitude faster and more parameter efficient.

Download
Neuromemetic Evolutionary Optimization
Paweł Liskowski, Krzysztof Krawiec, Nihat Engin Toklu
Parallel Problem Solving from Nature (PPSN), 2020
Abstract
ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution
Nihat Engin Toklu, Paweł Liskowski, Rupesh Kumar Srivastava
Parallel Problem Solving from Nature (PPSN), 2020
Abstract
Simone Pozzoli, Marco Gallieri, Riccardo Scattolini
IFAC World Congress, 2020
Abstract

The use of recurrent neural networks to represent the dynamics of unstable systems is difficult due to the need to properly initialize their internal states, which in most of the cases do not have any physical meaning, consequent to the non-smoothness of the optimization problem. For this reason, in this paper focus is placed on mechanical systems characterized by a number of degrees of freedom, each one represented by two states, namely position and velocity. For these systems, a new recurrent neural network is proposed: Tustin-Net. Inspired by second-order dynamics, the network hidden states can be straightforwardly estimated, as their differential relationships with the measured states are hardcoded in the forward pass. The proposed structure is used to model a double inverted pendulum and for model-based Reinforcement Learning, where an adaptive Model Predictive Controller scheme using the Unscented Kalman Filter is proposed to deal with parameter changes in the system.

Download
Saeed Saremi
arXiv, May 2020
Abstract

Inspired by recent developments in learning smoothed densities with empirical Bayes, we study variational autoencoders with a decoder that is tailored for the random variable Y=X+N(0,σ2Id). A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder; “implicit”, since during training the encoder only sees clean samples. This is the concept of imaginary noise model, where the noise model dictates the functional form of the variational lower bound (σ), but the noisy data are never seen during learning. The model is named σ-VAE. We prove that all σ-VAEs are equivalent to each other via a simple β-VAE expansion: (σ2)(σ1,β), where β=σ22/σ21. We prove a similar result for the Laplace distribution in exponential families. Empirically, we report an intriguing power law KLσν for the learned models and we study the inference in the σ-VAE for unseen noisy data. The experiments were performed on MNIST, where we show that quite remarkably the model can make reasonable inferences on extremely noisy samples even though it has not seen any during training. The vanilla VAE completely breaks down in this regime. We finish with a hypothesis (the XYZ hypothesis) on the findings here.

Download
Saeed Saremi, Rupesh Srivastava
arXiv, May 2020
Abstract

Smoothing classifiers and probability density functions with Gaussian kernels appear unrelated, but in this work, they are unified for the problem of robust classification. The key building block is approximating the energy function of the random variable Y=X+N(0,σ2Id) with a neural network which we use to formulate the problem of robust classification in terms of xˆ(Y), the Bayes estimator of X given the noisy measurements Y. We introduce empirical Bayes smoothed classifiers within the framework of randomized smoothing and study it theoretically for the two-class linear classifier, where we show one can improve their robustness above the margin. We test the theory on MNIST and we show that with a learned smoothed energy function and a linear classifier we can achieve provable 2 robust accuracies that are competitive with empirical defenses. This setup can be significantly improved by learning empirical Bayes smoothed classifiers with adversarial training and on MNIST we show that we can achieve provable robust accuracies higher than the state-of-the-art empirical defenses in a range of radii. We discuss some fundamental challenges of randomized smoothing based on a geometric interpretation due to concentration of Gaussians in high dimensions, and we finish the paper with a proposal for using walk-jump sampling, itself based on learned smoothed densities, for robust classification.

Download
Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník
International Conference on Learning Representations (ICLR), 2020
Abstract

This paper proposes the use of spectral element methods \citep{canuto_spectral_1988} for fast and accurate training of Neural Ordinary Differential Equations (ODE-Nets; \citealp{Chen2018NeuralOD}) for system identification. This is achieved by expressing their dynamics as a truncated series of Legendre polynomials. The series coefficients, as well as the network weights, are computed by minimizing the weighted sum of the loss function and the violation of the ODE-Net dynamics. The problem is solved by coordinate descent that alternately minimizes, with respect to the coefficients and the weights, two unconstrained sub-problems using standard backpropagation and gradient methods. The resulting optimization scheme is fully time-parallel and results in a low memory footprint. Experimental comparison to standard methods, such as backpropagation through explicit solvers and the adjoint technique \citep{Chen2018NeuralOD}, on training surrogate models of small and medium-scale dynamical systems shows that it is at least one order of magnitude faster at reaching a comparable value of the loss function. The corresponding testing MSE is one order of magnitude smaller as well, suggesting generalization capabilities increase.

Download
Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, Mark Cannon
International Conference on Learning Representations (ICLR), 2020
Abstract

This paper proposes a differentiable linear quadratic Model Predictive Control (MPC) framework for safe imitation learning. The infinite-horizon cost is enforced using a terminal cost function obtained from the discrete-time algebraic Riccati equation (DARE), so that the learned controller can be proven to be stabilizing in closed-loop. A central contribution is the derivation of the analytical derivative of the solution of the DARE, thereby allowing the use of differentiation-based learning methods. A further contribution is the structure of the MPC optimization problem: an augmented Lagrangian method ensures that the MPC optimization is feasible throughout training whilst enforcing hard constraints on state and input, and a pre-stabilizing controller ensures that the MPC solution and derivatives are accurate at each iteration. The learning capabilities of the framework are demonstrated in a set of numerical studies.

Download
Pierluca D’Oro, Wojciech Jaśkowski
arXiv, April 2020
Abstract

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor’s Jacobian matrix with the gradient of the critic w.r.t. input actions. However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization. In this paper, we propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients, which explicitly learns the action-value gradient. MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning, leading to a critic tailored for policy improvement. On a set of MuJoCo continuous-control tasks, we demonstrate the efficiency of the algorithm with respect to model-free and model-based state-of-the-art baselines.

Download
Giorgio Giannone, Asha Anoosheh, Alessio Quaglino, Pierluca D’Oro, Marco Gallieri, Jonathan Masci
arXiv, April 2020
Abstract

Event-based cameras are novel, efficient sensors inspired by the human vision system, generating an asynchronous, pixel-wise stream of data. Learning from such data is generally performed through heavy preprocessing and event integration into images. This requires buffering of possibly long sequences and can limit the response time of the inference system. In this work, we instead propose to directly use events from a DVS camera, a stream of intensity changes and their spatial coordinates. This sequence is used as the input for a novel \emph{asynchronous} RNN-like architecture, the Input-filtering Neural ODEs (INODE). This is inspired by the dynamical systems and filtering literature. INODE is an extension of Neural ODEs (NODE) that allows for input signals to be continuously fed to the network, like in filtering. The approach naturally handles batches of time series with irregular time-stamps by implementing a batch forward Euler solver. INODE is trained like a standard RNN, it learns to discriminate short event sequences and to perform event-by-event online inference. We demonstrate our approach on a series of classification tasks, comparing against a set of LSTM baselines. We show that, independently of the camera resolution, INODE can outperform the baselines by a large margin on the ASL task and it’s on par with a much larger LSTM for the NCALTECH task. Finally, we show that INODE is accurate even when provided with very few events.

Download
Program Synthesis as Latent Continuous Optimization: Evolutionary Search in Neural Embeddings
Paweł Liskowski, Krzysztof Krawiec, Nihat Engin Toklu, Jerry Swan
The Genetic and Evolutionary Computation Conference (GECCO), 2020
Abstract
Mayank Mittal, Marco Gallieri, Alessio Quaglino, Seyed Sina Mirrazavi Salehian, Jan Koutník
arXiv, Feb 2020
Abstract

This paper presents Neural Lyapunov MPC, an algorithm to alternately train a Lyapunov neural network and a stabilising constrained Model Predictive Controller (MPC), given a neural network model of the system dynamics. This extends recent works on Lyapunov networks to be able to train solely from expert demonstrations of one-step transitions. The learned Lyapunov network is used as the value function for the MPC in order to guarantee stability and extend the stable region. Formal results are presented on the existence of a set of MPC parameters, such as discount factors, that guarantees stability with a horizon as short as one. Robustness margins are also discussed and existing performance bounds on value function MPC are extended to the case of imperfect models. The approach is tested on unstable non-linear continuous control tasks with hard constraints. Results demonstrate that, when a neural network trained on short sequences is used for predictions, a one-step horizon Neural Lyapunov MPC can successfully reproduce the expert behaviour and significantly outperform longer horizon MPCs.

Download
Marco Gallieri, Seyed Sina Mirrazavi Salehian, Nihat Engin Toklu, Alessio Quaglino, Jonathan Masci, Jan Koutník, Faustino Gomez
Neural Information Processing Systems (NeurIPS) workshop on Safety and Robustness in Decision Making, 2019
Abstract

Control applications present hard operational constraints. A violation of these can result in unsafe behavior. This paper introduces Safe Interactive Model Based Learning (SiMBL), a framework to refine an existing controller and a system model while operating on the real environment. SiMBL is composed of the following trainable components: a Lyapunov function, which determines a safe set; a safe control policy; and a Bayesian RNN forward model. A min-max control framework, based on alternate minimisation and backpropagation through the forward model, is used for the offline computation of the controller and the safe set. Safety is formally verified a-posteriori with a probabilistic method that utilizes the Noise Contrastive Priors (NPC) idea to build a Bayesian RNN forward model with an additive state uncertainty estimate which is large outside the training data distribution. Iterative refinement of the model and the safe set is achieved thanks to a novel loss that conditions the uncertainty estimates of the new model to be close to the current one. The learned safe set and model can also be used for safe exploration, i.e., to collect data within the safe invariant set, for which a simple one-step MPC is proposed. The single components are tested on the simulation of an inverted pendulum with limited torque and stability region, showing that iteratively adding more data can improve the model, the controller and the size of the safe region.

Download
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber
NeurIPS Deep Reinforcement Learning Workshop, 2019
Abstract

Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques. Many of its main principles are outlined in a companion report [34]. Here we present the first concrete implementation of UDRL and demonstrate its feasibility on certain episodic learning problems. Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.

Download
Jürgen Schmidhuber
arXiv, December 2019
Abstract

We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input observations as commands, mapping them to actions (or action probabilities) through SL on past (possibly accidental) experience. UDRL generalizes to achieve high rewards or other goals, through input commands such as: get lots of reward within at most so much time! A separate paper [61] on first experiments with UDRL shows that even a pilot version of UDRL can outperform traditional baseline algorithms on certain challenging RL problems. We also introduce a related simple but general approach for teaching a robot to imitate humans. First videotape humans imitating the robot’s current behaviors, then let the robot learn through SL to map the videos (as input commands) to these behaviors, then let it generalize and imitate videos of humans executing previously unknown behavior. This Imitate-Imitator concept may actually explain why biological evolution has resulted in parents who imitate the babbling of their babies.

Download
Saeed Saremi
arXiv, October 2019
Abstract

Consider a feedforward neural network ψ:dd such that ψf, where f:d is a smooth function, therefore ψ must satisfy jψi=iψj pointwise. We prove a theorem that a ψ network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is ϕ:d and ϕf; in addition, a “soft analysis” of ϕ gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as ϕf, and conclude with a critique of denoising autoencoders.

Download
Marek Wydmuch, Michał Kempka, Wojciech Jaśkowski
IEEE Transactions on Games 2019, arXiv September 2018
Abstract

This paper presents the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multiplayer deathmatch in a first-person shooter game Doom. The bots had to make their decisions solely based on visual information, i.e., a raw screen buffer. To play well, the bots needed to understand their surroundings, navigate, explore, and handle the opponents at the same time. These aspects, together with the competitive multiagent aspect of the game, make the competition a unique platform for evaluating the state-of-the-art reinforcement learning algorithms. This paper discusses the rules, solutions, results, and statistics that give insight into the agents’ behaviors. Best performing agents are described in more detail. The results of the competition lead to the conclusion that, although reinforcement learning can produce capable Doom bots, they still are not yet able to successfully compete against humans in this game. This paper also revisits the ViZDoom environment, which is a flexible, easy to use, and efficient three-dimensional platform for research for vision-based reinforcement learning, based on a well-recognized first-person perspective game Doom.

Download
Giorgio Giannone, Saeed Saremi, Jonathan Masci, Christian Osendorfer
NeurIPS Bayesian Deep Learning and PGR Workshops, 2019
Abstract

We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent transformations reflect interpretable properties in the observation space. Furthermore, the model is structured in such a way that in the absence of transformations, we can run inference and obtain generative capabilities comparable with standard variational autoencoders. Finally, utilizing the trained encoder, we outperform the baselines by a wide margin on a challenging out-of-distribution classification task.

Download
Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer
NeurIPS Bayesian Deep Learning Workshop 2019
Abstract

We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation, and favorable runtime with respect to Gaussian Processes (GPs). We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability.

Download
A. Quaglino, M. Gallieri, J. Masci and J. Koutník
arXiv, June 2019
Abstract

This paper proposes the use of spectral element methods for fast and accurate training of Neural Ordinary Differential Equations (ODE-Nets). This is achieved by expressing their dynamics as truncated series of Legendre polynomials. The series coefficients, as well as the network weights, are computed by minimizing the weighted sum of the loss function and the violation of the ODE-Net dynamics. The problem is solved by coordinate descent that alternately minimizes, with respect to the coefficients and the weights, two unconstrained sub-problems using standard backpropagation and gradient methods. The resulting optimization scheme is fully time-parallel and results in a low memory footprint. Experimental comparison to standard methods, such as backpropagation through explicit solvers and the adjoint technique, on training surrogate models of small and medium-scale dynamical systems shows that it is at least one order of magnitude faster at reaching a comparable value of the loss function. The corresponding testing MSE is one order of magnitude smaller as well, suggesting generalization capabilities increase.

Download
T. Willi, J. Masci, J. Schmidhuber and C. Osendorfer
arXiv, June 2019
Abstract

We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs can learn dynamical patterns from sequential data and deal with non-stationarity. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional independence among subsequences of the time series. Our theoretically grounded framework for stochastic processes expands the applicability of NPs while retaining their benefits of flexibility, uncertainty estimation and favourable runtime with respect to Gaussian Processes. We demonstrate that state spaces learned by RNPs benefit predictive performance on real-world time-series data and nonlinear system identification, even in the case of limited data availability.

Download
J. Svoboda, A. Anoosheh, C. Osendorfer and J. Masci
arXiv, June 2019
Abstract

This paper introduces a neural style transfer model to conditionally generate a stylized image using only a set of examples describing the desired style. The proposed solution produces high-quality images even in the zero-shot setting and allows for greater freedom in changing the content geometry. This is thanks to the introduction of a novel Peer-Regularization Layer that recomposes style in latent space by means of a custom graph convolutional layer aiming at separating style and content. Contrary to the vast majority of existing solutions our model does not require any pre-trained network for computing perceptual losses and can be trained fully end-to-end with a new set of cyclic losses that operate directly in latent space. An extensive ablation study confirms the usefulness of the proposed losses and of the Peer-Regularization Layer, with qualitative results that are competitive with respect to the current state-of-the-art even in the challenging zero-shot setting. This opens the door to more abstract and artistic neural image generation scenarios and easier deployment of the model in production.

Download
J. E. Lenssen, C. Osendorfer, and J. Masci
arXiv, April 2019
Abstract

This paper presents an end-to-end differentiable algorithm for anisotropic surface normal estimation on unstructured point-clouds. We utilize graph neural networks to iteratively infer point weights for a plane fitting algorithm applied to local neighborhoods. The approach retains the interpretability and efficiency of traditional sequential plane fitting while benefiting from a data-dependent deep-learning parameterization. This results in a state-of-the-art surface normal estimator that is robust to noise, outliers and point density variation and that preserves sharp features through anisotropic kernels and a local spatial transformer. Contrary to previous deep learning methods, the proposed approach does not require any hand-crafted features while being faster and more parameter efficient.

Download
P. Shyam, W. Jaśkowski, and F. Gomez
International Conference on Machine Learning (ICML), 2019
Abstract

Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.

Download
L. Kidziński et al. (co-authored by all challenge participants)
arXiv, February 2019
Abstract

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

Download
W. Byeon, Q. Wang, R. K. Srivastava, and P. Koumoutsakos
European Conference on Computer Vision (ECCV), 2018
Abstract

Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry pre- dictions. We identify an important contributing factor for imprecise pre- dictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context- aware architecture that captures the entire available past context for each pixel using Parallel Multi-Dimensional LSTM units and aggregates it us- ing blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of-the-art performance for next step prediction on three challenging real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101. Moreover, it does so with fewer parameters than several recently proposed models, and does not rely on deep convolutional networks, multi-scale architectures, sepa- ration of background and foreground modeling, motion flow learning, or adversarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.

Download
J. Svoboda, J. Masci, F. Monti, M.M. Bronstein, and L. Guibas
International Conference on Representation Learning (ICLR), 2018
Abstract

Deep learning systems have become ubiquitous in many aspects of our lives. Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses. Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones. In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. This results in a form of non-local forward propagation in the model, where latent features are conditioned on the global structure induced by the graph, that is up to 3 times more robust to a variety of white- and black-box adversarial attacks compared to conventional architectures with almost no drop in accuracy.

Download
A. Zeyer, K. Irie, R. Schlüter, H. Ney
Interspeech, 2018
Abstract

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model. Index Terms: attention, end-to-end, speech recognition.

Download
F. Lattari, M. Ciccone, M. Matteucci, J. Masci, and F. Visin
2018 DAVIS Challenge on Video Object Segmentation – IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Abstract

We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the 10-th position.

Download
D. Ha and J. Schmidhuber
Neural Information Processing Systems (NeurIPS), 2018
Abstract

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model’s extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of this paper is available at https://worldmodels.github.io

Download
M. Ciccone, M. Gallieri, J. Masci, C. Osendorfer, and F. Gomez
Neural Information Processing Systems (NeurIPS), 2018
Abstract

This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.

Download
W. Jaśkowski, O. R. Lykkebø, N. E. Toklu, F. Trifterer, Z. Buk, J. Koutník and F. Gomez
The NIPS ’17 Competition: Building Intelligent Systems (First Place), 2017
Abstract

This paper describes the approach taken by the NNAISENSE Intelligent Automation team to win the NIPS ’17 “Learning to Run” challenge involving a biomechanically realistic model of the human lower musculoskeletal system.

Download
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M. M. Bronstein
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Abstract

Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean-structured data such as acoustic signals, images, or videos. Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of our framework. We test the proposed method on standard tasks from the realms of image-, graph- and 3D shape analysis and show that it consistently outperforms previous approaches.

Download
J. G. Zilly, R. K. Srivastava, J. Koutník and J. Schmidhuber
International Conference on Machine Learning (ICML), 2017
Abstract

Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with “deep” transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin’s circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose Recurrent Highway Networks, which extend the LSTM architecture to allow step-to-step transition depths larger than one. Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.

Download

Tell us what you do

Lugano
Piazza Molino Nuovo 17
6900, Switzerland
Austin
1224 East 12th St., suite 313 Texas, 78702, USA
trusted by innovators