HumanBrain
1.0.0
1.0.0
  • What is notes
  • Knowledge Base
    • Machine Learning
      • Gausian Process
    • Math
      • Statistics
        • Importance Sampling
        • Probability And Counting
      • Linear Algebra
        • Dummy
    • Deep Learning
      • Deep Learning
  • Code
    • Code
      • Generative
      • NLP
      • RL
      • Vision
  • Papers
    • papers
  • Notes
    • Cognitive
      • (2016. 4) ML Learn And Think Like Human
    • Optimization
      • (2010. 5) Xavier Initialization
      • (2015. 2) Batch Normalization
      • (2015. 2) He Initialization
    • Reinforcement Learning
      • (2017. 6) Noisy Network Exploration
    • Vision
      • (2013. 12) Network In Network
      • (2014. 12) Fractional Max-pooling
      • (2015. 12) Residual Network
    • Natural Language Processing
      • (2014. 9) Bahdanau Attention
      • (2015. 11) Diversity Conversation
      • (2015. 11) Multi Task Seq2seq
      • (2015. 12) Byte To Span
      • (2015. 12) Vocabulary Strategy
      • (2015. 6) Skip Thought
      • (2015. 6) Teaching Machine Read And Comprehend
      • (2015. 8) Luong Attention
      • (2015. 8) Subword NMT
      • (2016. 10) Bytenet
      • (2016. 10) Diverse Beam Search
      • (2016. 10) Fully Conv NMT
      • (2016. 11) Bidaf
      • (2016. 11) Dual Learning NMT
      • (2016. 11) Generate Wiki
      • (2016. 11) NMT With Reconstruction
      • (2016. 2) Exploring Limits Of Lm
      • (2016. 3) Copynet
      • (2016. 4) NMT Hybrid Word And Char
      • (2016. 5) Adversarial For Semi Supervised Text Classification
      • (2016. 6) Sequence Knowledge Distillation
      • (2016. 6) Squad
      • (2016. 7) Actor Critic For Seq
      • (2016. 7) Attn Over Attn NN RC
      • (2016. 9) PS LSTM
      • (2017. 10) Multi Paragraph RC
      • (2017. 11) Neural Text Generation
      • (2017. 12) Contextualized Word For RC
      • (2017. 3) Self Attn Sentence Embed
      • (2017. 6) Slicenet
      • (2017. 6) Transformer
      • (2017. 7) Text Sum Survey
      • (2018. 1) Mask Gan
      • (2018. 2) Qanet
      • (2018. 5) Minimal Qa
    • Generative
      • (2013. 12) VAE
      • (2014. 6) Gan
      • (2016. 7) Seq Gan
    • Model
      • (2012. 7) Dropout
      • (2013. 6) Dropconnect
      • (2015. 7) Highway Networks
      • (2015. 9) Pointer Network
      • (2016. 10) Fast Weights Attn
      • (2016. 10) Professor Forcing
      • (2016. 3) Stochastic Depth
      • (2016. 7) Layer Normalization
      • (2016. 7) Recurrent Highway
      • (2017. 1) Very Large NN More Layer
      • (2017. 6) Relational Network
Powered by GitBook
On this page
  1. Notes
  2. Reinforcement Learning

(2017. 6) Noisy Network Exploration

PreviousReinforcement LearningNextVision

Last updated 6 years ago

  • Submitted on 2017. 6

  • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

Simple Summary

Introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and ϵ-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games.

  • Optimism in the face of uncertainty is a common exploration heuristic in reinforcement learning.

  • a single change to the weight vector can induce a consistent, and potentially very complex, state-dependent change in policy over multiple time steps.

  • The variance of the perturbation is a parameter that can be considered as

    the energy of the injected noise. These variance parameters are learned using gradients from the reinforcement learning loss function, along side the other parameters of the agent.

  • NoisyNet

    • a randomised value function, where the functional form is a neural

      network. (whose weights and biases are perturbed by a parametric function

      of the noise)

    • these algorithms are quite generic and apply to any type of parametric policies (including neural networks), they are usually not data efficient and require a simulator to allow many policy evaluations.

  1. Independent Gaussian noise:

    • the noise applied to each weight and bias is independent ( for each noisy linear layer, there are pq + q noise variables (for p inputs to the layer and q outputs))

  2. Factorised Gaussian noise:

    • two vectors - the first has length of input, the second has length of the output, then we apply special function to both vections and calculate matrix multiplication of them. The result is then used as a random matrix which added to the weights.

  • Experiments

    • Deep Q-Networks (DQN) and Dueling.

    • Asynchronous Advantage Actor Critic (A3C).

images
images