REINFORCEMENT LEARNING IN ROBOTICS

Author (Year)

  1. INTRODUCTION

Recently, endowing the robots with the ability to solve unknown tasks by autonomously interacting with the environment other than entirely relying on hand-coded programs has been a significant area of research.  One of the most feasible technical routes used in endowing robots with such abilities is reinforcement learning (RL). RL refers to a process of learning through trial and error, by exploring the environment and the robot’s body.

The primary motivation behind the use of RL in endowing robots with new skills is that it offers them the ability to learn tasks, which is hard to physically demonstrate or directly program, such as jumping, lifting heavy weights, moving very fast, etc. RL also enables goal optimizations, especially in severe problems with no analytical formulation or no known closed-form solution. This ability is achieved through manipulations of cost variables, such as minimizing the amount of energy used in performing a given task or finding the fastest gait. Through RL, learning agents can adapt a skill to a new, previously unseen version of a task, such as learning how to walk from flat ground to a slope, learning how to generalize a task to new previously unseen parameter values, etc.  Also, RL is capable of handling control problems that would have been difficult to deal with using conventional controlling methods, since the control goal can be specified indirectly as a term in a reward function and not explicitly as the result of control action. 

Currently, RL has shown very strong human-level control. For instance, AlphaGo defeats one of the most excellent chess players in 2016. Also, one of the most distinctive features of RL is tabula rasa learning, which involves learning without knowing anything about the task. Tabula rasa feature prevents RL from becoming widespread in practice as it is time-consuming and an expensive venture. Also, providing learning agents with some expert knowledge about the task is a new area of study which seems to be conflicting with RL’s tabula rasa learning. Expert knowledge can be presented in different forms, which include symmetry in state action space of task, and shaping rewards. In case such expert knowledge is acquired in a different task, then its reuse process can be referred to as transfer learning.

While all these aspects are considered as enablers towards an autonomous robotic system, several challenges make RL in the real-world more difficult than RL in research, which forms the basis of the proposed paper.  One of the main challenges facing RL is Sim-to-Real, in cases where policies are to be trained in a simulated environment before being deployed

  1. PROBLEM STATEMENT

Over the last few years, researchers have demonstrated the capabilities of their RL policies to operate in a physical robot platform. In most cases, this is limited to small tasks that don’t necessitate an extensive manual adjustment to achieve the set goals.  However, it is challenging, especially for robots that use visual sensing, to realistically simulate the images that the robot is likely to encounter, and therefore creating a sim-to real gap. Several researchers have managed to reduce this gap using visual domain adaptation [1, 4]. However, such methods require a large number of unlabeled images from the real world, which may be very costly to correct.

Another method that has been used to bridge the sim-to-real gap is the domain randomization [2]. This method is used to train robots on different sensory inputs, with the aim of forcing the input processing layers of the network to extract semantically relevant features in an agnostic way to the superficial properties of the image such as, the texture of the image, or different ways in which the shadows of the image are cast from a constant light source.  In essence, this method results in the extraction of the same information from real-world images, which present another variation in the input. Application of this method on the input of a learning algorithm can only be made on related work, and therefore making the task potentially harder than necessary, since the algorithm has to model both the arbitrary changes in the visual domain, while at the same time trying to decipher the dynamics of the task.  Also, research into domain randomization by [3, 5] indicated that the use of some RL algorithms such as DDPG and A3C could be destabilized during data transfer.

In [1], the authors delve into a multi-task learning architecture consisting of a single shared network with a global feature pool and a soft-attention module to enabled leaning of task specific feature level attention. The proposed paper seeks to use learning vision-based robotic closed-loop grasping, where a robotic arm is mandated with the task of picking different unseen objects with the help of simulation and the use of s little real-world data as possible. Despite being an important area of application in robotics, gasping is an exceptionally challenging problem where the gasping system is expected to pick up previously unseen objects successfully. As a result, this cannot be achieved by merely memorizing grasps that work well for a given instance, but through generalization and extrapolation of innate understanding of geometry and physics.  The solution is made even more difficult by the fact that the system must also handle domain shifts in the distribution of objects themselves.

To solve these challenges in learning vision-based robotic closed-loop grasping, the proposed paper suggests the use of a randomized-to canonical adaptation network (RCAN) to cross the reality gap that translates the real-world images into their equivalent simulated version without the use of real-world data. This will be achieved by taking advantage of the domain randomization method in a unique way by adapting from a heavily randomized scene to an equivalent non-randomized, canonical version. The use of this method will enable the training of a robotic grasping algorithm in a pre-defined authoritative version of the simulator, before using the RCAN model to convert the real-world images to the canonical domain that was used to train the grasping algorithm. To achieve an almost double the performance, the RCAN model will be used together with a grasping algorithm that uses QT-Opt. QT-Opt is an RL algorithm that uses randomization to increase its performance [6].

  1. RELATED WORK

Robotic grasping is a well-explored area of study that has been used in analytically solving problems in cases where 3D meshes of objects would be used to compute the stability of a grasp against external wrenches [7, 8, 9]. Grasping algorithms assumes that similar objects will be captured during testing, such that point clouds of the test objects will be matched. In their work, [7] developed a method of predicting a grasp quality score overall grasps function. Through this, the authors were able to generate synthetic training data using physics simulation and depth image simulation and used CNN to map a depth image on.

Simulation of real data transfer involves the learning of skills in simulations before transferring the same to the real world, and therefore reducing the need for expensive real data collection. However, the process of transferring such skills is not as simple as it sounds due to variation in terms of visual and dynamics between the simulated world, where an object is seen for the first time through vision and real-world [6]. In [10], Saxena et al. looked into enabling the transfer of the skills from the simulated world to robotic manipulation using rendered objects to learn a vision-based grasping model. In another study, Rusu et al. delved into progressive neural networks aimed to help to adapt an existing deep RL policy trained from pixel in simulation to the real world for a reaching task [11].  

Another standard tool that has been used in computer vision for years is the data augmentation. In the bid to overcome the challenge of over-fitting, Krizhevsky et al. used cropping, flipping, and photometric variation of input images in training AlexNet [12].  Others have also explored the use of randomized simulated environments, specifically for simulating real-world transfer for grasping, among other similar manipulation tasks that involve the use of images with random textures, lighting, different camera position. This helps in making sure that the resulting algorithm is invariant to the differences in the domain and can be used in the real world.

Visual domain adaptation has also been used to allow a machine learning model trained with a sample from a source domain to generalize to a target domain, using existing but unlabeled target data. The transfer of skills from a simulation to the real world and actual performance of a given task autonomously can be categorized into a feature –level adaptation and pixel-level adaptation. Feature level adaptation involves learning of domain-invariant features between source and target domain, while pixel-level adaptation focuses on re-stylizing images from the source domain, and therefore making them look like images from the target domain. Unlike the image-to-image translation method, pixel-level domain adaptation entails more manageable tasks of learning, such as re-stylization from matching pairs of examples from both domains.

In their paper, Johns et al. uses deep Q-learning and 3D simulations to train a 7-DOF robotic arm in a robot control task in an end-to-end manner, without any prior knowledge [6]. This was done by rendering images of the virtual environment and passing them through a network to produce motor action. The authors used the intermediate rewards to explore different states of the 3D environment [6]. The intermediate rewards helped to guide the policy to higher-reward state. The paper provides a strong foundation that can be used in the proposed paper to provide end-to –end and highly scalable robot through RL.

  1. PROPOSED METHODOLOGY

The proposed method for addressing the sim-to real gap is the Randomized-to-Canonical Adaptation Networks (RCAN). The method involves the use of an image-conditioned generative adversary’s network to transform images from the randomized simulated environment into real-world images. Once trained, the image-conditioned generative adversary’s network will be in a position to transform the real-world image into images similar to those obtained from canonical simulation environment, which will help in training a RL algorithm (QT-Opt) entirely in simulation, and use such a generator to ensure that the trained policy can work in a real-world situation. In essence, the proposed method assumes three main methods. These methods include the randomized simulation domain, the canonical simulation domain, and the real-world domain. The expected final results will be achieved in several steps. These steps include RCAN data generation to provide a means of learning translation, an RCAN training method for transforming randomized sim images into canonical sim images with matching semantics, and Real World Grasping with QT-Opt for grasping the algorithm. The randomization process will involve the use of randomly selected texture from a set of images, including tray, graspable objects, arm segments, and floor. In addition, the proposed method will also include randomizing different positions, directions, colors, and lightings.  Randomizing the position and size of the arm and tray will help in increasing the diversity of scene configurations beyond those provided in normal robot operation during QT-Opt training. When creating the snapshot, the same transformation to both the canonical and randomized scene will be applied to ensure that the semantics to ensure that the canonical and randomized scene still match.  Although there are different methods that can be used to define a canonical environment, the proposed paper will use uniform colors to the background, tray and arm, while ensuring that the texture of the randomized object is in place. This will help in preserving the identity of the object, and therefore opening up possibilities for instance-specific grasping in future works. Each of the arm’s link will be colored differently and independently to ensure tracking. The lighting source will also be fixed in a canonical version, and therefore the network will be required to learn some geometrical aspects in order to re-render any shadows in the correct shape and direction. The expected result from this study is likely to show that RCAN is a superior method that can be used in learning a grasping algorithm directly using domain randomization.

  1. REFERENCES

[1] Liu, S., Johns, E. and Davison, A.J., 2019. End-to-end multi-task learning with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1871-1880).

 [2] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., 2017, September. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 23-30). IEEE.

[3]J. Zhang, L. Tai, Y. Xiong, M. Liu, J. Boedecker, and W. Burgard. Vr goggles for robots: Real-to-sim domain adaptation for visual control. IEEE Robotics and Automation Letters, 2019.

[4] Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J. and Quillen, D., 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research37(4-5), pp.421-436.

[5] Matas, J., James, S. and Davison, A.J., 2018. Sim-to-real reinforcement learning for deformable object manipulation. arXiv preprint arXiv:1806.07851.

[6] James, S. and Johns, E., 2016. 3d simulation for robot arm control with deep q-learning. arXiv preprint arXiv:1609.03759.

[7] Johns, E., Leutenegger, S. and Davison, A.J., 2016, October. Deep learning a grasp function for grasping under gripper pose uncertainty. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4461-4468). IEEE.

[8] Prattichizzo, D., Trinkle, J.C., Siciliano, B. and Khatib, O., 2008. Springer Handbook of Robotics.

[9] Rodriguez, A., Mason, M.T. and Ferry, S., 2012. From caging to grasping. The International Journal of Robotics Research31(7), pp.886-900.

[10] Saxena, A., Driemeyer, J. and Ng, A.Y., 2008. Robotic grasping of novel objects using vision. The International Journal of Robotics Research27(2), pp.157-173.

[11] Rusu, A.A., Vecerik, M., Rothörl, T., Heess, N., Pascanu, R. and Hadsell, R., 2016. Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286.

[12] Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Place your order
(550 words)

Approximate price: $22

Homework help cost calculator

600 words
We'll send you the complete homework by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 customer support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • 4 hour deadline
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 300 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
× How can I help you?