Learning-Based Object Grasping - Lecture 06 PDF
Document Details
Uploaded by PatientSlideWhistle
Hochschule Bonn-Rhein-Sieg
2023
Dr. Alex Mitrevski
Tags
Summary
This is an overview of learning-based object grasping for robotic arms and includes discussion of a primer on object grasping, grasp synthesis, factors, and different learning paradigms. The presentation is for a winter semester 2023/24 course at Bonn-Rhein-Sieg University of Applied Sciences.
Full Transcript
Learning-Based Object Grasping An Overview Dr. Alex Mitrevski Master of Autonomous Systems Winter semester 2023/24 Structure ▶ Object grasping primer ▶ Learning-based grasping ▶ A closer look at concrete learning-based grasping frameworks...
Learning-Based Object Grasping An Overview Dr. Alex Mitrevski Master of Autonomous Systems Winter semester 2023/24 Structure ▶ Object grasping primer ▶ Learning-based grasping ▶ A closer look at concrete learning-based grasping frameworks Learning-Based Object Grasping: An Overview 2 / 25 Object Grasping Primer Learning-Based Object Grasping: An Overview 3 / 25 What is Object Grasping? ▶ Informally, object grasping is the problem of picking up an object with a robotic hand ▶ The problem can be defined ∗ as∗ that ∗of finding an end-effector pose pG = (tG , RG ) that ensures a stable object grasp (the grasped object will remain in the gripper) ∗ ∗ ▶ Here, tG is the grasp center point and RG is the gripper’s (wrist) orientation ▶ Often, a grasp is additionally parameterised by an approach vector a ▶ More generally, the problem is that of determining positions and applied forces for the individual gripper fingers; let us denote such a grasp candidate by C ∗ Learning-Based Object Grasping: An Overview 4 / 25 Grasp Synthesis ▶ Grasp synthesis is an optimisation process that generates grasp candidates which satisfy a set of desired grasp quality metrics ▶ Formally, a grasp synthesis procedure takes on an object model O and a gripper description Q (consisting of finger joint positions and forces/torques) and generates a grasp candidate C ∗ that optimises the quality metrics ▶ Grasp synthesis procedures are often sampling-based, namely n grasping candidates Ci , 1 ≤ i ≤ n are generated and scored based on the desired quality metrics; the candidate C ∗ with the highest score is then selected for grasping Learning-Based Object Grasping: An Overview 5 / 25 Grasp Properties The finger configuration should The forces and moments satisfy one or more dexterity measures, acting on the grasped object such as avoiding singularities must add up to 0 Dexterity Equilibrium Grasp quality properties Stability Dynamic behaviour The grasped object must The behaviour of the grasped object return to equilibrium once any as a result of applied fingertip forces external disturbances are removed should follow a desired dynamic profile Learning-Based Object Grasping: An Overview 6 / 25 Grasp Quality Metrics ▶ There are a variety of quality metrics that can be used to evaluate grasps ▶ Such metrics are typically used in the evaluation of grasp candidates proposed by grasp synthesis procedures Learning-Based Object Grasping: An Overview 7 / 25 Factors Affecting Grasp Synthesis ▶ In general, there are various factors that influence how grasp synthesis is performed Learning-Based Object Grasping: An Overview 8 / 25 Factors Affecting Grasp Synthesis ▶ In general, there are various factors that influence how grasp synthesis is performed ▶ Knowledge of an object (or lack thereof) is one such factor — if an object is known, that knowledge can be exploited in the synthesis Learning-Based Object Grasping: An Overview 8 / 25 Factors Affecting Grasp Synthesis ▶ In general, there are various factors that influence how grasp synthesis is performed ▶ Knowledge of an object (or lack thereof) is one such factor — if an object is known, that knowledge can be exploited in the synthesis ▶ Synthesis also depends on the input modality used to identify objects and the types of object features that are utilised by the candidate generation procedure Learning-Based Object Grasping: An Overview 8 / 25 Factors Affecting Grasp Synthesis ▶ In general, there are various factors that influence how grasp synthesis is performed ▶ Knowledge of an object (or lack thereof) is one such factor — if an object is known, that knowledge can be exploited in the synthesis ▶ Synthesis also depends on the input modality used to identify objects and the types of object features that are utilised by the candidate generation procedure ▶ The type of robotic hand for which grasps are generated also influences how grasp candidates are generated Learning-Based Object Grasping: An Overview 8 / 25 Factors Affecting Grasp Synthesis ▶ In general, there are various factors that influence how grasp synthesis is performed ▶ Knowledge of an object (or lack thereof) is one such factor — if an object is known, that knowledge can be exploited in the synthesis ▶ Synthesis also depends on the input modality used to identify objects and the types of object features that are utilised by the candidate generation procedure ▶ The type of robotic hand for which grasps are generated also influences how grasp candidates are generated ▶ Task information can also be useful to incorporate — this can constrain the set of valid grasp candidates Learning-Based Object Grasping: An Overview 8 / 25 Challenges With Analytical Grasp Synthesis Reliance on object models Most analytical methods rely on given (geometric and physical) object models; this makes it difficult to use them for unknown objects Learning-Based Object Grasping: An Overview 9 / 25 Challenges With Analytical Grasp Synthesis Reliance on object models Reliance on simulations Most analytical methods rely on given (geometric Grasp quality metrics are often evaluated in and physical) object models; this makes it simulation, but simulated metrics do not difficult to use them for unknown objects necessarily translate well to the real world Learning-Based Object Grasping: An Overview 9 / 25 Challenges With Analytical Grasp Synthesis Reliance on object models Reliance on simulations Most analytical methods rely on given (geometric Grasp quality metrics are often evaluated in and physical) object models; this makes it simulation, but simulated metrics do not difficult to use them for unknown objects necessarily translate well to the real world Slow synthesis The evaluation of grasp quality metrics can be computationally expensive, which contributes to a slow synthesis process Learning-Based Object Grasping: An Overview 9 / 25 Challenges With Analytical Grasp Synthesis Reliance on object models Reliance on simulations Most analytical methods rely on given (geometric Grasp quality metrics are often evaluated in and physical) object models; this makes it simulation, but simulated metrics do not difficult to use them for unknown objects necessarily translate well to the real world Slow synthesis Inability to use prior experiences The evaluation of grasp quality metrics can be Analytical approaches are unable to use prior computationally expensive, which contributes experiences to guide the synthesis process — to a slow synthesis process every grasping instance is treated independently Learning-Based Object Grasping: An Overview 9 / 25 Learning-Based Grasping Learning-Based Object Grasping: An Overview 10 / 25 Learning for Object Grasping ▶ The objective of learning for grasping is to replace — partially or completely — analytical grasping methods ▶ There are various ways in which this can be performed, depending on what the desired learning outcome is Learning outcomes Grasp candidate Grasp synthesis Grasping classifier model policy Learning-Based Object Grasping: An Overview 11 / 25 Grasp Candidate Classification ▶ One way in which learning can be applied in the context of grasping is to train a model that can be used to score grasp candidates based on their quality Learning-Based Object Grasping: An Overview 12 / 25 Grasp Candidate Classification ▶ One way in which learning can be applied in the context of grasping is to train a model that can be used to score grasp candidates based on their quality ▶ In this case, a labelled dataset is needed that contains ground-truth metrics as labels Learning-Based Object Grasping: An Overview 12 / 25 Grasp Candidate Classification ▶ One way in which learning can be applied in the context of grasping is to train a model that can be used to score grasp candidates based on their quality ▶ In this case, a labelled dataset is needed that contains ground-truth metrics as labels ▶ During online application, a set of grasp candidates needs to be generated, all of which can be scored by the learned model Learning-Based Object Grasping: An Overview 12 / 25 Grasp Candidate Classification ▶ One way in which learning can be applied in the context of grasping is to train a model that can be used to score grasp candidates based on their quality ▶ In this case, a labelled dataset is needed that contains ground-truth metrics as labels ▶ During online application, a set of grasp candidates needs to be generated, all of which can be scored by the learned model ▶ This strategy does not fully replace the analytical grasp pipeline — analytical methods can still be used during grasp generation — but performs the grasp evaluation with a learned model ▶ Analytical methods are sometimes used for generating the data labels Learning-Based Object Grasping: An Overview 12 / 25 A Multitude of Learning-Based Grasp Evaluation Methods ▶ Grasp quality evaluation can be performed based on various input modalities and using different evaluation models ▶ An overview of some recent methods is given in the table on the right Learning-Based Object Grasping: An Overview 13 / 25 Learning-Based Grasp Synthesis ▶ Another way to apply learning for grasping is to learn a model that can be used to identify suitable grasping candidates A. Saxena, J. Driemeyer, and A. Y. Ng. “Robotic grasping of novel objects using vision,” International Journal of Robotics Research, vol. 27, no. 2, pp. 157–173, 2008. Learning-Based Object Grasping: An Overview 14 / 25 Learning-Based Grasp Synthesis ▶ Another way to apply learning for grasping is to learn a model that can be used to identify suitable grasping candidates ▶ In this case, learning is typically done by mapping visual (or multimodal) object features to a grasp quality estimate A. Saxena, J. Driemeyer, and A. Y. Ng. “Robotic grasping of novel objects using vision,” International Journal of Robotics Research, vol. 27, no. 2, pp. 157–173, 2008. Learning-Based Object Grasping: An Overview 14 / 25 Learning-Based Grasp Synthesis ▶ Another way to apply learning for grasping is to learn a model that can be used to identify suitable grasping candidates ▶ In this case, learning is typically done by mapping visual (or multimodal) object features to a grasp quality estimate ▶ The online application of such a model involves feature A. Saxena, J. Driemeyer, and A. Y. Ng. “Robotic grasping of novel estimation and then mapping those features through objects using vision,” International Journal of Robotics Research, vol. 27, no. 2, pp. 157–173, 2008. the learned model so that grasp candidates are found Learning-Based Object Grasping: An Overview 14 / 25 Learning-Based Grasp Synthesis ▶ Another way to apply learning for grasping is to learn a model that can be used to identify suitable grasping candidates ▶ In this case, learning is typically done by mapping visual (or multimodal) object features to a grasp quality estimate ▶ The online application of such a model involves feature A. Saxena, J. Driemeyer, and A. Y. Ng. “Robotic grasping of novel estimation and then mapping those features through objects using vision,” International Journal of Robotics Research, vol. 27, no. 2, pp. 157–173, 2008. the learned model so that grasp candidates are found ▶ The use of analytical grasp quality metrics is less common in this case — labels are typically simpler and denote whether a point represents a valid grasp or not Learning-Based Object Grasping: An Overview 14 / 25 Grasp Policies ▶ The previous two techniques generate grasp candidates or evaluate candidates, but grasping can also be performed using a learned policy, without explicit grasp hypotheses Learning-Based Object Grasping: An Overview 15 / 25 Grasp Policies ▶ The previous two techniques generate grasp candidates or evaluate candidates, but grasping can also be performed using a learned policy, without explicit grasp hypotheses ▶ In this case, a visuomotor policy for object grasping is learned Learning-Based Object Grasping: An Overview 15 / 25 Grasp Policies ▶ The previous two techniques generate grasp candidates or evaluate candidates, but grasping can also be performed using a learned policy, without explicit grasp hypotheses ▶ In this case, a visuomotor policy for object grasping is learned ▶ During online application, the policy can be directly applied without extracting explicit grasp candidates Learning-Based Object Grasping: An Overview 15 / 25 Grasp Policies ▶ The previous two techniques generate grasp candidates or evaluate candidates, but grasping can also be performed using a learned policy, without explicit grasp hypotheses ▶ In this case, a visuomotor policy for object grasping is learned ▶ During online application, the policy can be directly applied without extracting explicit grasp candidates ▶ Such a policy can be trained with a sparse reward (reward only on successful grasping) or with a shaped reward (where the shaping could potentially use analytical grasping metrics) Learning-Based Object Grasping: An Overview 15 / 25 Techniques Used in Learning-Based Grasping ▶ The previously mentioned learning problems can be solved using any of the usual learning paradigms ▶ Supervised learning appears to be the predominant learning paradigm in the literature ▶ A variety of supervised learning techniques are particularly used for feature extraction, for instance PointNet++ and DGCNN for extracting features from point cloud data ▶ Existing databases of 3D object models, such as ShapeNet, as well as point cloud datasets, such as Semantic3D, have been applied in this context Learning-Based Object Grasping: An Overview 16 / 25 Learning Data Sources Learning from demonstration Data sources Learning by trial Learning from and error labelled data Learning-Based Object Grasping: An Overview 17 / 25 Learning from Labelled Data ▶ Supervised learning is the most common strategy applied in the context of grasping ▶ A variety of models can be learned here, for instance: ▶ A grasp candidate evaluation model ▶ A database of grasp samples ▶ A grasp sampling model ▶ Since obtaining enough labelled data for learning can be challenging, synthethic data are sometimes used for learning Learning-Based Object Grasping: An Overview 18 / 25 Learning from Demonstrations ▶ Demonstrations represent another potential data source for grasp learning ▶ The objective in this case is to obtain a small set of successful grasps that can be used directly for execution or for further grasp model learning ▶ Demonstrations can be used as an independent data source or can supplement data used for supervised learning Learning-Based Object Grasping: An Overview 19 / 25 Learning by Trial and Error ▶ If neither labelled data nor human demonstrations are available, a robot can collect its own data for learning grasping models ▶ The nature of the collected data would differ depending on whether a grasp synthesis model or a grasp policy is learned ▶ If the collected data includes a mapping between object identities and attempted grasps, efficient object-specific grasping models can be acquired Learning-Based Object Grasping: An Overview 20 / 25 A Closer Look at Concrete Learning-Based Grasping Frameworks Learning-Based Object Grasping: An Overview 21 / 25 Dexterity Network (Dex-Net) J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in Proc. Robotics: Science and Systems (RSS), 2017. ▶ Dex-Nex is a convolutional neural network-based grasp quality evaluation model J. Mahler et al., “Learning ambidextrous robot grasping policies,” Science Robotics, vol. 4, no. 26, p. eaau4984, 2019. Learning-Based Object Grasping: An Overview 22 / 25 Dexterity Network (Dex-Net) J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in Proc. Robotics: Science and Systems (RSS), 2017. ▶ Dex-Nex is a convolutional neural network-based grasp quality evaluation model ▶ The earlier Dex-Net versions are defined for parallel-jaw grippers; the latest version can also be used with suction grippers J. Mahler et al., “Learning ambidextrous robot grasping policies,” Science Robotics, vol. 4, no. 26, p. eaau4984, 2019. Learning-Based Object Grasping: An Overview 22 / 25 Dexterity Network (Dex-Net) J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in Proc. Robotics: Science and Systems (RSS), 2017. ▶ Dex-Nex is a convolutional neural network-based grasp quality evaluation model ▶ The earlier Dex-Net versions are defined for parallel-jaw grippers; the latest version can also be used with suction grippers ▶ Dex-Net’s parallel-jaw grasps are parameterised by: ▶ pixel coordinates (determine a grasp position considering a top-down object view) J. Mahler et al., “Learning ambidextrous robot ▶ gripper depth (grasping height) grasping policies,” Science Robotics, vol. 4, no. 26, p. eaau4984, 2019. ▶ gripper orientation Learning-Based Object Grasping: An Overview 22 / 25 Grasp Quality Convolutional Neural Network (GQ-CNN) ▶ The main element behind Dex-Net is a Grasp Quality Convolutional Neural Network (GQ-CNN) J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in Proc. Robotics: Science and Systems (RSS), 2017. Learning-Based Object Grasping: An Overview 23 / 25 Grasp Quality Convolutional Neural Network (GQ-CNN) ▶ The main element behind Dex-Net is a Grasp Quality Convolutional Neural Network (GQ-CNN) ▶ The network takes as input a grasp candidate represented as an aligned depth image and the gripper depth, and outputs an estimate of the grasp success probability J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in Proc. Robotics: Science and Systems (RSS), 2017. Learning-Based Object Grasping: An Overview 23 / 25 Grasp Quality Convolutional Neural Network (GQ-CNN) ▶ The main element behind Dex-Net is a Grasp Quality Convolutional Neural Network (GQ-CNN) ▶ The network takes as input a grasp candidate represented as an aligned depth image and the gripper depth, and outputs an estimate of the grasp success probability ▶ An important aspect about GQ-CNN is that it is trained using synthetically generated data ▶ The synthetic dataset is created using 3D object models ▶ Grasp candidates for training are generated by sampling J. Mahler et al., “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” in latent variables from a graphical model Proc. Robotics: Science and Systems (RSS), 2017. ▶ For grasp samples in the training data, a success metric (probability of force closure) is evaluated in order to generate a label for the candidate Learning-Based Object Grasping: An Overview 23 / 25 Context-Aware Grasping ▶ The CAGE model is a deep neural network that evaluates grasp candidates, calculating a likelihood that the candidate would lead to a successful grasp W. Liu, A. Daruna and S. Chernova, “CAGE: Context-Aware Grasping Engine,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), 2020, pp. 2550-2556. Learning-Based Object Grasping: An Overview 24 / 25 Context-Aware Grasping ▶ The CAGE model is a deep neural network that evaluates grasp candidates, calculating a likelihood that the candidate would lead to a successful grasp ▶ The model uses information about the task context, which combines multiple elements: ▶ semantic task information (one-hot task and object state encodings) ▶ affordance estimation for a grasp candidate point ▶ point material information W. Liu, A. Daruna and S. Chernova, “CAGE: Context-Aware Grasping Engine,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), 2020, pp. 2550-2556. Learning-Based Object Grasping: An Overview 24 / 25 Context-Aware Grasping ▶ The CAGE model is a deep neural network that evaluates grasp candidates, calculating a likelihood that the candidate would lead to a successful grasp ▶ The model uses information about the task context, which combines multiple elements: ▶ semantic task information (one-hot task and object state encodings) ▶ affordance estimation for a grasp candidate point ▶ point material information ▶ The network is based on a wide-and-deep architecture in which: ▶ the wide component processes the task context W. Liu, A. Daruna and S. Chernova, “CAGE: Context-Aware Grasping Engine,” in Proc. IEEE Int. Conf. Robotics and ▶ the deep component combines the task context, an Automation (ICRA), 2020, pp. 2550-2556. object embedding, and optional additional features ▶ Training is done with a negative log-likelihood loss Learning-Based Object Grasping: An Overview 24 / 25 Next Lecture: Sim-to-Real Transfer Learning-Based Object Grasping: An Overview 25 / 25