Visual Pressure Estimation and Control for Soft Robotic Grippers PDF
Document Details
Uploaded by BestSellingViolet895
2022
Patrick Grady, Jeremy A. Collins, Samarth Brahmbhatt, Christopher D. Twigg, Chengcheng Tang, James Hays, Charles C. Kemp
Tags
Summary
This paper details a visual pressure estimation and control method for soft robotic grippers. The technique uses a neural network (VPEC-Net) to infer applied pressure in images, using a single RGB input. The authors demonstrate the effectiveness of this method in precision manipulation tasks, like grasping small objects. The research uses a mobile manipulator for testing.
Full Transcript
Visual Pressure Estimation and Control for Soft Robotic Grippers 1 Patrick Grady1 , Jeremy A. Collins1 , Samarth Brahmbhatt2 , Christopher D. Twigg3 , 2 Chengcheng Tang...
Visual Pressure Estimation and Control for Soft Robotic Grippers 1 Patrick Grady1 , Jeremy A. Collins1 , Samarth Brahmbhatt2 , Christopher D. Twigg3 , 2 Chengcheng Tang3 , James Hays1 , Charles C. Kemp1 Input Image Visually Estimated Pressure Ground Truth Pressure Precision Manipulation 3 arXiv:2204.07268v2 [cs.RO] 9 Aug 2022 Fig. 1. Left: Given a single RGB input image, VPEC-Net estimates the pressure applied by a soft gripper to a flat surface. Middle Left and Middle 4 Right: Pressure images are shown overlaid on the input image. VPEC-Net outputs a pressure image with a pressure estimate for each pixel of the input image. The estimated pressure image closely matches the ground truth pressure image obtained with a planar pressure sensing array. Right: Using visual servoing with estimated pressure images, a robot grasps small objects, including a penny. Abstract— Soft robotic grippers facilitate contact-rich ma- 5 One approach to precisely apply pressure is to explicitly 9 nipulation, including robust grasping of varied objects. Yet the model the mechanics of the gripper. Rigid-body models beneficial compliance of a soft gripper also results in significant enable precise control of rigid grippers, but result in errors deformation that can make precision manipulation challenging. We present visual pressure estimation & control (VPEC), a when applied to soft grippers. For example, sliding a soft method that infers pressure applied by a soft gripper using gripper’s fingertips across a surface can result in large an RGB image from an external camera. We provide results deviations from the gripper’s undeformed geometry. Soft- for visual pressure inference when a pneumatic gripper and a body models can represent the compliant geometry of soft tendon-actuated gripper make contact with a flat surface. We grippers, but are more complex than rigid-body models and also show that VPEC enables precision manipulation via closed- loop control of inferred pressure images. In our evaluation, depend on quantities that can be difficult to measure, such a mobile manipulator (Stretch RE1 from Hello Robot) uses as external forces and internal strain. Embedded sensors in visual servoing to make contact at a desired pressure; follow a soft grippers can make relevant measurements, but increase spatial pressure trajectory; and grasp small low-profile objects, hardware complexity and often alter gripper mechanics. including a microSD card, a penny, and a pill. Overall, our For both rigid and soft grippers, an explicit model of the results show that visual estimates of applied pressure can enable a soft gripper to perform precision manipulation. gripper needs to be related to the environment to model applied pressure, which often involves additional sensors and I. I NTRODUCTION 6 calibration. We present a novel approach that circumvents these mod- 10 High compliance helps soft robotic grippers conform to 7 eling and instrumentation complexities by using an external the environment and apply low forces during contact, but camera to directly estimate the pressure applied by a soft it also results in significant deformation that makes precise gripper to the world. Our approach relies on two key in- motions more difficult to achieve. For precision manipulation sights. First, many manipulation tasks only depend on the tasks, small position errors due to deformation can lead to pressure applied by the gripper to the world, rather than the failure. For example, tasks using fingertips to press a small gripper’s detailed state. For these tasks, directly estimating button, flip a small switch, or pick up a small object depend and controlling pressure applied to the world is sufficient for on pressure being applied with precision. task success. Second, the pressure applied to the world by a 1 Patrick Grady, Jeremy A. Collins, James Hays, and Charles C. Kemp soft gripper can be directly estimated by the gripper’s visible 8 are with the Institute for Robotics and Intelligent Machines at the Georgia deformation. This takes advantage of high compliance, since Institute of Technology (GT). 2 Samarth Brahmbhatt is with Intel Labs. larger deformations are more easily observed by an external 3 Christopher D. Twigg and Chengcheng Tang are with Meta Reality camera. Labs. This work was supported in part by NSF Award # 2024444. Code, data, and models are available at https://github.com/ Our method, visual pressure estimation & control (VPEC), 11 Healthcare-Robotics/VPEC. Charles C. Kemp is an associate pro- uses a convolutional neural network, VPEC-Net, to infer a fessor at GT. He also owns equity in and works part-time for Hello Robot Inc., which sells the Stretch RE1. He receives royalties from GT for sales 2D pressure map overlaid on the input RGB image from of the Stretch RE1. an external camera (see Figure 1). In other words, contact locations and pressure are estimated in the image space 1 work inferred grip force for a microgripper. Cross- 9 with an estimated pressure for each pixel in the image. A modal research has used vision to predict the output of robot- control loop achieves pressure objectives in the image space, mounted tactile sensors , ,. Our approach relies enabling a robot to precisely control pressure applied to the on soft grippers that visibly deform to infer the output of world and thereby grasp a small object observed by the tactile sensors mounted to the world. camera. In addition to gripper deformation, VPEC-Net has Our method uses visual servoing to achieve pressure 10 the potential to use other information, such as cast shadows objectives in the camera’s image space. Prior work has and motion blur. demonstrated visual control of a robot’s arm relative to For this paper, we consider contact with a horizontal plane, 2 flat surfaces based on shadows. Marker-based visual which is a common surface relevant to manipulation. To servoing has achieved precise in-hand manipulation with a construct a training dataset, we hand-operated a tendon- soft gripper. Visual object tracking has enabled precision driven soft gripper and a pneumatic soft gripper to make insertions with a soft gripper. Our system enables a soft contact with a high-resolution planar pressure sensor. We gripper to grasp small, low-profile objects from a flat surface. capture these interactions with four RGB cameras and use Our system can be thought of as using a virtual tactile 11 camera extrinsics to project the pressure sensor data onto the sensor array mounted to the world with which it attempts to RGB images, creating a labeled dataset for training VPEC- achieve pressure objectives. As such, our approach is similar Net. We collected approximately one hour of data for each to tactile servoing with real tactile sensor arrays , , gripper, yielding a dataset of 650K frames. Our contributions , ,. Notably, our system reports inferred pressure include the following: for each pixel of the input RGB image. This enables our VPEC: An algorithm that infers pressure applied by 3 system to directly relate pressure and vision. For example, a soft gripper to a planar surface using a single RGB in our grasping evaluation, the robot uses the RGB image to image. find the centroid of the target object, which determines key Precision Manipulation with VPEC: Evaluations in pressure objectives. which a mobile manipulator with a soft gripper achieves III. V ISUAL P RESSURE E STIMATION 12 pressure objectives via closed-loop control and grasps small objects, including a washer and a coin. In this section, we describe the grippers used and the 13 Release of dataset, trained models, and code: We will capture process to create our dataset. We also describe the release our core methods online to support replication network architecture and training procedure of VPEC-Net. of our work. A. Selected Grippers 14 II. R ELATED W ORK 4 For a vision-based approach to successfully estimate con- 15 Our work builds on prior efforts to visually infer pressure 5 tact and pressure, there must be visual cues to indicate the applied by human hands. We use the same neural network presence of these quantities. As a result, we train and test architecture, but apply it to inference and control of soft VPEC-Net with soft robotic grippers. These grippers are robotic grippers. compliant and deform when in contact with an object. We We evaluate our approach with the task of precision 6 select two different models of grippers that are examples manipulation on a flat surface. Our method controls the from common classes of grippers used by researchers. pressure applied by a soft gripper to the surface to grasp Tendon-Actuated Gripper: The first gripper we consider 16 small objects. Similarly, humans often slide their fingertips is the Stretch Compliant Gripper that comes with the Stretch across flat surfaces when picking up small objects , , RE1 mobile manipulator by Hello Robot Inc. This gripper which has inspired robotic grasping methods , ,. has suction-cup-shaped soft rubber fingertips supported by Work on grasping with soft end effectors has focused on spring steel flexures that bend when the gripper closes. To larger objects than we consider , ,. Grasping smaller close the gripper, an actuator uses a tendon to pull the inner objects tends to be more sensitive to gripper deformation, flexures. In a user study, a similar commercially available such as deformation due to sliding while in contact. grabber tool was found to be adept at manipulating various Prior work has used internal sensors to infer contact and 7 household objects. During the collection of the dataset pressure based on deformation of the gripper’s surface , (Sec III-B), we used a hand-operated version of this gripper. , , , , changes to gripper vibration , The gripper displays several visual cues that may indicate 17 deflection of the gripper’s compliant joints , and changes pressure when grasping. The rubber fingertips visibly deform to the gripper’s motion. We expect that our method can to match the contours of the grasped object or when in use similar information by observing a soft gripper with an contact with a surface, and the steel flexures bend when in external camera. For this paper, estimation only uses a single contact with a surface (Figure 2). image, which can have motion blur but lacks information Pneumatic Gripper: The second gripper we consider is 18 available in sequences of images. a pneumatic gripper sold by SoftGripping GmbH. The Research on image-based force estimation has focused 8 gripper is made of a flexible silicone, and contains hollow on inferring force and torque applied by a rigid tool to a cavities for pressurized air. When inflated, the sides of the deformable object , , , , ,. Early finger expand asymmetrically, resulting in the fingers closing. 1 Tendon-Actuated Gripper Pneumatic Gripper Dataset Collection Robotic Manipulation Fig. 2. Left: The two grippers used to train and test VPEC-Net. In the top images, the grippers are hovered in mid-air, and in the bottom images, the 2 grippers are pressed against the surface. Notice the deflection in the pneumatic gripper and the deformation in the tips of the tendon-actuated gripper. Right: We use hand-operated grippers to make contact with a high-resolution pressure sensing array to collect training data. During robotic manipulation experiments, we command a Hello Robot Stretch RE1 robot to pick up a variety of small objects from the table. Due to its silicone construction, the gripper is soft, and 3 capture of a diverse dataset including a wide range of 11 the entire finger deforms when in contact (Figure 2). Ad- pressure levels, orientations, grasp styles, and speeds. The ditionally, as the pressure in the finger cavity increases, the grippers are mounted on a handle 60 cm in length to allow a exterior of the gripper expands. person to operate the gripper easily. In Section V, we show that a network trained on this data can be used to control the B. Data Capture Setup 4 position of a robot-actuated gripper. We built a custom data capture rig to collect RGB im- 5 We designed a capture protocol to systematically collect 12 ages with synchronized ground truth pressure. The rig uses data from the grippers. We studied actions where the gripper aluminum framing to rigidly support a pressure sensor and makes contact with the surface of the planar pressure sensor. cameras. The parts of the rig visible to cameras are covered Our data is divided into three classes of actions: make with a white vinyl covering to provide a consistent visual contact, where both fingers of the gripper are lowered onto background. the surface, slide, where the gripper is translated along the To record pressure data, we use a Sensel Morph 6 surface, and close gripper, where the gripper is closed while sensor. The Morph is a flat pressure sensor with an active in contact with the surface. We additionally collect no contact area of 23×13 cm and features a grid of 185×105 individual data, where the gripper held just above the surface without force-sensitive resistor (FSR) elements. The sensor produces making contact to provide adversarial training data. high-resolution pressure data at approximately 100 Hz. Data collection is further divided by the amount of force 13 Four Azure Kinect cameras are mounted at different 7 applied, the lighting configuration, the speed of the human locations around the cage to observe the pressure sensor and operator, and the approach angle of the gripper. At least gripper from a variety of viewpoints. The RGB feed from 30 seconds of data is collected for each combination of the cameras is captured in 1080p at 30 Hz. The capture rig parameters, where the operator approaches the sensor and additionally has two lights mounted to provide illumination performs multiple grasps. Between individual grasps, the which can be turned on or off in any combination. Bright operator varied the translation and angle of the gripper. We lighting reduces the effect of motion blur, but during fast record 32 actions each with 3 lighting conditions, resulting motions, some blurring is still visible. in 96 individual sequences for each gripper. Approximately The cameras and pressure sensor are calibrated before 8 1 hour of data is collected for each gripper. each recording session using a specialized fiducial board. The We randomly remove 20% of the sequences to create a 14 board uses ChArUco markers on the top for localization held-out test set. in camera space, while pins on the bottom push into the edges of the pressure sensor, allowing consistent positioning. D. Network Architecture 15 We develop VPEC-Net, a neural network to estimate 16 C. Data Capture Protocol 9 pressure in image space. The network uses a single RGB While our work is targeted toward robotic grippers, we 10 image as input and produces an estimated pressure for each operate the grippers by hand for data collection (Figure 2). input pixel. To generate ground truth pressure data for this Collecting data with a human operator allows for efficient approach, the data measured by the pressure sensor is warped Input Image Estimated Pressure Ground Truth Pressure Input Image Estimated Pressure Ground Truth Pressure 1 Fig. 3. Left: Examples of pressure estimation for the tendon-actuated gripper. Right: Examples of pressure estimation for the pneumatic gripper. Input2 Image Column: The input RGB image used by VPEC-Net to infer pressure. Estimated Pressure Column: The estimated pressure image overlaid on the input image. Ground Truth Pressure Column: The ground truth pressure measurements from a pressure sensing array overlaid on the input image. Method Temporal Acc. Contact IoU Volumetric IoU MAE 4 Tendon-Actuated Gripper 95.9% 73.8% 58.2% 5.3 Pa Pneumatic Gripper 95.1% 63.3% 52.0% 9.7 Pa TABLE I 3 R ESULTS OF V ISUAL P RESSURE E STIMATION into image space using a homography transform. This allows 5 iterations. During training, several types of augmentations are 9 directly overlaying pressure information onto the image. used, including flips, random rotations, translations, scaling, VPEC-Net’s architecture takes inspiration from image- 6 brightness, and contrast changes. to-image translation neural networks used in the semantic segmentation literature. For an input RGB image I, a pres- IV. E VALUATION OF V ISUAL P RESSURE E STIMATION 10 sure image P̂ = f (I) is estimated. The network uses an To evaluate the performance of VPEC-Net, we perform 11 encoder-decoder architecture with skip connections. An SE- evaluations on the held-out test set. We use a variety of eval- ResNeXt50 network , , , with weights from uation metrics similar to to quantify pressure estimation pretraining on ImageNet is used for the encoder, and an accuracy. FPN network is used for the decoder. a) Temporal Accuracy: To evaluate the temporal accu- 12 The task of pressure estimation is framed as a classification 7 racy of pressure estimates, if any pressure pixel is above a problem. The pressure range is divided into 8 discrete bins threshold of 1.0 kPa, the frame is marked as containing con- placed evenly in logarithmic space, including an additional tact. Temporal Accuracy measures the consistency between zero pressure bin. For each pixel in the output pressure the presence of ground truth and estimated contact. image, the network classifies which pressure bin the pixel b) Contact IoU: To determine the spatial and temporal 13 should reside in. The network is trained with a cross-entropy accuracy of pressure estimates, binary contact images are loss. We tested various output representations and found that generated by thresholding pressure at the same value used this outperformed a direct regression of a pressure scalar. for temporal accuracy. The ground truth contact image and Images from the cameras are cropped to extend slightly 8 estimated contact image are compared to calculate intersec- past the edges of the pressure sensor. VPEC-Net is trained for tion over union (IoU). 600k iterations using the Adam optimizer. The learning c) Volumetric IoU: To assess the magnitude of pressure 14 rate is initially set at 1e−3, which drops to 1e−4 after 100k estimates, we extend the Contact IoU to Volumetric IoU. This 7.5 1 V. ROBOTIC C ONTROL OF P RESSURE 12 GT Force Total Force (N) 5.0 Est. Force We evaluated VPEC-Net with robotic manipulation tasks 13 involving pressure objectives and precision grasping. We 2.5 first show that VPEC-Net can be used to modulate the 0.0 pressure applied to a surface and increase the spatial accuracy 0 2 4 6 8 Time (sec) of a compliant robot. We then show how our approach 7.5 can be used to pick up small objects (penny, screw) that GT Force Total Force (N) Est. Force require precision manipulation. For all experiments, we used 5.0 a Stretch RE1 mobile manipulator with its stock gripper. 2.5 A. Making Contact with a Desired Pressure 14 0.0 0 1 2 3 4 5 6 7 8 VPEC-Net can be used to regulate the amount of force a 15 Time (sec) gripper exerts while in contact with a surface. We perform an experiment where the robot is commanded to exert a Fig. 4. Force estimates over time are visualized for make contact and slide 2 sequences of the tendon-actuated gripper in the test set. While VPEC-Net specified amount of normal force by lowering the gripper displays some amount of error in the quantity of force exerted, it accurately to make contact with a pressure sensor. captures the onset and termination of contact. The pressure estimated by VPEC-Net is integrated with 16 respect to area on the surface to acquire a total force estimate (Eqn. 2). The robot uses a simple bang-bang controller to views pressure images as 3D volumes, with the height of the 3 modulate force with the surface by adjusting the height of volume proportional to the quantity of pressure. The metric the gripper using the Stretch RE1’s lift joint. calculates the intersection over union of the two volumes and Z returns a percentage. 17 F̂ = f (I)dA (2) min(Pi,j , P̂i,j ) 4 Pi,j IoUvol = Pi,j (1) Each trial begins with the robot placed 3-5 cm above the 18 max(Pi,j , P̂i,j ) pressure sensor with VPEC-Net running on a single camera at a rate of 12 Hz. Once the pressure estimates indicate that d) MAE: To quantify the error in pressure in physical 5 the target force has been achieved, the ground truth force units, the mean absolute error is calculated across each pixel. is measured with the pressure sensor. We conduct a total of As most of the pixels in the dataset contain zero pressure, 60 trials, with 10 trials being recorded for each force level the MAE is low compared to the peak pressure observed in ranging from 0 to 5N (Figure 5). the dataset. VPEC-Net can accurately estimate force at higher levels. 19 However, it tends to underestimate forces in the range of 1 A. Results 6 to 2N, near the boundary of contact. This may be due to We train one network for each gripper and measure per- 7 formance on the held-out test set. The results are reported in Accuracy of Closed-Loop Force Control 20 Table I. We also provide qualitative examples of the network 5 pressure prediction in Figure 3. Generally, VPEC-Net performs well at estimating pressure 8 from a single image. Our approach can accurately estimate 4 if the gripper is in contact with the surface or not, achieving a temporal accuracy > 95% for both grippers. Force, Actual (N) We observe that the network trained on the tendon- 9 3 actuated gripper outperforms the pneumatic gripper in all metrics. We hypothesize that this is because the shape of the 2 pressure distribution created by the tendon-actuated gripper is often simpler (Figure 3). The tendon-actuated gripper also tends to visibly deform in a localized way, while deformation 1 of the pneumatic gripper is less local and occurs across a wide area. 0 B. Limitations 10 0 1 2 3 4 5 Force, Target (N) While the network successfully reconstructs pressure on a 11 flat surface, our dataset does not include objects with curved Fig. 5. We use a simple controller to achieve a target force, applying 21 feedback from VPEC-Net’s visual pressure estimation. The actual force is surfaces or unseen textures and only includes a limited set measured using the pressure sensor and matches the target value well at of action classes. higher force levels. Robot Trajectory while in Contact 1 8 6 Y (cm) 4 Desired Trajectory Open Loop Visual Servoing 2 0 Start/Goal 0 2 4 6 8 X (cm) Fig. 6. Left: The fingertips of the tendon-actuated gripper deflect 4cm when subjected to 5N of lateral force due to deformation of the gripper. Middle: 2 When commanded to trace a square path while in contact with a flat surface (red), a gripper using open-loop control accumulates significant error (blue). Feedback control using image space pressure estimates reduces tracking error (green) Right: In left to right order, this image shows the objects from our grasping evaluation: AA battery, penny, washer, bottle cap, microSD card, small green pill, tape roll, large pill, screw, cable segment, and binder clip. differences between the manually operated gripper used for 3 the target and gripper contact location in the same image, 10 training and the robotic gripper used for testing. our controller is ‘endpoint closed-loop’ and robust to inaccuracies in J. B. Following a Spatial Pressure Trajectory 4 Due to the inherent compliance of soft grippers, the precise 5 C. Grasping Small Low-Profile Objects 11 pose of the gripper can be difficult to control. This is To demonstrate the real-world value of VPEC, we perform 12 especially true when in contact with a surface, as deformation grasping trials with a range of objects (Figure 6c), including and friction with the surface can cause the gripper to stick very thin objects. Humans typically grasp these small objects and slip. Figure 6a shows that the gripper has significant de- by using their fingers to first make contact with the surface flection in response to an external disturbance. Additionally, near the object, then slide their fingertips to close around the to move the gripper laterally, the Stretch RE1 drives on a object. We take inspiration from this approach and design a carpeted floor with its differential drive mobile base, which robot control algorithm to grasp objects while maintaining can result in movement variations and inaccurate positioning contact with the surface. due to wheel slip and other phenomena. The robot must autonomously approach the object, grasp 13 We show that an open-loop controller accumulates signifi- 6 it, and pick it up without dropping it for 5 seconds (Figure 7). cant error while executing a trajectory in contact (Figure 6b, Trials where any of the robot’s actuators exceed their torque blue). The robot gripper was rotated and lowered to a limits are marked as failures. We remove the pressure sensor constant height such that one fingertip was in contact with from our capture rig (Figure 2) during grasping experiments, the surface. The robot was then commanded to move in an providing the robot with a larger workspace and demon- 8cm square path. The true path of the gripper was measured strating that VPEC-Net can generalize to a new surface. We by calculating the center of pressure detected by the pressure conduct 10 grasping attempts for each object. The object is sensor. reset to a random position and orientation after each trial. To achieve a higher accuracy, we use an image-based 7 Our system used a simple color thresholding algorithm to 14 visual servoing (IBVS) controller that leverages the find the centroid of the object in the image. The robot starts image space pressure estimates from VPEC-Net (Figure 6b, with the gripper positioned above the surface and is lowered green). The error function E(t) uses the position of the until pressure above a threshold is reported by VPEC-Net. maxima in the estimated pressure image, M (t), and a desired The normal force exerted by the gripper is continuously target position in image space T. controlled to maintain a set force (Sec V-A). We then perform visual servoing (Sec V-B) to grasp the 15 Tx − Mx (t) 8 ex E(t) = = (3) object. Our algorithm attempts to navigate the mean position ey Ty − My (t) of the two local pressure maxima produced by the gripper This error is transformed into robot actuator commands q̇ 9 fingertips to the object centroid in image space. Once the with the image Jacobian J and a gain λ: q̇(t) = λJ + E(t), average fingertip position is within a fixed radius of the where J + is the pseudo-inverse. Because we observe both object centroid, the gripper is closed and lifted. RGB Image 1 t = 0s t = 7s t = 19s t = 25s t = 29s Target and Estimated Pressure Initial Pose Make Contact Approach Object Grasp Lift Fig. 7. Left to Right: Grasping a 1mm thick microSD card. t=0s: The centroid of the object shown as a green circle is estimated using the RGB input 2 image. t=7s: The robot makes contact with the uninstrumented tabletop to achieve a desired pressure and estimate its location. Fingertip contact results in two ellipsoidal contact pressure regions. t=19s: The robot moves the estimated fingertip pressure regions to the centroid of the object. t=25s and t=29s: The gripper closes while maintaining a desired pressure on the surface to grasp and then pick up the microSD card. D. Grasping Results 3 also causing the gripper to be in inconsistent contact with 8 the surface. In very rare cases, pressure is underestimated, We find that VPEC allows the robot to accomplish preci- 4 causing the gripper to be driven into the surface such that the sion grasping using images from a single RGB camera. The motor torque limits are exceeded and the trial is stopped. We robot is able to grasp all 11 objects in our set (Table II), would expect additional training data to increase robustness and achieves an average success rate of 93%. The robot is and alleviate these issues. also able to maintain contact with the surface, allowing the visual servoing controller to accurately track the position of VI. C ONCLUSION 9 the robot in image space and modulate normal force to grasp We present VPEC, a method to visually estimate pressure 10 thin objects on a flat surface. from changes in the appearance of a soft gripper. We demon- Failures during grasping experiments can be attributed to 5 strate that a trained model can accurately estimate pressure a few causes. As our dataset was collected without distractor for two designs: a tendon-actuated gripper and a pneumatic objects present, when objects are placed in the camera’s field gripper. These pressure estimates can be used to perform of view, the network occasionally estimates pressure near closed-loop control of a robot to maintain a desired pressure, the object in image space. This extra pressure estimate may accurately trace a trajectory, and successfully manipulate cause the gripper to lift off the surface. We also find that the small objects. Our results suggest that visual estimation of network may occasionally overestimate the gripper pressure, pressure is a promising approach for soft robotic grippers. R EFERENCES 11 Object Dims. L×W×H Grasp Successes/Trials 7 Washer 10×10×1 mm 9/10 P. Grady, C. Tang, S. Brahmbhatt, C. D. Twigg, C. Wan, J. Hays, and 12 Small Green Pill 10×6×6 mm 10/10 C. C. Kemp, “PressureVision: estimating hand pressure from a single Large Pill 21×8×8 mm 9/10 RGB image,” European Conference on Computer Vision (ECCV), MicroSD Card 15×11×1 mm 8/10 2022. M. Kazemi, J.-S. Valois, J. A. Bagnell, and N. Pollard, “Human- Cable Segment 82×4×4 mm 10/10 inspired force compliant grasping primitives,” Autonomous Robots, Penny 19×19×1.5 mm 9/10 vol. 37, no. 2, pp. 209–225, 2014. Bottle Cap 30×30×13 mm 9/10 C. Eppner, R. Deimel, J. Alvarez-Ruiz, M. Maertens, and O. Brock, AA Battery 50×14×14 mm 9/10 “Exploitation of environmental constraints in human and robotic Binder Clip 25×24×19 mm 9/10 grasping,” The International Journal of Robotics Research, vol. 34, Screw 32×9×9 mm 10/10 no. 7, pp. 1021–1038, 2015. Tape Roll 36×36×13 mm 10/10 M. Ciocarlie, F. M. Hicks, R. Holmberg, J. Hawke, M. Schlicht, J. Gee, S. Stanford, and R. Bahadur, “The Velo gripper: A versatile TABLE II 6 single-actuator design for enveloping, parallel and fingertip grasps,” O BJECT D IMENSIONS AND G RASPING R ESULTS The International Journal of Robotics Research, vol. 33, no. 5, pp. 753–767, 2014. 1 2 V. Babin and C. Gosselin, “Picking, grasping, or scooping small B. S. Zapata-Impata, P. Gil, Y. Mezouar, and F. Torres, “Generation objects lying on flat surfaces: A design approach,” The International of tactile data from 3d vision and target robotic grasps,” IEEE Journal of Robotics Research, vol. 37, no. 12, 2018. Transactions on Haptics, vol. 14, no. 1, pp. 57–67, 2020. D. Yoon and Y. Choi, “Analysis of fingertip force vector for pinch- K. Patel, S. Iba, and N. Jamali, “Deep tactile experience: Estimating lifting gripper with robust adaptation to environments,” IEEE Trans- tactile sensor output from depth sensor data,” in 2020 IEEE/RSJ actions on Robotics, vol. 37, no. 4, pp. 1127–1143, 2021. International Conference on Intelligent Robots and Systems (IROS). C. Eppner and O. Brock, “Visual detection of opportunities to exploit IEEE, 2020, pp. 9846–9853. contact in grasping using contextual multi-armed bandits,” in 2017 S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo IEEE/RSJ International Conference on Intelligent Robots and Systems control,” IEEE Transactions on Robotics and Automation, vol. 12, (IROS). IEEE, 2017, pp. 273–278. no. 5, pp. 651–670, 1996. M. Pozzi, S. Marullo, G. Salvietti, J. Bimbo, M. Malvezzi, and P. M. Fitzpatrick and E. R. Torres-Jara, “The power of the dark side: D. Prattichizzo, “Hand closure model for planning top grasps with using cast shadows for visually-guided touching,” in 4th IEEE/RAS soft robotic hands,” The International Journal of Robotics Research, International Conference on Humanoid Robots, 2004., vol. 1. IEEE, vol. 39, no. 14, pp. 1706–1723, 2020. 2004, pp. 437–449. A. Gupta, C. Eppner, S. Levine, and P. Abbeel, “Learning dexterous B. Calli and A. M. Dollar, “Robust precision manipulation with simple manipulation for a soft robotic hand from human demonstrations,” in process models using visual servoing techniques with disturbance re- 2016 IEEE/RSJ International Conference on Intelligent Robots and jection,” IEEE Transactions on Automation Science and Engineering, Systems (IROS). IEEE, 2016, pp. 3786–3793. vol. 16, no. 1, pp. 406–419, 2018. S. Begej, “Planar and finger-shaped optical tactile sensors for robotic A. S. Morgan, B. Wen, J. Liang, A. Boularias, A. M. Dollar, and applications,” IEEE Journal on Robotics and Automation, vol. 4, no. 5, K. Bekris, “Vision-driven compliant manipulation for reliable, high- pp. 472–484, 1988. precision assembly tasks,” Robotics: Science and Systems, (RSS), R. Li, R. Platt, W. Yuan, A. ten Pas, N. Roscup, M. A. Srinivasan, 2021. and E. Adelson, “Localization and manipulation of small parts using P. Sikka, H. Zhang, and S. Sutphen, “Tactile servo: Control of touch- gelsight tactile sensing,” in 2014 IEEE/RSJ International Conference driven robot motion,” in Experimental Robotics III. Springer, 1994, on Intelligent Robots and Systems (IROS). IEEE, 2014, pp. 3988– pp. 219–233. 3993. N. Chen, H. Zhang, and R. Rink, “Edge tracking using tactile servo,” A. Yamaguchi and C. G. Atkeson, “Combining finger vision and in Proceedings 1995 IEEE/RSJ International Conference on Intelli- optical tactile sensing: Reducing and handling errors while cutting gent Robots and Systems. Human Robot Interaction and Cooperative vegetables,” in 2016 IEEE-RAS 16th International Conference on Robots, vol. 2. IEEE, 1995, pp. 84–89. Humanoid Robots. IEEE, 2016, pp. 1045–1051. Q. Li, C. Schürmann, R. Haschke, and H. J. Ritter, “A control N. Kuppuswamy, A. Alspach, A. Uttamchandani, S. Creasey, T. Ikeda, framework for tactile servoing.” in Robotics: Science and Systems, and R. Tedrake, “Soft-bubble grippers for robust and perceptive ma- (RSS). Citeseer, 2013. nipulation,” in 2020 IEEE/RSJ International Conference on Intelligent C.-T. Wen, S. Arai, J. Kinugawa, and K. Kosuge, “Tactile servoing Robots and Systems (IROS). IEEE, 2020, pp. 9917–9924. based pressure distribution control of a manipulator using a convo- N. F. Lepora, “Soft biomimetic optical tactile sensing with the TacTip: lutional neural network,” IEEE Access, vol. 9, pp. 117 132–117 139, A review,” IEEE Sensors Journal, 2021. 2021. W. Kuang, M. Yip, and J. Zhang, “Vibration-based multi-axis force N. F. Lepora and J. Lloyd, “Pose-based tactile servoing: Controlled soft sensing: Design, characterization, and modeling,” IEEE Robotics and touch using deep learning,” IEEE Robotics & Automation Magazine, Automation Letters, vol. 5, no. 2, pp. 3082–3089, 2020. vol. 28, no. 4, pp. 43–55, 2021. G. S. Koonjul, G. J. Zeglin, and N. S. Pollard, “Measuring contact C. C. Kemp, A. Edsinger, H. M. Clever, and B. Matulevich, “The points from displacements with a compliant, articulated robot hand,” design of Stretch: A compact, lightweight mobile manipulator for in 2011 IEEE International Conference on Robotics and Automation indoor human environments,” in IEEE International Conference on (ICRA). IEEE, 2011, pp. 489–495. Robotics and Automation (ICRA), 2022. S. Wang, A. Bhatia, M. T. Mason, and A. M. Johnson, “Contact lo- SoftGripping by Wegard GmbH. (2022) SoftGripping, the modular calization using velocity constraints,” in 2020 IEEE/RSJ International design system for flexible gripping. [Online]. Available: https: Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, //soft-gripping.com/ pp. 7351–7358. Sensel, “Sensel Morph haptic sensing tablet,” https://morph.sensel. A. A. Nazari, F. Janabi-Sharifi, and K. Zareinia, “Image-based force com/, Last accessed on 2022-02-22. estimation in medical applications: A review,” IEEE Sensors Journal, S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. vol. 21, no. 7, pp. 8805–8830, 2021. Marı́n-Jiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, C. W. Kennedy and J. P. Desai, “A vision-based approach for estimat- pp. 2280–2292, 2014. ing contact forces: Applications to robot-assisted surgery,” Applied K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Bionics and Biomechanics, vol. 2, no. 1, pp. 53–60, 2005. image recognition,” in 2016 IEEE Conference on Computer Vision E. Noohi, S. Parastegari, and M. Žefran, “Using monocular images and Pattern Recognition, (CVPR), 2016, pp. 770–778. to estimate interaction forces during minimally invasive surgery,” in J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and 2018 IEEE Conference on Computer Vision and Pattern Recognition, Systems. IEEE, 2014, pp. 4297–4302. (CVPR), 2018. D. Kim, H. Cho, H. Shin, S.-C. Lim, and W. Hwang, “An efficient S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated three-dimensional convolutional neural network for inferring physical residual transformations for deep neural networks,” in 2017 IEEE interaction force from video,” Sensors, vol. 19, no. 16, p. 3579, 2019. Conference on Computer Vision and Pattern Recognition, (CVPR), A. Marban, V. Srinivasan, W. Samek, J. Fernández, and A. Casals, 2017. “A recurrent convolutional neural network approach for sensorless P. Yakubovskiy, “Segmentation models pytorch,” https://github.com/ force estimation in robotic surgery,” Biomedical Signal Processing and qubvel/segmentation models.pytorch, 2020. Control, vol. 50, pp. 134–150, 2019. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, Z. Chua, A. M. Jarc, and A. M. Okamura, “Toward force estimation in “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE robot-assisted surgery using deep learning with vision and robot state,” Conference on Computer Vision and Pattern Recognition, (CVPR). in 2021 IEEE International Conference on Robotics and Automation IEEE, 2009, pp. 248–255. (ICRA). IEEE, 2021, pp. 12 335–12 341. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, M. A. Greminger and B. J. Nelson, “Modeling elastic objects with “Feature pyramid networks for object detection,” in IEEE Conference neural networks for vision-based force measurement,” in IEEE/RSJ on Computer Vision and Pattern Recognition, (CVPR), 2017, pp. International Conference on Intelligent Robots and Systems (IROS), 2117–2125. vol. 2. IEEE, 2003, pp. 1278–1283. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- Y. Li, J.-Y. Zhu, R. Tedrake, and A. Torralba, “Connecting touch and tion,” in 3rd International Conference on Learning Representations, vision via cross-modal prediction,” in IEEE Conference on Computer (ICLR) 2015, Y. Bengio and Y. LeCun, Eds., 2015. Vision and Pattern Recognition (CVPR), 2019, pp. 10 609–10 618.