Post on 07-Aug-2021
Image Processing-Based Object Recognition and Manipulation with a 5-DOF Smart Robotic Arm through a Smartphone Interface Using Human
Intent Sensing
Haiming Ganghg1169@nyu.edu
Background
• Some of manipulator work followed by fixed built-in command• Some of manipulator work followed by people command• Need lots of sensors or marks • Low intelligence• Difficult to participate the daily life of human
Solution
• Adjustable manipulator Adjust itself according to various requirements and environment
• Easy using manipulatorLess sensors and marks
• Smart manipulator• Human partner
Help people finish some tasks according to daily activity data of people
System diagram
Structure of arm robot
Base
Shoulder pan joint
Shoulder lift joint
Elbow joint
Wrist joint
Finger
How it work?1. Mobile phone or tablet receives image data from camera
according to target object2. Mobile phone or tablet deals with image frame by frame3. Mobile phone or tablet sends position information of target
object and robot4. Raspberry pi solves inverse kinematic equation to get joint
angular and sends these data to Arbotix-M microcontroller5. Arbotix-M covert joint angular to electrical signal and operate
robot to pick up target object6. Place target object to a fixed and preset position
Mobile phone or tablet receives image data from camera according to target object• Camera connect with raspberry pi by usb
cable and normally open• Raspberry pi sends image frame to
mobile phone via mjpg-streamer (Wireless Transmission)
• Choose object by clicking button in the interface or mobile phone chooses preset object automatically according to pace of running (Step count algorithm for iphone 5 and motion coprocessor for newer version)
Mobile phone or tablet deals with image frame by frame• Object recognition by haar feature classifier (10 FPS)• Obtain the position information between target object and
manipulator (1 FPS)• The position information of mark can be saved in phone, so the
phone does not need to obtain this data every time when using
Box Beer Toothpaste
Pump Cup
Haar cascadeInitially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, haar features shown in below image are used. They are just like convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels under white rectangle from sum of pixels under black rectangle.
For example, consider the image below. Top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. The haar feature reflects the change of the gray scalar of the image.
Haar Cascade ClassifierPositive image
Negative image
Box 700 3500Beer 550 2750Toothpaste 700 3500Pump 700 3500Cup 450 2250
Distinguish Condition
(a) (b)
(c) (d)
(e)
Box:B>G>RBeer:G>B, G>R, G<40 and B>30Toothpaste boxR>G>B PumpR>G, R>B and R<27 CupR>B>G
Obtain position information1. Features2D + Homography to find mark2. Houghline function to find the lines’ equations and use the equations to
calculate the position of intersection points3. Affine Transformation to get the birds’-eye-view and calculate the pixel
distance between mark and object
Affine transformation
• Perspective Transform• Distance measurement• Fixed height of object
Robot Mark
Bottom of ObjectGlobal Coordinate
y
x
Publish the position information to the robot by the ios devices
Improvement
• Faster image process speed• More accurate recognition algorithm • More functions combined with mobile phonea) Set an alarm to pick up preset objectb) Utilize habits of people to determine different target objects
in different time
Demo
Thank you!&Question?