Robots: (i) How do they see, what we see?
Wall-E (No need of introductions to this wonderful robot)

Robots: (i) How do they see, what we see?

Sight gives us a wonderful depiction of what it is (the world), that we live on and provides stimulus to motor controls, mood, depth, relative positioning, the density of objects, etc. Now, when does sight begin? its when the light enters your cornea and when it hits the retina, where the photoreceptors convert the light to an electrical signal for your optical nerve to carry it to the visual cortex (in your brain) to render the image you see. Vision, according to recent studies is said to occur in your brain at least in three different processing systems [1]:

  • First, where the shapes/edges/corners are processed
  • Second, is mainly about colours
  • Third, is about movement, location, and spatial organization

Further studies show that all this information gathered is extremely dependent on the gradients/intensity of light rather than colour. From this, the next step logically would be to percept information according to matching criteria and cluster them accordingly, to understand and comprehend functions between or from them. Now, this is how we see, but how do robots see? What are the essential components that contribute to making such a system relevant for the functioning of robots? To answer this, I am going to write articles as a three-part series: (i) Based on sight, (ii) Based on perception, and (iii) Based on Localization and Pre-Navigation.

Why do robots need sight?

First, let's answer this most fundamental question by tracing back to why we require sight. We need it so it can make our decision-making easier for reacting to a situation or making observations about our environment (which might help in navigating). The same goes for robots. Now, if it's purely to navigate around you don't have to have a vision on the robot, there are other ways you can perceive it, but sight/vision helps in stitching various key factors together, which interdepend on one other. We can narrow down those key factors as, (i) Observation, (ii) Depth estimation, (iii) Obstacle detection, localization, and recognition, and (iv) navigation.

What makes robots see?

The answer is pretty straightforward, Cameras. There are plenty of camera sensors available which provide visual information similar to how we comprehend scenes (Light-Electric signal- Image - Data) around us. There are plenty of resources that compare how cameras and our eyes compare to achieve the same result [2]. But, the camera's insight ends in representing a 2D (RGB/Grayscale) representation of a 3D scene in an environment. The processing needs to happen elsewhere to perceive something useful from it, such as depth. There is no way to associate depth just from the output of a monocular RGB camera. This is where stereo cameras come into the fray.

No alt text provided for this image

Stereo Cameras are much like our eyes, where it relies on epipolar geometry [3], to perceive the 3D correspondences based on two visualizing frames to triangulate depth. The output of this type of sensor is extremely useful for navigation and spatial awareness.


No alt text provided for this image

Along that lines, Lidars[5], Sonars, ToF (time of flight) & IR (InfraRed cameras with projector) are all good choices when depth is of the essence in enabling your robot to see. The later mentioned techniques are pretty common in other mammals like dolphins, some whales, and bats (concept called echolocation [4], where beams of waves are sent and the transmit/receive response is used to estimate obstacle's positions).

No alt text provided for this image

In most cases, one or more sensors are clubbed together to achieve a purpose or for solving a problem, as one sensor is better than the other and accomplishes some of its functionalities inherently rather than achieving it through processing. You would have heard of the advent of Kinect [6] , when it was launched as an accessory with XBOX, but no one predicted that it would be used for something else. It became one of the most used research tools for Computer vision and robot perception. The sensor itself has a RGB camera, IR projector and camera, LED and a microphone array, these types of sensors are called RGB-D sensors with depth perception enabled. There are a few others such as Bumblebee stereo camera, Intel RealSense and Stereolabs Zed camera, which are quite handy to enable sight in your robot.

No alt text provided for this image

Our normal day-to-day robot vacuum cleaners use similar technology to achieve what it is doing more conventionally. However, the perception and navigation side of it is quite compute intensive and I will explain it in this series.



How do you enable sight in robots?

It is all good as you can plug those cameras into your laptop or desktop systems and get vital information about the surrounding it is scanning and observing. However, it is only useful if the solution is portable and the computer which the sensors are getting plugged into is dispatched out there, somewhere remote or away from your contact, on some fabricated parts which can locomote (either walk, crawl or ride on wheels) with the help of inter-connected electronic circuitry.

No alt text provided for this image

These portable computers are made more robust as years pass by. It started bare-metal by using a microcontroller along with various microprocessors, with peripheral enablers like motor controllers, crystal oscillators (or something with clock capabilities), LED/LCD screens, voltage regulators and a power system. Explaining each system and its operation is a separate studying discipline on its own and will really make things more verbose (and Boring ! for this article unless you like it, 😂!). The most important thing here is a microcontroller, which nowadays an entire OS-enabled NUC (Next Unit of Computing) are more preffered (unless you are from the Space industry 😅). This will act as the robot's brain to enable vision processing from the connected sensors and perform additional smarts on its top for perception. The most commonly used NUCs are RaspberryPi, Intel NUCs, Nvidia Jetson etc.

Conclusion

Although, I haven't explained how the Image in itself is computed and gets into the pixel-level processing. I will cover a brief introduction of it in the perception article, which is to follow. We saw how/what enables sight in robots and what are the various ways/methods for enabling them. Next, we will see on the output of these sensors and on how the data is getting processed for the robots to discern its environment around.

Reference:

[1] https://www.brainfacts.org/thinking-sensing-and-behaving/vision/2012/vision-processing-information

[2]https://letstalkscience.ca/educational-resources/stem-in-context/eye-vs-camera

[3] https://en.wikipedia.org/wiki/Epipolar_geometry

[4] https://www.britannica.com/science/echolocation

[5] https://au.mathworks.com/discovery/lidar.html

[6] https://en.wikipedia.org/wiki/Kinect

Gordon Roesler

Space Infrastructure Advocate at Robots in Space LLC

3y

Looking forward to it, Vee

Ramesh Babu S

Professor and Head at Sri Venkateswara College of Engineering, Chennai, India

3y

Best wishes

To view or add a comment, sign in

More articles by Vinayak Ravi

Others also viewed

Explore content categories