Scene Understanding for Multi-Purpose AMRs


Robots have been developing rapidly and are now able to handle complex tasks autonomously. Going forward robots are expected to perform new tasks alongside humans and changing landscape, where there is a need for these tasks to be performed intelligently under unpredictable circumstances.  To address this technology gap, Digital Services Lab (DSL) introduced Scene Understanding - a collection of artificial intelligence (AI) models that can be integrated into Autonomous Mobile Robots (AMRs).   

Scene Understanding makes use of AI algorithms to provide visual feedback to AMRs. The visual capabilities offered by Scene Understanding opens new possibilities for more insightful decisions from AMRs. For example,

  1. Enabling AMRs to avoid obstacles such as human crowds and hazardous zones.
  2. Allowing AMRs to be deployed for UV-light disinfection applications that requires people detection as a safety measure.

The current version of Scene Understanding consists of the following technologies:

  • Human Detection

    DSL’s liveness detection model embedded in the Human Detection module allows AMRs to quickly differentiate in-person humans from poster figures. This feature is useful for AMRs to operate safely by ensuring no humans are in the vicinity before starting UV-light disinfection. 

  • Face Mask Detection

    DSL’s face mask detection is able to detect different varieties of mask-types and view angles.  Furthermore, the model is pre-trained with images taken in low-quality lighting condition with varying resolutions.  This enhances the model to yield better accuracy in dealing with real-world use cases.  

  • Human Distance Estimation     

    To estimate distance between humans, PoseNet is used to detect body keypoints - similar to algorithms used in motion-sensing video game controllers.  Additionally, torso length is used to estimate the distance between people.  This model works well even when detecting people who are sitting.    

  • Lift (Elevator) Entering Procedures

    This module uses video feeds to detect humans and pets.  Due to safety concerns, lifts are only dedicated to AMRs after being left idle for N seconds. There are two different procedures for AMRs to enter lifts:   
    • Outside the lift - AMRs ensure no humans or pets are inside the lift before boarding
    • Inside the lift - AMRs ensure no humans or pets are inside the lift before closing lift doors


We invite Technology Partners and Developers in the robotics sector to augment and enhance their current technological offerings with our Scene Understanding AI models. Interested parties are welcome to experiment with implementing Scene Understanding into their own AMRs.  The benefits are as follows

  • Complimentary royalty-free source code with approved Technology Licensing Agreement (TLA)
  • Easy and straightforward deployment with Docker
  • Support for ARM64 on NVIDIA edge GPUs
  • Rigorous testing for all modules
  • Stringent release cycle to ensure memory consistency and low latency

Scene Understanding enhanced AMRs may be used in the following applications:

  • COVID-19 Safety Precautions
    • Enforce wearing of face mask through Face Mask Detection
    • Ensure social distancing through Human Distance Projection
    • Operate UV light disinfection equipment through Human Detection

  • Infrastructure
    • Provide in-building delivery service through Lift Entering Procedure, which enable AMRs to use elevators within high-rise buildings, hotels or malls, to deliver parcels, luggage or shopping bags


  • Human Detection

    This demo uses machine learning to detect the number of humans within a camera frame.  Successfully detected humans will be indicated in blue.

    Click here to access the demo site.
  • Face Mask Detection

    This demo uses machine learning to detect human faces wearing masks within a camera frame.  If detected face is wearing a face mask, it will be indicated in green.  A human face detected without a face mask will be indicated in red.

    Click here to access the demo site.
  • Human Distance Estimation

    This demo calculates the distance among detected humans.
    • The parameters currently preset in this demo page uses Fx=813, Fy=813, centerX=302.
    • Results may vary, if different camera specifications are used.
    • Camera calibrations may be set to yield better accuracy in estimation.

    Click here to access the demo site

Scene Understanding Components

Scene Understanding is a suite of technologies licensed under IMDA’s Technology licensing terms. Technology partners are welcome to contact Digital Services Lab for the relevant licences to access the source codes. 

  • Human Detection
  • Face Mask Detection
  • Human Distance Estimation

Click here (166.31KB) for more information.


1. What are the terms of use for Scene Understanding?

If you are keen to obtain the source codes for Scene Understanding, please reach us at the email address provided below. The transfer of the source codes will require a Technology Licensing Agreement with IMDA.  For more information on terms & conditions, kindly refer to the form here (199.69KB).

2. Is there a fee to pay in order to use Scene Understanding?

No. The source code will be provided free with approved TLA.

3. What kind of technology license is offered by Digital Services Lab?

The technology license is perpetual, non-exclusive and sub-licensable.

4. What are the obligations of the licensee?

By signing the technology licensing agreement, the licensee commits to the use, the commercialisation, and the propagation of the technologies.

5. How is the performance of the Scene Understanding modules?

All modules undergo rigorous testing and are benchmarked against renowned models. You many find more information on performance details here.

6. How to deploy the Scene Understanding modules?

All the modules are containerised using Docker. The deployment is easy and straightforward and can be executed with a few commands. There is also a support for ARM64 deployment, specifically supporting the edge deployment using Nvidia GPUs.



For further enquiries on Scene Understanding for Multi-Purpose AMRs, please contact

Last updated on: 12 Jan 2021