Robotic Personal Assistant

Autonomous robot with computer vision, SLAM, and conversational AI capabilities

Project Lead | Robotics Club, IIT Madras | 2020 - 2021
Team Size: 8 members
Hardware: NVIDIA Jetson Nano, Arduino, ROS-compatible robot chassis

Project Overview

Developed an autonomous robotic personal assistant capable of navigating indoor environments, recognizing objects and people, understanding natural language commands, and providing product recommendations. The system integrates computer vision, SLAM, conversational AI, and e-commerce APIs into a cohesive platform.

Core Capabilities

1. Computer Vision Module

Real-time Face & Emotion Detection:

  • Implemented using OpenCV with Haar feature-based cascade classifiers
  • Detects multiple faces simultaneously in video stream
  • Classifies emotions: happy, sad, angry, neutral, surprised
  • Runs at 15-20 FPS on Jetson Nano

Object Detection & Identification:

  • Deployed YOLO (You Only Look Once) for real-time object detection
  • Custom training on household items and common objects
  • Anchor boxes for optimal bounding box prediction
  • Non-maximum suppression (NMS) to filter overlapping detections
  • Intersection over Union (IoU) for accurate segmentation

2. Navigation & Mapping

ORB-SLAM2 with Dynamic Environment Handling:

  • Integrated ORB-SLAM2 for visual odometry and mapping
  • Enhanced with probabilistic deep learning model to filter dynamic objects
  • Maintains stable maps despite moving people and objects
  • Achieves robust localization in challenging indoor scenarios

ROS Integration:

  • Used RViz for real-time visualization of robot state and map
  • Simulated and tested in Gazebo before hardware deployment
  • Implemented path planning algorithms for autonomous navigation
  • Multi-environment mapping: stores and switches between multiple saved maps

3. Conversational AI & Recommendations

Intent-based Chatbot:

  • Natural language understanding for user commands
  • Context-aware conversation management
  • Integration with robot actions (navigation, object finding)
  • Voice input/output for hands-free interaction

Product Recommendation System:

  • Web scraping Amazon for product data (prices, reviews, ratings)
  • Collaborative filtering for personalized recommendations
  • Conversational interface: “Find me a good laptop under $1000”
  • Real-time price comparison and deal alerts

Technical Implementation

Hardware Setup

  • Processing: NVIDIA Jetson Nano (GPU acceleration for vision tasks)
  • Sensors: RGB-D camera, IMU, wheel encoders
  • Actuators: Differential drive motors controlled via Arduino
  • Power: Custom battery management system

Software Stack

  • Vision: OpenCV, PyTorch (YOLO), TensorFlow
  • SLAM: ORB-SLAM2, ROS Navigation Stack
  • Simulation: Gazebo, RViz
  • ML Frameworks: PyTorch, TensorFlow
  • Languages: Python (primary), C++ (performance-critical components)

Key Algorithms

Dynamic Object Filtering:

# Probabilistic model for static vs. dynamic classification
def filter_dynamic_objects(features, motion_vectors):
    # Extract ORB features from current frame
    orb_features = extract_orb_features(frame)

    # Predict dynamic probability using trained model
    dynamic_probs = dl_model.predict(features)

    # Filter features with high dynamic probability
    static_features = features[dynamic_probs < threshold]

    return static_features

Multi-Environment Mapping:

  • Save map on user command
  • Load appropriate map based on GPS/WiFi fingerprinting
  • Seamless switching without re-initialization

Challenges & Solutions

Challenge Solution
Dynamic environments confusing SLAM Integrated DL model to identify and filter dynamic objects
Limited computational resources Optimized models, used TensorRT for inference acceleration
Real-time performance requirements Parallel processing pipelines, efficient data structures
Integration of multiple subsystems Robust ROS architecture with well-defined interfaces

Demonstrations & Results

Performance Metrics:

  • Face detection: 95% accuracy, 18 FPS
  • Object detection: 88% mAP, 12 FPS
  • SLAM localization error: < 5cm in typical indoor environment
  • Navigation success rate: 92% (reaching target without collision)

User Study:

  • Tested with 25 users over 2 weeks
  • 4.2/5 average satisfaction rating
  • Most appreciated features: object finding, natural conversation

Future Enhancements

  • Integration of LLMs for more natural conversations
  • Improved manipulation capabilities (robotic arm)
  • Multi-robot coordination for larger spaces
  • Cloud connectivity for expanded product database
  • Privacy-preserving face recognition

Impact

This project served as:

  • Educational platform for Robotics Club members to learn ROS, computer vision, and system integration
  • Competition entry at inter-college robotics competitions (2nd place at RoboFest 2021)
  • Foundation for subsequent research in autonomous systems