Robotics Notes

1. Core Concepts

This section introduces the fundamental definition of a robot and its core characteristics. Understanding these foundational ideas is the first step to analyzing any robotic system, from a simple automated arm to a complex autonomous vehicle.

What is a Robot?

A robot is a programmable machine designed to perform a series of actions automatically or semi-autonomously. It interacts with the physical world through a continuous cycle of Sensing, Thinking (Computation), and Acting.

Types of Robots

Robots come in many forms, depending on their application and environment. Each type is optimized for specific tasks.

2. Hardware: Sensors & Actuators

A robot's hardware defines its ability to perceive and interact with the world. This section covers the "senses" (sensors) and "muscles" (actuators) of a robot. Selecting the right hardware is a critical design trade-off based on the task and environment.

Sensors: The Robot's Senses

Actuators: The Robot's Muscles

Actuators are the components that convert control signals into physical motion, allowing a robot to move, lift, rotate, grasp, or otherwise interact with its surroundings. The choice of actuator depends on the required precision, power, speed, and environmental conditions of the task.

Electric Actuators

Electric actuators are the most common type in robotics due to their precision, cleanliness, and ease of control. They convert electrical energy into mechanical motion.

DC Motors

Working Principle: DC (Direct Current) motors convert electrical energy into mechanical energy through the interaction of magnetic fields. When current flows through a coil (armature) placed in a magnetic field, it experiences a force that causes it to rotate continuously.

Control: Speed and direction are typically controlled by varying the input voltage or using Pulse Width Modulation (PWM).

Pros: Simple to control, high speed, continuous rotation, relatively inexpensive.
Cons: Less precise position control without external feedback (e.g., encoders), can be noisy, brushes wear out in brushed DC motors.
Applications: Driving wheels in mobile robots, fans, pumps, continuous rotation applications where precise angular positioning is not critical.

Servo Motors

Working Principle: A servo motor is a closed-loop system consisting of a DC motor, a gear reduction unit, a position sensor (potentiometer or encoder), and a control circuit. It receives a control signal (PWM) and rotates to a specific angular position, maintaining that position even under varying loads.

Control: Controlled by the width of a PWM pulse, which dictates the desired angle. The internal feedback loop continuously adjusts the motor to reach and hold the commanded position.

Pros: Precise angular position control, high torque at low speeds, compact.
Cons: Limited range of rotation (typically 0-180° or 0-360° for continuous rotation servos), can "hunt" for position, more complex than simple DC motors.
Applications: Robotic arms (joint actuators), pan-tilt camera systems, grippers, steering mechanisms.

Stepper Motors

Working Principle: Stepper motors divide a full rotation into a number of equal steps. They move one step at a time by energizing specific coil windings in a sequence. This allows for very precise open-loop position control.

Control: Controlled by sending a sequence of electrical pulses to the motor coils. Each pulse causes the motor to rotate by one step.

Pros: Excellent open-loop position accuracy (no feedback needed for basic operation), high holding torque when stationary, robust.
Cons: Can lose steps under heavy loads or high speeds, lower torque at high speeds, consumes power even when stationary, can be noisy.
Applications: 3D printers, CNC machines, plotters, precision positioning systems, camera sliders.

Comparison of Electric Actuators

Aspect	DC Motor	Servo Motor	Stepper Motor
Primary Use	Continuous rotation, speed control	Precise angular positioning	Precise step-by-step positioning
Control Type	Open-loop (speed), Closed-loop (position with encoder)	Closed-loop (internal feedback)	Open-loop (position)
Precision	Low (without feedback)	High	High (for steps)
Speed	High	Moderate	Low to Moderate
Torque	Variable (depends on load)	High at low speeds	High holding torque, drops at speed
Cost	Low	Moderate	Moderate

Other Actuator Types

Hydraulic Actuators: Use incompressible fluid (oil) under pressure to generate high forces and torques. They offer high power density and are suitable for heavy-duty applications like excavators and large industrial robots. However, they can be messy, require pumps and reservoirs, and offer less precise control compared to electric motors.
Pneumatic Actuators: Use compressed air to generate linear or rotary motion. They are known for being fast, simple, and clean, often used for on/off actions like opening/closing grippers or simple pressing tasks. Their drawbacks include less precise control, the need for compressors and air reservoirs, and potential noise.

3. Control Systems

Control systems are the brainstem of a robot, translating high-level goals into low-level actions. This section explores how robots regulate their behavior, comparing simple systems with more advanced, adaptive ones, and dives into the most common controller in robotics: the PID.

Open-Loop vs. Closed-Loop Control

Theoretical Overview

A Control System is a mechanism that manages or regulates the behavior of other devices to achieve a desired output.

Open-Loop Control Systems operate without feedback. Actions are based purely on preset commands, meaning they cannot correct errors or adapt to disturbances. They are simple and cost-effective but less accurate.

Closed-Loop Control Systems (feedback systems) continuously monitor the output via sensors and compare it to the desired setpoint. The difference (error) is used to adjust control actions, making them accurate, robust, and adaptive to dynamic environments, though more complex and costly. Negative feedback is predominantly used to reduce error and stabilize the system.

Comparison Summary

Aspect	Open-Loop	Closed-Loop
Feedback	No	Yes
Accuracy	Less	More
Adaptability	Low	High
Complexity	Simple	Complex

Interactive PID Controller Tuning

Adjust the gains to see their effect on system response. Try to reach the setpoint quickly with minimal overshoot and oscillation.

Proportional (Kp): Reduces rise time

Integral (Ki): Eliminates steady-state error

Derivative (Kd): Reduces overshoot

PID Control Equation

The total PID output $u(t)$ is a sum of the proportional, integral, and derivative terms:

$u(t) = K_p \cdot e(t) + K_i \cdot \int e(\tau)d\tau + K_d \cdot \frac{de(t)}{dt}$

Where $e(t)$ is the error, $K_p$ is proportional gain, $K_i$ is integral gain, and $K_d$ is derivative gain.

Understanding PID Components

The Proportional (P) Term ($K_p \cdot e(t)$) provides an immediate response to the current error. Increasing $K_p$ makes the system respond faster but can increase overshoot and degrade stability. It typically leaves a small steady-state error.

The Integral (I) Term ($K_i \cdot \int e(\tau)d\tau$) addresses accumulated past errors. Its primary role is to eliminate steady-state error. Increasing $K_i$ helps remove offsets but can increase overshoot and settling time, potentially leading to instability.

The Derivative (D) Term ($K_d \cdot \frac{de(t)}{dt}$) anticipates future error trends by reacting to the rate of change of error. Increasing $K_d$ reduces overshoot and improves settling time, thereby enhancing stability. However, it is sensitive to measurement noise.

PID Tuning Methods

Manual Tuning (Trial-and-Error): Start with $K_i=0, K_d=0$. Increase $K_p$ until oscillations begin. Then, add $K_i$ to eliminate steady-state error, and finally adjust $K_d$ to reduce overshoot and improve settling.

Ziegler-Nichols Method: An empirical method where $K_p$ is increased until constant oscillations occur. The ultimate gain ($K_u$) and ultimate period ($T_u$) are then used with a predefined table to calculate optimal PID gains.

Controller Type	Kp	Ki	Kd
P	$0.5K_u$	–	–
PI	$0.45K_u$	$1.2K_p/T_u$	–
PID	$0.6K_u$	$2K_p/T_u$	$K_pT_u/8$

Cohen-Coon Method: Suitable for first-order plus time-delay (FOPTD) systems, using an open-loop step response to estimate process gain, time constant, and dead time, from which PID parameters are derived.

4. Robot Kinematics

Kinematics is the geometry of motion. This section explores how we can predict a robot's end-effector position from its joint angles (Forward Kinematics) and, more challengingly, how we can determine the required joint angles to reach a specific target (Inverse Kinematics).

Interactive 2-Link Arm Kinematics

Degrees of Freedom (DOF) & Joints

Degrees of Freedom (DOF) refers to the number of independent movements (translations and rotations) a robot can execute. In 3D space, a rigid body has 6 DOF (3 translational, 3 rotational). The DOF dictates a robot's dexterity and its ability to position and orient its end-effector. A minimum of 6 DOF is generally required for full control in a 3D environment.

Robot movements are facilitated by different types of joints:

Revolute Joint (R): Permits rotation around a fixed axis. Variable is an angle ($\theta$). Example: elbow joint.
Prismatic Joint (P): Allows linear translation along a fixed axis. Variable is a displacement ($d$). Example: hydraulic piston.

Gruebler-Kutzbach Criterion (for DOF)

For Spatial Mechanisms: $DOF = 6(N - 1 - J) + \sum_{j=1}^{J} f_j$

For Planar Mechanisms: $DOF = 3(N - 1 - J) + \sum_{j=1}^{J} f_j$

Where $N$ = number of links (including base), $J$ = number of joints, $f_j$ = DOF of the $j$-th joint.

DOF Calculator (Gruebler-Kutzbach)

Select a robot type to see its Degrees of Freedom calculation.

Choose Robot Type:

Select a robot type above to see its DOF calculation and description.

Forward Kinematics (FK)

Forward Kinematics (FK) is the process of calculating the position and orientation (pose) of a robot's end-effector when the values of its joint parameters are known. This is often summarized as converting "Angles to Position." FK is deterministic: for every valid set of input joint parameters, there is a unique and predictable output pose.

Homogeneous Transformation Matrices (HTM)

HTMs are 4x4 matrices that represent both rotation and translation within a single structure, simplifying kinematic computations by allowing chaining of multiple transformations through matrix multiplication.

$T = \begin{bmatrix} R_{3\times3} & d_{3\times1} \\ 0_{1\times3} & 1 \end{bmatrix}$

Where $R$ is a 3x3 rotation matrix and $d$ is a 3x1 translation vector.

Denavit-Hartenberg (DH) Convention

The Denavit-Hartenberg (DH) Method is a standardized procedure to compute Forward Kinematics for serial-link robots. It involves assigning coordinate frames to links, extracting four DH parameters, and chaining DH transformation matrices.

The four DH Parameters for each joint/link pair are:

$\theta_i$ (Joint angle): Rotation about the $z_{i-1}$ axis (variable for revolute joints, constant for prismatic).
$d_i$ (Link offset): Translation along the $z_{i-1}$ axis (variable for prismatic joints, constant for revolute).
$a_i$ (Link length): Distance from the $z_{i-1}$ axis to the $z_i$ axis, measured along the $x_i$ axis.
$\alpha_i$ (Link twist): Angle between the $z_{i-1}$ axis and the $z_i$ axis, measured about the $x_i$ axis.

DH Transformation Matrix

$T_{i-1}^i = \begin{bmatrix} \cos\theta_i & -\sin\theta_i\cos\alpha_i & \sin\theta_i\sin\alpha_i & a_i\cos\theta_i \\ \sin\theta_i & \cos\theta_i\cos\alpha_i & -\cos\theta_i\sin\alpha_i & a_i\sin\theta_i \\ 0 & \sin\alpha_i & \cos\alpha_i & d_i \\ 0 & 0 & 0 & 1 \end{bmatrix}$

The final pose $T_n^0 = T_0^1 \cdot T_1^2 \cdot \ldots \cdot T_{n-1}^n$.

Inverse Kinematics (IK)

Inverse Kinematics (IK) is the process of determining the joint parameters required for a robot's end-effector to reach a desired position and orientation ("Position to Angles"). It is generally more complex than FK.

Challenges in Inverse Kinematics

Multiple Solutions: A single target pose can have several joint configurations.
No Solution: Target is outside the robot's reachable workspace.
Infinite Solutions (Redundancy): For robots with more DOF than task dimensions.
Nonlinear Equations: Requires complex mathematical solvers.

Inverse Kinematics Solution Methods

Geometric (Trigonometric) Method: Uses laws of sines/cosines, best for simple planar robots.
Algebraic Method: Symbolically inverts FK equations, suitable for low-DOF robots.
Numerical (Iterative) Methods: Use Jacobian inverse or optimization; handles complex geometries and redundancy but requires initial guess and may not always converge.
Learning-Based Methods: Employ ML (e.g., neural networks) to learn solutions, fast inference but can have accuracy/generalization issues.

5. Robot Navigation & SLAM

Navigation is about getting from point A to point B. This involves not just finding a path (Path Planning) but also figuring out where you are in the first place, especially in an unknown environment (SLAM - Simultaneous Localization and Mapping).

Path Planning: A* vs. RRT

Compare how a search-based algorithm (A*) and a sampling-based algorithm (RRT) find a path from the green start to the red goal, avoiding gray obstacles.

Other Path Planning Algorithms

Beyond A* and RRT, other important search-based algorithms include:

Breadth-First Search (BFS): BFS explores a graph level by level, ensuring that all nodes at a given depth are visited before moving on to nodes at the next depth. It uses a queue data structure to manage the order of node visits.
- How it works: Start at the root (or a chosen node) and explore all its immediate neighbors. Then, for each of those neighbors, explore their unvisited neighbors, and so on.
- Optimality: Guarantees the shortest path in terms of number of edges (or unweighted graphs).
- Use Case: Finding the shortest path in a grid where all movements have the same cost.
Dijkstra's Algorithm: Dijkstra's algorithm finds the shortest paths from a single source node to all other nodes in a graph, given that edge weights are non-negative. It uses a priority queue to efficiently select the unvisited node with the smallest known distance from the source.
- How it works: It maintains a set of visited nodes and a set of unvisited nodes, iteratively selecting the unvisited node with the smallest distance and updating the distances of its neighbors.
- Optimality: Guarantees the shortest path in weighted graphs with non-negative edge weights.
- Use Case: Finding the most efficient route on a map where different road segments have varying travel times (weights).

A* Heuristics

Manhattan Distance: $h = |x_{current} - x_{goal}| + |y_{current} - y_{goal}|$, suitable for grid-based robots restricted to 4-directional motion.
Euclidean Distance: $h = \sqrt{(x_{current} - x_{goal})^2 + (y_{current} - y_{goal})^2}$

The SLAM Paradox

SLAM solves the "chicken-and-egg" problem: to build a map, you need to know where you are, but to know where you are, you need a map. It's a continuous cycle of prediction and correction.

SLAM Approaches

Various algorithms address the SLAM problem:

Extended Kalman Filter (EKF-SLAM): Maintains a single state estimate, good for small, linear problems.
Particle Filter (FastSLAM): Uses multiple hypotheses (particles) to handle non-linearities and multi-modal uncertainty, more scalable to larger maps.
Graph-Based SLAM): Builds a graph of poses and constraints, using optimization for consistent mapping, common in large-scale applications.

SLAM State Estimation Formulas

Prediction (Motion Update): $x_t = x_{t-1} + \Delta x$

Landmark position update: $x_{landmark} = x_{robot} + r \cdot \cos(\theta_{robot} + \phi)$

6. Robot Perception

Perception is how a robot understands its environment from raw sensor data. This process is hierarchical, moving from low-level geometric maps to high-level semantic understanding, often powered by AI and machine learning.

Semantic Mapping & Knowledge Representation

Beyond basic geometric maps, robots can build Semantic Maps by combining metric maps with meaningful labels (e.g., "kitchen", "table"). This often involves scene segmentation and object classification. Knowledge representation structures (T-box for class hierarchies, A-box for individuals) help robots reason about their environment at a higher, more abstract level.

7. Robot Operating System (ROS)

ROS is a flexible framework for writing robot software. It provides tools and libraries to help build robot applications, enabling modularity and distributed computing.

ROS Architecture & Concepts

Master: Coordinates communication between nodes, provides name registration.
Nodes: Individual executable processes that perform computation (e.g., sensor driver, motor controller).
Topics: Asynchronous, unidirectional communication using a publisher-subscriber model.
Messages: Data structures defined in .msg files, used for communication over topics/services.
Services: Synchronous request/response communication, defined in .srv files.
Parameter Server: Shared dictionary for configuration variables, accessible by any node at runtime.
ROS Bags: Files used to record and replay topic messages for debugging and analysis.
ROS Launch Files: XML files to start multiple ROS nodes and set parameters, simplifying deployment.

ROS Tools & Simulation

RViz: A powerful 3D visualization tool for sensor data and robot state, crucial for debugging.
Gazebo: A 3D robot simulator seamlessly integrated with ROS, supporting physics-based simulation for testing navigation, manipulation, and perception algorithms in a virtual environment.
Common Tools: rqt_graph (visualize communication), rosnode, rostopic, rosbag, rosparam for introspection and debugging.

ROS1 vs ROS2

ROS1 (e.g., Noetic): Mature, widely adopted, uses custom middleware.
ROS2 (e.g., Foxy, Humble): Newer, improved security, real-time capabilities, uses DDS middleware, better for production and multi-robot systems.

8. Learning & Human-Robot Interaction (HRI)

Robots are increasingly designed to learn from experience and interact effectively with human users, crucial for adaptability and societal integration.

Reinforcement Learning (RL)

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make optimal decisions by performing actions within an environment to maximize a cumulative reward.

Agent-Environment Loop: The core cycle where the agent performs an action, the environment responds with a new state and a reward signal, and the agent updates its policy (strategy) based on this feedback.
Q-learning: A prominent model-free RL algorithm that learns an action-value function (Q-function), which estimates the expected cumulative reward for taking a specific action in a given state.
Exploration-Exploitation Trade-off: A fundamental challenge in RL involving balancing the act of trying new, potentially better actions (exploration) against utilizing actions known to yield good rewards (exploitation).
Example: A classic example for policy learning in RL is the GridWorld problem, where an agent learns to navigate a grid to reach a goal while avoiding obstacles, based on rewards and penalties.

Human-Robot Interaction (HRI)

Human-Robot Interaction (HRI) is a multidisciplinary field dedicated to studying the interactions between humans and robots. Its goal is to design robots and interfaces that are intuitive, effective, and safe for human collaboration.

Control Interfaces: Focuses on natural and intuitive ways for humans to command robots, including gesture-based control (e.g., hand movements), speech-based control (e.g., voice commands), and haptic feedback.
HRI Design Principles: Guidelines for creating effective and intuitive interactions, often emphasizing predictability, legibility of robot intent, and appropriate social cues.
Ethics, Transparency, and Trust: Critical considerations for societal integration and successful collaboration:
- Ethics: Addresses moral principles in robot design and deployment, including issues of robot autonomy, responsibility for actions, and ensuring human safety.
- Transparency: Refers to the robot's ability to make its internal state, actions, and intentions understandable to humans, fostering clarity and predictability.
- Trust: Signifies the human's ability to rely on robots to perform tasks safely, reliably, and consistently, which is built through transparent behavior and adherence to ethical guidelines.