


Beginner Python Project: Build an Augmented Reality Drawing App Using OpenCV and Mediapipe
In this Python project, we'll create a simple AR drawing app. Using your webcam and hand gestures, you can draw virtually on the screen, customize your brush, and even save your creations!
Setup
To get started, create a new folder and initialize a new virtual environment using:
python -m venv venv
./venv/Scripts/activate
Next up install the required libraries using pip or installer of your choice:
pip install mediapipe
pip install opencv-python
Note
You may have trouble installing mediapipe with latest version on python. As I am writing this blog I am using python 3.11.2. Make sure to use the compatible version on python.
Step 1: Capture Webcam Feed
The first step is to set up your webcam and display the video feed. We'll use OpenCV's VideoCapture to access the camera and continuously display frames.
import cv2 # The argument '0' specifies the default camera (usually the built-in webcam). cap = cv2.VideoCapture(0) # Start an infinite loop to continuously capture video frames from the webcam while True: # Read a single frame from the webcam # `ret` is a boolean indicating success; `frame` is the captured frame. ret, frame = cap.read() # Check if the frame was successfully captured # If not, break the loop and stop the video capture process. if not ret: break # Flip the frame horizontally (like a mirror image) frame = cv2.flip(frame, 1) # Display the current frame in a window named 'Webcam Feed' cv2.imshow('Webcam Feed', frame) # Wait for a key press for 1 millisecond # If the 'q' key is pressed, break the loop to stop the video feed. if cv2.waitKey(1) & 0xFF == ord('q'): break # Release the webcam resource to make it available for other programs cap.release() # Close all OpenCV-created windows cv2.destroyAllWindows()
Did You Know?
When using cv2.waitKey() in OpenCV, the returned key code may include extra bits depending on the platform. To ensure you correctly detect key presses, you can mask the result with 0xFF to isolate the lower 8 bits (the actual ASCII value). Without this, your key comparisons might fail on some systems—so always use & 0xFF for consistent behavior!
Step 2: Integrate Hand Detection
Using Mediapipe's Hands solution, we'll detect the hand and extract the position of key landmarks like the index and middle fingers.
import cv2 import mediapipe as mp # Initialize the MediaPipe Hands module mp_hands = mp.solutions.hands # Load the hand-tracking solution from MediaPipe hands = mp_hands.Hands( min_detection_confidence=0.9, min_tracking_confidence=0.9 ) cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break # Flip the frame horizontally to create a mirror effect frame = cv2.flip(frame, 1) # Convert the frame from BGR (OpenCV default) to RGB (MediaPipe requirement) frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Process the RGB frame to detect and track hands result = hands.process(frame_rgb) # If hands are detected in the frame if result.multi_hand_landmarks: # Iterate through all detected hands for hand_landmarks in result.multi_hand_landmarks: # Get the frame dimensions (height and width) h, w, _ = frame.shape # Calculate the pixel coordinates of the tip of the index finger cx, cy = int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * w), \ int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * h) # Calculate the pixel coordinates of the tip of the middle finger mx, my = int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x * w), \ int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y * h) # Draw a circle at the index finger tip on the original frame cv2.circle(frame, (cx, cy), 10, (0, 255, 0), -1) # Green circle with radius 10 # Display the processed frame in a window named 'Webcam Feed' cv2.imshow('Webcam Feed', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break # Exit the loop if 'q' is pressed # Release the webcam resources for other programs cap.release() cv2.destroyAllWindows()
Step 3: Track Finger Position and Draw
We’ll track the index finger and allow drawing only when the index and middle fingers are separated by a threshold distance.
We'll maintain a list of co-ordinates of the index fingers to draw on the original frame and every time middle finger is close enough, we'll append None to this co-ordinates array indicating a breakage.
import cv2 import mediapipe as mp import math # Initialize the MediaPipe Hands module mp_hands = mp.solutions.hands hands = mp_hands.Hands( min_detection_confidence=0.9, min_tracking_confidence=0.9 ) # Variables to store drawing points and reset state draw_points = [] # A list to store points where lines should be drawn reset_drawing = False # Flag to indicate when the drawing should reset # Brush settings brush_color = (0, 0, 255) brush_size = 5 cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break frame = cv2.flip(frame, 1) frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) result = hands.process(frame_rgb) # If hands are detected if result.multi_hand_landmarks: for hand_landmarks in result.multi_hand_landmarks: h, w, _ = frame.shape # Get the frame dimensions (height and width) # Get the coordinates of the index finger tip cx, cy = int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * w), \ int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * h) # Get the coordinates of the middle finger tip mx, my = int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x * w), \ int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y * h) # Calculate the distance between the index and middle finger tips distance = math.sqrt((mx - cx) ** 2 + (my - cy) ** 2) # Threshold distance to determine if the fingers are close (used to reset drawing) threshold = 40 # If the fingers are far apart if distance > threshold: if reset_drawing: # Check if the drawing was previously reset draw_points.append(None) # None means no line reset_drawing = False draw_points.append((cx, cy)) # Add the current point to the list for drawing else: # If the fingers are close together set the flag to reset drawing reset_drawing = True # # Draw the lines between points in the `draw_points` list for i in range(1, len(draw_points)): if draw_points[i - 1] and draw_points[i]: # Only draw if both points are valid cv2.line(frame, draw_points[i - 1], draw_points[i], brush_color, brush_size) cv2.imshow('Webcam Feed', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break # Release the webcam and close all OpenCV windows cap.release() cv2.destroyAllWindows()
Step 4: Improvements
- Use OpenCV rectangle() and putText() for buttons to toggle brush size and color.
- Add an option to save the frame.
- Add an eraser tool, use the new co-ordinates to modify draw_points array.
The above is the detailed content of Beginner Python Project: Build an Augmented Reality Drawing App Using OpenCV and Mediapipe. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Pythonlistsarepartofthestandardlibrary,whilearraysarenot.Listsarebuilt-in,versatile,andusedforstoringcollections,whereasarraysareprovidedbythearraymoduleandlesscommonlyusedduetolimitedfunctionality.

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code
