Computer Vision (Class 10 AI): Pixels, Features, OpenCV & CNNs Explained – CBSE Guide
Welcome to the complete guide for the CBSE Class 10 AI chapter on Computer Vision. This page explains how AI learns to “see,” starting from the basics of pixels, channels, and features. We explore core CV tasks like image classification, object detection, and segmentation. You will learn how to use the OpenCV library for practical operations like blurring and Canny edge detection, and understand the core “convolution” operation. Finally, we’ll see how these blocks build a Convolutional Neural Network (CNN), the brain that powers AI vision. This guide includes Q&As, study notes, and interactive demos to help you excel.
Guide to Class 10 Chapter 10 – AI Computer Vision
Your guide to CBSE AI Foundation concepts: Q&A, FAQs, Explainers, and Diagrams.
Learning Outcomes for This Chapter
By the end of this guide, you should be able to:
- Explain the concept of an image as pixels, channels, and features.
- List and describe the core computer vision tasks (classification, detection, segmentation).
- Outline the complete pipeline and main building blocks of a Convolutional Neural Network (CNN).
- Understand the purpose of key tools like OpenCV and the convolution operation.
2. The AI That “Sees” — An Introduction to Computer Vision
In modern technology, it is common to ask: How does a smartphone camera instantly find and focus on a face in a group photo?[1] How does a self-driving car “see” a pedestrian and distinguish them from a lamppost?[2] The answer is Computer Vision, a powerful field of Artificial Intelligence (AI).
Computer Vision (CV) is the domain of AI that enables machines to acquire, process, analyze, and, most importantly, interpret meaningful data from digital images and videos.[2, 3] This field seeks to replicate the complex capabilities of human sight and the “cognitive ability” to understand the visual world.[2] It is important to distinguish this from the simple act of “seeing.” A standard digital camera can acquire visual data (the pixels), but Computer Vision provides the “brain” that understands what that data represents—identifying objects, people, and patterns with high degrees of accuracy.[2, 4]
The applications of this “machine sight” are already integrated into daily life and advanced industries, including:
- Healthcare: Analyzing medical imaging, such as using chest X-rays to aid in the diagnosis of diseases like pneumonia.[1, 2]
- Transportation: Powering autonomous systems, such as self-driving vehicles, to perceive, interpret, and react to their complex environments.[2, 4]
- Security: Driving facial recognition technology for tasks like unlocking a smartphone or identifying individuals in a secure area.[1, 4]
- Manufacturing: Automating visual inspection on assembly lines to perform rapid and accurate defect detection, improving quality control.[2]
3. What Can AI “See”? The Core Tasks of Computer Vision
Computer vision is not a single concept but a field encompassing a wide range of tasks, from simple to incredibly complex.[1] These tasks can be understood as a hierarchy, moving from a general understanding of an image to a highly detailed, pixel-level interpretation. The three most common tasks are classification, detection, and segmentation.
Task 1: Image Classification
Core Question: “What is the main subject of this image?” [5]
This is the simplest task. It processes an entire image and assigns it a single label, like “dog” or “cat”. [4, 6]
Task 2: Object Detection
Core Question: “What objects are in this image, and where are they?”
This is a more complex task. It finds multiple objects and draws a “bounding box” (a rectangle) around each one, with a label for each box. [6, 7]
Task 3: Image Segmentation
Core Question: “What is the exact shape of every object in this image?”
This is the most detailed task. It classifies every single pixel to create a “mask” that shows the exact outline of each object. [7, 8]
Table 10.1: Comparison of Core Computer Vision Tasks
| Task | Core Question | Output | Example |
|---|---|---|---|
| Image Classification | “What is this image about?” | A single label for the entire image. | A photo of a cat is given the label “Cat”. [5] |
| Object Detection | “What objects are in this image and where are they?” | Bounding boxes (rectangles) with labels for multiple objects. | A photo with a box around a person labeled “Person” and a box around a car labeled “Car”. [8] |
| Image Segmentation | “What is the exact shape of everything in this image?” | A pixel-level mask (a precise outline) for each object. | A photo where all pixels belonging to the “Person” are colored red and all “Car” pixels are colored blue. [7] |
4. How a Computer Decodes an Image: Pixels, Channels, and Features
To understand how an AI model interprets an image, one must first understand what an image is from a computer’s perspective. Computers do not “see” a picture; they see a structured grid of numbers.[3]
The Building Block: The Pixel
A digital image is made of thousands or millions of tiny dots called pixels (short for “picture elements”).[10] A pixel is the smallest single point of color in a digital image.
In the simplest case, a grayscale (black and white) image is just a single 2D grid, or one channel.[10] Each pixel in this grid is represented by a single number, typically ranging from 0 (representing pure black) to 255 (representing pure white), with shades of gray in between.
Adding Color: The RGB Channels
Color images are more complex. They are typically formed by combinations of primary colors.[10, 11] The most common standard for computer displays and digital images is the RGB (Red, Green, Blue) color model.[10]
An RGB image is not one grid of numbers, but three 2D grids stacked on top of each other.[10] These are:
- A Red Channel: A 2D grid showing only the intensity of red (0-255) for every pixel.
- A Green Channel: A 2D grid showing the intensity of green (0-255) for every pixel.
- A Blue Channel: A 2D grid showing the intensity of blue (0-255) for every pixel.
The final color of any single pixel is created by combining the three values (one from each channel) at that pixel’s location.[12] For example, a pixel with a value of (Red: 250, Green: 165, Blue: 0) would appear as a bright orange. Thus, a color image with a resolution of 800×600 pixels is actually a 3D block of data with the dimensions 800x600x3.
Finding Meaning: Features vs. Pixels
A single pixel’s value (e.g., “150”) is just raw data and not very informative on its own. To understand an image, an AI cannot look at each pixel individually; it must find meaningful patterns in the groups of pixels. These “interesting” patterns are called features.[13, 14]
Features are the “landmarks” within the pixel data.[15] They are a more compact, meaningful representation of the image, reducing the high dimensionality of the raw pixel data.[13, 16] The entire process of computer vision is to move from a meaningless grid of raw pixels to an abstract understanding based on features.
The Simplest Features: Edges and Corners
The most fundamental features an AI model learns to find are edges and corners.
- Edges: These are boundaries where the image’s brightness changes abruptly.[14, 15] They represent the outline of an object or the boundary between two different surfaces.[14]
- Corners: Also called “interest points,” these are locations where edges intersect or where a line sharply changes direction.[14, 15]
A computer, therefore, starts its process of “seeing” by first analyzing the raw pixels to find these simple features. This process forms the foundation for building a more complex understanding of the image content.
5. A Practical Toolkit for Computer Vision: Getting Started with OpenCV
To perform these tasks, developers rely on specialized tools. The most popular and foundational of these is the OpenCV library.
What is OpenCV?
OpenCV stands for Open Source Computer Vision Library.[3, 17] It is a massive, free software toolkit containing over 2,500 optimized algorithms for computer vision and machine learning.[3, 18] Its main purpose is to provide a common infrastructure for CV tasks, with a strong focus on real-time applications.[17, 18]
OpenCV is a fundamental tool for developers because it is open-source (free for both academic and commercial use), computationally efficient, and cross-platform, running on Windows, Linux, Android, and Mac OS.[17, 19] While written in C++, it has user-friendly bindings for many languages, most notably Python, which has made it highly accessible for rapid development and research.[3, 17]
Sample Notebook: Your First CV Program in Python
OpenCV makes it simple to perform basic image operations. The following annotated code provides a step-by-step example of how to read, resize, and display an image—often the first steps in any CV pipeline.
# Step 1: Import the OpenCV Library
# Import the OpenCV library
import cv2
# Import NumPy, a library for numerical operations that OpenCV uses
import numpy as np
# Step 2: Read an Image
# The cv2.imread() function is used to load an image from a file.
# OpenCV loads this image as a NumPy array. [3, 20]
image = cv2.imread("my_image.jpg")
# Step 3: Resize an Image (Best Practice)
# Get the original image's dimensions (height, width)
(original_height, original_width) = image.shape[:2]
# Define a new target width
new_width = 500
# Calculate the ratio to maintain the aspect ratio
ratio = new_width / float(original_width)
new_height = int(original_height * ratio)
# Resize the image while maintaining the aspect ratio
# We use cv2.INTER_AREA for interpolation, which is good for shrinking [23, 24]
resized_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_AREA)
# Step 4: Display the Image
# cv2.imshow() displays the image in a new window.
# cv2.waitKey(0) waits indefinitely for a key press.
# cv2.destroyAllWindows() closes the windows. [21, 22]
cv2.imshow("Original Image", image)
cv2.imshow("Resized Image", resized_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
6. Common OpenCV Operations (New Module)
Beyond just reading and resizing, OpenCV provides a vast library of functions. Here are three common and fundamental operations used in many CV pipelines: converting to grayscale, blurring, and edge detection.
Operation 1: Convert to Grayscale
Often, color is not necessary to solve a CV problem (like finding edges). Working in grayscale (one channel) instead of color (three channels) reduces the amount of data by two-thirds, making calculations much faster.
The cv2.cvtColor() function is used for this. We pass it the original image and a “color code,” like cv2.COLOR_BGR2GRAY, to specify the conversion.
Operation 2: Blurring (Smoothing)
Images can contain “noise” (random variations in pixel values). This noise can confuse algorithms, especially edge detectors. To reduce noise, we often apply a blur.
Gaussian Blur (using cv2.GaussianBlur()) is a popular technique. It applies a mathematical function that averages pixels, giving more weight to the center pixel. This creates a smooth, natural blur and removes high-frequency noise.
Operation 3: Canny Edge Detection
This is one of the most popular and effective edge-detection algorithms. The cv2.Canny() function performs a multi-step process to find clean, thin edges.
- It first applies a Gaussian blur to reduce noise.
- It then finds the intensity gradient (direction of change) of the image.
- It suppresses pixels that are not at the “peak” of an edge.
- Finally, it uses two thresholds (min and max) to link strong edge pixels and discard weak ones.
This results in a black-and-white image where white pixels represent the detected edges.
# ... (Continuing from the previous code sample) ...
# Step 5: Convert to Grayscale
# Use cvtColor to change the image from BGR (OpenCV's default) to Grayscale
gray_image = cv2.cvtColor(resized_image, cv2.COLOR_BGR2GRAY)
# Step 6: Apply Gaussian Blur
# We provide a (5, 5) kernel. This must be an odd number.
# The '0' tells OpenCV to automatically calculate the standard deviation.
blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0)
# Step 7: Detect Edges with Canny
# We provide two thresholds: a min (100) and max (200).
# Pixels above 200 are definitely edges.
# Pixels below 100 are definitely not.
# Pixels between 100-200 are only kept if they connect to a "definite" edge.
canny_edges = cv2.Canny(blurred_image, 100, 200)
# Step 8: Display the new images
cv2.imshow("Grayscale", gray_image)
cv2.imshow("Blurred", blurred_image)
cv2.imshow("Canny Edges", canny_edges)
# Wait for a key press and close all windows
cv2.waitKey(0)
cv2.destroyAllWindows()
7. The Engine of Sight: How Convolution Works
In Section 4, it was established that AI needs to find features (like edges). The core mathematical operation that allows an AI to find these features is called convolution.[25, 26] This is the fundamental operation in modern computer vision.
The Intuitive View: A “Filter” for Finding Features
A convolution can be understood as a “feature detector.” The process involves two things:
- The Input Image (the 3D grid of pixels).
- A Kernel or Filter (a small matrix of numbers, e.g., 3×3).[25, 27]
The kernel’s numbers (or weights) are designed to detect a specific pattern. The process works by “sliding” this kernel over every patch of the input image, from left-to-right, top-to-bottom.[26, 28]
At each position, the computer performs an element-wise multiplication between the kernel’s 3×3 grid and the 3×3 patch of pixels it is on top of. All nine results are then summed to produce a single number. This single number becomes one pixel in a new image, which is called a Feature Map or Activation Map.[26]
Kernels as Feature Detectors
The “magic” of convolution is that different numbers in the kernel matrix will detect different features.[26]
- A Blur Kernel might have all 1/9 values. This averages the surrounding pixels, resulting in a blurred image.[29]
- A Sharpen Kernel will have a large positive number in the center and negative numbers around it. This emphasizes the center pixel’s difference from its neighbors, sharpening the image.[25]
- An Edge Detection Kernel (like a Sobel filter) will have positive numbers on one side and negative numbers on the other.[29] When this kernel is over a flat, single-color region, the positive and negative numbers cancel out, resulting in a 0 (black). But when it slides over an edge (a sharp change from dark to light), the multiplications result in a large positive number (white), thus “activating” and highlighting the edge.[27]
Interactive Drawing: Convolution Demo
This drawing shows a 3×3 kernel sliding over a 5×5 image. The kernel is a simple vertical edge detector. Notice how the “Output Value” bar is high (red) only when the kernel is over the edge.
1. Input Image & Kernel
2. Output Feature Map
3. Current Output Value
0
The Numeric View: A Step-by-Step Calculation
Let’s walk through a simple numerical example of 2D convolution.
Step 0: The Setup
Input Image =
[ 10 10 10 100 ]
[ 10 10 10 100 ]
[ 10 10 10 100 ]
[ 10 10 10 100 ]
Kernel =
[ 1 0 -1 ]
[ 1 0 -1 ]
[ 1 0 -1 ]
Step 1: Calculate Output (0, 0)
Place the 3×3 kernel over the top-left 3×3 patch of the image.
- (10 x 1) + (10 x 0) + (10 x -1) = 0
- (10 x 1) + (10 x 0) + (10 x -1) = 0
- (10 x 1) + (10 x 0) + (10 x -1) = 0
- Sum = 0 + 0 + 0 = 0
The top-left pixel of our output Feature Map is 0. This is because the kernel was over a “flat” area.
Step 2: Calculate Output (0, 1)
Slide the kernel one pixel to the right.
- (10 x 1) + (10 x 0) + (100 x -1) = -90
- (10 x 1) + (10 x 0) + (100 x -1) = -90
- (10 x 1) + (10 x 0) + (100 x -1) = -90
- Sum = -90 + -90 + -90 = -270
The kernel has found a strong vertical edge, so the output value is very high (in magnitude).
Final Result:
After sliding the kernel over all possible positions, the 4×4 input and 3×3 kernel will produce a 2×2 Feature Map.
Feature Map =
[ 0 -270 ]
[ 0 -270 ]
This new “image” is smaller, but it has transformed the pixel data into feature data, highlighting where the vertical edge is.
8. The “Brain” That Sees: Building a Convolutional Neural Network (CNN)
The convolution operation is the “engine” of sight, but it needs a “brain” to control it. This is the Convolutional Neural Network (CNN).
Cross-Link: What is a Neural Network?
Before defining a CNN, one must understand a basic Neural Network (NN). A Neural Network is a type of machine learning model inspired by the web of neurons in the human brain.[31] It is built from layers of interconnected “artificial neurons” (nodes).[32, 33, 34] These networks learn from data. During training, they process thousands of examples and continuously adjust the “weights”—the strength of the connections between neurons—to learn how to map inputs (like an image) to correct outputs (like a label).[33, 35]
The CNN Pipeline: An Assembly Line for Classification
A Convolutional Neural Network (CNN or ConvNet) is a specialized type of neural network designed to be highly effective for grid-like data, such as images.[34, 36, 37]
A CNN can be visualized as an “assembly line” for classification, divided into two main stages.[38, 39] The typical pipeline follows a repeating pattern:
Input -> [Conv -> ReLU -> Pool] -> [Conv -> ReLU -> Pool] -> [Flatten] -> [FC] -> Output
Stage 1: Feature Extraction (The “Eyes”)
This first stage of the network is responsible for finding and summarizing features, moving from raw pixels to abstract patterns.
Convolutional Layer
The “Feature Finder”. Uses a learnable kernel to find edges, curves, and textures. [29, 42]
ReLU Layer
The “Activator”. Introduces non-linearity by setting all negative values to 0. [44, 45]
Pooling Layer
The “Summarizer”. Downsamples (shrinks) the feature map, often by taking the max value. [48, 50]
(This [Conv → ReLU → Pool] block repeats multiple times)
Stage 2: Classification (The “Brain”)
This second stage takes the abstract features and makes a final, reasoned decision.
Flatten Layer
Converts the 3D feature maps into one long 1D vector of numbers. [37]
Fully Connected (FC) Layer
The “Decision Maker”. Every neuron is connected to all inputs. It learns global patterns from the features. [51, 52]
Output Layer
Gives the final probabilities for each class (e.g., 95% “Cat”, 5% “Dog”).
Detailed Block Descriptions
Block 1: The Convolutional Layer (The Feature Finder)
This is the core building block of the CNN.[36, 41] It performs the convolution operation. However, there is a key difference from the manual example in Section 5: in a CNN, the kernel’s values (weights) are not pre-set. They are learned during training.[29, 42] The CNN automatically learns which features (edges, curves, colors, textures) are important to find in order to solve its problem.[42, 43] The first layers learn simple features (like edges), and deeper layers combine these to learn complex features (like shapes, eyes, or wheels).[26, 41]
Block 2: The ReLU Layer (The Activator)
Immediately after each convolution, the resulting feature map is passed through a ReLU (Rectified Linear Unit) layer.[40, 41] ReLU is an activation function whose job is to introduce non-linearity into the network.[44, 45] Its rule is extremely simple: f(x) = max(0, x).[44, 45]
In practice, this layer takes the feature map and sets all negative pixel values to zero, while leaving all positive values unchanged.[44] This is vital because the relationships that define real-world objects are non-linear. ReLU allows the network to learn these complex patterns and helps the network train faster.[46, 47]
Block 3: The Pooling Layer (The Summarizer)
After the ReLU function, a Pooling Layer is often applied.[36, 41] The job of this layer is downsampling, which means making the feature maps smaller while retaining the most important information.[43, 48, 49]
The most common type is Max Pooling.[48, 50] It works by sliding a small window (e.g., 2×2) over the feature map and outputting only the maximum value from that window.[48, 50] This has two key benefits:
- Efficiency: It dramatically reduces the size of the data, which means less computation for the following layers.[48, 50]
- Robustness (Translation Invariance): By capturing only the most prominent feature signal (the max value), the network becomes less sensitive to the feature’s exact location. This helps the model recognize an object (e.g., a cat) even if it is slightly moved or shifted in the frame.[50]
The Classification Block
This second stage of the network is responsible for taking all the extracted features and making a final decision.
Block 4: The Fully Connected (FC) Layer (The Decision Maker)
After several blocks, the network has generated many small, abstract feature maps. These are flattened into a single, long 1D vector of numbers.[37]
This vector is then fed into a Fully Connected (FC) Layer.[36, 41] “Fully connected” means that every neuron in this layer is connected to every value in the flattened vector from the previous step.[42, 51, 52]
This is the “reasoning” part of the network.[53] While the convolutional layers performed local feature detection (finding an edge here, a curve there), the FC layer performs global integration.[51] It looks at the combination of all the features and learns to make a decision.[38, 54] For example, it learns that if the “pointy ear” feature, “whisker” feature, and “fur texture” feature are all strongly activated, then the probability of the image being a “Cat” is very high.
9. Chapter Summary and Next Steps
Chapter Summary
This chapter explored the journey of how AI learns to “see.”
- Computer Vision is the AI field for interpreting visual data, with tasks ranging from Image Classification (what is it?), to Object Detection (where is it?), to Image Segmentation (what is its exact shape?).
- Computers see images as 3D grids of numbers called pixels, organized in channels (e.g., Red, Green, Blue). The goal is to find features (like edges and corners) within this data.
- OpenCV is the key software library for performing practical CV operations like reading and resizing images.
- The convolution operation is the mathematical “engine” that uses a “kernel” (or filter) to slide over an image and create Feature Maps, which highlight where specific features are located.
- A Convolutional Neural Network (CNN) is the “brain” that automates this process. It uses Conv layers to find features, ReLU layers to add non-linearity, Pooling layers to summarize features, and Fully Connected (FC) layers to make a final decision.
Cross-Link: How Do We Know It’s Working?
After a CNN is trained, it makes predictions. But how do we know if those predictions are correct? We must evaluate the model’s performance.[55] This leads to the “Evaluation” chapter. The most common metric for this is Accuracy.[55, 56]
- Definition: Accuracy measures the proportion of correct predictions a model makes compared to the total number of predictions.[56, 57, 58]
- Formula: Accuracy = (Number of Correct Predictions) / (Total Number of Predictions).[56]
- Example: If we test our model on 100 new images and it correctly classifies 87 of them, the model’s accuracy is 87% or 0.87.[56] A perfect score is 1.0.[58]
While simple to understand, accuracy can be misleading, especially on imbalanced datasets.[59, 60] For example, if a model is trained to detect a rare disease that only appears in 1% of X-rays, a “lazy” model that always predicts “no disease” will be 99% accurate. However, this model is 100% useless because it fails to find the one positive case that matters. This limitation is why other evaluation metrics, such as Precision and Recall, are essential for understanding a model’s true performance.[55, 56, 61]
10. Quick Q&A (Test Your Knowledge)
Q: What is the smallest unit of a digital image called?
Hint: It’s short for “picture element”.
A pixel. It is the smallest single point of color in a digital image.
Q: What is the main difference between Image Classification and Object Detection?
Hint: One just says “what,” the other says “what and where”.
Image Classification gives a single label to the entire image (e.g., “Cat”). Object Detection finds multiple objects in the image and draws a “bounding box” around each one, giving each box a label (e.g., “a box around the cat, labeled ‘cat'”).
Q: What are the three channels in a standard RGB color image?
Hint: The answer is in the name “RGB”.
Red, Green, and Blue. A color image is made of three stacked 2D grids, one for the intensity of each of these colors.
Q: What is the mathematical operation used to find features like edges?
Hint: It involves sliding a “kernel” over the image.
Convolution. A kernel (or filter) is slid across the image, and at each step, an element-wise multiplication and sum is performed to create a “Feature Map”.
Q: What are the two main parts of a CNN pipeline?
Hint: First it finds things, then it decides what they are.
1. Feature Extraction: Using Conv, ReLU, and Pooling layers to find patterns.
2. Classification: Using Fully Connected (FC) layers to make a final decision based on those features.
Q: What does OpenCV stand for, and what is it used for?
Hint: It’s a library for…
Open Source Computer Vision Library. It’s a free toolkit of algorithms used to perform practical, real-time CV tasks like reading, resizing, and analyzing images.
Q: Why do we often convert images to grayscale in CV?
Hint: It’s about simplicity and speed.
Converting to grayscale (1 channel) from color (3 channels) reduces the data by two-thirds. This makes all calculations much faster and simpler. For many tasks, like edge detection, color information is not necessary.
Q: What is the purpose of the Canny Edge Detector?
Hint: It’s considered a very “clean” way to find outlines.
The Canny algorithm (using cv2.Canny) is a popular, multi-step process to find edges. It blurs the image to reduce noise and then uses two thresholds to find and connect strong, thin edges while ignoring weak or isolated pixels.
11. Frequently Asked Questions (FAQs)
A: Real-world objects are complex and “non-linear”. A basic convolution is a linear operation (just multiplication and addition). The ReLU (Rectified Linear Unit) layer introduces non-linearity by setting all negative values to zero. This allows the network to learn much more complex patterns than just simple lines or gradients.
A: A Convolutional layer learns to find features. Its kernel has weights that are updated during training. Its goal is to transform the data into a “feature map”.
A Pooling layer (like Max Pooling) does not learn. It just follows a simple rule (e.g., “take the maximum value in this 2×2 window”). Its goal is not to find features, but to downsample (shrink) the feature map to make the network more efficient and robust.
A: An imbalanced dataset is one where one class is very rare (e.g., 99% “Normal” X-rays, 1% “Disease” X-rays). A “lazy” model can get 99% accuracy by just predicting “Normal” every time. This high accuracy score looks good, but the model is useless because it completely fails to find the 1% of cases that matter. It has learned nothing.
A: They are often used together but have different goals. cv2.GaussianBlur is for smoothing. Its purpose is to blur the image to reduce random “noise.” cv2.Canny is for edge detection. Its purpose is to find the outlines of objects. In fact, Canny uses a Gaussian blur as its very first step to reduce noise *before* it looks for edges.
12. Key Study Notes
- Computer Vision (CV): AI field to interpret images and videos.
- Pixel: The smallest dot of color in an image.
- RGB: The (Red, Green, Blue) channels that form a color image. An 800×600 image is an 800x600x3 grid of numbers.
- Feature: A meaningful pattern in the pixels (e.g., an edge, corner, texture).
- Classification: One label for the whole image (e.g., “Dog”).
- Detection: Multiple labels with bounding boxes (e.g., “a box around the dog”).
- Segmentation: A pixel-perfect outline for every object.
- OpenCV: A free software library for CV tasks.
- cv2.cvtColor: OpenCV function to change color spaces (e.g., to grayscale).
- cv2.GaussianBlur: OpenCV function to blur an image and reduce noise.
- cv2.Canny: OpenCV function for high-quality edge detection.
- Convolution: The operation of sliding a kernel (filter) over an image to create a Feature Map.
- CNN: Convolutional Neural Network. The “brain” model for CV.
- Conv Layer: Learns and finds features.
- ReLU Layer: Adds non-linearity (sets negative numbers to 0).
- Pooling Layer: Summarizes and shrinks the feature map (e.g., Max Pooling).
- Fully Connected (FC) Layer: The final “decision-making” part that looks at all features.
- Accuracy: (Correct Predictions) / (Total Predictions). Can be misleading.




