Module 6 — Computer Vision | AI & Robotics Training

Topic 01 · Pictures as numbers

How computers “see” images

To a computer, a photo is not a memory — it is a grid of numbers. Vision AI learns patterns in those numbers, the same way you learned that a banana is yellow and curved.

💡 From Module 3: Machine learning still means “learn from examples.” Here, the examples are photos with labels (cat, scratch, ripe, empty shelf).

What is a pixel? Zoom in on any digital photo and you see tiny coloured squares. Each square is a pixel. The computer stores a number (or a few numbers) for each pixel — how bright it is, and what colour.

Colour in simple terms: Many images use three numbers per pixel — how much red, green, and blue (RGB). Mix those and you get the colour you see on screen.

Humans vs computers: You recognise a friend’s face in a crowd using memory and context. A computer starts with raw numbers and must learn which number patterns mean “face,” “car,” or “crack in the wall.”

Three main jobs vision AI can do:

Classification — one label for the whole image (“cat” or “dog”).
Detection — find each object and draw a box around it.
Segmentation — label every pixel (this pixel is road, this pixel is person).

Image basics — size and file types

Resolution is pixels wide × tall (e.g. 1920×1080). More pixels = more detail and more work for the computer.

Grayscale = one brightness number per pixel. Factories often use it to count dark blobs on a light tray.

JPEG, PNG: Common photo formats. JPEG smaller for photos; PNG for sharp edges and screenshots.

Figure — Pick one job per project; do not do all three on day one.

Who labels training photos?

Who	Good for	Watch out
Students	Leaves, demos	Agree on rules
Factory workers	Scratch vs OK	Tired eyes; rotate
Experts	Medical outlines	Privacy, law

Example — barcode vs vision: Barcode is exact if present. Vision handles crumpled labels or fruit with no sticker.

Figure — The computer never “sees” like you do — it reads number patterns.

Figure — Zoom in enough and every picture looks like coloured squares.

Vision jobs — types and what they are used for

Job type	What you get	Used for	Example
Classification	One label for whole image	Sort photos, quality OK / not OK	“Beach” search in phone gallery
Detection	Boxes around each thing	Count people, find cars, track ball	Security camera person overlay
Segmentation	Colour each pixel by type	Road vs sidewalk, tumour outline	Self-driving research maps
Face detection	Find where faces are	Camera focus, blur background	Phone portrait mode
Face recognition	Match face to identity	Unlock device (needs care + consent)	Face unlock on phone
OCR (read text)	Text from image	Scans, receipts, signs	Deposit a cheque in banking app
Pose / gesture	Where body joints are	Fitness apps, sign language research	Dance game scoring

What training data looks like

To teach vision AI, people collect labelled images:

Photo of an apple + label “apple”.
Photo of factory part + label “scratch” or “OK”.
Thousands of examples — more variety usually helps (different lighting, angles, backgrounds).

If you only train on bright sunny photos, the system may fail on dark rainy photos. That is not “stupid AI” — it is missing examples.

Example — phone gallery search: You type “beach” and see beach photos. The phone learned what beach scenes look like from many labelled images — sand, sky, water patterns — not from reading the word on the photo.

Example — school science fair: Students photograph leaves and label them “oak,” “maple,” “birch.” A simple classifier can guess new leaf photos if lighting is similar to training photos.

Try it · Pick one vision job for a school cafeteria

Figure — Same as Module 3: learn from labelled examples.

Transfer learning — why it saves time

A model trained on millions of general photos already knows edges and textures. You teach a small new head for your labels — bruised apple, rust spot — with hundreds of photos, not millions.

Bad labels teach bad lessons

Mistake	Result
“Dog” on cat photo	Confuses both
Only sunny photos	Fails at night
Same photo in train and test	Fake high score

Try it · Rules vs camera for “empty chair”?

Would a motion sensor be simpler? What breaks the camera approach?

Would you use classification, detection, or segmentation? What would you label in photos? Who checks mistakes before acting?

Topic 02 · Learning from pixels

How image AI learns (simple idea)

Special programs scan small windows over the photo. They learn edges first, then shapes, then whole objects — like learning letters, then words, then sentences.

You do not need the math. The important ideas are:

Show the network many labelled photos.
It guesses the label and checks if it was wrong.
It slowly adjusts internal settings to do better next time.
Repeat until guesses are good enough on new photos it never saw.

Transfer learning (shortcut): Start from a model already trained on millions of general photos (cats, cars, chairs). Then teach it your smaller job — “bruised apple” vs “good apple” — with fewer pictures of your own.

Things that hurt vision AI: dark rooms, blur, shiny reflections, hidden objects, and labels that disagree (one person says “OK,” another says “defect”).

Figure — Early layers see simple parts; later layers combine them into meaning.

Figure — Like studying with practice questions, then taking a new exam.

Common data problems — and what to do

Problem	What goes wrong	What helps
Too few photos	Guesses random on new images	Collect more; use transfer learning
All photos look the same	Works in lab, fails in real room	Add night, blur, different angles
Wrong labels	AI learns the wrong lesson	Two people label; spot-check
Class imbalance	99% “OK” → always says OK	Collect more defect photos; measure fairly
Leaky test set	Near-duplicates in train and test	Split by time or camera, not random only

When rules are enough Bright room, fixed camera, same object every time — threshold counting may beat a big neural network.

When AI helps Objects vary in size, colour, or background — many labelled photos teach patterns rules cannot write by hand.

When humans stay boss Medical, legal, policing — AI suggests; human decides and explains to the person affected.

Example — fruit sorting belt: A camera over the conveyor takes a photo of each apple. A network trained on thousands of labelled images flags bruises. A puff of air or a gate pushes bad apples aside. Humans still spot-check random trays.

Example — document scanner app: OCR reads account numbers from a cheque photo. The app checks format (right number of digits) before sending — vision plus simple rules.

Topic 03 · Classic picture tricks

Before AI: blur, edges, and filters

Long before big neural networks, engineers used simple picture steps. Many factory and medical systems still use them because they are fast, cheap, and easy to explain.

Why simple steps still matter: If lighting is controlled (same lamp, same distance), you can count white pills or measure a screw head without training a huge model. You can explain every step to an inspector.

Typical pipeline (classic vision):

Take photo with a fixed camera.
Maybe blur to remove speckle noise.
Adjust brightness if needed.
Find edges or turn black/white (threshold).
Count blobs or measure width in pixels.
Convert pixels to millimetres using a ruler in the scene.

Figure — Same flow in many factories: prepare image → measure → decision.

Figure — Pills on tray: white blobs after threshold.

Lighting first Fix the lamp before buying a bigger model.

Fixed camera Bolt it down — shake ruins measure.

OpenCV Free library for blur, edges, threshold — used everywhere.

Simple filter types — what they do

Filter / step	What it does	Used for	Example
Blur	Smooths tiny speckles	Cleaner image before counting	Remove camera noise on grey tray
Sharpen	Makes edges crisper	See fine cracks (careful — also sharpens noise)	Inspecting metal surface
Edge finder	Highlights outlines	Measure width, find shape	Is the screw head round?
Brightness / contrast	Darken or brighten	Same rule morning and afternoon	Factory belt under changing sun
Threshold	Black and white only	Count white blobs	Pills on dark tray
Crop / resize	Cut or shrink image	Faster processing; focus on belt only	Ignore factory background
Colour filter	Keep one colour range	Find red defects on grey part	Tomato sorting by redness

Rules vs AI — when to pick which

Situation	Often best choice	Why
Same camera, same lighting, same object	Rules + threshold	Fast, cheap, easy to audit
Objects vary in look or background	Trained vision AI	Hard to write rules for every case
Need to explain every decision to law	Rules first; AI as helper	“Pixel count > 500” is clearer than “layer 7 said so”
Phone app for billions of photos	Big pre-trained AI	Scale and variety need deep learning

Example — counting pills: Pills are white circles on a dark tray. A high-contrast photo + threshold turns the image black and white. Software counts white blobs. If count is 30, tray is complete. No neural network needed if the tray always looks the same.

Example — reading a dial: Edge detection finds the needle outline; geometry measures the angle. A fixed camera and good lighting matter more than a fancy model.

Discuss · Could a canteen count sandwiches with only thresholding?

What would have to stay the same every day? What would break the system?

Topic 04 · Real uses

Where you already meet vision AI

You use computer vision more than you think — unlocking your phone, scanning homework, filtering selfies, and getting alerts when a camera sees a person. The ideas from Topics 1–3 show up in all of them.

Same loop everywhere: Camera captures image → software finds patterns → something happens (unlock, beep, stop belt, draw box on screen).

Good uses save time, catch defects, or help people with disabilities (text-to-speech on signs).

Risky uses need extra care: face recognition in public, emotion guessing in hiring, fully automatic medical decisions.

Where vision AI shows up — types and uses

Where	What vision does	Why people use it	Example product
Phone	Face unlock, photo search, filters, QR scan	Convenience, fun, security	Gallery search, portrait mode
Car	Backup lines, lane hints, sign read	Help driver see danger	Reversing camera, lane assist
Factory	Spot scratches, wrong parts, missing cap	Quality 24/7	Camera over conveyor belt
Shop	Recognise items at self-checkout	Faster queues	Camera above basket
Home security	Person / animal / package alert	Notify owner, not watch 24/7	Video doorbell
Farming	Weed vs crop, ripeness, pest damage	Target spraying, less waste	Drone over field
Healthcare (with doctor)	Highlight region on scan	Second pair of eyes	X-ray assist — doctor decides
Accessibility	Describe scene, read text aloud	Help blind or low-vision users	Phone “describe image” feature

Figure — Same idea as a robot from Module 4: see, think, act.

Uses that need extra care

Use	Risk if wrong	What responsible teams do
Face recognition in public	Wrong person accused; privacy harm	Consent, law check, human review, audit logs
Emotion AI in hiring	Unfair rejection; pseudo-science	Often avoided; humans interview
Medical image only	Missed disease	Doctor makes final call; regulated testing
Security “weapon” detection	False alarm, bias	Test on diverse data; human verifies alert

Careful uses: Face recognition in public spaces and medical images need extra testing, consent, and human review. “The AI said so” is not enough for life-changing decisions.

Vision in your Module 8 project

Idea	Vision job	Keep small
Recycling bins	Classify material	~50 photos per class
Plant health	OK vs wilted leaf	Same camera distance
Parking slot	Car vs empty	One camera angle
Bottle cap	Cap missing?	Rules may be enough

With IoT (Module 5): Camera → vision says defect → MQTT message → chart or belt stop. Draw the full chain on your poster.

Example — shelf tidiness: Photo each hour; “aligned / messy” for display only — not for grading people.

Example — helmet check demo: Detect head region; check helmet visible. Needs consent if filming people.

One vision task per project — not detect + OCR + segment at once.
Hold back photos for honest testing.
Say when the system refuses: too dark, too blurry.
Link to Module 7 ethics for faces and medical images.

Example — video doorbell: Camera sees motion. Small AI on device asks “person or car?” If person, send phone alert. Cloud stores clip if you pay for storage. You decide whether to open the door — vision only informs you.

Example — factory cap check: Camera looks down at bottles. Vision detects missing cap or wrong label colour. Belt stops automatically; operator removes bad bottle. Classic rules or AI both work if lighting is stable.

Try it · List one helpful and one risky use of face recognition

Explain why in one sentence each. Who should be allowed to override the system?

Check Your Understanding

Quick Knowledge Check

10 easy questions on how machines see pictures. Instant feedback on every answer.

Score: 0 / 0

Module Wrap-Up

Key Takeaways

Module 6 in short: photos are grids of numbers; AI learns patterns to recognise things.

01 A digital photo is a grid of coloured dots (pixels) stored as numbers.
02 Classification = what is in the whole picture; detection = what and where.
03 Face unlock and photo tags use the same broad idea: find patterns in pixels.
04 Blur and edge tools are simple steps before harder AI tasks.
05 More good training photos usually helps — but lighting and angles still matter.
06 OCR reads text from images (scanned pages, signs).
07 Vision helps cars, hospitals, farms, and phones — each needs care and testing.
08 Using faces in public raises privacy and fairness questions worth discussing.

↑ Back to Top

📚 Further reading:
• OpenCV documentation — docs.opencv.org
• PyTorch vision tutorials — pytorch.org/vision
• ImageNet & modern CV history — peer surveys on arXiv (search “ImageNet deep learning survey”)

ComputerVision Fundamentals

How computers “see” images

How image AI learns (simple idea)

Before AI: blur, edges, and filters

Where you already meet vision AI

Quick Knowledge Check

Key Takeaways

Computer
Vision Fundamentals