FOR BEGINNERS

New to AI? Start here.

Never trained an AI model before? No problem. This page explains the whole idea in plain language — what labeling is, what training is, and exactly how to start. No jargon.

STEP ONE — THE IDEA

What is labeling?

Think of teaching a child.

Imagine showing a child photos and saying “this is a car,” “this is a dog.” After enough examples, the child just knows. Labeling is exactly that: you show the computer images and point out what's in them.

In practice

In practice: you draw a box around each object in an image and give it a name (“car,” “person,” “helmet”). Do this for a batch of images and you’ve built a labeled dataset.

The tool for this: Annotation Studio

STEP TWO — THE IDEA

What is training?

Now the child learns on its own.

After the child has seen thousands of labeled examples, it can recognize a car it has never seen before. Training is when the computer studies your labeled images and learns the patterns — so it can find those objects in brand-new images by itself.

In practice

In practice: you feed your labeled dataset to the trainer, it runs for a while, and out comes a “model” — a file that can detect your objects automatically.

The tool for this: Model Training

THE BIG PICTURE

The whole journey, in 4 steps

1

Collect images

Gather photos of the things you want the AI to recognize.

2

Label them

Draw boxes and name the objects. Use Annotation Studio.

3

Train a model

Let the computer learn from your labels. Use Model Training.

4

Use your model

Your trained model now detects objects on its own. Deploy it anywhere.

GOING DEEPER

The settings that actually matter

When you train, a few numbers control how the learning happens. You don't have to set them — Model Training picks smart defaults — but here's what they mean so the screen makes sense.

Epochs

how many times

One epoch = the model looks at your entire set of images once. It needs several passes to really learn — like re-reading a book. Too few epochs and it barely learns; too many and it just memorizes your exact images and gets worse on new ones (that's called overfitting).

Batch size

how many at once

How many images the model looks at before it updates what it has learned. Bigger batches train more smoothly but need more GPU memory (VRAM). Model Training caps this to fit your card automatically.

Image size

how much detail

Every image is resized to a square (commonly 640×640) before training. Bigger means the model sees finer detail — but trains slower and needs more memory.

Don't want to think about any of this? You don't have to. Hit Start with the defaults — they're tuned to just work.

CHOOSING YOUR MODEL

Which model should you train?

Why only YOLO?

Model Training uses YOLO — and only YOLO — because it's one of the best object-detection model families ever made: fast, accurate, and proven on real projects. We include every YOLO version so you can pick the one that fits you.

The versions

YOLO keeps improving. Each newer version is generally a better balance of speed and accuracy than the one before it.

v8

The stable classic — widely used and reliable.

v9 / v10

Newer refinements — better accuracy, faster detection.

v11Recommended

A refined all-rounder — efficient and accurate.

YOLO26Recommended

The newest generation — our top pick for the best results today.

Our recommendation: go with v11 or YOLO26 — they give the best results for most people.

The sizes — Nano to Extra-Large

Each version also comes in sizes. Same brain, different capacity: bigger sizes are more accurate but need more powerful hardware and train slower.

SizeSpeedAccuracyBest for
Nano (n)FastestLowestReal-time speed · small images (~320px)
Small (s)FastGoodFast detection · ~320–640px
Medium (m)BalancedHighBalanced all-rounder · ~640px
Large (l)SlowerHigherSmall objects & detail · ~640–1280px
Extra-Large (x)SlowestHighestMax accuracy · fine detail · ~1280px

How do I pick the size for my dataset?

There's no single right answer — it depends on four things. The biggest one is how much data you actually have.

How many images do you have?

Hundreds → Nano or Small. A few thousand → Medium. Tens of thousands → Large or X. A big model trained on a tiny dataset just memorizes it (overfitting) — so small data wants a small model.

How hard is the task?

A few clear, different classes (car vs. person) → a small size handles it fine. Many similar classes, or tiny objects in the frame → a bigger size sees more.

How strong is your GPU?

Low VRAM → smaller model + smaller image size. Strong GPU → you can afford a bigger model and higher resolution.

Do you need speed?

Real-time or many images per second → Nano / Small. Running offline where accuracy matters more → size up.

Rule of thumb: more data + harder task + stronger GPU → go bigger. When in doubt, start at Medium + v11, test it, and only move up a size if you need more accuracy.

READY?

How to start

Two tools, used in order. Both run on your own machine.

Step 1

Annotation Studio

Label your images. Draw boxes, name objects, export a clean dataset.

Open Annotation Studio
Step 2

Model Training

Feed your labeled data in and train a working model — no terminal, no Python.

Open Model Training

That’s the whole idea.

Label → Train → Use. Start with the first tool and follow the steps inside it.