TOEIC Speaking Overview: 11 Tasks, 20 Minutes, and the Hierarchical Rubric

You sit down at a computer terminal, put on the headset, and within two minutes you're reading a paragraph aloud. Forty-five seconds later you're reading another. Then a picture appears and you have 45 seconds to prepare a 30-second description. The microphone captures every pause, every mispronounced consonant, every intonation rise and fall. Nineteen minutes from now, the test ends. Your certificate will report a 0-200 score plus two qualitative descriptors — Pronunciation Low/Medium/High, Intonation and Stress Low/Medium/High — that follow you onto your resume for the next two years.

TOEIC Speaking is the most compressed productive-skill test of any major workplace English certificate. Eleven tasks, twenty minutes, five task families, two separate descriptor axes on the certificate, and a rubric structure that layers more criteria onto later tasks as they get harder. Candidates who prepare for Speaking by "practicing speaking in general" often land a score in the 110-130 range and don't know why — their pronunciation is fine, their vocabulary is fine, but their rubric-to-task mapping is wrong.

Understanding the test's structure — which tasks test what, how the rubric accumulates, and where the descriptors come from — is the precondition for preparing efficiently.

What TOEIC Speaking Actually Looks Like

TOEIC Speaking is one of two computer-delivered tests in the Speaking & Writing family (TOEIC Writing is the other). It's taken at an authorized test center on a provided terminal with a headset. You do not speak to a human interviewer — your responses are recorded and later scored by ETS-certified human raters. There is no pause button, no rewind, no second take.

Feature	TOEIC Speaking
Total tasks	11
Total test time	~20 minutes
Delivery	Computer with headset microphone
Scoring	Human raters (ETS-certified)
Score scale	0-200 (10-point increments)
Additional descriptors	Pronunciation (Low/Medium/High), Intonation & Stress (Low/Medium/High)
Results turnaround	Up to 14 business days
Validity	2 years

The 11 tasks are grouped into five task families, ordered roughly by increasing cognitive demand.

Task	Questions	Prep	Speak	Score Scale
Read a Text Aloud	Q1-2	45 s	45 s	0-3
Describe a Picture	Q3-4	45 s	30 s	0-3
Respond to Questions	Q5-7	3 s	15-30 s	0-3
Respond Using Information Provided	Q8-10	45 s (study)	15-30 s	0-3
Express an Opinion	Q11	45 s	60 s	0-5

How the Scoring Actually Works

Each task is scored on a small-integer rubric (0-3 for Q1-10, 0-5 for Q11). The raw scores across all 11 tasks are combined and scaled to the 0-200 reported range. The conversion table is proprietary to ETS; raw scores are not published. A zero on a task means the response was blank, off-topic, or unintelligible; full marks means the response met every criterion at the task's expected level.

Two crucial structural features:

The rubric is hierarchical. Q1-2 is scored on just two criteria — Pronunciation and Intonation/Stress. Q3-4 adds Grammar, Vocabulary, and Cohesion on top of pronunciation and intonation. Q5-7 adds Relevance and Completeness on top of all previous criteria. Q8-10 maintains the full criteria set. Q11 adds opinion support (reasons, details, examples) and is scored on a larger 0-5 scale because the task permits more variation in quality.

In other words, later tasks are scored on more dimensions than earlier tasks. A response that would score a 3 on Q1 (good pronunciation, good intonation) would not automatically score a 3 on Q3 — it also needs grammatical accuracy, appropriate vocabulary, and cohesive organization.

Pronunciation and Intonation/Stress are reported separately. Your certificate shows a Low/Medium/High descriptor for each. These are not just components of the overall 0-200 score — they're independent reports of your speech quality, visible to anyone who sees the certificate. Low pronunciation alongside a score of 150 tells an employer something different than High pronunciation alongside a 150.

The Five Task Families in Detail

Q1-2: Read a Text Aloud (45s prep + 45s speak)

You see a short text on screen — typically a workplace announcement, advertisement, or narrative (50-80 words). You have 45 seconds to read it silently and prepare. You have 45 seconds to read it aloud.

Rubric: Pronunciation (word-level sound accuracy) and Intonation/Stress (sentence-level rhythm and emphasis). Nothing else. Grammar and vocabulary are not scored because the text is provided — you're not composing.

This is the task that most directly feeds the Low/Medium/High descriptors on your certificate. Performance here visibly carries to every employer who reads the certificate.

Q3-4: Describe a Picture (45s prep + 30s speak)

A photograph appears on screen. You have 45 seconds to prepare and 30 seconds to describe it. The picture is typically a workplace or everyday scene with enough content to support 5-6 descriptive sentences.

Rubric: Pronunciation, Intonation/Stress, plus Grammar, Vocabulary, and Cohesion (connecting ideas smoothly). Five criteria.

Thirty seconds is deliberately tight. You cannot describe everything. Prioritization — opening with setting, hitting the main figures, adding one detail about atmosphere or inferred context — is where scoring separates.

Q5-7: Respond to Questions (3s prep + 15-30s speak)

You are told you're participating in a simulated phone interview or market survey. The first prompt sets the scenario ("Imagine an American marketing firm is doing research about shopping habits in your country. You have agreed to participate..."). Three questions follow, each with 3 seconds of preparation and 15-30 seconds to answer. Q5 and Q6 are shorter (15 seconds); Q7 is longer (30 seconds).

The questions often escalate: Q5 asks a simple factual question, Q6 asks for a preference or experience, Q7 asks for an explanation or opinion with more depth.

Rubric: all prior criteria plus Relevance and Completeness. You must answer the question asked, not a related question, and your answer must actually address the full prompt.

Q8-10: Respond Using Information Provided (45s study + 3s prep + 15-30s speak)

A written document appears on screen — a conference schedule, an agenda, a meeting itinerary, a class syllabus. You have 45 seconds to read and study it. Then three simulated phone questions arrive, each with 3 seconds of preparation and 15-30 seconds to answer. You answer using information from the document.

Rubric: same as Q5-7. Relevance and Completeness are weighted heavily because the information source is provided — failure to transfer it accurately is a direct rubric hit.

This task tests written-to-oral conversion. You're reading a schedule, holding the relevant piece in working memory, and saying it naturally in a sentence — not reading it verbatim.

Q11: Express an Opinion (45s prep + 60s speak)

A workplace or everyday topic appears ("Do you agree or disagree that companies should allow employees to work from home?"). You have 45 seconds to prepare and 60 seconds to deliver a supported opinion.

Rubric: expanded to 0-5 and includes everything from prior criteria plus Opinion Support — reasons, details, examples. A response that states an opinion but doesn't support it caps at 2. A response with reasons but no specific examples caps at 3-4. A 5 requires opinion + at least two supporting reasons + at least one concrete example + cohesive organization.

Q11 is the one task where preparation-time planning makes the biggest visible difference. The 45 seconds should produce a mental outline: thesis + two reasons + one example per reason.

Proficiency Level Descriptors (What 0-200 Actually Maps To)

The 0-200 score maps to eight proficiency levels, each with a qualitative description. Employers and placement officers read the numeric score alongside these descriptors to understand what a candidate can actually do.

Level	Score Range	What It Means
8	190-200	Connected, sustained discourse appropriate to workplace. Highly intelligible throughout.
7	160-180	Effective workplace communication with only minor weaknesses.
6	130-150	Partial success expressing opinions; sometimes difficult to understand.
5	110-120	Limited success with opinions; inconsistent on basic questions.
4	80-100	Unsuccessful explaining opinions; minimal language use.
3	60-70	Can state opinions with difficulty; cannot support them.
2	40-50	Cannot state or support opinions; difficult to understand.
1	0-30	Left significant portions unanswered.

Corporate score thresholds cluster at Level 6 (130-150) for customer-facing roles and Level 7 (160-180) for international-rotation or consulting roles. Very few jobs require Level 8. Knowing your target level shifts the preparation calculus — hitting 150 is a very different game from hitting 190.

The Low/Medium/High Descriptors on Your Certificate

TOEIC Speaking is the only test in the TOEIC family that reports qualitative descriptors separately from the numeric score. Your certificate shows:

Pronunciation: Low / Medium / High
Intonation & Stress: Low / Medium / High

These descriptors correspond roughly to rubric-level performance on Q1-2 (Read a Text Aloud), the task designed specifically to isolate these two dimensions. A candidate who scores 150 with High/High descriptors reads as stronger than a candidate who scores 150 with Low/Medium — even though the numeric score is the same.

For candidates targeting customer-facing or client-side roles, the descriptors can matter more than the numeric score. An employer filtering for "comfortable on English phone calls" will weight Pronunciation and Intonation/Stress descriptors directly, because those are the dimensions that determine whether a customer will understand you.

Pronunciation Low/Medium/High is driven by word-level accuracy: are individual consonants and vowels clearly articulated, are word stresses correct, are speech sounds in the speaker's native-language inventory rather than bleeding through from a first language. Intonation/Stress Low/Medium/High is driven by sentence-level rhythm: do sentences rise and fall naturally, are content words emphasized, do questions sound like questions and statements sound like statements.

How to Prepare by Task Family, Not by "Speaking" in General

A general "speaking practice" plan — casual conversation, random prompts — usually plateaus candidates at 120-140. The test is compartmentalized, and preparation should match.

For Q1-2 (Read Aloud): Daily read-aloud drills of 50-80-word workplace texts, recording and self-reviewing for word stress and sentence intonation. Target: natural rhythm, no monotone, no robotic pauses.

For Q3-4 (Describe a Picture): Build a template (location → main figures → actions → one inference) and practice on 20-30 pictures with a 30-second timer. Target: finish 5-6 sentences in the time limit with visible structure.

For Q5-7 (Respond to Questions): Collect a question bank of simulated survey prompts and drill them with only 3 seconds of thinking time. Target: start speaking within 2 seconds, no extended silent pauses mid-answer.

For Q8-10 (Respond with Info): Practice reading and summarizing schedules/agendas. Target: find the requested piece of info in under 5 seconds, convert it into a grammatical sentence.

For Q11 (Express an Opinion): Drill the thesis + two reasons + one example structure, timed at 45s prep and 60s delivery. Target: never run out of content before the 60 seconds ends; never run out of time before finishing the structure.

This five-part split — rather than undifferentiated speaking practice — is the single biggest shift that turns a 130 into a 160.

Common Mistakes to Drop

Treating Q1-2 as easy warm-up. Q1-2 is the task that feeds the Pronunciation and Intonation/Stress descriptors on your certificate. Underpreparing here costs you qualitative credibility even if the numeric score looks fine.

Underusing preparation time. You get 45 seconds to prepare Q1, Q3, and Q11. Use all of it. Skipping prep and starting early costs you structure, not time.

Overrunning the response window. The microphone cuts off. Running over means your final sentence doesn't exist. Practice with a strict timer.

Memorizing templates word-for-word. Raters hear hundreds of responses and recognize canned openings. Use structural templates (opening → body → close) but vary the surface language.

Ignoring intonation drills. Candidates who have decent pronunciation but flat intonation land a High/Low split on descriptors, which undercuts the certificate's perceived quality. Intonation is a distinct skill that requires distinct drilling.

How Speaking Feeds the Full S&W Profile

TOEIC Speaking is one half of the S&W certificate (Writing is the other, also scored 0-200, also delivered on a computer). Some employers require both; many require only one. Speaking is the more expensive preparation in time and anxiety terms, which is why candidates often skip it even when it would strengthen their profile. For client-facing roles — hospitality, sales, international consulting — a Speaking score is worth more than a comparable L&R score because it's direct evidence of the skill the job actually uses.

On ExamRift, TOEIC Speaking practice is organized by task family with dedicated drill modes for each of the 11 task types. Responses are captured, timed, and evaluated by the AI scoring engine against the rubric criteria for that specific task (Pronunciation, Intonation/Stress, Grammar, Vocabulary, Cohesion, Relevance, Completeness, Opinion Support). The dashboard tracks your performance across dimensions, not just overall score, so you can see whether Pronunciation is trailing Intonation or whether Q11 opinion support is capping your Speaking score below 160. The three-tier rubric is how the test actually works — preparation that mirrors the rubric is how the score actually climbs.

Ready to prepare for TOEIC Speaking task by task? Practice on ExamRift with dedicated drills for all 11 tasks, timed prep-and-speak windows, and AI-evaluated feedback against the actual rubric dimensions raters use.