TOEIC Speaking Q1-2 Read Aloud: The Pronunciation + Intonation Dual-Axis Rubric

Forty-five seconds to prepare. The text on screen is an eighty-word airport announcement. You read it silently, eyes flicking across the page, thinking "I know these words." The audio prompt plays and you begin. Your pronunciation is accurate — every consonant clear, every vowel the right sound. But you read every sentence at the same pitch, every clause at the same pace, with tiny pauses in the wrong places. The rater scores you 3 on Pronunciation and 1 on Intonation and Stress. The descriptor line on your certificate reads: Pronunciation High, Intonation & Stress Low. An employer reading that split wonders why.

Q1-2 of TOEIC Speaking is the shortest task family — two items, 90 seconds of speaking total — and also the task with the most visible certificate consequence. The two dimensions scored here, Pronunciation and Intonation/Stress, are the only rubric criteria that appear as standalone Low/Medium/High descriptors on your score certificate. A candidate who prepares for Speaking as a whole but neglects the dual-axis structure of Q1-2 ends up with a certificate that undersells their ability — not because their overall score is low, but because the descriptor line tells employers the wrong story.

Understanding what each axis actually measures, and how the rubric scores them independently, is how Q1-2 becomes a reliable 3/3 opening rather than a point leak on the easiest task on the test.

What Read Aloud Actually Looks Like

Q1 and Q2 are the first two items on TOEIC Speaking. Each presents a short written text on screen. You have 45 seconds of preparation time and 45 seconds to read the text aloud. The texts are typically 50-80 words and come from one of three common workplace genres.

Feature	Read Aloud (Q1-2)
Task count	2
Prep time each	45 seconds
Speak time each	45 seconds
Text length	~50-80 words
Text types	Announcements, advertisements, narrative paragraphs
Rubric	Pronunciation (0-3), Intonation & Stress (0-3)
Other criteria?	No grammar, vocabulary, cohesion, or opinion support scored
Total minutes	~3 minutes of the 20-minute test

Three text types recur across TOEIC Speaking Q1-2:

Announcements — public address or workplace PA content ("Attention all passengers on Flight 403...," "Ladies and gentlemen, the conference will reconvene at two o'clock...")
Advertisements — promotional copy for workplace services or products ("Looking for a reliable catering service for your next office event?...")
Narratives — short informational paragraphs ("Samantha Lee was recently promoted to Regional Director...")

Genre matters because intonation patterns differ. An announcement uses authoritative, clear-statement rhythm. An advertisement uses enthusiastic, persuasive rhythm. A narrative uses natural-storytelling rhythm. Reading all three with the same delivery costs you Intonation/Stress points.

The Dual-Axis Rubric Nobody Explains Clearly

Most Speaking preparation materials say "pronunciation and intonation are scored" and leave it there. The actual rubric is more specific — and the two dimensions are scored independently.

Axis 1: Pronunciation (0-3)

Pronunciation measures word-level sound accuracy: are individual consonants and vowels articulated clearly, is each word recognizable to a native ear, are word stresses in the correct syllables.

Pronunciation scoring, compressed:

3 — All words are intelligible. Any accent influence is minor and doesn't impede comprehension.
2 — Occasional mispronunciations, but most words remain clear. Listener may need minor effort.
1 — Frequent mispronunciations. Listener must work to follow.
0 — Pronunciation is so inaccurate that significant portions are unintelligible.

What drives Pronunciation specifically:

Consonant accuracy: /θ/ vs /s/ (think/sink), /v/ vs /w/ (very/werry), /l/ vs /r/ (light/right), final consonants that first-language speakers often drop
Vowel accuracy: short /ɪ/ vs long /iː/ (sit/seat), /æ/ vs /ɛ/ (bat/bet), reduced schwa in unstressed syllables
Word stress: pho-TO-graph vs PHO-to-graph vs pho-to-GRAPH — wrong syllable stress on polysyllabic words is a reliable pronunciation score hit, especially on Latinate words where the stress rule differs from spelling intuition (de-VEL-op, ap-PLI-cable, ca-TE-go-ri-cal-ly)
Reduction patterns: function words like "to," "for," "can," "and" become "tuh," "fer," "kn," "n" in natural speech; reading every function word as stressed breaks natural flow

Axis 2: Intonation & Stress (0-3)

Intonation and Stress measures sentence-level rhythm and emphasis: does speech rise and fall naturally across clauses, are content words emphasized and function words de-emphasized, does the overall prosody sound like English or like recitation.

Intonation/Stress scoring, compressed:

3 — Natural pitch movement across sentences, appropriate stress on content words, cohesive prosodic grouping.
2 — Mostly natural, occasional flat or mis-stressed passages.
1 — Frequently monotone or mis-stressed. Rhythm impedes naturalness.
0 — Complete absence of natural prosody. Word-by-word recitation.

What drives Intonation/Stress specifically:

Sentence-final intonation contour: statements fall at the end; yes/no questions rise; WH-questions fall; unfinished clauses rise slightly before resuming
Content-word stress: nouns, verbs, adjectives, adverbs carry main stress; articles, prepositions, auxiliaries are de-stressed
Rhythm and pacing: English is stress-timed — the gaps between stressed syllables are roughly even, even if the number of unstressed syllables between them varies
Phrase grouping: natural speech groups words into 5-8 syllable chunks separated by brief pauses, not one long breath or word-by-word staccato
Emphasis for new information: the word that carries the "point" of a sentence gets extra stress ("The meeting is on Tuesday, not Monday.")

Why the Two Axes Matter Independently

A candidate can score 3 on Pronunciation and 1 on Intonation/Stress by reading individual words correctly but delivering them in a flat monotone. The certificate then reports Pronunciation: High, Intonation & Stress: Low.

A candidate can also score 1 on Pronunciation and 3 on Intonation/Stress by having a heavy first-language accent on individual sounds but delivering natural English rhythm and emphasis. The certificate reports Pronunciation: Low, Intonation & Stress: High.

Most candidates land a matched pair (both High, both Medium, both Low). The split cases typically flag one of two problems: over-trained on vocabulary pronunciation but under-trained on prosody (high/low split), or native-like rhythm absorbed from listening exposure but phoneme-level sound errors from limited production practice (low/high split). Both are correctable, but you have to know which one is happening.

How to Use the 45-Second Preparation Window

The 45-second prep is not "read silently." It's a structured process. Four things to do in order:

1. Silent read for comprehension (5-10 seconds). Understand what the text is — announcement? ad? narrative? — and what the overall message is. This frames your delivery tone.

2. Mark sentence boundaries and clause breaks (10 seconds). Mentally (you can't mark on screen) note where each sentence ends, where commas suggest short pauses, where clauses group. These are your pause points.

3. Identify content words for stress (10-15 seconds). Which nouns, verbs, adjectives carry the sentence's meaning? Those get stress. Scan each sentence for the "point" word that would be underlined if someone were emphasizing it.

4. Rehearse the first two sentences silently or at whisper volume (10-15 seconds). The opening sets the rater's impression. Don't start cold — have the opening's rhythm loaded before the record light comes on.

Candidates who skip steps 2-3 often deliver sentences with stress on function words (articles, prepositions) because their eye landed there. Candidates who skip step 4 stumble on the first few words, which is exactly where the rater is forming the initial impression.

Sentence Intonation Patterns Worth Drilling

Four core patterns cover 90% of Q1-2 sentences.

Falling statement. Most declarative sentences fall in pitch across the last content word and drop at the end. "The conference will reconvene at two o'clock." — pitch rises slightly on "reconvene," carries into "two," and falls on "o'clock."

Rising yes/no question. Questions that expect yes or no rise in pitch on the final content word. "Will you be attending the meeting?" — rise on "meeting."

Falling WH-question. Questions with who/what/when/where/why/how fall, unlike yes/no questions. "When does the flight depart?" — fall on "depart."

List rise-rise-rise-fall. A list pattern rises on each item except the last, which falls. "Our services include marketing, advertising, graphic design, and public relations." — rise on "marketing," "advertising," "design," fall on "relations."

Q1-2 texts often contain all four patterns in one 60-80-word passage. Recognizing which sentence demands which pattern is the core intonation skill.

Text-Type Specifics: What Each Genre Rewards

The three text types Q1-2 draws from each have distinct delivery patterns. Matching your delivery to the genre lifts Intonation/Stress scoring directly.

Announcement. Public-address style. Authoritative, measured pace, clear enunciation. Opening lines often "Attention...," "Ladies and gentlemen...," "Good morning...." Intonation is slightly flatter than conversational, with firm falls at sentence ends. Content words carry clear emphasis. Useful rule of thumb: imagine you are reading over a loudspeaker, not in a conversation.

Advertisement. Persuasive, warmer, slightly faster. Sentences include descriptive adjectives that deserve emphasis (high-quality, award-winning, convenient, affordable). Rising intonation on key selling points, with invitations ("Come visit us today!," "Don't miss this opportunity!") delivered with enthusiasm but not cartoonish. Overly flat ads read as dismissive; over-enthusiastic ads read as insincere.

Narrative. Informational, story-like, natural conversational rhythm. Often profiles a person or event ("Samantha Lee was recently promoted..."). The pace is slightly slower than advertisement, with natural variation in pitch as the story moves through time markers (first, then, recently, next month). Emphasis falls on the new information in each sentence — typically the last content word in the final clause.

Genre-identify in the first 10 seconds of your 45-second prep, and let the identification shape your delivery register.

L1-Specific Pronunciation Patterns Worth Knowing

Candidates from different first-language backgrounds hit different reliable pronunciation traps. Knowing your L1's pattern lets you target drills efficiently.

L1 Japanese. /l/ vs /r/ (light/right, play/pray), /θ/ vs /s/ (think/sink), /v/ vs /b/ (very/berry), vowel length distinctions (sheet/shit, beat/bit). Syllable-final consonants are often followed by a schwa vowel (book-u instead of book), which breaks rhythm.

L1 Mandarin / Cantonese. Final consonants often dropped or unreleased (take pronounced tay), /l/ and /n/ mergers in some dialects, vowel length distinctions, word stress (Mandarin is tone-based, so English stress-timing requires active retraining).

L1 Korean. /f/ and /p/ mergers, /v/ and /b/ mergers, /z/ and /j/ mergers, /r/ and /l/ distinctions, consonant clusters often split with intrusive vowels (street pronounced suh-street).

L1 Spanish / Portuguese. /iː/ vs /ɪ/ (sheep/ship), /v/ vs /b/ (they're the same sound in many Spanish dialects), initial /s/ + consonant often gets a leading vowel (school pronounced eschool), flap /ɾ/ substituted for tap /t/ or /d/.

L1 Hindi / other South Asian languages. /v/ and /w/ mergers, retroflex /t/ and /d/ substituted for alveolar, syllable-timed rather than stress-timed rhythm (every syllable given roughly equal weight, which flattens English prosody).

You don't need to eliminate every L1 feature — a recognizable non-native accent can still score Pronunciation 3 as long as individual words remain intelligible. But targeting the two or three sounds that most frequently cause comprehension friction pays off per minute of drill more than anything else.

Common Pitfalls That Drop You to Level 1-2

Monotone delivery. Reading every sentence at the same pitch. The most common cause of Intonation/Stress 1. Fix: exaggerate pitch movement during practice until it feels "too much" — it will sound normal on playback.

Staccato word-by-word pacing. Pausing after every single word rather than grouping by phrase. Sounds like a robot reading. Fix: mark phrase groups in prep and breathe through them.

Run-on delivery. The opposite problem: reading through punctuation with no pauses. Sounds like panicked reading. Fix: honor commas with a short pause and periods with a slightly longer one.

Wrong stress on multi-syllable words. DE-ve-lop instead of de-VEL-op. PHO-to-graph-ic instead of pho-to-GRA-phic. Fix: during prep, scan for every word of 3+ syllables and mentally check its stress pattern.

Inflating the first syllable. A nervous-reader tell. Every sentence starts at high pitch with stress on the first word regardless of meaning. Fix: deliberately start sentences at a neutral or slightly low pitch and let the natural content-word stresses do the work.

Reading everything as an announcement. Applying announcement-style formality to an advertisement text or a narrative. Fix: genre-identify in prep and match the tonal register.

What Actually Works: The Daily Drill

Q1-2 improvement is almost entirely volume-driven. Two kinds of drill, done daily for 10 minutes, compound fast.

Shadowing. Listen to a native-speaker reading of a short text (news clip, podcast intro, audiobook paragraph) and read along out loud, matching rhythm, stress, and intonation. Shadow the same 30-60-second clip five times in a row, not five different clips once each. The repetition is the point.

Record-and-review. Take a practice Q1-2 text. Prepare for 45 seconds. Record your 45-second read. Listen back with a transcript. Mark every place where you mis-stressed, mis-pronounced, or went flat. Do the same text again. Compare the two recordings.

Five record-and-review passes on one text, then move to a new text. One text deeply reviewed beats ten texts read once each.

Targeted pronunciation drills. For candidates with a specific first-language pattern (L1 Japanese often needs /l/-/r/ and /θ/-/s/; L1 Spanish often needs /iː/-/ɪ/ and final consonants; L1 Mandarin often needs final consonants and tense-lax vowel pairs), 5-10 minutes of minimum-pair practice per day closes sound-level gaps that record-and-review alone doesn't catch.

How Q1-2 Sets Up the Rest of Speaking

A strong Q1-2 performance does three things for your overall score. First, the descriptors stamped on your certificate are primarily set here — High/High descriptors look good to any employer reading the certificate, regardless of the numeric score. Second, Q1-2 is the task where pronunciation and intonation are isolated; doing them well at baseline makes every later task easier, because you're not fighting foundational clarity problems while also trying to describe a picture or support an opinion. Third, Q1-2 warms up your voice and your rhythm for the harder tasks that follow — candidates who bomb Q1-2 often stay tentative into Q3-4, compounding the damage.

Q1-2 is the task family where preparation pays the cleanest return per hour of practice. Unlike Q11 (where opinion support, grammar, and vocabulary interact), Q1-2 asks only two questions — do words sound right, does rhythm sound right — and both are independently trainable.

How ExamRift Trains Read Aloud

On ExamRift, TOEIC Speaking Q1-2 practice is structured around the dual-axis rubric. Every practice item includes the text, a 45-second silent prep timer, a 45-second recording window, and post-response evaluation on two separate scales — Pronunciation (word-level accuracy, syllable stress, phoneme errors flagged) and Intonation/Stress (pitch contour analysis, content-word stress, phrase-grouping). The dashboard tracks the two axes independently, so you can see if you're landing a matched High/High or a lopsided High/Low descriptor pattern, and the practice bank includes all three text genres (announcement, advertisement, narrative) with genre-appropriate delivery models from native speakers across American, British, Canadian, and Australian accents.

Q1-2 should be the opening pair that anchors your certificate with High/High descriptors. With a structured 45-second prep, record-and-review drilling, and attention to the two axes the rubric actually scores, Read Aloud becomes exactly that.

Ready to lock in High/High descriptors on your TOEIC Speaking certificate? Practice Q1-2 on ExamRift with dual-axis scoring, phoneme-level feedback, and shadowing drills built around the actual text genres on the test.