TOEIC Part 1 Photographs: The 6-Question Warm-Up and Its Three Distractor Families

TOEIC Part 1 Photographs: The 6-Question Warm-Up and Its Three Distractor Families

You open the test booklet, see a photograph of a child with a wheelbarrow, and confidently wait for the audio. Four statements play. You pick (B) — "He's moving a wheelbarrow" — because the wheelbarrow is the obvious main object. The correct answer was (A) "He's shoveling some soil." You just lost a point on the easiest section of the TOEIC.

This kind of loss separates test-takers who score 900 from those who stall at 850. Part 1 is the shortest, easiest, and most predictable section on the entire TOEIC L&R. It is also the section where strong candidates leak points — not because the English is difficult, but because the distractor design is engineered to punish inattention.

Understanding how those distractors work is the difference between a Part 1 that reliably delivers 6 out of 6 and one that quietly drops 1-2 points before you've even warmed up.

The Format: Six Photographs, One Correct Statement Each

TOEIC Listening Part 1 is the opening section of the Listening test. It consists of 6 photographs, each printed in the test booklet and accompanied by four statements that are played aloud. You hear each set of four statements exactly once. You do not see the statements on paper — only the photograph.

Feature Detail
Number of questions 6
Statements per question 4 (A, B, C, D)
Statements printed in booklet? No — audio only
How many times played? Once
Audio length per question ~8 seconds of statements
Time to mark answer ~5 seconds between items
Accents American, British, Canadian, Australian
Total section time ~3-4 minutes

Your job is to choose the statement that best describes the photograph. Notice that word — best. Multiple statements may be partially accurate. Only one is the best description.

There is no time pressure inside Part 1 itself. The audio paces you at ~13 seconds per question, and you cannot go back. This is the most relaxed section of the entire 2-hour test, which is precisely why many candidates treat it too casually.

Why Part 1 Matters Despite Being "Easy"

Part 1 is worth 6 questions out of 100 in the Listening section — only 6% of your Listening score by raw count. It is tempting to dismiss it as a warm-up, but this view is wrong for three reasons.

First, every point counts at the top of the scale. Scaled scoring on TOEIC L&R is non-linear. The difference between a raw 96 and a raw 100 in Listening can be the difference between 480 and 495 on the scaled score. Losing 1-2 points on Part 1 is where 900+ candidates fail to hit 950+.

Second, Part 1 is your pacing warm-up. The first six questions set your listening rhythm for the next 99. If you spend Part 1 distracted or second-guessing, you will carry that unsteadiness into Part 2 (25 questions) and Part 3 (39 questions) — where a single moment of inattention costs you far more.

Third, Part 1 is the only section where perfection is genuinely achievable through preparation. Part 3 and Part 4 contain enough genuine difficulty that even native speakers occasionally miss items. Part 1 does not. Every distractor pattern has been cataloged. If you know what to look for, 6/6 is a reasonable goal every time.

The Three Major Distractor Families

TOEIC Part 1 distractors fall into three recurring families. Once you can name them, you can spot them in real time as the audio plays.

Family 1: Similar-Sound Distractors

These are statements that contain a word acoustically similar to a correct description, but with a completely different meaning. The test-taker hears the beginning of the word, matches it to something they see in the picture, and commits to the answer before the sentence finishes.

Classic pairs that have appeared in published TOEIC material:

  • writing vs riding — a person holding a pen versus a person on a horse
  • copies vs coffee — office paperwork versus a mug
  • walking vs working — someone in motion versus someone at a desk
  • shopping vs chopping — retail versus a kitchen
  • drawer vs door — furniture versus a room entrance
  • reading vs leading — an open book versus a group in motion

The defense against similar-sound traps is listen to the entire statement. Do not commit on the first content word you hear. The TOEIC deliberately front-loads statements with words that match the picture, then flips them with a different verb or object. If you lock in on the first syllable, you will walk into the trap. This is especially dangerous for candidates who process English audio by catching high-content nouns and inferring the rest — Part 1 punishes that strategy directly.

Family 2: Half-Right Descriptions

These are statements that mention something genuinely present in the picture but describe it incorrectly. The half that is right pulls you in. The half that is wrong is the trap.

Typical patterns:

  • Right object, wrong action — "He's eating lunch" when the picture shows someone setting a table (food and dining are visible, but no eating is happening)
  • Right person, wrong location — "The woman is sitting at a desk" when she is actually standing next to the desk
  • Right action, wrong object — "She's putting on a hat" when she's putting on a pair of shoes
  • Right setting, wrong number — "The men are walking in the park" when only one person is walking and another is seated

Half-right distractors exploit a basic cognitive shortcut: when we hear language that partially matches what we see, our brain tends to confirm the match rather than verify every element. On a TOEIC statement, every content word — subject, verb, object, modifier — must match the picture. A statement that is 75% accurate is 100% wrong.

The classic example from published TOEIC material: a picture of a woman near a television with a power cord in her hand. The four statements:

  • (A) A woman is putting on a pair of shoes.
  • (B) A woman is dusting a television screen.
  • (C) A woman is watching television.
  • (D) A woman is plugging a power cord into an outlet.

(B) and (C) are the half-right traps. A television is in the picture, so "dusting a television screen" and "watching television" both contain matching vocabulary. But she is neither dusting nor watching — she is plugging in the cord. (D) is correct because every element is accurate: woman, plugging, power cord, outlet.

Family 3: Voice and Tense Errors

This is the most subtle distractor family and the one that trips up candidates who are otherwise strong listeners. The statement uses a grammatically correct structure that is close to what is happening in the picture — but the voice (active vs passive) or tense is wrong for what the image shows.

The core patterns:

  • Active vs passive mismatch — "The documents are being signed" suggests an ongoing action by a human, while "The documents have been signed" suggests a completed state. A picture showing signed documents stacked on a desk (no hand in frame) would favor the second, not the first.
  • Present continuous vs simple present — "He is sitting in a chair" describes an action in progress; "Chairs are arranged around a table" describes a state. Both can be grammatically fine, but only one matches the photograph.
  • Present continuous vs present perfect — "She is opening the door" requires motion visible in the frame; "The door has been opened" describes the resulting state.

"The shelves are being stocked" sounds plausible in a warehouse photo — but if no person is visible performing the stocking action, the correct statement will usually describe a state ("Boxes are stacked on the shelves") rather than an ongoing action.

The rule of thumb: present continuous requires a visible, in-progress action by the subject of the sentence. If you can't see it happening, the statement is probably wrong.

Secondary Traps Worth Knowing

Beyond the three major families, a handful of smaller trap types account for most remaining Part 1 losses.

Overly specific vocabulary. A statement uses a precise technical term that looks impressive but describes the wrong thing — "She's operating a forklift" in a photo where she's simply standing near one. Professional-sounding vocabulary is not a guarantee of correctness.

Absent-feature distractors. The statement describes something plausible for the setting but not actually visible. A conference-room picture might attract "The projector is being turned on" even if no projector is in the frame. If you can't see it, it isn't there.

Inference traps. "They're about to leave" or "He's going to sit down" require you to predict an action rather than describe one. TOEIC correct answers almost never rely on prediction — they describe what is visibly happening right now.

Relative position errors. "The man is standing behind the woman" when he's actually next to her. Prepositions (behind, beside, across from, between) are common trap surfaces because they pass quickly in audio.

People Photos vs Object/Scene Photos

Part 1 photographs fall into two broad categories, each with different distractor patterns.

People photos (typically 4 of the 6 questions) show one or more humans performing an action. Correct answers usually use a present continuous verb — is shoveling, is plugging in, is reaching for. Distractors tend to come from the half-right and voice/tense families. Identify the single most specific action the subject is performing and wait for a statement that names exactly that action.

Object and scene photos (typically 2 of the 6 questions) show workplaces, equipment, or streets without a clear human subject. Correct answers tend to use passive voice or stative descriptions — are stacked, has been left, are arranged, is mounted on. Similar-sound distractors are more common here because there's no obvious action anchor to compare against.

Recognizing which type of photo you're looking at in the first two seconds primes your ear for the right grammatical pattern in the audio.

The 30-Second Photo-Scan Habit

Part 1 gives you only ~1.5 seconds of silence between the directions ending and the first audio starting. But you can scan the photographs during the directions, which last about 30 seconds. A disciplined scan of all six photos gives you enormous advantage.

What to look for in each photo:

  1. Who is the subject? One person, multiple people, or no people at all.
  2. What is the single most specific action? If a person is present, what one verb captures what they're doing? Reaching, writing, pointing, pouring.
  3. What is the specific object? Not just "tool" but "wrench." Not "vehicle" but "forklift." Specific nouns beat general ones in correct answers.
  4. What is the setting? Office, warehouse, outdoor, kitchen, airport. Setting narrows the vocabulary field.
  5. What is NOT in the picture? Any obvious absence helps you rule out absent-feature distractors before the audio starts.

The photo-scan habit costs nothing and turns Part 1 into a prediction game. By the time the audio starts, you already have candidate sentences in your head. The audio's job is just to confirm or correct you.

Vocabulary Clusters Worth Drilling

Part 1 photographs recur across test dates within a predictable range of workplace and daily-life settings. Building active vocabulary in these clusters — not just recognition but quick retrieval — pays off directly.

Cluster Typical Vocabulary
Office supplies stapler, file cabinet, drawer, binder, printer, photocopier, in-tray, swivel chair, cubicle, whiteboard
Outdoor scenes pavement, curb, awning, bench, fountain, railing, streetlamp, pedestrian crossing, shrub, lawn
Industrial / warehouse forklift, pallet, crate, conveyor belt, hard hat, safety vest, loading dock, ladder, toolbox, machinery
Travel boarding pass, carry-on, overhead bin, ticket counter, luggage cart, gate, escalator, platform, terminal, lobby
Food service tray, apron, counter, register, menu, utensil, pitcher, napkin, condiment, display case

Beyond nouns, Part 1 rewards active control of action verbs in present continuous: reaching, bending, leaning, crouching, stacking, wiping, folding, hanging, kneeling, gripping, adjusting, aligning. Most test-takers recognize these in writing but hesitate to identify them in real-time audio. Passive recognition is not enough.

A Worked Example

Consider a photograph of a child at a gravel pile, holding a shovel, with a wheelbarrow nearby. The audio plays:

  • (A) He's shoveling some soil.
  • (B) He's moving a wheelbarrow.
  • (C) He's cutting some grass.
  • (D) He's planting a tree.

Run the distractor analysis:

  • (A) — Action visible (shoveling) + object correct (soil/gravel). Candidate.
  • (B) — Half-right: wheelbarrow is in the picture, but he is not moving it.
  • (C) — Absent feature: no grass-cutting action.
  • (D) — Absent feature: no tree being planted.

Correct: (A). The lesson: the most specific action verb that matches what's visibly happening wins. The wheelbarrow being in the picture is irrelevant if no one is moving it.

What 6-for-6 Looks Like in Practice

Candidates who reliably get all six Part 1 questions correct share a few habits:

  1. They scan all six photos during the directions, not during the audio.
  2. They wait for every statement to finish — no matter how confident they feel after word three.
  3. They mentally name the verb tense and voice expected for each photo (active vs passive, continuous vs stative) before the audio starts.
  4. They do not go back. Once the next audio starts, the previous question is closed. Second-guessing Part 1 while Part 2 is playing is the fastest way to lose points on Part 2.

None of these habits require advanced English. They require attention, pattern recognition, and trust in the strategy.

How ExamRift Trains Part 1 Skills

On ExamRift, TOEIC Listening Part 1 practice is built around the distractor families described above. Every practice item includes the photograph, four spoken statements with native-accent audio (American, British, Canadian, Australian), and a post-answer supplement that labels each distractor by family — similar-sound, half-right, voice/tense, absent feature — so you can see why each wrong option was wrong.

Each practice set includes vocabulary supplements drawn from the photograph's setting (office, warehouse, outdoor, travel, food service), and the dashboard tracks which distractor family accounts for your most frequent errors, letting you target your specific weakness rather than grinding through generic sets.

Part 1 should be the reliable 6-for-6 section that anchors your Listening score. With systematic exposure to all three distractor families and the 30-second photo-scan habit, it becomes exactly that.


Ready to turn Part 1 into the easiest perfect score on the TOEIC? Practice TOEIC Listening Part 1 on ExamRift and learn to spot every distractor family before the audio finishes playing.