Troubleshooting Methods

Universal troubleshooting loop

Troubleshooting in networking is a controlled reduction of possibilities. You are not trying commands. You are building evidence, eliminating entire categories of failure, and converging on the smallest possible fault domain.

At any moment, you should be able to answer:

What is the current scope (who… what… where…)?
What have I proven works (and therefore can be deprioritized)?
What is the next test and what will it eliminate?

Every structured method uses the same loop; methods differ only in where you start and how you traverse.

What define the problem really means

Lock down scope immediately:

Impact: single host, subnet/VLAN, site, WAN, global
Symptom type:
- No connectivity (hard down)
- Intermittent (flaps, drops, timeouts)
- Performance (latency, loss, throughput)
- Reachability vs name resolution vs application
Reproducibility: consistent, time-based, load-based
Recent change: config change, code upgrade, cabling, circuit, policy push

This prevents wasting time on irrelevant layers or devices.

Map tests to OSI layers

When you run a test, know what layer(s) it validates.

Key engineering point: A successful test does not prove everything is fine. It proves this specific thing worked under these conditions. For example, a ping can succeed while TCP 443 fails due to ACL/policy; a TCP handshake can succeed while large transfers fail due to MTU/PMTUD issues.

Structured troubleshooting

1) Top-Down (start at L7, go down)

When to use

User reports an application symptom (web app, email, DNS name fails, authentication fails)
Multiple lower-layer checks already show clean (links up, routing stable)
You suspect DNS, proxy, TLS, app policy, server-side, or L4 policy

How it narrows the scope

Troubleshooting Methods

Universal troubleshooting loop

What define the problem really means

Map tests to OSI layers

Structured troubleshooting

1) Top-Down (start at L7, go down)

When to use

How it narrows the scope

Typical command set (Cisco + generic)

What top-down eliminates quickly

2) Bottom-Up (start at L1, go up)

When to use

Method logic

Cisco-focused workflow and commands

L1: Physical

L2: Switching (VLAN/trunk/STP/MAC)

L3: IP/routing

L4+: policy edges (ACL/NAT/QoS)

3) Divide-and-Conquer (start at L3, then go up or down)

When to use

Core approach

Practical detail (important)

4) Follow-the-Path (hop-by-hop)

When to use

Steps

Commands

5) Swap Components (substitution isolation)

When to use

What to swap (controlled)

6) Perform Comparison (working vs broken)

When to use

What to compare

Practical runbook starter

CLI quick sets to memorize

L1

L2

L3

Policy/edge

Exam tips

Summary