Guides

What to Do When a Site Is Down

Use alert details, reports, response times, and status codes to quickly decide whether the issue is the website, network, DNS, application, or monitor configuration.

Start with the alert details.

A down alert is your starting point. Review the monitor name, check type, location, response code, response time, and the time the failure began before making changes.

Look for patterns before reacting.

Compare recent checks, locations, response times, and status codes. A single symptom can point to several different causes depending on what changed.

First goal: determine whether the service is actually unavailable.Check the alert details, open the monitor report, and compare what Spark Uptime saw with what you can reproduce from your own browser, server, or network tools.

Begin with a quick confirmation

When a site is marked down, avoid guessing. Confirm the scope of the issue first. A site that is down from multiple locations usually needs faster escalation than a site that only failed from one path, one network, or one specific check type.

1

Read the alert carefully

Review the affected monitor, the check type, the detected status, the location that reported the problem, and the time the down event started.

2

Open the monitor report

Look at recent checks, response times, response codes, and the event timeline. Reports help separate a temporary spike from a sustained outage.

3

Test the target yourself

Open the website, API URL, IP address, port, or keyword page from your own connection. If possible, also test from a server or network outside your office or home connection.

4

Compare the failure type

A timeout, refused connection, DNS failure, SSL/TLS failure, HTTP error, or missing keyword each points to a different part of the stack.

Use the monitor type to narrow the cause

The correct troubleshooting path depends on what kind of monitor reported the down event. Spark Uptime supports website, ping, port, and keyword checks, and each type answers a different availability question.

Website monitor

Use this for homepages, stores, dashboards, and API URLs where HTTP or HTTPS availability matters. Check the HTTP status code, redirects, SSL/TLS behavior, and whether the web server is responding.

Ping monitor

Use this to test whether an IPv4 or IPv6 address responds to ICMP. A failed ping can indicate packet filtering, routing issues, host downtime, or an IP that intentionally blocks ping.

Port monitor

Use this to verify whether a specific TCP service port is accepting connections. If the port is closed or timing out, check the service daemon, firewall, listener, and upstream routing.

Keyword monitor

Use this when the page must contain exact case-sensitive content. If the page loads but the keyword is missing, check the page body, application output, caching, template changes, and redirects.

Interpret common failure signals

The alert and report details usually reveal which layer needs attention. Use the signal below to decide what to inspect first.

Timeout: The target did not respond in time. Check server load, firewall rules, upstream provider issues, overloaded application workers, DNS resolution delays, or network reachability.
Connection refused: The host was reachable, but the service was not accepting the connection. Check whether the web server, API process, or service bound to the monitored port is running.
DNS failure: The hostname could not be resolved or returned unexpected DNS behavior. Check nameservers, records, DNSSEC, recent DNS changes, TTL, and whether the correct hostname is being monitored.
HTTP 4xx: The server responded, but the request was rejected or not found. Check authentication rules, WAF blocks, changed URLs, missing files, permission rules, and redirect behavior.
HTTP 5xx: The web server or application returned an error. Check application logs, PHP or app worker errors, database connectivity, upstream proxy errors, and recent deployments.
SSL/TLS error: HTTPS could not be validated or completed. Check certificate expiration, hostname coverage, chain files, intermediate certificates, protocol support, and recent certificate changes.
Keyword missing: The monitored page loaded, but Spark Uptime did not find the exact case-sensitive keyword. Check page content, capitalization, dynamic rendering, logged-in-only content, and caching/CDN changes.

Check response times before and during the event

Response time trends can show whether the site failed suddenly or degraded first. A rising response time before the down event may point to database pressure, slow upstream services, overloaded workers, heavy traffic, or network congestion.

Sudden drop

A sudden transition from normal response times to down can indicate a deploy, firewall rule, DNS change, service restart, certificate issue, or provider outage.

Gradual slowdown

A slow rise before failure often suggests resource exhaustion, application queue buildup, database latency, or a dependency becoming unstable.

Location pattern

If only one region is affected, investigate routing, CDN behavior, DNS answers, regional firewall rules, or provider transit issues.

Repeated flapping

Frequent down and recovery events can indicate intermittent load, unstable deployments, rate limiting, health check blocking, or a service near capacity.

Review recent changes

Most availability issues are easier to diagnose when you connect the alert time to recent changes. Compare the down time with deployments, DNS edits, firewall changes, certificate renewals, server updates, CDN changes, redirects, or payment/provider maintenance.

1

Check deploy history

Look for application releases, plugin changes, theme edits, dependency updates, configuration changes, or restarts near the alert time.

2

Check DNS and CDN changes

Confirm the hostname still points to the correct destination and that CDN, proxy, cache, or firewall settings are not blocking monitor traffic.

3

Check server health

Review CPU, memory, disk, network, database, web server, and application logs. A monitor alert often appears after the underlying system is already under stress.

Confirm the monitor configuration

If the site appears healthy but the monitor still reports down, review the monitor settings. Confirm the URL, protocol, hostname, IP address, port, and keyword match what you actually want Spark Uptime to check.

Website monitors: Confirm the URL is correct and that redirects lead to a reachable page. Spark Uptime can monitor a URL with or without http:// or https://.
Ping monitors: Confirm the monitored IPv4 or IPv6 address is correct and that the host is expected to respond to ping.
Port monitors: Confirm the port number is correct and that the service is listening publicly on that port.
Keyword monitors: Confirm the keyword is present in the page response exactly as entered, including capitalization.

Communicate if customers may be affected

If the issue affects real users, publish a clear update on your status page or internal incident channel. Keep the message simple: what is affected, when it started, what users may experience, and that investigation is underway.

Good incident updates reduce confusion.Avoid speculation. Use plain language, update as facts change, and post a recovery message once the monitor confirms the service is back online.

After recovery

Once the monitor returns to an operational state, use the report to understand duration, response time changes, and whether the recovery was stable. If the issue was caused by configuration, capacity, DNS, SSL/TLS, or application changes, document the cause and prevention steps.

Confirm stability

Watch several checks after recovery to make sure the service is not still flapping or responding slowly.

Record the cause

Note what failed, what fixed it, who was impacted, and what should change to prevent the same issue.

Tune monitoring

If needed, adjust monitor type, URL, keyword, port, or alert routing so future alerts are more precise.

Update users

If you posted an incident notice, close the loop with a recovery update and any relevant next steps.