Smart Alert Configuration: Avoiding False Alarms While Catching Real Issues
The Alert Fatigue Problem
Traditional monitoring tools often suffer from one of two extremes:
- Over-alerting: Every minor hiccup triggers a notification, leading to alert fatigue and ignored messages
- Under-alerting: Thresholds set too high miss genuine issues until customers complain
The key is finding the balance: get notified about problems that need attention, nothing more, nothing less.
HITS Scout’s Smart Alert Philosophy
We implement a layered alerting strategy designed to minimize false positives while ensuring real downtime never goes undetected.
Layer 1: Consecutive Failure Threshold
Default setting: Alert after 2 consecutive failures
Why: Single check failures often result from temporary network blips, not actual downtime.
Example timeline:
10:00 - Check succeeds (200 OK)
10:05 - Check fails (timeout) ← No alert yet
10:10 - Check fails again (timeout) ← ALERT SENT
10:15 - Check fails (timeout) ← No additional alert
10:20 - Check succeeds (200 OK) ← Recovery notification sent
You can adjust this threshold:
- 1 failure = Maximum sensitivity (good for critical services with strict SLAs)
- 2 failures = Default (balances responsiveness with false positive reduction)
- 3+ failures = Conservative (when you only care about extended outages)
Layer 2: Primary URL Verification
When a discovered child link fails, we verify the primary URL before alerting.
Why: Broken internal links (deleted blog posts, moved pages) shouldn’t wake you at 3 AM. Only failures of the primary URL indicate actual downtime.
Example:
Primary URL: https://example.com
Child link fails: https://example.com/old-article-404
Action:
1. Check primary URL: https://example.com
2. If primary succeeds → Log child failure, include in daily report
3. If primary fails → Send immediate alert (site is actually down)
This prevents thousands of false alerts from normal content management activities like:
- Deleting old blog posts
- Restructuring site navigation
- Removing outdated product pages
Layer 3: Child Link Reporting
Failed child links don’t disappear—they’re just handled differently:
- Logged in the database with status code and timestamp
- Included in daily/weekly summary reports via email
- Visible in the dashboard with filterable views (all/failed/pending)
- No immediate alerts unless the primary URL also fails
This gives you visibility into broken links without interrupting your day.
Alert Channels: Right Message, Right Medium
Not all alerts require the same urgency. HITS Scout supports multiple channels:
Best for: General notifications, daily summaries, low-priority alerts
Configuration:
- Set quiet hours (no emails between 10 PM - 7 AM)
- Choose summary frequency (immediate, hourly, daily)
- Filter by severity (critical only, all failures)
Slack/Discord
Best for: Team notifications, immediate visibility, discussion threads
Configuration:
- Route to specific channels (#monitoring, #incidents)
- Mention specific users or roles (@on-call, @devops)
- Include rich formatting with status codes and response times
Webhooks (Enterprise)
Best for: Custom integrations, PagerDuty, Opsgenie, custom dashboards
Configuration:
- POST JSON payloads to your endpoint
- Include full event context (monitor ID, URL, failure count, region)
- Retry logic with exponential backoff
- Signature verification for security
Advanced Alert Rules
Pro and Enterprise plans offer additional configuration:
Time-Based Rules
Set different alert behaviors based on time of day:
Business Hours (9 AM - 6 PM):
- Alert threshold: 1 failure
- Channels: Slack + Email
- Include: All monitors
Off-Hours (6 PM - 9 AM):
- Alert threshold: 3 failures
- Channels: PagerDuty (on-call only)
- Include: Critical monitors only
Severity-Based Routing
Route different failure types to appropriate channels:
Critical (Primary URL down):
- Immediate Slack notification
- Email to on-call engineer
- PagerDuty incident creation
Warning (Child link broken):
- Daily email summary
- No Slack ping
Info (Slow response time):
- Weekly report only
Maintenance Windows
Schedule blackout periods when you don’t want alerts:
- Deployment windows: Silence alerts during known maintenance
- Recurring maintenance: Every Saturday 2-4 AM
- One-time events: Conference talks, demos, testing periods
Alert History and Analysis
Every alert event is logged with:
- Timestamp (when failure detected)
- Monitor details (URL, check interval, region)
- Failure context (status code, error message, response time)
- Resolution time (when site recovered)
- Notifications sent (which channels, to whom)
Use this data to:
- Analyze downtime patterns: Are failures clustered around deployments?
- Calculate MTTR: Mean time to recovery for your sites
- Audit alert accuracy: Are you getting too many/too few alerts?
- Compliance reporting: Export SLA uptime metrics
Common Alert Configuration Mistakes
❌ Alerting on Every Child Link Failure
Problem: Generates hundreds of alerts for normal content management
Fix: Use primary URL verification (default in HITS Scout)
❌ No Threshold (Single Check Failure Alerts)
Problem: Network blips and temporary issues flood your inbox
Fix: Set threshold to 2+ consecutive failures
❌ Same Alerts for All Monitors
Problem: Critical production site and test site treated equally
Fix: Use monitor tags/groups with different alert rules
❌ Ignoring Alert History
Problem: Repeated issues go unaddressed
Fix: Review monthly reports and address recurring failures
❌ Alert Fatigue Leading to Disabled Notifications
Problem: Miss genuine downtime because you turned alerts off
Fix: Adjust thresholds rather than disabling—find the right balance
Getting Started
Default HITS Scout alert configuration works well for most users:
- 2 consecutive failure threshold reduces false positives
- Primary URL verification prevents child link spam
- Email notifications for immediate issues
- Daily summaries for child link reports
As you grow, customize:
- Add Slack/Discord for team visibility
- Set up maintenance windows for planned downtime
- Configure severity-based routing for different monitor tiers
- Use webhooks for PagerDuty/Opsgenie integration
Configure your first monitor →
FAQ
Q: Can I get SMS alerts?
A: Not directly, but you can use webhooks to integrate with Twilio or similar services.
Q: What’s the maximum alert frequency?
A: We rate-limit to one alert per monitor per 5 minutes to prevent spam, regardless of check interval.
Q: Can I test my alert configuration?
A: Yes! Use the “Test Notification” button on each configured channel to send a sample alert.
Q: What happens if my Slack webhook stops working?
A: We’ll fallback to email and notify you that the Slack integration needs attention.
Q: Can different team members get different alerts?
A: Yes on Pro/Enterprise plans. Configure alert routing based on monitor tags and user roles.