Yaltopia-Homes-TGClient/docs/MONITORING_SYSTEM.md
2026-01-08 19:06:12 +03:00

367 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 📊 Bot Monitoring System - Easy Configuration Guide
The Yaltipia Telegram Bot includes a comprehensive monitoring system that automatically tracks performance, errors, and sends alerts to administrators. This guide will help you set it up in 5 minutes.
## 🚀 Quick Setup (5 Minutes)
### Step 1: Get Your Chat ID
**Option A: Personal Chat (Recommended for testing)**
1. Start a chat with your bot
2. Send any message to your bot
3. Your Chat ID will be logged in the console (look for `User XXXXXXX:`)
4. Copy this number
**Option B: Group Chat (Recommended for teams)**
1. Create a Telegram group
2. Add your bot to the group
3. Make the bot an admin (required for sending messages)
4. Send any message in the group
5. Look in console logs for the group Chat ID (negative number like `-1001234567890`)
### Step 2: Configure Environment Variables
Add these lines to your `.env` file:
```bash
# REQUIRED: Admin Chat Configuration
ADMIN_CHAT_IDS=YOUR_CHAT_ID_HERE
# OPTIONAL: Advanced Configuration (use defaults if unsure)
MONITORING_TOPIC_ID=5 # For group topics (optional)
HEALTH_CHECK_INTERVAL_MINUTES=30 # How often to check system health
DAILY_REPORT_HOUR=9 # When to send daily reports (24h format)
ERROR_CLEANUP_INTERVAL_HOURS=1 # Memory cleanup interval
```
### Step 3: Test Your Setup
1. Restart your bot
2. You should receive a "🚀 Bot Started" message
3. If you don't receive it, check the troubleshooting section below
## 📋 Configuration Examples
### Example 1: Single Admin (Personal Chat)
```bash
ADMIN_CHAT_IDS=123456789
```
### Example 2: Multiple Admins
```bash
ADMIN_CHAT_IDS=123456789,987654321,555666777
```
### Example 3: Group with Topic (Advanced)
```bash
ADMIN_CHAT_IDS=-1001234567890
MONITORING_TOPIC_ID=5
```
### Example 4: Frequent Monitoring (Every 5 minutes)
```bash
ADMIN_CHAT_IDS=123456789
HEALTH_CHECK_INTERVAL_MINUTES=5
DAILY_REPORT_HOUR=8
```
## 🔧 Environment Variables Explained
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `ADMIN_CHAT_IDS` | ✅ Yes | None | Comma-separated list of Telegram Chat IDs |
| `MONITORING_TOPIC_ID` | ❌ No | None | Topic ID for group chats (advanced) |
| `HEALTH_CHECK_INTERVAL_MINUTES` | ❌ No | 30 | How often to check system health |
| `DAILY_REPORT_HOUR` | ❌ No | 9 | Hour (0-23) to send daily reports |
| `ERROR_CLEANUP_INTERVAL_HOURS` | ❌ No | 1 | How often to clean up error logs |
## 📱 What You'll Receive
### <20> **Srtartup Notifications**
When your bot starts, you'll get:
```
🚀 Bot Started
✅ Yaltipia Home Client TG Bot is now running
🕐 Started at: 08/01/2026, 09:00:00
<EFBFBD>E Platform: win32
<EFBFBD> Nsode.js: v20.19.2
```
### <20> **Error Alerts** (Instant)
When errors occur, you'll receive detailed alerts:
```
🚨 BOT ERROR ALERT
🚨 URGENCY: INVESTIGATE
📍 Context: text_message_handler
❌ Error: Cannot read property 'id' of undefined
🔢 Error Code: N/A
<EFBFBD> User ID: 123456789
🔄 Count (this type): 1
<EFBFBD> Tiotal Errors Today: 3
🕐 Time: 08/01/2026, 14:30:25
<EFBFBD> RECOMMDENDED ACTIONS:
• Check error logs for patterns
• Monitor if error repeats
• Test affected functionality
<EFBFBD> Next Steps:
1. Check logs for more details
2. Test the affected feature
3. Take corrective action if needed
4. Monitor for resolution
```
### 🏥 **Health Alerts** (When Issues Detected)
System health warnings when problems are detected:
```
🚨 SYSTEM HEALTH ALERT
<EFBFBD> STATUS: CRITICAL
⚡ URGENCY: IMMEDIATE ACTION REQUIRED
<EFBFBD> CURREtNT SYSTEM STATUS:
• Memory Usage: 520MB
• Error Rate: 15%
• Uptime: 2h 15m
• Total Errors: 5
• Total Messages: 33
🔍 DETECTED ISSUES:
• High memory usage: 520MB
• High error rate: 15%
🔧 RECOMMENDED ACTIONS:
• Check server resources immediately
• Consider restarting the bot if issues persist
• Monitor user complaints
• Check recent error logs for patterns
• Verify API connectivity
<EFBFBD> Alsert Time: 08/01/2026, 16:52:50
<EFBFBD> Nexlt Steps:
1. Check server logs for details
2. Monitor system for next 15 minutes
3. Take action if issues persist
```
### 📊 **Daily Reports** (Every Morning)
Comprehensive daily reports sent at your configured time:
```
📊 Daily Bot Report
⏱️ Uptime: 24h 15m
<EFBFBD> Total Users: 45
<EFBFBD> Metssages Processed: 1,234
🔔 Notifications Sent: 89
❌ Errors: 2
💾 Memory Usage: 245MB
🏥 Health: healthy
📅 Date: 08/01/2026
```
### <20> **Shutdown Notifications**
When the bot stops (including Ctrl+C):
```
🛑 Bot Shutdown
❌ Yaltipia Home Client TG Bot is shutting down
🕐 Shutdown at: 08/01/2026, 17:30:00
⏱️ Uptime: 8h 30m
📝 Reason: SIGINT (Ctrl+C)
<EFBFBD> Mess*ages processed: 1,234
🔔 Notifications sent: 89
❌ Total errors: 2
```
### 💚 **Regular Health Checks** (If Frequent Monitoring Enabled)
For intervals ≤ 2 minutes, you'll get simple status updates:
```
Health: HEALTHY | Memory: 245MB | Uptime: 120min | Messages: 456 | Notifications: 23 | Errors: 1 | Time: 14:30:25
```
## 🎯 Admin Commands
Once monitoring is configured, admins can use these commands in any chat with the bot:
| Command | Description | Example Response |
|---------|-------------|------------------|
| `/system_status` | Get current system status | Shows uptime, memory, errors, health |
| `/send_report` | Generate manual system report | Same as daily report, on-demand |
| `/notifications_status` | Check notification service status | Shows if notifications are working |
| `/shutdown_bot` | Gracefully shutdown bot | Stops bot with proper cleanup |
### Example System Status Response:
```
<EFBFBD> Systeem Status
⏱️ Uptime: 5h 23m
👥 Active Users: 12
💬 Messages: 456
🔔 Notifications: 23
❌ Errors: 1
💾 Memory: 234MB
🏥 Health: healthy
📅 Started: 08/01/2026, 09:00:00
```
## 🔍 What Gets Monitored
### ✅ **Tracked Events**
- User registrations and login attempts
- Message processing and responses
- Notification creation and delivery
- API calls and responses
- System errors and exceptions
- Memory usage and performance metrics
- Failed login attempts (with user details for admin help)
### 🚨 **Error Types Monitored**
- **Critical Errors**: Uncaught exceptions, unhandled promise rejections
- **User Errors**: Message handling failures, authentication issues
- **Network Errors**: API connection failures, timeout issues
- **System Errors**: Memory issues, performance problems
### 📈 **Performance Metrics**
- **Memory Usage**: RSS, Heap Used, External memory
- **Error Rates**: Percentage of messages resulting in errors
- **Response Times**: How quickly the bot responds
- **User Activity**: Message patterns and usage statistics
## 🛠️ Health Monitoring Details
### Automatic Health Checks
The system automatically checks health at your configured interval:
- **Memory Usage**: Alerts if > 500MB
- **Error Rate**: Alerts if > 10% of messages result in errors
- **System Responsiveness**: Monitors for hanging processes
### Health Status Levels
- **🟢 Healthy**: All systems normal, no issues detected
- **🟡 Warning**: Minor issues detected, monitoring recommended
- **🔴 Critical**: Major issues requiring immediate attention
### Smart Error Alerting
- **Network Errors**: First alert immediately, then every 30 minutes if persisting
- **Critical Errors**: Always alert immediately
- **Regular Errors**: Alert with 10-minute cooldown to prevent spam
- **Similar Errors**: Grouped together to reduce noise
## 🔒 Security & Privacy
- **No Sensitive Data**: Passwords and tokens are masked in logs
- **Admin-Only Access**: Only configured admins can use monitoring commands
- **Secure Error Reporting**: User data is protected in error reports
- **Rate Limiting**: Prevents spam of admin notifications
## 🚨 Troubleshooting
### Problem: Not Receiving Startup Notification
**Possible Causes & Solutions:**
1. **Wrong Chat ID**
```bash
# Check console logs for your actual Chat ID
# Look for: [ACTIVITY] User 123456789: text_message
ADMIN_CHAT_IDS=123456789 # Use the number from logs
```
2. **Bot Not Added to Group**
- Add bot to your group
- Make bot an admin
- Use the group's Chat ID (negative number)
3. **Invalid Environment Variable**
```bash
# Make sure no spaces around the equals sign
ADMIN_CHAT_IDS=123456789 # ✅ Correct
ADMIN_CHAT_IDS = 123456789 # ❌ Wrong
```
### Problem: Topic Messages Not Working
**Solutions:**
1. **Check Topic ID**
```bash
# Make sure topic exists and ID is correct
MONITORING_TOPIC_ID=5
```
2. **Bot Permissions**
- Bot must be admin in the group
- Bot must have permission to send messages in topics
3. **Remove Topic Configuration**
```bash
# Comment out or remove this line to send to main chat
# MONITORING_TOPIC_ID=5
```
### Problem: Too Many/Too Few Health Checks
**Solutions:**
```bash
# For less frequent monitoring (every 30 minutes)
HEALTH_CHECK_INTERVAL_MINUTES=30
# For more frequent monitoring (every 5 minutes)
HEALTH_CHECK_INTERVAL_MINUTES=5
# For testing (every 1 minute)
HEALTH_CHECK_INTERVAL_MINUTES=1
```
### Problem: Memory Usage Shows NaN
**This is automatically fixed in the current version.** If you still see this:
1. Restart your bot
2. The system now properly calculates memory usage
3. You should see actual MB values like "245MB"
### Problem: Bot Permissions Error
**Error Messages & Solutions:**
- `chat not found` → Bot removed from group, re-add it
- `bot was blocked` → User blocked the bot, use different admin
- `not enough rights` → Make bot admin in group
- `topic closed` → Open the topic or remove MONITORING_TOPIC_ID
## 📞 Getting Help
### Quick Diagnostics
1. **Check Console Logs**: Look for `[MONITOR]` messages
2. **Test Configuration**: Restart bot and look for startup notification
3. **Verify Chat ID**: Send message to bot and check console for your ID
4. **Test Commands**: Try `/system_status` command
### Common Log Messages
- `✅ Test message sent successfully` → Configuration working
- `❌ Test message failed` → Check Chat ID and bot permissions
- `✅ Message sent successfully to -1001234567890 (topic 5)` → Topic working
- `⚠️ Topic 5 not found` → Check topic ID or remove it
## 🚀 Benefits
### For Administrators
- **Proactive Issue Detection**: Know about problems before users report them
- **Performance Insights**: Understand how your bot is performing
- **Error Tracking**: Identify and fix recurring issues quickly
- **Usage Analytics**: See how users interact with your bot
### For Users
- **Better Reliability**: Issues are detected and fixed faster
- **Improved Performance**: System optimization based on monitoring data
- **Faster Support**: Admins have detailed error information when helping
## 🎉 You're All Set!
Once configured:
1. **Restart your bot** to activate monitoring
2. **Look for the startup notification** in your configured chat
3. **Test with `/system_status`** to verify admin commands work
4. **Monitor the health alerts** to keep your bot running smoothly
The monitoring system will now help ensure your bot runs reliably and any issues are quickly identified and resolved!