Production Best Practices
Giới thiệu
Trong bài này, chúng ta sẽ tìm hiểu các best practices cho deployment và maintenance của hệ thống backup trong production.
Error Handling
Standardized Error Objects
typescript
export const errorSnapshotLimitExceeded = {
error: 'Snapshot limit exceeded',
code: 'SNAPSHOT_LIMIT_EXCEEDED',
type: 'VALIDATION_ERROR',
};
export const errorSnapshotLimitByScheduledBackup = {
error: 'Cannot create manual snapshot - would exceed limit for scheduled backup',
code: 'SNAPSHOT_LIMIT_BY_SCHEDULED_BACKUP',
type: 'VALIDATION_ERROR',
};Error Handling Pattern
typescript
try {
await checkSnapshotLimit(serverId, limit);
await createSnapshot();
} catch (error: any) {
// Check if it's a known error object
if (error?.error && error?.code && error?.type) {
throw error; // Preserve error structure
}
// Handle unexpected errors
logger.error(`Unexpected error: ${error?.message}`);
throw new Error('Failed to create snapshot');
}Logging Strategy
Winston Logger
typescript
import { logger } from '../../../logger/winston';
// Info logs
logger.info(`✅ Backup created successfully: ${imageId}`);
logger.info(`🔄 Processing backup job: ${resourceId}`);
// Warning logs
logger.warn(`⚠️ Schedule disabled, skipping: ${scheduleId}`);
// Error logs
logger.error(`❌ Backup failed: ${error.message}`);Log Levels
- Info: Normal operations, status updates
- Warn: Recoverable issues, fallbacks
- Error: Failures, exceptions
Monitoring
Queue Statistics
typescript
// Monitor queue health
const stats = await backupQueueManager.getQueueStats();
if (stats.waiting > 100) {
// Alert: Too many waiting jobs
}
if (stats.failed > 50) {
// Alert: Too many failed jobs
}Health Checks
typescript
// Periodic health check
const health = await backupQueueManager.healthCheck();
if (!health.healthy) {
// Alert: Queue health issues
logger.error(`Queue health issues: ${health.issues.join(', ')}`);
}Graceful Shutdown
Shutdown Handlers
typescript
export async function stopBackupScheduler() {
logger.info('🔄 Stopping backup scheduler...');
// Stop cron jobs
if (schedulerTask) {
schedulerTask.stop();
schedulerTask = null;
}
if (cleanupTask) {
cleanupTask.stop();
cleanupTask = null;
}
// Close queue manager
await backupQueueManager.close();
logger.info('📴 Backup scheduler stopped');
}
// Graceful shutdown handlers
process.on('SIGTERM', async () => {
logger.info('🔄 SIGTERM received, stopping backup scheduler...');
await stopBackupScheduler();
process.exit(0);
});
process.on('SIGINT', async () => {
logger.info('🔄 SIGINT received, stopping backup scheduler...');
await stopBackupScheduler();
process.exit(0);
});Job Recovery
Recover Interrupted Jobs
typescript
export async function recoverInterruptedJobs(): Promise<void> {
// Find jobs that were running when app stopped
const interruptedJobs = await BackupSchedule.find({
status: 'running',
});
for (const job of interruptedJobs) {
// Reset status
await BackupSchedule.findByIdAndUpdate(job._id, {
$set: {
status: 'idle',
runningJobId: null,
lastError: null,
}
});
// Reschedule if due
if (shouldReschedule(job)) {
await backupQueueManager.addBackupJob(job, 5000);
}
}
}Performance Optimization
Batch Operations
typescript
// ✅ DO: Process in batches
const batchSize = 100;
const cursor = BackupSchedule.find({...}).batchSize(batchSize).cursor();
for await (const schedule of cursor) {
// Process schedule
}
// ❌ DON'T: Load all at once
const schedules = await BackupSchedule.find({...}); // May be too largeEfficient Queries
typescript
// ✅ DO: Use indexes
BackupSchedule.createIndex({ nextRunAt: 1, status: 1 });
BackupSchedule.createIndex({ enabled: 1, status: 1 });
// ✅ DO: Project only needed fields
BackupSchedule.find({...}, { nextRunAt: 1, status: 1 });Troubleshooting Guide
Common Issues
1. Queue Not Processing
bash
# Check Redis connection
redis-cli ping
# Check queue stats
curl http://localhost:3000/api/v1/backup/queue/stats2. Schedules Not Running
bash
# Check scheduler logs
tail -f logs/backup-scheduler.log
# Check MongoDB schedules
db.backupschedules.find({ enabled: true, status: 'idle' })3. OpenStack Errors
bash
# Check OpenStack credentials
echo $OPENSTACK_ENDPOINT_HCM
echo $OPENSTACK_USERNAME_HCM
# Test authentication
curl -X POST $OPENSTACK_ENDPOINT_HCM/identity/v3/auth/tokens ...Summary
Key Points
Error Handling
- Standardized error objects
- Preserve error structure
- Clear error messages
Logging
- Use Winston logger
- Appropriate log levels
- Context in logs
Monitoring
- Queue statistics
- Health checks
- Alerting
Graceful Shutdown
- Handle SIGTERM/SIGINT
- Close connections
- Save state
Job Recovery
- Recover interrupted jobs
- Reschedule if needed
- Reset status
Next Steps
Bạn đã hoàn thành khóa học!
Quay lại:
- Index - Tổng quan khóa học
- Case Study - Executive summary
Last Updated: 2025-01-25
Previous: 08. Snapshot Limits
Back to: Index