Production Best Practices

Giới thiệu

Trong bài này, chúng ta sẽ tìm hiểu các best practices cho deployment và maintenance của hệ thống backup trong production.

Error Handling

Standardized Error Objects

typescript

export const errorSnapshotLimitExceeded = {
  error: 'Snapshot limit exceeded',
  code: 'SNAPSHOT_LIMIT_EXCEEDED',
  type: 'VALIDATION_ERROR',
};

export const errorSnapshotLimitByScheduledBackup = {
  error: 'Cannot create manual snapshot - would exceed limit for scheduled backup',
  code: 'SNAPSHOT_LIMIT_BY_SCHEDULED_BACKUP',
  type: 'VALIDATION_ERROR',
};

Error Handling Pattern

typescript

try {
  await checkSnapshotLimit(serverId, limit);
  await createSnapshot();
} catch (error: any) {
  // Check if it's a known error object
  if (error?.error && error?.code && error?.type) {
    throw error; // Preserve error structure
  }
  // Handle unexpected errors
  logger.error(`Unexpected error: ${error?.message}`);
  throw new Error('Failed to create snapshot');
}

Logging Strategy

Winston Logger

typescript

import { logger } from '../../../logger/winston';

// Info logs
logger.info(`✅ Backup created successfully: ${imageId}`);
logger.info(`🔄 Processing backup job: ${resourceId}`);

// Warning logs
logger.warn(`⚠️ Schedule disabled, skipping: ${scheduleId}`);

// Error logs
logger.error(`❌ Backup failed: ${error.message}`);

Log Levels

Info: Normal operations, status updates
Warn: Recoverable issues, fallbacks
Error: Failures, exceptions

Monitoring

Queue Statistics

typescript

// Monitor queue health
const stats = await backupQueueManager.getQueueStats();

if (stats.waiting > 100) {
  // Alert: Too many waiting jobs
}

if (stats.failed > 50) {
  // Alert: Too many failed jobs
}

Health Checks

typescript

// Periodic health check
const health = await backupQueueManager.healthCheck();

if (!health.healthy) {
  // Alert: Queue health issues
  logger.error(`Queue health issues: ${health.issues.join(', ')}`);
}

Graceful Shutdown

Shutdown Handlers

typescript

export async function stopBackupScheduler() {
  logger.info('🔄 Stopping backup scheduler...');
  
  // Stop cron jobs
  if (schedulerTask) {
    schedulerTask.stop();
    schedulerTask = null;
  }
  
  if (cleanupTask) {
    cleanupTask.stop();
    cleanupTask = null;
  }
  
  // Close queue manager
  await backupQueueManager.close();
  
  logger.info('📴 Backup scheduler stopped');
}

// Graceful shutdown handlers
process.on('SIGTERM', async () => {
  logger.info('🔄 SIGTERM received, stopping backup scheduler...');
  await stopBackupScheduler();
  process.exit(0);
});

process.on('SIGINT', async () => {
  logger.info('🔄 SIGINT received, stopping backup scheduler...');
  await stopBackupScheduler();
  process.exit(0);
});

Job Recovery

Recover Interrupted Jobs

typescript

export async function recoverInterruptedJobs(): Promise<void> {
  // Find jobs that were running when app stopped
  const interruptedJobs = await BackupSchedule.find({
    status: 'running',
  });
  
  for (const job of interruptedJobs) {
    // Reset status
    await BackupSchedule.findByIdAndUpdate(job._id, {
      $set: {
        status: 'idle',
        runningJobId: null,
        lastError: null,
      }
    });
    
    // Reschedule if due
    if (shouldReschedule(job)) {
      await backupQueueManager.addBackupJob(job, 5000);
    }
  }
}

Performance Optimization

Batch Operations

typescript

// ✅ DO: Process in batches
const batchSize = 100;
const cursor = BackupSchedule.find({...}).batchSize(batchSize).cursor();

for await (const schedule of cursor) {
  // Process schedule
}

// ❌ DON'T: Load all at once
const schedules = await BackupSchedule.find({...}); // May be too large

Efficient Queries

typescript

// ✅ DO: Use indexes
BackupSchedule.createIndex({ nextRunAt: 1, status: 1 });
BackupSchedule.createIndex({ enabled: 1, status: 1 });

// ✅ DO: Project only needed fields
BackupSchedule.find({...}, { nextRunAt: 1, status: 1 });

Troubleshooting Guide

Common Issues

1. Queue Not Processing

bash

# Check Redis connection
redis-cli ping

# Check queue stats
curl http://localhost:3000/api/v1/backup/queue/stats

2. Schedules Not Running

bash

# Check scheduler logs
tail -f logs/backup-scheduler.log

# Check MongoDB schedules
db.backupschedules.find({ enabled: true, status: 'idle' })

3. OpenStack Errors

bash

# Check OpenStack credentials
echo $OPENSTACK_ENDPOINT_HCM
echo $OPENSTACK_USERNAME_HCM

# Test authentication
curl -X POST $OPENSTACK_ENDPOINT_HCM/identity/v3/auth/tokens ...

Summary

Key Points

Error Handling
- Standardized error objects
- Preserve error structure
- Clear error messages
Logging
- Use Winston logger
- Appropriate log levels
- Context in logs
Monitoring
- Queue statistics
- Health checks
- Alerting
Graceful Shutdown
- Handle SIGTERM/SIGINT
- Close connections
- Save state
Job Recovery
- Recover interrupted jobs
- Reschedule if needed
- Reset status

Next Steps

Bạn đã hoàn thành khóa học!

Quay lại:

Index - Tổng quan khóa học
Case Study - Executive summary

Last Updated: 2025-01-25
Previous: 08. Snapshot Limits
Back to: Index

Production Best Practices ​

Giới thiệu ​

Error Handling ​

Standardized Error Objects ​

Error Handling Pattern ​

Logging Strategy ​

Winston Logger ​

Log Levels ​

Monitoring ​

Queue Statistics ​

Health Checks ​

Graceful Shutdown ​

Shutdown Handlers ​

Job Recovery ​

Recover Interrupted Jobs ​

Performance Optimization ​

Batch Operations ​

Efficient Queries ​

Troubleshooting Guide ​

Common Issues ​

1. Queue Not Processing ​

2. Schedules Not Running ​

3. OpenStack Errors ​

Summary ​

Key Points ​

Next Steps ​

Production Best Practices

Giới thiệu

Error Handling

Standardized Error Objects

Error Handling Pattern

Logging Strategy

Winston Logger

Log Levels

Monitoring

Queue Statistics

Health Checks

Graceful Shutdown

Shutdown Handlers

Job Recovery

Recover Interrupted Jobs

Performance Optimization

Batch Operations

Efficient Queries

Troubleshooting Guide

Common Issues

1. Queue Not Processing

2. Schedules Not Running

3. OpenStack Errors

Summary

Key Points

Next Steps