Skip to content

Hệ thống Lịch Backup (Schedule System)

Giới thiệu

Trong bài này, chúng ta sẽ tìm hiểu về hệ thống lịch backup sử dụng cron jobs, tính toán nextRunAt, và quản lý status states.


BackupSchedule Model

Schema Structure

typescript
interface IBackupSchedule {
  _id?: string;
  resourceType: 'cloud-server' | 'vps';
  resourceId: string;
  serverId?: string; // OpenStack server ID
  serverName?: string;
  customerEmail: string;
  location?: string; // 'HCM' or 'HNI'
  
  // Schedule configuration
  startHour: number; // 0-23
  intervalDays: number; // 1 (daily), 7 (weekly), 30 (monthly)
  retain: number; // Number of backups to keep
  timezone: string; // 'Asia/Ho_Chi_Minh'
  
  // Template reference
  scheduleTemplateId?: string;
  
  // Status tracking
  status: 'idle' | 'pending' | 'running' | 'completed' | 'failed';
  enabled: boolean;
  
  // Timestamps
  lastRunAt?: Date;
  nextRunAt?: Date;
  startedAt?: Date;
  completedAt?: Date;
  failedAt?: Date;
  
  // Job tracking
  runningJobId?: string;
  retryCount: number;
  lastError?: string;
}

Cron-Based Scheduler

Scheduler Implementation

typescript
import cron from 'node-cron';
import { backupQueueManager } from './BackupQueueManager';
import { scheduleDueJobs, recoverInterruptedJobs } from '../client/BackupScheduleService';

export async function startBackupScheduler() {
  // Initialize queue manager
  await backupQueueManager.initialize();
  
  // Recover interrupted jobs from previous restart
  await recoverInterruptedJobs();
  
  // Main scheduler - check for due schedules every hour at minute 5
  const schedulerTask = cron.schedule('5 * * * *', async () => {
    const now = new Date();
    logger.info(`🔄 Backup scheduler tick at ${now.toISOString()}`);
    
    try {
      const queuedJobs = await scheduleDueJobs(now);
      logger.info(`✅ Backup scheduler completed - Queued ${queuedJobs} jobs`);
    } catch (e: any) {
      logger.error(`❌ Backup scheduler error: ${e?.message || e}`);
    }
  });
  
  // Cleanup task - run daily at 2 AM
  const cleanupTask = cron.schedule('0 2 * * *', async () => {
    logger.info('🧹 Running daily cleanup...');
    await backupQueueManager.cleanOldJobs();
    // Reset error states for old failed jobs
    // ...
  });
  
  schedulerTask.start();
  cleanupTask.start();
  
  logger.info('✅ Backup scheduler started');
}

Cron Schedule Syntax

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
* * * * *

Examples:

  • '5 * * * *' - Mỗi giờ tại phút thứ 5
  • '0 2 * * *' - Hàng ngày lúc 2h sáng
  • '0 0 * * 0' - Hàng tuần vào Chủ nhật lúc 0h

Calculate nextRunAt

Timezone-Aware Calculation

typescript
import moment from 'moment-timezone';

export function calcNextRunAt(schedule: IBackupSchedule): Date {
  const tz = schedule.timezone || 'Asia/Ho_Chi_Minh';
  const now = moment.tz(tz);
  
  // Calculate today at startHour
  const today = now.clone()
    .hour(schedule.startHour)
    .minute(0)
    .second(0)
    .millisecond(0);
  
  // If current time is after today's startHour, schedule for next interval
  if (now.isAfter(today)) {
    return today.add(schedule.intervalDays, 'days').toDate();
  }
  
  // Otherwise, schedule for today
  return today.toDate();
}

Example

typescript
// Schedule: daily at 2 AM, timezone: Asia/Ho_Chi_Minh
// Current time: 2025-01-25 10:00:00 (after 2 AM)

const nextRunAt = calcNextRunAt({
  startHour: 2,
  intervalDays: 1,
  timezone: 'Asia/Ho_Chi_Minh'
});
// Result: 2025-01-26 02:00:00

// If current time: 2025-01-25 01:00:00 (before 2 AM)
// Result: 2025-01-25 02:00:00

Schedule Due Jobs

Check và Queue Due Schedules

typescript
export async function scheduleDueJobs(now: Date): Promise<number> {
  let queuedJobs = 0;
  
  // Find due schedules
  const cursor = BackupSchedule.find({
    enabled: true,
    status: { $in: ['idle', 'completed', 'failed'] },
    $or: [
      { nextRunAt: { $lte: now } },
      { nextRunAt: null }
    ]
  })
    .sort({ nextRunAt: 1, _id: 1 })
    .cursor();
  
  for await (const schedule of cursor) {
    const tz = schedule.timezone || 'Asia/Ho_Chi_Minh';
    const nowTz = moment.tz(now, tz);
    
    // Check if should run this hour
    const shouldRunThisHour = nowTz.hour() === schedule.startHour;
    
    // Check days since last run
    const lastRun = schedule.lastRunAt 
      ? moment.tz(schedule.lastRunAt, tz) 
      : null;
    const daysSinceLastRun = !lastRun 
      ? Infinity 
      : nowTz.clone().startOf('day').diff(lastRun.clone().startOf('day'), 'days');
    const dueByInterval = daysSinceLastRun >= schedule.intervalDays;
    
    // Skip if not due
    if (!(shouldRunThisHour && dueByInterval)) continue;
    
    // Skip if already running or failed too many times
    if (schedule.status === 'running' || 
        (schedule.status === 'failed' && schedule.retryCount >= 3)) {
      continue;
    }
    
    // Update status to pending
    await BackupSchedule.findByIdAndUpdate(schedule._id, {
      $set: {
        status: 'pending',
        nextRunAt: calcNextRunAt(schedule)
      }
    });
    
    // Add to queue
    if (backupQueueManager.isReady()) {
      await backupQueueManager.addBackupJob(schedule);
      queuedJobs++;
    }
  }
  
  return queuedJobs;
}

Status States

Status Lifecycle

idle → pending → running → completed → idle

                       failed → (retry) → running

                              (max retries) → idle

Status Descriptions

StatusDescriptionNext Action
idleWaiting for next runSchedule checks every hour
pendingQueued in BullMQWorker picks up job
runningBackup in progressUpdate on completion/failure
completedBackup succeededReset to idle
failedBackup failedRetry or reset after max attempts

Job Recovery

Recover Interrupted Jobs

typescript
export async function recoverInterruptedJobs(): Promise<void> {
  // Find jobs that were running when app stopped
  const interruptedJobs = await BackupSchedule.find({
    status: 'running'
  });
  
  for (const job of interruptedJobs) {
    // Check if job is still active in queue
    if (job.runningJobId && backupQueueManager.isReady()) {
      const queue = backupQueueManager.getQueue();
      const activeJob = await queue?.getJob(job.runningJobId);
      
      if (activeJob && await activeJob.isActive()) {
        // Job still running, skip
        continue;
      }
    }
    
    // Reset status
    await BackupSchedule.findByIdAndUpdate(job._id, {
      $set: {
        status: 'idle',
        runningJobId: null,
        lastError: null
      }
    });
    
    // Reschedule if due
    const now = new Date();
    if (shouldReschedule(job, now)) {
      await backupQueueManager.addBackupJob(job, 5000); // 5s delay
    }
  }
}

Best Practices

1. Timezone Handling

typescript
// ✅ DO: Always use timezone-aware moment
const nowTz = moment.tz(now, schedule.timezone || 'Asia/Ho_Chi_Minh');

// ❌ DON'T: Use local time
const now = new Date(); // Uses server timezone

2. Status Tracking

typescript
// ✅ DO: Update status at each stage
await BackupSchedule.findByIdAndUpdate(id, {
  $set: { status: 'pending' } // When queued
});
// ... later
await BackupSchedule.findByIdAndUpdate(id, {
  $set: { status: 'running' } // When started
});

3. Calculate nextRunAt Early

typescript
// ✅ DO: Calculate nextRunAt when queuing
await BackupSchedule.findByIdAndUpdate(id, {
  $set: {
    status: 'pending',
    nextRunAt: calcNextRunAt(schedule) // Calculate now
  }
});

// ❌ DON'T: Calculate after completion (may miss schedule window)

Summary

Key Points

  1. Cron-Based Scheduler

    • Check every hour at minute 5
    • Daily cleanup at 2 AM
  2. nextRunAt Calculation

    • Timezone-aware
    • Based on startHour và intervalDays
  3. Status Management

    • Track status at each stage
    • Recover interrupted jobs
  4. Due Check Logic

    • Check hour match
    • Check days since last run
    • Skip running/failed jobs

Next Steps

Trong bài tiếp theo, chúng ta sẽ tìm hiểu về:


Last Updated: 2025-01-25
Previous: 03. Snapshot & Backup
Next: 05. Queue Management

Internal documentation for iNET Portal