11.8. Resource Outages

For various reasons, resources can go out of order: machines may fail or may have to get scheduled maintenance, nurses or doctors may get sick or may be out-of-duty or off-shift. Consequently, both planned and unexpected resource outages may have to be modeled.

While human resources being out-of-duty or off-shift can be modeled with the help of service time calendars, scheduled maintenance may be modeled in the form of maintenance activities recurring with a fixed interval. Unexpected resource outages are resource failures.

Modeling Resource Failures

Resource failures can be modeled with the help of two types of events: failure and recovery, and two related random variables: time-to-next-failure and failure time. In this approach, the next resource failure event occurs x time units after a recovery event, where x is obtained by invoking the random variable sampling function timeToNextFailure(). Each failure event triggers a recovery event, which is scheduled with a delay of y time units where y is obtained by invoking the random variable sampling function failureTime().

This simple approach applies both to non-human and human resources.

Further modeling options:

Allow specifying a time-to-first-failure, in addition to the time-to-next-failure.
Allow counting only the busy time of a resource, instead of the total time, for determining its next failure time.

Modeling the Repair of Failed Resources

In activity-based simulation, the basic failure modeling approach can be refined by modeling the repair of a failed non-human resource as an activity whose duration is provided by the random variable repair time and which requires a repair person as a resource, such that the total failure time is the sum of the two random variables repair lead time (the time needed for getting a repair person to start the repair) and repair time.

Then, when a failure event occurs, it triggers a repair activity to start with a delay of x time units and a duration of y time units, where x = repairLeadtime() and y = repairTime(). The next failure event is scheduled to occur z time units after a repair end event, where z is obtained by invoking the random variable sampling function timeToNextFailure().

Modeling the Scheduled Maintenance of Resources

Non-human resources, such as machines, may undergo periodic maintenance for preventing them to fail in the near future. This can be modeled by scheduling periodic maintenance activities every x time units where x = maintenanceIntervall(), along with periodic failure events.

When a maintenance activity starts before the next scheduled failure event has occurred, the simulator has to retract this event from the Future Event List for implementing its prevention by the maintenance activity. The next failure event is scheduled to occur x time units after the maintenance end event, where x is obtained by invoking the random variable sampling function timeToNextFailure().

When a resource fails before its next scheduled maintenance has started, the scheduled maintenance is cancelled (the simulator has to retract the scheduled maintenance start event) and, instead, the failure event triggers a repair activity. Only when the repair activity ends, the next maintenance activity is scheduled with a delay of y time units where y is obtained by invoking the random variable sampling function maintenanceTime().

Further modeling options:

Allow specifying a time-to-first-maintenance, in addition to the maintenance interval time-to-next-maintenance.
Allow specifying the recurrence of maintenance activities with a schedule instead of a (typically fixed) maintenance interval.

Defining General Elements for Modeling Resource Outages

A simulation framework/language should support resource outage modeling by allowing to define for each (non-human) resource object type the three pairs of (typically random variable) time functions introduced above and summarized in the following class diagram.

These pairs of time functions have to be used incrementally: specifying maintenance time functions requires specifying repair time functions, which, in turn, requires specifying failure time functions.

Resource Outage Modeling in AnyLogic

AnyLogic is a state-of-the-art DES modeling tool/framework. It allows defining recurrent failures (not with follow-up recovery events, but only with follow-up repair activities) and maintenance for each Resource Pool. There is the option to use only the busy time of a resource for computing its next failure time. For repair activities, there is no possibility to distinguish between repair lead time and repair time. For maintenance activities, it can be defined which task priority they have and if they are preemptive or not. Neither for repair nor for maintenance activities, performer resources can be assigned.