Factory AI Logo
Back

The Bathtub Curve: How to Master the Three Phases of Asset Failure

Feb 23, 2026

bathtub curve
Hero image for The Bathtub Curve: How to Master the Three Phases of Asset Failure

When a maintenance manager or reliability engineer looks at a "bathtub curve," they aren't looking at a simple chart; they are looking at the lifecycle of their facility’s profitability. At its core, the bathtub curve is a graphical representation of the failure rate of an asset over time. But the real question most professionals are asking is: "How do I use this model to predict when my equipment will fail so I can stop reacting and start optimizing?"

The direct answer is that the bathtub curve allows you to categorize equipment failure into three distinct chronological phases: Infant Mortality (early failure), Normal Life (constant failure rate), and Wear-Out (increasing failure rate). By identifying which phase an asset is currently in, you can move away from "one-size-fits-all" maintenance and instead apply specific strategies—like rigorous commissioning for new assets or predictive monitoring for aging ones—that align with the actual physics of failure.

In 2026, understanding this curve is no longer a theoretical exercise. With the integration of AI-driven sensors and advanced asset management systems, we can now see these curves manifest in real-time data, allowing us to intervene before the "wear-out" phase results in a catastrophic breakdown.

How do I manage the "Infant Mortality" phase without blowing my budget?

The "Infant Mortality" phase, or the early failure period, occurs immediately after an asset is installed or commissioned. During this time, the failure rate is high but rapidly decreasing. This is often counterintuitive to management; why would a brand-new pump or motor fail?

In a modern industrial setting, early failures are typically caused by manufacturing defects, improper installation, poor design, or "out-of-the-box" component flaws. According to research from ReliabilityWeb, up to 40% of new equipment failures can be traced back to improper installation or commissioning.

The Strategy: Precision Commissioning and Burn-in

To manage this phase without overspending, you must shift your focus from "repair" to "verification."

  1. Rigorous Acceptance Testing: Don't just turn the machine on. Use vibration analysis and thermal imaging during the first 50 hours of operation to identify misalignments or electrical imbalances.
  2. The "Burn-in" Period: For critical electronics or sensors, a controlled burn-in period allows components to fail in a safe environment before they are integrated into the main production line.
  3. Standardized Checklists: Use work order software to enforce strict installation protocols. If a technician skips a torque specification or a lubrication step, the asset is effectively pushed back into a high-risk infant mortality zone.

Case Study: The $50,000 Alignment Lesson

Consider a mid-sized chemical processing plant that recently installed ten new 50-HP centrifugal pumps. Within the first 72 hours, three of the pumps suffered bearing seizures. Initially, the team blamed the manufacturer. However, a post-mortem analysis revealed that the pumps were installed with a "soft foot" condition—a common installation error where the machine's feet do not sit flat on the baseplate.

By implementing a mandatory laser alignment verification step in their asset management workflow, the plant eliminated these early-life failures for the remaining seven pumps. The cost of the laser alignment tool was $12,000, but it saved an estimated $50,000 in emergency repair costs and lost production time during that single commissioning cycle. This illustrates that "Infant Mortality" is often a human-induced phase that can be mitigated with precision.

By investing 5-10% more time in the commissioning phase, you can reduce early-life repair costs by as much as 30%. The goal is to "flush out" the defects that are inherent to the manufacturing and installation process so the asset can reach its stable "Normal Life" phase as quickly as possible.

What does the "Constant Failure Rate" phase actually mean for my PM schedule?

Once an asset survives the infant mortality period, it enters the "Normal Life" or "Useful Life" phase. This is the bottom of the bathtub, where the failure rate is relatively low and constant. In this phase, failures are considered "random." They aren't caused by the machine wearing out, nor by installation errors, but by external shocks: power surges, operator errors, or extreme environmental conditions.

The biggest mistake maintenance teams make here is over-scheduling Preventive Maintenance (PM). If the failure rate is truly random, performing a PM every 500 hours won't necessarily prevent a failure that is caused by a random voltage spike. In fact, intrusive PMs during this phase can actually trigger a "mini-bathtub curve" by introducing human error into a stable system.

Transitioning to Condition-Based Monitoring (CBM)

In 2026, the "bottom of the bathtub" is managed through CMMS software that prioritizes Condition-Based Maintenance (CBM) over calendar-based tasks.

  • Monitor the "Stress-Strength" Relationship: Instead of changing oil every six months, use oil analysis to change it only when the lubricating properties degrade.
  • Focus on MTBF: Track your Mean Time Between Failures (MTBF) during this phase. If the MTBF starts to shrink, it’s a signal that your asset is prematurely entering the next phase of the curve.
  • Operator Training: Since failures in this phase are often "random" (caused by external factors), the highest ROI comes from training operators to recognize early signs of distress, such as unusual noises or smells, which are often precursors to these random events.

The Bathtub Curve Decision Framework

To help your team decide which maintenance strategy to apply based on the curve phase, use the following decision framework:

PhasePrimary Failure CauseMaintenance StrategyKey Metric to Watch
Infant MortalityInstallation/Manufacturing DefectsPrecision Commissioning & VerificationFailure Rate (FR)
Normal LifeRandom Events/External ShocksCondition-Based Monitoring (CBM)Mean Time Between Failures (MTBF)
Wear-OutPhysical Degradation/FatiguePredictive Maintenance & ReplacementP-F Interval & Repair Costs

How do I know exactly when the "Wear-Out" phase starts?

The "Wear-Out" phase is the final section of the curve, where the failure rate begins to climb sharply. This is caused by the physical degradation of the asset—friction, fatigue, oxidation, and general "old age." The challenge for any facility manager is identifying the "knee" of the curve—the exact point where the failure rate starts to accelerate.

If you wait too long, you face catastrophic failure and unplanned downtime. If you act too early, you are throwing away perfectly good asset life and wasting capital.

Quantitative Benchmarks for the Wear-Out Transition

Identifying the "knee" requires moving beyond gut feeling and into specific, measurable thresholds. For most industrial rotating equipment, the transition into the wear-out phase can be quantified using these benchmarks:

  1. Vibration Velocity (ISO 10816-3): For a standard motor (Group 1, rigid support), a jump from "Zone B" (Satisfactory) to "Zone C" (Unsatisfactory)—typically exceeding 4.5 mm/s RMS—is a clear indicator that the wear-out phase has begun.
  2. Ultrasound Decibel Increase: A 12-15 dB increase over the established baseline in high-frequency ultrasound (30 kHz) often signals the very beginning of bearing fatigue, long before heat or audible noise occurs.
  3. Oil Debris Analysis: In gearboxes, an increase in the concentration of large ferrous particles (>10 microns) or a sudden spike in ISO Cleanliness Codes (e.g., moving from 18/16/13 to 21/19/16) indicates that the internal surfaces are actively shedding material.

Using Predictive Analytics to Find the "Knee"

This is where AI predictive maintenance becomes essential. By analyzing historical data and real-time sensor inputs, AI can identify subtle shifts in performance that indicate the transition from "Normal Life" to "Wear-Out."

  • Vibration Thresholds: For rotating equipment, a steady increase in the 1x or 2x vibration peaks often signals the beginning of the wear-out phase. For example, in predictive maintenance for pumps, an increase in vibration of just 0.1 inches per second (ips) over a 30-day period can signal that the bearings have entered their final 10% of life.
  • Thermal Signatures: In electrical systems, a gradual rise in operating temperature (even if still within "safe" limits) often precedes the steep climb of the wear-out curve.
  • Economic Life vs. Physical Life: Use your CMMS to track the "Total Cost of Ownership." When the cost of maintaining an aging asset exceeds the amortized cost of a new one, you have reached the economic end of the wear-out phase, regardless of whether the machine is still running.

Is the bathtub curve still relevant in 2026 with AI and IoT?

Some critics argue that the bathtub curve is an oversimplification. They point to the famous Nowlan and Heap study for United Airlines, which found that only 6% of components actually follow the classic bathtub shape. The other 94% follow different patterns, such as "constant probability of failure" or "slowly increasing failure with no distinct wear-out zone."

However, the bathtub curve remains the foundational mental model for Reliability Centered Maintenance (RCM). In 2026, we don't discard the curve; we digitize it.

The Digital Twin Evolution

Modern prescriptive maintenance systems create a "Digital Twin" of the asset. Instead of assuming a theoretical bathtub curve, the system builds a unique curve for that specific machine based on its load, environment, and maintenance history.

  • Real-time Hazard Functions: We no longer look at a static PDF of a curve. We look at a dynamic "Hazard Function" $\lambda(t)$ that updates every hour.
  • Contextual Data: If a conveyor belt is running at 110% capacity in a high-humidity environment, the AI automatically "compresses" the bathtub curve, shortening the Useful Life phase and moving the Wear-Out phase forward in the timeline.

Edge Case: The "No-Wear-Out" Paradox

It is important to note that some assets—specifically modern solid-state electronics and certain high-grade polymers—do not exhibit a traditional wear-out phase. For these assets, the bathtub curve looks more like an "L-shape." They have a high infant mortality rate, but once they reach the "Normal Life" phase, they maintain a constant failure rate until they are rendered obsolete by technology rather than physical failure. Understanding this prevents maintenance teams from performing "preventative" replacements on electronics that are still perfectly functional.

What are the common mistakes when applying this model to real-world assets?

Even with the best software, human error can lead to a misunderstanding of where an asset sits on the curve. Here are the most frequent pitfalls:

  1. Treating All Assets the Same: A lightbulb has a very distinct wear-out phase (it burns out). A complex centrifugal compressor has hundreds of components, each with its own curve. Applying a single "bathtub" strategy to a complex system leads to failure. You must decompose the system into its critical components.
  2. Ignoring the "P-F Interval": The bathtub curve tells you when failures are likely, but the P-F Interval (Point of potential failure to Point of functional failure) tells you how much time you have to react. Misjudging this interval means you might see the wear-out phase coming but still fail to stop the breakdown.
  3. The "Maintenance-Induced Failure" Trap: As mentioned earlier, performing too much maintenance during the "Normal Life" phase can actually create infant mortality conditions. Every time you open a machine, you risk introducing contaminants or misaligning parts.
  4. Data Silos: If your vibration data is in one system and your work order history is in another, you will never see the true shape of your bathtub curve. Integration is the key to visibility.

How do I implement a strategy based on this curve using a CMMS?

To turn the theory of the bathtub curve into a functional maintenance program, you need a structured approach within your CMMS software.

Step 1: Asset Criticality Ranking

Not every asset deserves a deep dive into its failure curve. Use a 5x5 criticality matrix to identify the 20% of assets that cause 80% of your downtime. Focus your curve-mapping efforts here.

Step 2: Establish Baselines

During the "Normal Life" phase, capture baseline data. What is the "normal" temperature, vibration, and power draw? Without a baseline, you cannot identify the "knee" of the wear-out curve.

Step 3: Automate the "Infant Mortality" Guardrails

Set up your CMMS to automatically trigger "Follow-up Inspections" 48 hours, one week, and one month after any major repair or new installation. This forces technicians to check for the early signs of infant mortality failures.

Step 4: Use Weibull Distribution Analysis

For high-level reliability engineering, use the Weibull Distribution—a mathematical formula that describes the bathtub curve. A Weibull shape parameter ($\beta$) of less than 1 indicates infant mortality, $\beta = 1$ indicates random failure, and $\beta > 1$ indicates wear-out. Modern CMMS platforms can calculate these values automatically based on your historical work order data.

Implementation Roadmap: The First 90 Days

If you are starting from scratch, follow this timeline to integrate bathtub curve logic into your operations:

  • Days 1-30 (The Audit): Identify your top 10 most critical assets. Review their past 24 months of work order history. Categorize every failure as "Early Life," "Random," or "Wear-Out."
  • Days 31-60 (The Baseline): Install IoT sensors on these 10 assets. Collect 30 days of "Normal Life" data to establish a statistical baseline for vibration, temperature, and amperage.
  • Days 61-90 (The Integration): Configure your work order software to trigger alerts when sensor data deviates by more than 2 standard deviations from the baseline. This marks your transition from reactive to curve-based predictive maintenance.

What is the ROI of mastering the bathtub curve?

The financial implications of ignoring the bathtub curve are staggering. According to the National Institute of Standards and Technology (NIST), inadequate maintenance practices cost U.S. manufacturers an estimated $222 billion annually.

By mastering the curve, you achieve ROI in three specific areas:

  1. Reduced Capital Expenditure (CapEx): By accurately identifying the wear-out phase, you can extend the life of an asset by 15-20% through targeted interventions, delaying the need for expensive replacements.
  2. Lower Operational Expenditure (OpEx): Eliminating unnecessary PMs during the "Useful Life" phase reduces labor costs and spare parts consumption.
  3. Increased OEE (Overall Equipment Effectiveness): Minimizing both infant mortality and wear-out failures directly increases machine availability and throughput.

In a 24/7 manufacturing environment, moving from a reactive "run-to-fail" model to a curve-based predictive model typically results in a 10x return on investment within the first 18 months of implementation.

Summary: The Future of Failure Modeling

The bathtub curve is more than just a line on a graph; it is a roadmap for asset reliability. In 2026, the most successful maintenance teams are those that recognize the unique challenges of each phase. They use rigorous commissioning to defeat infant mortality, condition-based monitoring to navigate the useful life phase, and predictive analytics to anticipate the wear-out phase.

By integrating these strategies into a robust asset management framework, you don't just react to failures—you master the physics of time and wear that govern your facility.

Tim Cheung

Tim Cheung

Tim Cheung is the CTO and Co-Founder of Factory AI, a startup dedicated to helping manufacturers leverage the power of predictive maintenance. With a passion for customer success and a deep understanding of the industrial sector, Tim is focused on delivering transparent and high-integrity solutions that drive real business outcomes. He is a strong advocate for continuous improvement and believes in the power of data-driven decision-making to optimize operations and prevent costly downtime.