Part 2: Simulate Failures

Part 1: Basic Workflow

Part 2: Failure Simulation

In this part, you'll simulate failures to see how Temporal handles them. This demonstrates why Temporal is particularly useful for building reliable systems.

The key concept here is durable execution: your workflow's progress is saved after every step. When failures and crashes happen (network issues, bugs in your code, server restarts), Temporal resumes your workflow exactly where it stopped. No lost work, no restarting from the beginning.

What you'll accomplish:

Crash a server mid-transaction and see zero data loss
Inject bugs into code and fix them live

Difficulty: Intermediate

Ready to break some stuff? Let's go.

Experiment 1 of 2: Crash Recovery Test

Unlike other solutions, Temporal is designed with failure in mind. You're about to simulate a server crash mid-transaction and watch Temporal handle it flawlessly.

The Challenge: Kill your Worker process while money is being transferred. In traditional systems, this would corrupt the transaction or lose data entirely.

What We're Testing

Worker

→

CRASH

→

Recovery

→

Success

Before You Start

Worker is currently stopped

You have terminals ready (Terminal 2 for Worker, Terminal 3 for Workflow)

Web UI is open at http://localhost:8233

What's happening behind the scenes?

The Temporal Server acts like a persistent state machine for your Workflow. When you kill the Worker, you're only killing the process that executes the code - but the Workflow state lives safely in Temporal's durable storage. When a new Worker starts, it picks up exactly where the previous one left off.

This is fundamentally different from traditional applications where process crashes mean lost work.

Instructions

Step 1: Start Your Worker

First, stop any running Worker (Ctrl+C) and start a fresh one in Terminal 2.

Worker Status: RUNNING

Workflow Status: WAITING

Terminal 2 - Worker

python run_worker.py

go run worker/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"

npm run worker

dotnet run --project MoneyTransferWorker

bundle exec ruby worker.rb

Step 2: Start the Workflow

Now in Terminal 3, start the Workflow. Check the Web UI - you'll see your Worker busy executing the Workflow and its Activities.

Worker Status: EXECUTING

Workflow Status: RUNNING

Terminal 3 - Workflow

python run_workflow.py

go run start/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.TransferApp"

npm run client

dotnet run --project MoneyTransferClient

bundle exec ruby starter.rb

Step 3: Simulate the Crash

The moment of truth! Kill your Worker while it's processing the transaction.

Jump back to the Web UI and refresh. Your Workflow is still showing as "Running"!

That's the magic! The Workflow keeps running because Temporal saved its state, even though we killed the Worker.

Worker Status: CRASHED

Workflow Status: RUNNING

The Crash Test

Go back to Terminal 2 and kill the Worker with Ctrl+C

Step 4: Bring Your Worker Back

Restart your Worker in Terminal 2. Watch Terminal 3 - you'll see the Workflow finish up and show the result!

Worker Status: RECOVERED

Workflow Status: COMPLETED

Transaction: SUCCESS

Terminal 2 - Recovery

python run_worker.py

go run worker/main.go

mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker"

npm run worker

dotnet run --project MoneyTransferWorker

bundle exec ruby worker.rb

Mission Accomplished! You just simulated killing the Worker process and restarting it. The Workflow resumed where it left off without losing any application state.

tip

Try This Challenge

Try killing the Worker at different points during execution. Start the Workflow, kill the Worker during the withdrawal, then restart it. Kill it during the deposit. Each time, notice how Temporal maintains perfect state consistency.

Check the Web UI while the Worker is down - you'll see the Workflow is still "Running" even though no code is executing.

Experiment 2 of 2: Live Bug Fixing

The Challenge: Inject a bug into your production code, watch Temporal retry automatically, then fix the bug while the Workflow is still running.

Live Debugging Flow

Bug

→

Retry

→

Fix

→

Success

Before You Start

Worker is stopped

Code editor open with activities.py

Ready to uncomment the failure line

Web UI open to watch the retries

What makes live debugging possible?

Traditional applications lose all context when they crash or fail. Temporal maintains the complete execution history and state of your Workflow in durable storage. This means you can:

Fix bugs in running code without losing progress
Deploy new versions while Workflows continue executing
Retry failed operations with updated logic
Maintain perfect audit trails of what happened and when

This is like having version control for your running application state.

Instructions

This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the deposit() Activity method.

Let your Workflow continue to run but don't start the Worker yet.

Open the activities.py file and switch out the comments on the return statements so that the deposit() method returns an error:

activities.py

@activity.defn
async def deposit(self, data: PaymentDetails) -> str:
    reference_id = f"{data.reference_id}-deposit"
    try:
        confirmation = await asyncio.to_thread(
            self.bank.deposit, data.target_account, data.amount, reference_id
        )
        """
        confirmation = await asyncio.to_thread(
            self.bank.deposit_that_fails,
            data.target_account,
            data.amount,
            reference_id,
        )
        """
        return confirmation
    except InvalidAccountError:
        raise
    except Exception:
        activity.logger.exception("Deposit failed")
        raise

Save your changes and switch to the terminal that was running your Worker.

Start the Worker again:

python run_worker.py

Note, that you must restart the Worker every time there's a change in code. You will see the Worker complete the withdraw() Activity method, but it errors when it attempts the deposit() Activity method.

The important thing to note here is that the Worker keeps retrying the deposit() method:

2024/02/12 10:59:09 INFO  No logger configured for temporal client. Created default one.
2024/02/12 10:59:09 INFO  Started Worker Namespace default TaskQueue money-transfer WorkerID 77310@temporal.local@
2024/02/12 10:59:09 Withdrawing $250 from account 85-150.
2024/02/12 10:59:09 Depositing $250 into account 43-812.
2024/02/12 10:59:09 ERROR Activity error. This deposit has failed.
2024/02/12 10:59:10 Depositing $250 into account 43-812.
2024/02/12 10:59:10 ERROR Activity error. This deposit has failed.
2024/02/12 10:59:12 Depositing $250 into account 43-812.

The Workflow keeps retrying using the RetryPolicy specified when the Workflow first executes the Activity.

You can view more information about the process in the Temporal Web UI. Click the Workflow. You'll see more details including the state, the number of attempts run, and the next scheduled run time.

Your Workflow is running, but only the withdraw() Activity method has succeeded. In any other application, you would likely have to abandon the entire process and perform a rollback.

With Temporal, you can debug and resolve the issue while the Workflow is running.

Pretend that you found a fix for the issue. Switch the comments back to the return statements of the deposit() method in the activities.py file and save your changes.

How can you possibly update a Workflow that's already halfway complete? You restart the Worker.

To restart the Worker, cancel the currently running worker with Ctrl+C, then restart the Worker by running:

python run_worker.py

The Worker starts again. On the next scheduled attempt, the Worker picks up right where the Workflow was failing and successfully executes the newly compiled deposit() Activity method.

Switch back to the terminal where your run_workflow.py program is running, and you'll see it complete:

Transfer complete.
Withdraw: {'amount': 250, 'receiver': '43-812', 'reference_id': '1f35f7c6-4376-4fb8-881a-569dfd64d472', 'sender': '85-150'}
Deposit: {'amount': 250, 'receiver': '43-812', 'reference_id': '1f35f7c6-4376-4fb8-881a-569dfd64d472', 'sender': '85-150'}

Visit the Web UI again, and you'll see the Workflow has completed. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction!

This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the Deposit() Activity function. Let your Workflow continue to run but don't start the Worker yet.

Open the activity.go file and switch out the comments on the return statements so that the Deposit() function returns an error:

activity.go

func Deposit(ctx context.Context, data PaymentDetails) (string, error) {
    log.Printf("Depositing $%d into account %s.\n\n",
        data.Amount,
        data.TargetAccount,
    )

    referenceID := fmt.Sprintf("%s-deposit", data.ReferenceID)
    bank := BankingService{"bank-api.example.com"}
    // Uncomment the next line and comment the one after that to simulate an unknown failure
    confirmation, err := bank.DepositThatFails(data.TargetAccount, data.Amount, referenceID)
    // confirmation, err := bank.Deposit(data.TargetAccount, data.Amount, referenceID)
    return confirmation, err
}

Ensure you're calling bank.DepositThatFails.

Save your changes and switch to the terminal that was running your Worker. Start the Worker again:

go run worker/main.go

You will see the Worker complete the Withdraw() Activity function, but it errors when it attempts the Deposit() Activity function. The important thing to note here is that the Worker keeps retrying the Deposit() function:

2022/11/14 10:59:09 INFO  No logger configured for temporal client. Created default one.
2022/11/14 10:59:09 INFO  Started Worker Namespace default TaskQueue TRANSFER_MONEY_TASK_QUEUE WorkerID 77310@temporal.local@
2022/11/14 10:59:09 Withdrawing $250 from account 85-150.
2022/11/14 10:59:09 Depositing $250 into account 43-812.
2022/11/14 10:59:09 ERROR Activity error. This deposit has failed.
2022/11/14 10:59:10 Depositing $250 into account 43-812.
2022/11/14 10:59:10 ERROR Activity error. This deposit has failed.
2022/11/14 10:59:12 Depositing $250 into account 43-812.

The Workflow keeps retrying using the RetryPolicy specified when the Workflow first executes the Activity.

Your Workflow is running, but only the Withdraw() Activity function has succeeded. In any other application, the whole process would likely have to be abandoned and rolled back.

With Temporal, you can debug and fix the issue while the Workflow is running.

Pretend that you found a fix for the issue. Switch the comments back on the return statements of the Deposit() function in the activity.go file and save your changes.

How can you possibly update a Workflow that's already halfway complete? You restart the Worker.

First, cancel the currently running worker with Ctrl+C, then restart the worker:

go run worker/main.go

The Worker starts again. On the next scheduled attempt, the Worker picks up right where the Workflow was failing and successfully executes the newly compiled Deposit() Activity function.

Switch back to the terminal where your start/main.go program is running, and you'll see it complete:

Transfer complete (transaction IDs: W1779185060, D1779185060)

Visit the Web UI again, and you'll see the Workflow has completed. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction.

This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the deposit Activity method.

Try it out by following these steps:

Make sure your Worker is stopped before proceeding, so your Workflow doesn't finish. Switch to the terminal that's running your Worker and stop it by pressing Ctrl+C.

Open the AccountActivityImpl file and modify the deposit method so activityShouldSucceed is set to false.

Save your changes and switch to the terminal that was running your Worker.

Verify the Workflow is running in the Web UI. If finished, restart it using the Maven command.

Start the Worker again:

mvn clean install -Dorg.slf4j.simpleLogger.defaultLogLevel=info 2>/dev/null
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker" -Dorg.slf4j.simpleLogger.defaultLogLevel=warn

Note, that you must restart the Worker every time there's a change in code. You will see the Worker complete the withdraw Activity method, but it errors when it attempts the deposit Activity method.

The important thing to note here is that the Worker keeps retrying the deposit method:

Withdrawing $32 from account 612849675.
[ReferenceId: d3d9bcf0-a897-4326]
Deposit failed
Deposit failed
Deposit failed
Deposit failed

The Workflow keeps retrying using the RetryPolicy specified when the Workflow first executes the Activity.

Your Workflow is running, but only the withdraw Activity method has succeeded. In any other application, you would likely have to abandon the entire process and perform a rollback.

With Temporal, you can debug and resolve the issue while the Workflow is running.

Pretend that you found a fix for the issue. Switch activityShouldSucceed back to true and save your changes.

How can you possibly update a Workflow that's already halfway complete? You restart the Worker.

To restart the Worker, go to the terminal where the Worker is running and cancel the Worker with Ctrl+C. Then restart the Worker by running the following command:

mvn clean install -Dorg.slf4j.simpleLogger.defaultLogLevel=info 2>/dev/null
mvn compile exec:java -Dexec.mainClass="moneytransferapp.MoneyTransferWorker" -Dorg.slf4j.simpleLogger.defaultLogLevel=warn

The Worker starts again. On the next scheduled attempt, the Worker picks up right where the Workflow was failing and successfully executes the newly compiled deposit Activity method:

Depositing $32 into account 872878204.
[ReferenceId: d3d9bcf0-a897-4326]
[d3d9bcf0-a897-4326] Transaction succeeded.

Visit the Web UI again, and you'll see the Workflow has completed. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction!

This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the deposit Activity function. Let your Workflow continue to run but don't start the Worker yet.

Open the activities.ts file and switch out the comments on the return statements so that the deposit function returns an error:

Ensure you're calling bank2.depositThatFails.

Save your changes and switch to the terminal that was running your Worker. Start the Worker again:

npm run worker

You will see the Worker complete the withdraw Activity function, but it errors when it attempts the deposit Activity function. The important thing to note here is that the Worker keeps retrying the deposit function:

2023-10-11T19:03:25.778Z [INFO] Worker state changed { state: 'RUNNING' }
Withdrawing $400 from account 85-150.
Depositing $400 into account 43-812.
2023-10-11T19:03:29.445Z [WARN] Activity failed {
  attempt: 1,
  activityType: 'deposit',
  taskQueue: 'money-transfer',
  error: Error: This deposit has failed
}
Depositing $400 into account 43-812.

The Workflow keeps retrying using the RetryPolicy specified when the Workflow first executes the Activity.

You can view more information about the process in the Temporal Web UI. Click the Workflow. You'll see more details including the state, the number of times it has been attempted, and the next scheduled run time.

Traditionally, you're forced to implement timeout and retry logic within the service code itself. This is repetitive and prone to errors. With Temporal, you can specify timeout configurations in the Workflow code as Activity options.

Your Workflow is running, but only the withdraw Activity function has succeeded. In any other application, the whole process would likely have to be abandoned and rolled back.

With Temporal, you can debug and fix the issue while the Workflow is running.

Pretend that you found a fix for the issue. Switch the comments back on the return statements of the deposit function in the activities.ts file and save your changes.

How can you possibly update a Workflow that's already halfway complete? You restart the Worker.

First, cancel the currently running worker with Ctrl+C, then restart the worker:

npm run worker

The Worker starts again. On the next scheduled attempt, the Worker picks up right where the Workflow was failing and successfully executes the newly compiled deposit Activity function.

Switch back to the terminal where your npm run client program is running, and you'll see it complete:

Transfer complete (transaction IDs: W3436600150, D9270097234)

Visit the Web UI again, and you'll see the Workflow has completed. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction.

This demo application makes a call to an external service in an Activity. If that call fails due to a bug in your code, the Activity produces an error.

To test this out and see how Temporal responds, you'll simulate a bug in the DepositAsync() Activity method.

Let your Workflow continue to run but don't start the Worker yet.

Open the Activities.cs file and switch out the comments on the return statements so that the DepositAsync() method throws an exception:

MoneyTransferWorker/Activities.cs

[Activity]
public static async Task<string> DepositAsync(PaymentDetails details)
{
    var bankService = new BankingService("bank2.example.com");
    Console.WriteLine($"Depositing ${details.Amount} into account {details.TargetAccount}.");

    // Uncomment below and comment out the try-catch block below to simulate unknown failure
    return await bankService.DepositThatFailsAsync(details.TargetAccount, details.Amount, details.ReferenceId);
    
    /*
    try
    {
        return await bankService.DepositAsync(details.TargetAccount, details.Amount, details.ReferenceId);
    }
    catch (Exception ex)
    {
        throw new ApplicationFailureException("Deposit failed", ex);
    }
    */
}

Save your changes and switch to the terminal that was running your Worker.

Start the Worker again:

dotnet run --project MoneyTransferWorker

Note, that you must restart the Worker every time there's a change in code. You will see the Worker complete the WithdrawAsync() Activity method, but it errors when it attempts the DepositAsync() Activity method.

The important thing to note here is that the Worker keeps retrying the DepositAsync() method:

Running worker...
Withdrawing $400 from account 85-150.
Depositing $400 into account 43-812.
Depositing $400 into account 43-812.
Depositing $400 into account 43-812.
Depositing $400 into account 43-812.

The Workflow keeps retrying using the RetryPolicy specified when the Workflow first executes the Activity.

Your Workflow is running, but only the WithdrawAsync() Activity method has succeeded. In any other application, you would likely have to abandon the entire process and perform a rollback.

With Temporal, you can debug and resolve the issue while the Workflow is running.

Pretend that you found a fix for the issue. Switch the comments back to the return statements of the DepositAsync() method in the Activities.cs file and save your changes.

How can you possibly update a Workflow that's already halfway complete? You restart the Worker.

To restart the Worker, go to the terminal where the Worker is running and cancel the Worker with Ctrl+C, then restart the Worker by running:

dotnet run --project MoneyTransferWorker

The Worker starts again. On the next scheduled attempt, the Worker picks up right where the Workflow was failing and successfully executes the newly compiled DepositAsync() Activity method.

Switch back to the terminal where your Program.cs file in MoneyTransferClient folder is running, and you'll see it complete:

Workflow result: Transfer complete (transaction IDs: W-caa90e06-3a48-406d-86ff-e3e958a280f8, D-1910468b-5951-4f1d-ab51-75da5bba230b)

Visit the Web UI again, and you'll see the Workflow has completed. You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction!

The Withdraw Activity contains a bug, although it hasn't been a problem yet because that statement is currently commented out. You will now uncomment it to expose the bug. You'll see that this causes the Activity to fail, but you'll also see that it retries automatically. More importantly, you'll observe that after you fix the bug, the Workflow Execution that was failing will complete successfully.

Try it out by following these steps:

Whenever you modify the code, you must restart the Worker for the changes to take effect. Press Ctrl+C in the terminal where your Worker is running to stop the Worker so that you can introduce the bug in the next step.
Edit the activities.rb file, uncomment the line in the Withdraw Activity that causes a divide-by-zero error, and save the change
Start the Worker again by running bundle exec ruby worker.rb
Start a new Workflow Execution by running bundle exec ruby starter.rb
Go to the main page in the Temporal Web UI and click the Workflow Execution you just started to view its detail page. You should see that the Withdraw Activity is failing. Click the Pending Activities tab to see the cause, and then click the History tab to return to the previous view.
Edit the activities.rb file, comment out the line with the divide-by-zero error, and then save the change.
Press Ctrl+C in the terminal where your Worker is running and then run bundle exec ruby worker.rb to start it again. The Worker will now use the code that contains the change you made in the previous step.
You should see that the Workflow Execution completes successfully. This will be visible both in the Web UI and in the terminal where you started the Workflow Execution.

Mission Accomplished! You have just fixed a bug in a running application without losing the state of the Workflow or restarting the transaction!

tip

Try This Challenge

Real-World Scenario: Try this advanced experiment:

Change the retry policy in workflows.py to only retry 1 time
Introduce a bug that triggers the refund logic
Watch the Web UI as Temporal automatically executes the compensating transaction

Question to consider: How would you handle this scenario in a traditional microservices architecture?

Summary: What You Accomplished

Congratulations! You've experienced firsthand why Temporal is a game-changer for reliable applications. Here's what you demonstrated:

What You Learned

Crash-Proof Execution

You killed a Worker mid-transaction and watched Temporal recover seamlessly. Traditional applications would lose this work entirely, requiring complex checkpointing and recovery logic.

Live Production Debugging

You fixed a bug in running code without losing any state. Most systems require you to restart everything, losing all progress and context.

Automatic Retry Management

Temporal handled retries intelligently based on your policy, without cluttering your business logic with error-handling code.

Complete Observability

The Web UI gave you full visibility into every step, retry attempt, and state transition. No more debugging mysterious failures.

Summary

Successfully recovered from a Worker crash

Fixed a bug in a running Workflow

Observed automatic retry behavior

Used the Web UI for debugging

Experienced zero data loss through failures

Advanced Challenges

Try these advanced scenarios:

tip

Mission: Compensating Transactions

Modify the retry policy in workflows.py to only retry 1 time
Force the deposit to fail permanently
Watch the automatic refund execute

Mission objective: Prove that Temporal can handle complex business logic flows even when things go wrong.

tip

Mission: Network Partition Simulation

Start a long-running Workflow
Disconnect your network (or pause the Temporal Server container)
Reconnect after 30 seconds

Mission objective: Demonstrate Temporal's resilience to network failures.

Knowledge Check

Test your understanding of what you just experienced:

Q: Why do we use a shared constant for the Task Queue name?

Answer: Because the Task Queue name connects your Workflow starter to your Worker. If they don't match exactly, your Worker will never see the Workflow tasks, and execution will stall indefinitely.

Real-world impact: This is like having the wrong radio frequency - your messages never get delivered.

Q: What happens when you modify Activity code for a running Workflow?

Answer: You must restart the Worker to load the new code. The Workflow will continue from where it left off, but with your updated Activity logic.

Real-world impact: This enables hot-fixes in production without losing transaction state.

Continue Your Learning

Understanding Temporal

Learn core concepts for Temporal apps

→

Take a Free Course

Enroll in Temporal 101 and 102

→

Code Exchange

Explore example applications

→

Experiment 1 of 2: Crash Recovery Test​

What We're Testing

Before You Start​

Instructions​

Step 1: Start Your Worker​

Step 2: Start the Workflow​

Step 3: Simulate the Crash​

Step 4: Bring Your Worker Back​

Experiment 2 of 2: Live Bug Fixing​

Live Debugging Flow

Before You Start​

Instructions​

Summary: What You Accomplished​

What You Learned​

Summary​

Advanced Challenges​

Knowledge Check​

Continue Your Learning​

Understanding Temporal

Take a Free Course

Code Exchange

Experiment 1 of 2: Crash Recovery Test

Before You Start

Instructions

Step 1: Start Your Worker

Step 2: Start the Workflow

Step 3: Simulate the Crash

Step 4: Bring Your Worker Back

Experiment 2 of 2: Live Bug Fixing

Before You Start

Instructions

Summary: What You Accomplished

What You Learned

Summary

Advanced Challenges

Knowledge Check

Continue Your Learning