Dependency Chain modelling

A dependency chain is where you have a number of functions that all need to do some part of a piece of work in order to fully deliver it. These functions complete their part and then pass it along to another.

In my last article I showed why your IT requests were likely taking so much longer to service than you expected. But what if you have multiple dependencies chained together? How can you get a rough idea of how long something is going to take when you have a disconnected dependency chain rather than an end to end view of the system. In this article, I’ll explain the process of getting a rough statistical idea of how long something is going to take when you have multiple dependencies all linked together. I’ll explain this through an R script, but the concept is easily transferable.

First things first, here are the libraries I’m using. I only include them here for completeness.

``````library(truncnorm)
library(ggplot2)
library(dplyr)``````

We need to establish some test data before I can get to the, quite simple, method. Feel free to skip the next chapter if you’d prefer to just get the bottom line.

Setting up our dependency chain data

We’ll be modelling IT delays as a function of utilisation and discrete processing time for work. So first things first, let’s generate some utilisation. I’m going to assume a fairly standard range from a healthy-ish 80% up to 100%. It doesn’t feel right to just pluck numbers out of the air for everything in this script so I wrote a function.

Setting Utilisation

``````get_rand_utilisation <- function(lower = 80, upper = 100) {
sample(lower:upper, 1) / 100
}``````

I’m going to assume a rather simplistic view where we have 3 functions completing work. They don’t talk to each other very well and we don’t have a joined up view. But we will have an idea of how long things take in each function.

Let’s assign some utilisation.

``````analysis_util <- get_rand_utilisation()
dev_util <- get_rand_utilisation()
test_util <- get_rand_utilisation()``````

I’ve ended up with:

```Analysis: 90%
Development: 80%
Test: 83%```

Creating a cycle time distribution

I’ve chosen to assign some arbitrary values for mean work processing time. As the delays take up such a vast amount of the total time, I didn’t feel bad about this unscientific method.

```analysis_effort_days <- 1
dev_effort_days <- 3
test_effort_days <- 2```

I then defined a function that would generate a cycle time distribution. The expected wait time was calculated by using the utilisation formula from the previous post.

This was used to create a distribution of 50 items, with a minimum value of 1 and a mean of the total expected elapsed time. The elapsed time is the sum of the processing time and the total expected waiting time based on the utilisation value. I could have made the data more realistic here by introducing an appropriate standard deviation but there seemed to be little value for this exercise.

``````get_cycle_sample <- function(utilisation_perc, processing_time, n = 50) {
wait_time <- (utilisation_perc / (1 - utilisation_perc)) * processing_time
rtruncnorm(n = n, a = 1, mean = wait_time)
}``````

So now I could create my distributions for each function. I’ve cast the output to a frame and piped it into a histogram so you can see what we’re working with. Yes, I should have created a function.

Analysis Cycle Time Distribution

``````analysis <- get_cycle_sample(utilisation_perc = analysis_util, processing_time = analysis_effort_days)
data.frame(analysis) %>% ggplot(aes(x = analysis)) + geom_histogram()``````

Development Cycle Time Distribution

``````dev <- get_cycle_sample(utilisation_perc = dev_util, processing_time = dev_effort_days)
data.frame(dev) %>% ggplot(aes(x = dev)) + geom_histogram()``````

Test Cycle Time Distribution

``````test <- get_cycle_sample(utilisation_perc = test_util, processing_time = test_effort_days)
data.frame(test) %>% ggplot(aes(x = test)) + geom_histogram()``````

Simulating a dependency chain

So from here it’s depressingly straight forward I’m afraid, I simply used the replicate function in R to run a function a set number of times to generate a sample of possible states from our underlying distributions.

The function I used samples each of the dependencies in the chain and sums them to provide a total processing time for the entire dependency chain. This is much like plucking a random work item from the past for each function but doing it 100 times.

``````build_sample <- function() {
sample(analysis, 1) + sample(dev, 1) + sample(test, 1)
}
samples_df <- as.data.frame(unlist(samples))

We now have a list of possible futures based on our historic data, so we can see what the distribution looks like and begin to query it for some answers. But let’s note a few things first.

1. We expect the work to take a total of 6 days (1 + 2 + 3)
2. We are using very realistic utilisation figures, there are many places where 80% utilisation would be a luxury
3. This is a relatively simple context where there are only 3 disconnected functions

Here’s the distribution of expected lead times for our dependency chain.

``samples_df %>% ggplot(aes(x = lead_times)) + geom_histogram()``

Analysing the dependency chain data

The spread is a little narrow as we didn’t apply sensible standard deviations to the individual distributions but I hope that the point is clear.

We were supposed to have 6 days of effort, but now we’re looking at a mean of over 30. It’s actually around 33 days if we want a confidence interval of 85%, which we do. A mean is the most likely, but we’re also going to late half the time.

Even in this rather simple chain, we’re looking at almost 6 times the processing time being wasted in delays. I wonder who many dollars, pounds or euros that adds up to across your entire IT organisation?

Conclusion

This simple example has highlighted the dangers of dependency chains. A single dependency is bad enough, as we’ve proven. But having them all link together is even worse. We only discussed 3 here, but I have seen dependency maps where there are 15, 20, or even 30 different functions required to perform work before something can get to a customer. No wonder we have software releases that take months or years to get out of the door.