Reference no: EM134010680
Algorithms and Analysis
Graph Algorithms in Action: Modeling a Disease Outbreak
Learning Outcomes
This assessment relates to four learning outcomes of the course which are:
CLO 1: Compare, contrast, and apply the key algorithmic design paradigms: brute force, divide and conquer, decrease and conquer, transform and conquer, greedy, dynamic programming and iterative improvement;
CLO2: Compare, contrast, and apply key data structures: trees, lists, stacks, queues, hash tables and graph representations. Students working on Data Structures Algorithms concepts may find this topic closely aligned with graph representation techniques;
CLO 3: Define, compare, analyse, and solve general algorithmic problem types: sorting, searching, graphs and geometric;
CLO 4: Theoretically compare and analyse the time complexities of algorithms and data structures; and
CLO 5: Implement, empirically compare, and apply fundamental algorithms and data structures to real-world problems.
Overview
Across multiple tasks in this assignment, you will model a disease outbreak in the city of Metropolis as a weighted graph to capture citizen's contact, implement and compare algorithms for tracing the most likely transmission pathways through the population, and design a dynamic programming strategy to allocate scarce antiviral treatment as effectively as possible.
You will address both the structural problem of representing and querying the contact network efficiently, and the algorithmic problem of finding optimal transmission paths and vaccination strategies under real constraints. Some components ask you to critically assess your solutions through both theoretical analysis and controlled empirical experiments, encouraging reflection on the relationship between algorithm design and real-world performance. The assignment emphasises strategic thinking and the ability to communicate algorithmic ideas clearly and effectively. Topics involving Algorithms For Problem Solving are especially relevant to the implementation and analysis requirements described in this project.
Assessment Details
The assignment is broken up into a number of tasks to help you progressively complete the project. Please note that although completing one task can assist with subsequent tasks, each task can be attempted independently. Moreover, the tasks are designed so that even if your implementation does not fully achieve the expected output, you can still discuss and present the theoretical aspects of your algorithmic design through your report.
Task A: Graph Representation
Before the epidemiologists of Metropolis can run any algorithms on the contact network, they need an efficient way to store it. You learn that there are two standard ways to represent a graph: (1) an adjacency matrix; and (2) an adjacency list. The list representation is already implemented for you. Your task is to implement the adjacency matrix representation.
Implementation Task
The current implementation in graph/adjacency list.py stores the contact graph as an adjacency list. You must implement the class in graph/adjacency matrix.py to represent the same graph as an adjacency matrix. Your implementation must expose exactly the same public interface as the matrix, so that all downstream algorithms in Tasks B, C, and D work correctly with either representation by changing a single configuration variable.
Task B: Estimating Infection Risk Over Time
With the contact network built, the epidemiologists of Metropolis need to estimate how dangerous the outbreak will become.
Monte Carlo Simulation
Your task is to compute, for every resident, the probability that they will be infected within a planning horizon of T days. We model the outbreak as a discrete-time process. Each day, every infected resident independently attempts to transmit the virus to each of their neighbours, with the probability of a single-day transmission given by the edge weight wij ∈ (0, 1]. Once infected, a resident remains infectious for the remainder of the horizon.
A Monte Carlo baseline is provided in transmission/monte carlo.py. It simulates the outbreak X times, randomly deciding each day whether transmissions succeed according to the edge weights, and estimates ri,T as the fraction of simulations in which Vi was infected. It converges to the correct answer for large X, but is slow, approximate, and non-deterministic. Algorithm 1 shows its structure - note the nested loops, which will be important in Task C. This section also relates to concepts commonly studied in Data Structures Algorithms coursework.
Algorithm 1 MonteCarlo(G, s, T, X)
Require: Graph G = (V, E); source s; horizon T ; simulations X
Ensure: Estimated risk table where table[t][i] ≈ ri,t
1: for each day t = 1 to T do
2: for each simulation x = 1 to X do
3: Restart from scratch - only s is infected
4: for each day d = 1 to t do
5: for each uninfected resident Vi do
6: for each infected neighbour Vj of Vi do
7: Infect Vi with probability wij
8: table[t][i] ← fraction of simulations where Vi infected
9: return table
Implementation Task
Your task: implement the exact dynamic programming solution in transmission/task b.py using the recurrence above for rt,i (in the box). Your implementation must:
correctly initialise the base cases at t = 0;
fill the risk table row by row, where table[t][i] stores rt,i; and
work correctly with both graph representations from Task A.
Testing your Solution
To run using your dynamic programming solution, change the configuration variable risk solver from "monte carlo" to "task b" in your configuration file. You are provided a test suite to check your solution - to run it, invoke the command python -m tests.task b.
Report Task
Pseudo-code. Write concise pseudo-code for your dynamic programming solution in the style of Algorithm 1.
Limitations of the recurrence. The recurrence is a simplifying approximation of true disease spread - rather than tracking the full joint probability distribution over all possible infection states across the network, which would be exponentially complex, it makes a structural assumption about how transmission pathways interact.
What structural property of our contact network makes the recurrence an approximation rather than an exact calculation?
With reference to Figure 5, explain why the infection risk computed for V1 at t = 3 is only approximate. Identify the specific term that introduces the error and state whether the recurrence over- or under-estimates the true risk.
Identify the class of graphs on which the recurrence gives an exact answer rather than an approximation for rt,i. Justify your answer.
Task C: Analysis of Infection Risk Algorithms
With both algorithms implemented, the epidemiologists of Metropolis need to know which approach to deploy. A large, sparsely connected network (e.g. a city) over a short intervention window and a small, densely connected network (e.g. a submarine crew confined together for months, requiring a long planning horizon) present very different computational challenges - the best algorithm and graph representation for one may perform poorly on the other. In this task, you will theoretically analyse and empirically compare the Monte Carlo baseline and your dynamic programming solution from Task B across both graph representations from Task A.
Students analysing runtime efficiency and complexity comparisons may also refer to Algorithms For Problem Solving resources for additional conceptual support.
Report Task
Your report should include:
Algorithm Complexity Analysis. (2 marks) Strongly justify, with reference to the pseudo-code and the graph representation being used, the theoretical time complexity of all four combinations:
Monte Carlo with an adjacency matrix;
Monte Carlo with an adjacency list;
Dynamic programming with an adjacency matrix; and
Dynamic programming with an adjacency list.
For each combination, identify the specific operation that dominates the runtime and explain how the choice of representation affects its cost.
Empirical Design. Identify which variables are most important when measuring the runtime of the four combinations. Explain which parameters you will vary in your experiments, which you will hold fixed, and why.
Empirical Analysis. Provide a thorough analysis using clearly labeled plots that allow you to compare the runtime of all four combinations.
Reflection. Discuss whether your empirical results align with your theoretical complexity predictions. Make a claim as to which algorithm/representation combination works the best for Metropolis and why. Would your answer change if transmission rates and connections updated daily?
Task D: Antiviral Allocation
The epidemiologists of Metropolis have used the infection risk scores from Task B to assess the risk to every resident. The city's medical authority has now issued a limited supply of C antiviral doses. Vaccinating a resident requires a course of treatment - different residents may require different numbers of doses depending on their age and health status. Your job is to decide who gets vaccinated. You have a limited supply of doses to distribute across the population. Using the infection risk scores computed in Task B as the benefit of vaccination, your goal is to choose which residents to vaccinate so that the total benefit is maximised within the dose budget, with ties broken in favour of the selection using the fewest doses.
Attachment:- Algorithms and Analysis.rar