Reference no: EM134012036
Algorithms and Analysis: Graph Algorithms in Action – Modeling a Disease Outbreak
Learning Outcomes
This assessment relates to four learning outcomes of the course which are:
CLO 1: Compare, contrast, and apply the key algorithmic design paradigms: brute force, divide and conquer, decrease and conquer, transform and conquer, greedy, dynamic programming and iterative improvement.
CLO 2: Compare, contrast, and apply key data structures and algorithms: trees, lists, stacks, queues, hash tables and graph representations.
CLO 3: Define, compare, analyse, and solve general algorithmic problem types: sorting, searching, graphs and geometric problems.
CLO 4: Theoretically compare and analyse the time complexities of algorithms and data structures.
CLO 5: Implement, empirically compare, and apply fundamental algorithms and data structures to real-world problems.
Overview
Across multiple tasks in this assignment, you will model a disease outbreak in the city of Metropolis as a weighted graph to capture citizen contact, implement and compare algorithms for tracing the most likely transmission pathways through the population, and design a dynamic programming strategy to allocate scarce antiviral treatment as effectively as possible.
You will address both the structural problem of representing and querying the contact network efficiently, and the algorithmic problem of finding optimal transmission paths and vaccination strategies under real constraints. Some components ask you to critically assess your solutions through both theoretical analysis and controlled empirical experiments, encouraging reflection on the relationship between algorithms for problem solving and real-world performance. The assignment emphasises strategic thinking and the ability to communicate algorithmic ideas clearly and effectively.
Assessment Details
The assignment is broken up into a number of tasks to help you progressively complete the project. Please note that although completing one task can assist with subsequent tasks, each task can be attempted independently. Moreover, the tasks are designed so that even if your implementation does not fully achieve the expected output, you can still discuss and present the theoretical aspects of your algorithmic design through your report.
Please note that the assignment is broken into 4 tasks. In summary:
Task A: Build the city's contact network into an efficient data structure to power all downstream analysis.
Task B: Calculate every resident's infection risk over time using a precise mathematical approach, identifying who is most in danger.
Task C: Rigorously analyse and compare your risk estimation approach against a baseline method, both theoretically and through real experiments.
Task D: Decide who receives the city's scarce antiviral doses, maximising the benefit to the population within strict supply limits.
Code Base
The assignment codebase is provided via GitHub Classroom - accept the assignment link to have a personal repository automatically created for you. All development must take place in this repository, with your work tracked through regular, meaningful Git commits. Before writing a single line of code, read the README in your repository carefully, as it contains essential setup and configuration instructions. Only modify files explicitly marked for editing - changes outside these sections may break automated testing.
If, upon accepting the assignment, you receive a "Repository Access Issue" error, please look at the inbox associated with the GitHub account you are using - you may need to accept the invitation.
Writing the Report
Templates for the assignment report are available in both Word and Overleaf (LaTeX) formats - use whichever you are most comfortable with. Whilst not required, we strongly recommend using some form of version control for your report writing as well as your code. Tools such as Git, Overleaf's built-in history tracking, or Google Docs' revision history all make it easy to recover earlier drafts and keep a clear record of your progress.
Each Task has a set page limit, with:
Task A: 0 pages (code only)
Task B: 1 page (code and report)
Task C: 2 pages (report only)
Task D: 2 pages (code and report)
Please respect these page limits. Any content that exceeds the page limit for a given task will NOT be marked.
Motivation
The T-Virus has broken out in the city of Metropolis. As the newly appointed Regional Health Director, you have been handed a dossier containing everything the epidemiologists know: a map of every known contact between residents, the probability that each contact transmits the virus in a single day, and the vulnerability of each resident. The clock is ticking—every day the virus spreads further, and your antiviral stockpile is limited.
Your job has four parts:
Build the contact network (Task A) – represent the city's contact relationships as a graph, efficiently enough to run algorithms on in real time (implement graph/adjacency matrix.py).
Estimate infection risk (Task B) – compute the probability that each resident becomes infected over the coming T days, identifying the most at-risk residents before the outbreak grows (implement transmission/task b.py and write report).
Analyse your solution (Task C) – theoretically and empirically compare your algorithm against the provided baseline across different network structures (write report on analysis of Task A and Task B).
Allocate antiviral treatment (Task D) – decide who receives the scarce supply of doses, maximising the benefit to the population within the limits of your stockpile (implement treatment/task d.py and write report).
Figure 2 shows the contact network for a sub-population in Metropolis. Each resident is a node, and each contact relationship is an edge labelled with the daily probability of transmission. Patient zero V0 is already infected—the question is how far the virus will spread, and who should be protected first.
Task A: Graph Representation
Before the epidemiologists of Metropolis can run any algorithms on the contact network, they need an efficient way to store it. You learn that there are two standard ways to represent a graph: (1) an adjacency matrix; and (2) an adjacency list. The list representation is already implemented for you. Your task is to implement the adjacency matrix representation.
Implementation Task
The current implementation in graph/adjacency list.py stores the contact graph as an adjacency list. You must implement the class in graph/adjacency matrix.py to represent the same graph as an adjacency matrix. Your implementation must expose exactly the same public interface as the matrix, so that all downstream algorithms in Tasks B, C, and D work correctly with either representation by changing a single configuration variable.
Testing your Solution
To run using adjacency matrix representation, change the configuration variable graph type from "list" to "matrix" in your configuration file. You are provided a test suite to check your solution—run the command python -m tests.task a.
Report Task
You are not required to write anything in your report for Task A.
Task B: Estimating Infection Risk Over Time
With the contact network built, the epidemiologists of Metropolis need to estimate how dangerous the outbreak will become.
Monte Carlo Simulation
Your task is to compute, for every resident, the probability that they will be infected within a planning horizon of T days. We model the outbreak as a discrete-time process. Each day, every infected resident independently attempts to transmit the virus to each of their neighbours, with the probability of a single-day transmission given by the edge weight wij ∈ (0,1]. Once infected, a resident remains infectious for the remainder of the horizon.
A Monte Carlo baseline is provided in transmission/monte carlo.py. It simulates the outbreak X times, randomly deciding each day whether transmissions succeed according to the edge weights, and estimates ri,T as the fraction of simulations in which Vi was infected. It converges to the correct answer for large X, but is slow, approximate, and non-deterministic.
Implementation Task
Your task is to implement the exact dynamic programming solution in transmission/task b.py using the recurrence above for rt,i. Your implementation must:
correctly initialise the base cases at t = 0;
fill the risk table row by row, where table[t][i] stores rt,i; and
work correctly with both graph representations from Task A.
This task can also be approached using principles from software engineering and efficient Python programming practices to ensure scalability and maintainability.
Testing your Solution
To run using your dynamic programming solution, change the configuration variable risk solver from "monte carlo" to "task b" in your configuration file. You are provided a test suite to check your solution—run the command python -m tests.task b.
Report Task
Pseudo-code. Write concise pseudo-code for your dynamic programming solution in the style of Algorithm 1.
Limitations of the recurrence. The recurrence is a simplifying approximation of true disease spread—rather than tracking the full joint probability distribution over all possible infection states across the network, which would be exponentially complex, it makes a structural assumption about how transmission pathways interact.
What structural property of our contact network makes the recurrence an approximation rather than an exact calculation?
With reference to Figure 5, explain why the infection risk computed for V1 at t = 3 is only approximate. Identify the specific term that introduces the error and state whether the recurrence over- or under-estimates the true risk.
Identify the class of graphs on which the recurrence gives an exact answer rather than an approximation for rt,i. Justify your answer.
Task C: Analysis of Infection Risk Algorithms
With both algorithms implemented, the epidemiologists of Metropolis need to know which approach to deploy. A large, sparsely connected network and a small, densely connected network present very different computational challenges. In this task, you will theoretically analyse and empirically compare the Monte Carlo baseline and your dynamic programming solution from Task B across both graph representations from Task A.
Implementation Task
While you are required to run timing experiments as part of this task, no marks are awarded for the implementation itself.
Report Task (Maximum 2 Pages)
Algorithm Complexity Analysis.
Empirical Design.
Empirical Analysis.
Reflection.
Task D: Antiviral Allocation
The epidemiologists of Metropolis have used the infection risk scores from Task B to assess the risk to every resident. The city's medical authority has now issued a limited supply of antiviral doses. Vaccinating a resident requires a course of treatment—different residents may require different numbers of doses depending on their age and health status. Your job is to decide who gets vaccinated. Using the infection risk scores computed in Task B as the benefit of vaccination, your goal is to choose which residents to vaccinate so that the total benefit is maximised within the dose budget, with ties broken in favour of the selection using the fewest doses.