Case Study – Ordering tasks

Case Scenario: Software Development Lifecycle

Imagine managing a software development project with the following 15 tasks. Each task depends on the completion of one or more preceding tasks, ensuring a logical flow from initiation to deployment and maintenance.

List of Tasks and Dependencies

Task A: Define Project Scope
Task B: Gather Requirements
Task C: Analyze Requirements
Task D: Create Design Specifications
Task E: Develop Frontend
Task F: Develop Backend
Task G: Set Up Development Environment
Task H: Integrate Frontend and Backend
Task I: Develop API Endpoints
Task J: Write Unit Tests
Task K: Perform Integration Testing
Task L: Conduct System Testing
Task M: User Acceptance Testing (UAT)
Task N: Deploy to Production
Task O: Monitor and Maintain

Dependencies Between Tasks

Task A → Task B
Task B → Task C
Task C → Task D
Task D → Task E
Task D → Task F
Task G → Task E
Task G → Task F
Task E → Task H
Task F → Task I
Task I → Task H
Task H → Task J
Task J → Task K
Task K → Task L
Task L → Task M
Task M → Task N
Task N → Task O

We can visually represent it like this:

Problem: Can we write program that takes a set of tasks and dependancies, and create a possible order in which to complete the project assuming we can do only on task at a time? E.g. how do we convert a DAG into a well-ordered list?

To start, we could texually represent the dependencies like this:

A[Define Project Scope] --> B[Gather Requirements]
B --> C[Analyze Requirements]
C --> D[Create Design Specifications]
D --> E[Develop Frontend]
D --> F[Develop Backend]
G[Set Up Development Environment] --> E
G --> F
E --> H[Integrate Frontend and Backend]
F --> I[Develop API Endpoints]
I --> H
H --> J[Write Unit Tests]
J --> K[Perform Integration Testing]
K --> L[Conduct System Testing]
L --> M[User Acceptance Testing (UAT)]
M --> N[Deploy to Production]
N --> O[Monitor and Maintain]

Can we write a program to read in the text above, and produce something like this (a well-ordered list):

A [Define Project Scope] B [Gather Requirements] G [Set Up Development Environment] C [Analyze Requirements] D [Create Design Specifications] E [Develop Frontend] F [Develop Backend] I [Develop API Endpoints] H [Integrate Frontend and Backend] J [Write Unit Tests] K [Perform Integration Testing] L [Conduct System Testing] M [User Acceptance Testing (UAT)] N [Deploy to Production] O [Monitor and Maintain]

Directed Acyclic Graphs (DAGs)

Directed Acyclic Graphs (DAGs) are powerful data structures used to model relationships where direction and the absence of cycles are essential. They are widely applicable in various fields, including computer science, project management, and data processing.

The earlier diagram is a DAG.

1. Understanding DAGs

What is a DAG?

Directed: Each edge has a direction, indicating a relationship from one node to another.
Acyclic: The graph does not contain any cycles, meaning there’s no way to start at a node and return to it by following the directed edges.

Common Uses of DAGs

Task Scheduling: Representing dependencies between tasks.
Version Control Systems: Modeling commits where each commit points to its predecessors.
Data Processing Pipelines: Structuring stages where data flows in one direction.
Expression Trees: Representing mathematical expressions.

2. Example: Task Dependency DAG

Description: A common use of DAGs is to represent tasks with dependencies. For example, in project management, certain tasks must be completed before others can begin.

Consider the following set of tasks on a progect:

A. Data Ingestion
B. Data Cleaning
C. Data Transformation
D. Data Analysis
E. Data Visualization

With the following dependancies:

Data Ingestion –> Data Cleaning
Data Cleaning –> Data Transformation
Data Transformation –> Data Analysis
Data Transformation –> Data Visualization

This could be coded like this in a text file:

A[Data Ingestion] –> B[Data Cleaning]
B –> C[Data Transformation]
C –> D[Data Analysis]
C –> E[Data Visualization]

Assuming you can only do one task as a time, what is an ordering the will work for this project?

Here is another example.

This could be represented by this:

A[Project Planning] –> B[Design Architecture]
A –> C[Obtain Permits]
A –> D[Site Survey]
B –> E[Site Preparation]
C –> E
D –> E
E –> F[Foundation]
F –> G[Framing]
G –> H[Roofing]
G –> I[Electrical Installation]
G –> J[Plumbing Installation]
I –> K[HVAC Installation]
J –> K
K –> L[Interior Finishing]
K –> M[Exterior Finishing]
L –> O[Final Inspection]
M –> O
N[Landscaping] –> O

What data structure can we use to represent such a graph?

Representing a graph

How would you choose which representation to use?

Do you need to use a linked list to represent an adjacency list?

Once we have a representation of a di-graph, how do we convert it into a well ordered list?

Topological Sort

1.1. What is Topological Sort?

Definition: Topological Sort is a linear ordering of vertices in a DAG where for every directed edge uv, vertex u precedes vertex v in the ordering.
Key Characteristics:
- Applicability: Only applicable to DAGs (graphs without cycles).
- Multiple Valid Sorts: A DAG can have multiple valid topological orderings.
- Use Cases: Task scheduling, course prerequisite ordering, build systems, etc.

1.2. Use a Stack

LIFO Property: Stacks operate on a Last-In-First-Out (LIFO) principle, making them ideal for storing vertices in the reverse order of their completion during DFS.
Efficient Storage: As DFS explores deeper into the graph, vertices are pushed onto the stack once all their dependencies have been processed, ensuring the correct topological order when popped.

2. Topological Sort Using DFS and a Stack

2.1. High-Level Overview

Initialization:
- Start with an empty stack.
- Maintain a visited set to track processed vertices.
DFS Traversal:
- Perform DFS on each unvisited vertex.
- For each vertex, recursively visit all its adjacent (dependent) vertices.
Stack Operations:
- After visiting all adjacent vertices of a vertex, push the vertex onto the stack.
- This ensures that vertices with no outgoing edges are at the top of the stack.
Generating the Order:
- Pop all vertices from the stack to get the topological ordering.

2.2. Step-by-Step Explanation

Let’s break down the algorithm with an example DAG:

Example DAG:

A --> B
A --> C
B --> D
C --> D

Topological Sort Steps:

Start with Vertex A:
- Mark A as visited.
- Visit A’s neighbors: B and C.
Visit Vertex B:
- Mark B as visited.
- Visit B’s neighbor: D.
Visit Vertex D:
- Mark D as visited.
- D has no neighbors.
- Push D onto the stack.
Back to Vertex B:
- All neighbors of B are processed.
- Push B onto the stack.
Back to Vertex A:
- Next neighbor: C.
Visit Vertex C:
- Mark C as visited.
- Visit C’s neighbor: D.
- D is already visited.
- Push C onto the stack.
Back to Vertex A:
- All neighbors of A are processed.
- Push A onto the stack.
Finalize the Order:
- Pop vertices from the stack: A, C, B, D.
- Reverse the order to get the topological sort: A, B, C, D or A, C, B, D.

Note: Both A, B, C, D and A, C, B, D are valid topological orders for this DAG.

2.3. Handling Cycles

Cycle Detection: While Topological Sort assumes the input graph is a DAG, it’s essential to detect cycles to prevent incorrect sort orders. If a cycle is detected during DFS, the algorithm should report that a topological sort isn’t possible.

3. Pseudocode for Topological Sort Using DFS and a Stack

Below is the pseudocode for performing a topological sort using DFS and a stack. This pseudocode includes cycle detection to ensure the input graph is a DAG.

3.1. Data Structures Used

Graph Representation: Adjacency List (e.g., AdjList), where each vertex maps to a list of its adjacent vertices.
Visited Set: To keep track of visited vertices (Visited).
Recursion Stack Set: To keep track of vertices in the current DFS path (RecStack) for cycle detection.
Stack: To store the topological order (TopoStack).

3.2. Pseudocode

Function TopologicalSort(Graph):
    Initialize an empty stack TopoStack
    Initialize an empty set Visited
    Initialize an empty set RecStack

    For each vertex V in Graph:
        If V is not in Visited:
            If DFSUtil(V, Visited, RecStack, TopoStack, Graph) is False:
                Return "Cycle detected. Topological Sort not possible."

    Initialize an empty list Order
    While TopoStack is not empty:
        Push TopoStack.pop() to Order

    Return Order

Function DFSUtil(V, Visited, RecStack, TopoStack, Graph):
    Add V to Visited
    Add V to RecStack

    For each neighbor N of V in Graph[V]:
        If N is not in Visited:
            If DFSUtil(N, Visited, RecStack, TopoStack, Graph) is False:
                Return False
        Else If N is in RecStack:
            Return False  // Cycle detected

    Remove V from RecStack
    Push V onto TopoStack
    Return True

3.3. Explanation of the Pseudocode

TopologicalSort Function:
- Initialization:
  - An empty stack TopoStack is used to store the vertices in the order of completion.
  - Visited set keeps track of all visited vertices to prevent re-processing.
  - RecStack set is used for cycle detection by tracking the current path of vertices in the recursion.
- Iterate Over All Vertices:
  - For each vertex V in the graph, if it hasn’t been visited, perform DFS starting from V.
  - If the DFS detects a cycle (DFSUtil returns False), the topological sort isn’t possible.
- Construct the Order:
  - After DFS completes for all vertices, pop all elements from TopoStack to form the topological order Order.
- Return the Order:
  - If no cycles are detected, return the list Order containing the topological sort.
DFSUtil Function:
- Mark Current Vertex:
  - Add vertex V to both Visited and RecStack to indicate it’s being processed.
- Process All Neighbors:
  - For each neighbor N of vertex V:
    - If N hasn’t been visited, recursively perform DFS on N.
    - If the recursive call returns False, propagate the False upwards to indicate a cycle.
    - If N is already in RecStack, a cycle is detected, and False is returned.
- Completion of Vertex Processing:
  - After processing all neighbors, remove V from RecStack as its processing is complete.
  - Push V onto TopoStack to record its completion.
- Return True:
  - Indicate successful processing without detecting a cycle.

4. Example Walkthrough

Let’s walk through the algorithm with the earlier example DAG:

DAG:

A --> B
A --> C
B --> D
C --> D

Graph Representation:

Graph = {
    A: [B, C],
    B: [D],
    C: [D],
    D: []
}

Topological Sort Steps:

Initialize:
- TopoStack: empty
- Visited: empty
- RecStack: empty
- Order: empty
Process Vertex A:
- Mark A as visited and add to RecStack.
- Visit B:
  - Mark B as visited and add to RecStack.
  - Visit D:
    - Mark D as visited and add to RecStack.
    - D has no neighbors; remove D from RecStack and push to TopoStack.
  - Remove B from RecStack and push to TopoStack.
- Back to A, visit C:
  - Mark C as visited and add to RecStack.
  - Visit D:
    - D is already visited and not in RecStack; continue.
  - Remove C from RecStack and push to TopoStack.
- Remove A from RecStack and push to TopoStack.
Finalize Order:
- Pop from TopoStack: A, C, B, D.
- Reverse to get Order: A, B, C, D.

Result:

Topological Order: A, B, C, D

A C++ solution