System Architecture

Overall Workflow

The Code Analysis Agent follows a multi-stage workflow that transforms raw code repositories into queryable knowledge graphs:

Repository Ingestion: When a user provides a GitHub repository URL through the Streamlit interface, the system clones it locally and presents the file structure. Users can then select specific source folders for analysis.
Graph Construction: Selected Python files are processed through a pipeline that:

Parses each file to extract classes, functions, and their relationships
Uses LLMs to generate natural language descriptions of code components
Structures these descriptions into a graph format with typed nodes and edges
Uploads the resulting graph to a Neo4j database

Query Processing: Once the graph database is populated, users can submit natural language questions about the codebase. These queries are:

Routed to the appropriate agent type via the supervisor
Translated into appropriate tool calls (Cypher queries, RAG lookups, etc.)
Processed to generate responses and/or visualizations
Presented to the user via the Streamlit interface

This workflow enables a seamless transition from raw code to interactive analysis without requiring users to understand the underlying graph structures or query languages.

Agent Router System

The heart of the system’s intelligence is its agent router architecture, implemented using LangGraph. This architecture enables dynamic, state-driven processing of user queries through specialized agents:

flowchart TD
    %% Main nodes
    Q[User Query] --> S[Supervisor]
    S --> D[Decision Logic]
    
    %% Decision branches
    D -->|Detailed Code Analysis| M[Micro Agent]
    D -->|High-Level Overview| MA[Macro Agent]
    
    %% Agent tools
    M --> DT[Detailed Tools]
    MA --> ST[Strategic Tools]
    
    %% Analysis outputs
    DT --> RA[Relationship Analysis]
    ST --> CS[Code Summaries]
    
    %% Connecting the outputs to final results
    RA --> R[Response to User]
    CS --> R
    
    %% STYLE SECTION
    style Q fill:#FFDDC1,stroke:#FF7F50,stroke-width:2px
    style R fill:#FFDDC1,stroke:#FF7F50,stroke-width:2px
    
    style S fill:#CDE7F0,stroke:#0277BD,stroke-width:2px
    style D fill:#CDE7F0,stroke:#0277BD,stroke-width:2px
    
    style M fill:#D1E8D0,stroke:#2E8B57,stroke-width:2px
    style DT fill:#D1E8D0,stroke:#2E8B57,stroke-width:2px
    style RA fill:#D1E8D0,stroke:#2E8B57,stroke-width:2px
    
    style MA fill:#E6D0E8,stroke:#8A2BE2,stroke-width:2px
    style ST fill:#E6D0E8,stroke:#8A2BE2,stroke-width:2px
    style CS fill:#E6D0E8,stroke:#8A2BE2,stroke-width:2px
    
    %% Add a clearer description of the paths
    classDef labelStyle fill:white,stroke:none,color:black
    class pathLabels labelStyle

The agent router consists of three primary components:

Supervisor Node: Analyzes user queries to determine whether they require detailed code relationship analysis (micro) or high-level architectural summaries (macro). This node implements a decision-making process based on the query content and current conversation state.
Micro Agent: Specialized for detailed code analysis, this agent has access to tools for:

Building and executing Cypher queries against the Neo4j database
Retrieving specific relationship information between code components
Generating network visualizations of code relationships
Providing detailed explanations using RAG-based retrieval

Macro Agent: Focused on high-level architectural understanding, this agent can:

Generate natural language summaries of the codebase structure
Create Mermaid diagrams showing logical component relationships
Provide strategic insights about code organization

Each agent operates within a standardized LangGraph workflow, with structured prompting that guides the LLM toward appropriate tool selection and reasoning.

System Diagrams

Data Flow Diagram

The Data Flow Diagram illustrates the comprehensive pipeline through which code is processed, analyzed, and made available for interactive querying. At the foundation, Git repositories provide source code that enters our processing pipeline through a multi-stage transformation:

First, the code parser extracts structural elements from Python files, which are then enriched by the LLM processor that generates semantic descriptions and relationships. This processed information flows into two critical storage components: the Neo4j graph database (storing structured relationships) and the RAG vectorstore (enabling semantic retrieval).

The central LangGraph Router orchestrates the system’s intelligence, dynamically directing queries to appropriate analysis tools. These tools are divided into micro and macro capabilities - micro tools leverage the Cypher Query Builder to extract precise relationship data from Neo4j, while macro tools generate high-level summaries and architectural diagrams. Both tool sets produce visualizations and textual outputs that are presented through the Streamlit interface.

This architecture enables bidirectional information flow, where user queries trigger specific analysis pathways and results are transformed into appropriate visualizations or textual responses. The color-coded components highlight the distinct functional areas: orange for user-facing interfaces, blue for agent routing logic, and neutral for data processing and storage elements.

Component Diagram

The Component Diagram provides a structural view of the system’s modular architecture, highlighting the key functional units and their interactions. The system is organized into three primary component groups:

The Streamlit App serves as the user-facing layer, containing the Repo Cloner for importing code repositories, the Chat Interface for query input and response display, and the Visualization Panel for rendering interactive graphs and diagrams. This frontend communicates bidirectionally with the LangGraph Router, which acts as the central orchestration mechanism.

The Router intelligently delegates user queries to two specialized tool sets: Micro Tools and Macro Tools. Micro Tools include the Cypher Generator for constructing database queries, the Visualizer for rendering relationship graphs, and the RAG Query Tool for semantic retrieval. Macro Tools consist of the Text Generator for producing high-level summaries and the Mermaid Generator for creating architectural diagrams.

The Database layer forms the foundation of the system, housing both the Neo4j graph database for structured relationships and the RAG Vectorstore for semantic embeddings. This layer also includes the Database Initializer for setup and Graph Builders for transforming code into graph structures.

This component architecture emphasizes clean separation of concerns while maintaining efficient communication pathways between modules. The color-coding distinguishes user interface components (orange) from agent and tool components (blue) and data storage components (neutral), reflecting their distinct roles in the overall system.