mirror of https://github.com/srbhr/Resume-Matcher.git synced 2026-01-19 07:19:56 +00:00

Files

srbhr b833bf3db7 feat: enhance documentation and remove deprecated files

- Added a comprehensive Table of Contents to AGENTS.md for improved navigation.
- Expanded documentation with new backend and frontend guides, detailing architecture, API contracts, and workflows.
- Introduced new documents for backend architecture and requirements, outlining system design and API specifications.
- Removed outdated Makefile and setup.sh scripts to streamline project setup.
- Deleted yarn.lock to eliminate unnecessary dependency management files.
- Updated AGENTS.md to reflect changes in documentation structure and links.

2025-12-29 16:27:39 +05:30

12 KiB

Raw Blame History

Backend Architecture Guide

This document details the Resume Matcher backend architecture, design decisions, and implementation patterns.

Overview

The backend is a lean, local-first FastAPI application designed for:

Single-user local deployment
Multi-provider AI support
JSON-based storage (no database server required)
Minimal dependencies

Technology Stack

Component	Technology	Purpose
Framework	FastAPI	Async API with automatic OpenAPI docs
Database	TinyDB	JSON file storage, zero configuration
AI Integration	LiteLLM	Unified API for 100+ LLM providers
Doc Parsing	markitdown	PDF/DOCX to Markdown conversion
Validation	Pydantic	Request/response schema validation

Directory Structure

apps/backend/
├── app/
│   ├── __init__.py          # Package version
│   ├── main.py              # FastAPI app entry point
│   ├── config.py            # Pydantic settings (env vars)
│   ├── database.py          # TinyDB wrapper
│   ├── llm.py               # LiteLLM multi-provider wrapper
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── health.py        # Health & status endpoints
│   │   ├── config.py        # LLM configuration endpoints
│   │   ├── resumes.py       # Resume CRUD & improvement
│   │   └── jobs.py          # Job description storage
│   ├── services/
│   │   ├── __init__.py
│   │   ├── parser.py        # Document parsing service
│   │   └── improver.py      # AI resume improvement service
│   ├── schemas/
│   │   ├── __init__.py
│   │   └── models.py        # All Pydantic models
│   └── prompts/
│       ├── __init__.py
│       └── templates.py     # LLM prompt templates
├── data/                    # TinyDB storage directory
│   ├── .gitkeep
│   ├── database.json        # Main data (gitignored)
│   └── config.json          # LLM config (gitignored)
├── pyproject.toml           # Dependencies & build config
├── requirements.txt         # Pip-compatible deps
├── .env.example             # Environment template
└── .gitignore

Core Modules

1. Configuration (`app/config.py`)

Uses pydantic-settings to load configuration from environment variables:

class Settings(BaseSettings):
    llm_provider: Literal["openai", "anthropic", "openrouter", "gemini", "deepseek", "ollama"]
    llm_model: str
    llm_api_key: str
    llm_api_base: str | None  # For Ollama/custom endpoints
    host: str
    port: int
    data_dir: Path

Settings are loaded from:

Environment variables (highest priority)
.env file
Defaults

2. Database (`app/database.py`)

TinyDB wrapper providing typed access to collections:

db = Database()

# Tables
db.resumes      # Resume documents
db.jobs         # Job descriptions
db.improvements # Improvement results

# Operations
db.create_resume(content, content_type, filename, is_master, processed_data)
db.get_resume(resume_id)
db.get_master_resume()
db.delete_resume(resume_id)  # Returns True if deleted, False if not found
db.list_resumes()            # Returns all resumes
db.set_master_resume(resume_id)  # Sets a resume as master
db.create_job(content, resume_id)
db.get_stats()

Why TinyDB?

Pure Python, no server process
Stores data as readable JSON
Perfect for local single-user apps
Easy backup (just copy the file)

3. LLM Integration (`app/llm.py`)

LiteLLM wrapper for multi-provider support with robust JSON handling:

# Check provider health
health = await check_llm_health(config)

# Text completion
response = await complete(prompt, system_prompt, config)

# JSON completion (with JSON mode, retries, and extraction)
data = await complete_json(prompt, system_prompt, config, retries=2)

Key Features:

Feature	Description
JSON Mode	Auto-enables `response_format={"type": "json_object"}` for supported providers
Retry Logic	2 automatic retries with lower temperature on each attempt (0.1 → 0.0)
JSON Extraction	Bracket-matching algorithm handles malformed responses and markdown blocks
Timeouts	Configurable per-operation: 30s (health), 120s (completion), 180s (JSON)

JSON Mode Support:

def _supports_json_mode(provider: str, model: str) -> bool:
    # Supported: openai, anthropic, gemini, deepseek
    # OpenRouter: claude, gpt-4, gpt-3.5, gemini, mistral models

Supported Providers:

Provider	Model Format	API Key Env Var
OpenAI	`gpt-4o-mini`	`OPENAI_API_KEY`
Anthropic	`anthropic/claude-3-5-sonnet`	`ANTHROPIC_API_KEY`
OpenRouter	`openrouter/model-name`	`OPENROUTER_API_KEY`
Google Gemini	`gemini/gemini-1.5-flash`	`GEMINI_API_KEY`
DeepSeek	`deepseek/deepseek-chat`	`DEEPSEEK_API_KEY`
Ollama	`ollama/llama3.2`	None (local)

4. Services

Parser Service (`app/services/parser.py`)

Handles document conversion:

# Convert PDF/DOCX to Markdown
markdown = await parse_document(file_bytes, filename)

# Parse Markdown to structured JSON via LLM
structured = await parse_resume_to_json(markdown_text)

Improver Service (`app/services/improver.py`)

AI-powered resume optimization:

# Extract keywords from job description
keywords = await extract_job_keywords(job_description)

# Score resume against job requirements
score = await score_resume(resume_text, keywords)

# Generate improved resume
improved = await improve_resume(original, job_desc, score, keywords)

API Endpoints

Health & Status

Endpoint	Method	Description
`GET /api/v1/health`	GET	LLM connectivity check
`GET /api/v1/status`	GET	Full app status

Status Response:

{
  "status": "ready | setup_required",
  "llm_configured": true,
  "llm_healthy": true,
  "has_master_resume": true,
  "database_stats": {
    "total_resumes": 5,
    "total_jobs": 3,
    "total_improvements": 2
  }
}

Configuration

Endpoint	Method	Description
`GET /api/v1/config/llm-api-key`	GET	Get current config (key masked)
`PUT /api/v1/config/llm-api-key`	PUT	Update LLM config
`POST /api/v1/config/llm-test`	POST	Test LLM connection

Resumes

Endpoint	Method	Description
`POST /api/v1/resumes/upload`	POST	Upload PDF/DOCX
`GET /api/v1/resumes?resume_id=`	GET	Fetch resume by ID
`GET /api/v1/resumes/list`	GET	List resumes (optionally include master)
`PATCH /api/v1/resumes/{id}`	PATCH	Update resume JSON
`GET /api/v1/resumes/{id}/pdf`	GET	Download resume PDF
`POST /api/v1/resumes/improve`	POST	Tailor for job
`DELETE /api/v1/resumes/{id}`	DELETE	Delete resume

Jobs

Endpoint	Method	Description
`POST /api/v1/jobs/upload`	POST	Store job description
`GET /api/v1/jobs/{id}`	GET	Fetch job by ID

Data Flow

Resume Upload Flow

1. User uploads PDF/DOCX
2. markitdown converts to Markdown
3. (Optional) LLM parses to structured JSON
4. Store in TinyDB with resume_id
5. Return resume_id to frontend

Resume Improvement Flow

1. Frontend sends resume_id + job_id
2. Fetch resume and job from DB
3. Extract keywords from job description (LLM with JSON mode)
4. Generate improved/tailored resume (LLM with JSON mode + retries)
5. Store tailored resume with parent link
6. Return improvement results with resume_preview

Note: Scoring feature was removed in v1 to focus on keyword alignment quality.

Resume Delete Flow

1. Frontend sends DELETE /api/v1/resumes/{resume_id}
2. Backend calls db.delete_resume(resume_id)
3. TinyDB removes the document from the resumes table
4. Return success message or 404 if not found
5. Frontend clears localStorage if deleting master resume
6. Frontend shows success confirmation dialog
7. Frontend redirects to dashboard
8. Dashboard refreshes resume list on focus

Important Notes:

The is_master flag in the database and master_resume_id in localStorage can get out of sync
Dashboard filters tailored resumes by BOTH is_master flag AND localStorage master ID
Dashboard refreshes the resume list when the window gains focus (handles navigation back from viewer)

Prompt Templates

Located in app/prompts/templates.py:

Prompt	Purpose
`PARSE_RESUME_PROMPT`	Convert Markdown to structured JSON
`EXTRACT_KEYWORDS_PROMPT`	Extract requirements from JD
`IMPROVE_RESUME_PROMPT`	Generate tailored resume
`RESUME_SCHEMA_EXAMPLE`	JSON schema example for structured output

Prompt Design Guidelines:

Keep prompts simple: Avoid complex escaping like {{ - use single {variable} for substitution
Be direct: Start with "Parse this..." or "Extract..." rather than lengthy preambles
Include schema examples: Show the expected JSON structure in the prompt
Explicit output format: End with "Output ONLY the JSON object, no other text"

Example prompt structure:

PARSE_RESUME_PROMPT = """Parse this resume into JSON. Output ONLY the JSON object, no other text.

Example output format:
{schema}

Rules:
- Use "" for missing text fields, [] for missing arrays
- Number IDs starting from 1

Resume to parse:
{resume_text}"""

Configuration

Environment Variables

# Required for cloud providers
LLM_PROVIDER=openai          # openai|anthropic|openrouter|gemini|deepseek|ollama
LLM_MODEL=gpt-4o-mini        # Model identifier
LLM_API_KEY=sk-...           # API key (not needed for Ollama)

# Optional
LLM_API_BASE=http://localhost:11434  # For Ollama or custom endpoints
HOST=0.0.0.0
PORT=8000

Runtime Configuration

Users can update LLM config via API without restarting:

curl -X PUT http://localhost:8000/api/v1/config/llm-api-key \
  -H "Content-Type: application/json" \
  -d '{"provider": "anthropic", "model": "claude-3-5-sonnet", "api_key": "sk-ant-..."}'

Config is stored in data/config.json and takes precedence over env vars.

Development

Running Locally

cd apps/backend

# Install dependencies
uv pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env
# Edit .env with your API key

# Run with auto-reload
uv run uvicorn app.main:app --reload --port 8000

# Or directly
uv run python -m app.main

Adding a New Endpoint

Create/update router in app/routers/
Add Pydantic models to app/schemas/models.py
Export in app/schemas/__init__.py
Register router in app/main.py

Adding a New LLM Provider

LiteLLM handles most providers automatically. For custom providers:

Update get_model_name() in app/llm.py
Add env var mapping in setup_llm_environment()
Update LLM_PROVIDER literal in app/config.py

Dependencies

The backend uses minimal dependencies (9 packages):

fastapi         - Web framework
uvicorn         - ASGI server
python-multipart - File uploads
pydantic        - Data validation
pydantic-settings - Config management
tinydb          - JSON database
litellm         - Multi-provider LLM
markitdown      - Document conversion
python-dotenv   - Env file loading

Error Handling

All endpoints return consistent error responses:

{
  "detail": "Error message here"
}

Status	Meaning
400	Bad request (validation error)
404	Resource not found
413	File too large
422	Unprocessable (parsing failed)
500	Server error (LLM failure)

12 KiB Raw Blame History

Backend Architecture Guide

Overview

Technology Stack

Directory Structure

Core Modules

1. Configuration (app/config.py)

2. Database (app/database.py)

3. LLM Integration (app/llm.py)

4. Services

Parser Service (app/services/parser.py)

Improver Service (app/services/improver.py)

API Endpoints

Health & Status

Configuration

Resumes

Jobs

Data Flow

Resume Upload Flow

Resume Improvement Flow

Resume Delete Flow

Prompt Templates

Configuration

Environment Variables

Runtime Configuration

Development

Running Locally

Adding a New Endpoint

Adding a New LLM Provider

Dependencies

Error Handling

12 KiB

Raw Blame History

1. Configuration (`app/config.py`)

2. Database (`app/database.py`)

3. LLM Integration (`app/llm.py`)

Parser Service (`app/services/parser.py`)

Improver Service (`app/services/improver.py`)