Recruitment AI

AI Resume Ranking System

AI-powered resume screening with TF-IDF vectorization and skill gap analysis

1000+ Resumes Processed

TF-IDF Vectorization

NLP Text Processing

Cosine Similarity Metric

Overview

An intelligent resume screening and ranking system that automates the candidate evaluation process using Natural Language Processing. The system processes over 1000 resumes, extracts relevant skills and experience, and ranks candidates against job descriptions using TF-IDF vectorization and cosine similarity scoring.

Built during the Future Interns ML internship, this project addresses a real recruitment challenge: manually screening hundreds of resumes is time-consuming and inconsistent. The AI-powered approach ensures objective, reproducible candidate ranking based on skill alignment.

Tech Stack

Python Scikit-learn TF-IDF NLP Pandas NumPy NLTK Cosine Similarity

System Architecture

The pipeline follows a three-stage approach: text extraction and preprocessing, feature vectorization, and similarity-based ranking.

Text Processing

Resume Parsing, Text Cleaning, Tokenization, Stop Word Removal

→

Vectorization

TF-IDF Feature Extraction, N-gram Analysis, Term Weighting

→

Ranking Engine

Cosine Similarity, Score Ranking, Skill Gap Report

How It Works

Resume Parsing — Raw resume text is extracted and cleaned. Noise (headers, footers, formatting artifacts) is removed, and the text is normalized through lowercasing, stemming, and stop word removal.
Job Description Processing — The target job description undergoes the same preprocessing pipeline, ensuring both documents are in a comparable format for vectorization.
TF-IDF Vectorization — Both resumes and job descriptions are transformed into TF-IDF vectors. Term Frequency-Inverse Document Frequency captures not just keyword presence, but the relative importance of each term across the document corpus.
Cosine Similarity Scoring — Each resume vector is compared against the job description vector using cosine similarity. This produces a 0-1 score representing how closely the resume matches the job requirements.
Ranking & Skill Gap Analysis — Candidates are ranked by similarity score. The system also identifies missing skills — terms present in the job description but absent from the resume — generating actionable skill gap reports.

Key Features

Scalable Processing — Handles 1000+ resumes efficiently using vectorized NumPy operations and sparse matrix representations from Scikit-learn's TF-IDF implementation.
Skill Gap Analysis — Beyond ranking, the system identifies which required skills are missing from each candidate's resume, providing actionable feedback for both recruiters and applicants.
Configurable Matching — TF-IDF parameters (n-gram range, max features, term frequency thresholds) are tunable to adjust matching sensitivity for different job domains.
Objective Scoring — Eliminates human bias in initial resume screening by applying consistent mathematical criteria across all candidates.
Batch Processing — Processes entire resume directories at once, outputting ranked CSV reports with scores and skill gap analysis for each candidate.

# Example: Rank resumes against a job description
from resume_ranker import ResumeRanker

ranker = ResumeRanker()
ranker.load_resumes("./resumes/")
ranker.set_job_description("job_description.txt")
results = ranker.rank()

# Output: Ranked candidates with similarity scores
for candidate in results:
    print(f"{candidate.name}: {candidate.score:.2%}")
    print(f"  Missing skills: {candidate.skill_gaps}")

View on GitHub ← Back to Portfolio