VectorForgeML
Production-grade ML framework built from scratch in C++
Overview
A next-generation machine learning framework bridging R's simplicity with C++'s raw performance. Features 9+ algorithms, BLAS/LAPACK hardware acceleration, OpenMP parallelism, and zero-copy R integration via Rcpp.
VectorForgeML was designed to solve a real problem: R is excellent for statistical computing and data exploration, but its interpreted nature creates a performance ceiling for large-scale ML workloads. By pushing all compute-intensive operations into optimized C++ while maintaining an intuitive R interface, VectorForgeML delivers the best of both worlds.
Architecture
The framework is organized into three distinct layers, each responsible for a specific concern. Data flows from the R user interface through the zero-copy Rcpp bridge into the high-performance C++ core.
Algorithms Implemented
Every algorithm is implemented from scratch in C++ with no external ML library dependencies. Each implementation leverages hardware-specific optimizations where applicable.
OLS via BLAS/LAPACK matrix operations. Hardware-accelerated normal equation solver.
Gradient descent with sigmoid activation. Configurable learning rate and epochs.
L2-regularized regression via Cholesky decomposition. Prevents overfitting with lambda tuning.
Multi-class classification with Log-Sum-Exp numerical stability trick.
Recursive partitioning with raw C++ graph pointers. Custom node allocation.
Parallelized ensemble via OpenMP. Multi-core tree training with bagging.
Optimized with std::partial_sort for efficient k-selection without full sort.
Principal Component Analysis via SVD/Eigen decomposition. LAPACK-accelerated.
Lloyd's algorithm with efficient centroid updates and convergence detection.
Technical Deep Dive
Under the hood, VectorForgeML makes deliberate low-level engineering decisions to maximize throughput while maintaining correctness and a clean API surface.
BLAS/LAPACK Integration
All linear algebra operations are routed through BLAS and LAPACK, meaning matrix multiplications, decompositions, and solvers run on hardware-optimized routines rather than naive loops. This single decision delivers order-of-magnitude speedups for algorithms like Linear Regression, Ridge Regression, and PCA.
// Naive O(n^3) matrix multiply for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) C[i][j] += A[i][k] * B[k][j]; // BLAS: single call, hardware-optimized cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0, A, n, B, n, 0.0, C, n);
OpenMP Parallelism
Ensemble methods like Random Forest benefit directly from thread-level parallelism. Each tree in the forest is trained independently, making this a naturally parallelizable workload. OpenMP distributes tree training across all available CPU cores with minimal synchronization overhead.
// Parallel tree training in Random Forest #pragma omp parallel for schedule(dynamic) for (int i = 0; i < n_trees; i++) { // Each thread trains one tree auto sample = bootstrap_sample(data); trees[i] = build_tree(sample, max_depth, min_samples); } // Aggregate predictions across trees auto predictions = majority_vote(trees, X_test);
Zero-Copy R Bridge
The Rcpp bridge eliminates data serialization overhead by sharing memory directly between R and C++. R's NumericMatrix is converted to a raw double pointer without copying, meaning even gigabyte-scale datasets incur zero transfer cost when crossing the language boundary.
// Zero-copy: R matrix -> C++ pointer // [[Rcpp::export]] Rcpp::NumericVector cpp_predict( Rcpp::NumericMatrix X) { // Direct pointer access, no copy double* ptr = REAL(X); int nrow = X.nrow(); int ncol = X.ncol(); // Run prediction on raw memory return predict_internal(ptr, nrow, ncol); }
Pipeline Architecture
Inspired by scikit-learn's Pipeline API, VectorForgeML provides a composable pipeline system in R that chains preprocessing, training, and evaluation into a single reproducible workflow. This pattern prevents data leakage and ensures consistent transformations across train and test sets.
# Scikit-learn-style pipeline in R pipeline <- VFPipeline( StandardScaler(), PCA(n_components = 10), RandomForest( n_trees = 100, max_depth = 12 ) ) # Fit and predict in one call pipeline$fit(X_train, y_train) preds <- pipeline$predict(X_test)
Code Examples
VectorForgeML exposes a clean, high-level R API. All heavy computation happens in C++ behind the scenes.
library(VectorForgeML) # Load and split data data <- read.csv("dataset.csv") split <- train_test_split(data, ratio = 0.8) # Train Random Forest classifier model <- RandomForest( n_trees = 200, max_depth = 15, min_samples = 5 ) model$fit(split$X_train, split$y_train) # Evaluate preds <- model$predict(split$X_test) accuracy(split$y_test, preds) confusion_matrix(split$y_test, preds)
library(VectorForgeML) # Load and split data data <- read.csv("housing.csv") split <- train_test_split(data, ratio = 0.8) # Train Ridge Regression model <- RidgeRegression( lambda = 0.01 ) model$fit(split$X_train, split$y_train) # Evaluate preds <- model$predict(split$X_test) rmse(split$y_test, preds) r_squared(split$y_test, preds)
Research Paper
VectorForgeML: A Production-Grade Machine Learning Framework in C++
This paper presents VectorForgeML, a modular machine learning framework built entirely from scratch in C++ with R bindings. The framework implements 9+ supervised and unsupervised algorithms including Linear Regression, Logistic Regression, Ridge Regression, Softmax Regression, Decision Trees, Random Forests, KNN, PCA, and K-Means Clustering. Key architectural decisions include BLAS/LAPACK integration for hardware-accelerated linear algebra, OpenMP for multi-core parallelism, and zero-copy data exchange between R and C++ via Rcpp.
Installation
VectorForgeML is distributed as an R package and can be installed directly from GitHub using the remotes package.
install.packages("remotes")
remotes::install_github("mohd-musheer/VectorForgeML")