Hi, I'm Michal Borek

Machine Learning Engineer & AI Consultant

I build production ML systems — from RAG pipelines and LLM applications to computer vision models. Experienced in end-to-end development, research, and deploying AI solutions that solve real problems.

Michal - ML Engineer

About Me

I'm a Machine Learning Engineer graduating from Michigan State University with a degree in Computational Data Science and a Mathematics minor. I specialize in building end-to-end AI systems — from data pipelines and model training to production deployment. My recent work includes a full RAG system for legal document research with citation verification, medical imaging classifiers published at SPIE, and Agentic-AI tooling for high-performance computing.

Core competencies:

  • LLM applications, RAG pipelines, and retrieval systems
  • Computer Vision and medical imaging (published researcher)
  • Full-stack ML: Python, FastAPI, Docker, PostgreSQL, vector databases
  • High-performance computing and distributed systems
  • Production deployment and system architecture

I've contributed to published research at MIDI Lab (medical imaging) and built developer tools at MSU's Institute for Cyber-Enabled Research. I'm focused on building reliable, well-architected AI systems that deliver measurable results.

Education

B.S. Computational Data Science, Minor in Mathematics
Michigan State University
Spring 2026

Looking For

MLE / AI Engineer roles, ML consulting engagements, and research collaborations in NLP, computer vision, or applied AI.

My Skills

Programming Languages

Python
95%
SQL
90%
TypeScript
80%
C++
75%

ML & AI

PyTorch
90%
RAG / LLMs
90%
Scikit-Learn
90%
TensorFlow
75%
Pandas / NumPy
95%
Vector DBs
85%

Infrastructure & Web

FastAPI
90%
Docker
85%
Next.js
80%
PostgreSQL
85%
AWS
70%
React
75%

My Resume

Professional Resume

View my comprehensive resume showcasing my education, experience, projects, and skills in Machine Learning and Data Science. Updated with my latest accomplishments and research work.

Software Engineer Intern at iCER
Research Assistant at MIDI Lab
Computational Data Science Major

Resume PDF

View my complete professional resume

My Projects

End-to-end systems spanning ML pipelines, RAG architectures, and applied research

Legal AI — RAG Research Assistant
Featured

Legal AI — RAG Research Assistant

Open-source RAG system for legal and tax document research. Combines PostgreSQL full-text search with Qdrant vector similarity and a local LLM (Llama 3.1 via Ollama) to answer questions about uploaded documents. Every response passes through a citation verification layer that rejects claims not supported by retrieved evidence — making hallucinations structurally impossible. Supports multi-turn conversations, a full audit trail, and a structure-aware chunker that understands legal document formatting.

PythonFastAPINext.jsRAGLLMQdrantPostgreSQLOllamaDockerHybrid SearchCitation Verification
DarkVision

DarkVision

Computer Vision model for classifying animals in dark, low-visibility images with 92% accuracy using fine-tuned ResNet-18.

Computer VisionPyTorchCNNResNet-18Fine-Tuning
Auto Grader

Auto Grader

Automated grading system that evaluates student code submissions against test suites, providing instant feedback and scoring.

PythonUnittestAutomationLinux
QSide-Notebook

QSide-Notebook

Browser-based data visualization tool that lets users explore and chart datasets without local environment setup.

JupyterLitePythonSQL

Research & Publications

Research Publication
April 2025

Ordinal Classification Framework for Multiclass Grading of Pneumoconiosis

SPIE Medical Imaging 2025: Computer-Aided Diagnosis

Published research paper in SPIE Medical Imaging 2025 presenting a novel ordinal classification approach for automated pneumoconiosis severity grading.

This paper presents an ordinal classification framework specifically designed for multiclass grading of pneumoconiosis severity. Our approach addresses the inherent ordinal nature of pneumoconiosis progression stages, providing more accurate and clinically relevant automated assessment compared to traditional classification methods.

Machine LearningMedical ImagingOrdinal ClassificationComputer VisionSPIE

Published Paper

Access the publication online

DOI: 10.1117/12.3046353

Authors:

Liu, M., Loveless, I., Huang, Z., Borek, M., Rosenman, K., Alessio, A., Wang, L.

Research Poster
2025

UURAF Research Poster 2025

UURAF 2025 - Michigan State University

Research poster presented at the University Undergraduate Research and Arts Forum (UURAF), showcasing AI-powered pneumoconiosis classification using chest radiographs.

Pneumoconiosis is an occupational lung disease caused by inhaling mineral dust, and chest radiography remains the key screening tool. Although standardization efforts by the ILO and NIOSH—such as the B Reader Certification Program—have improved consistency, challenges like reader variability, limited certified readers, and potential conflicts of interest persist. This study leverages artificial intelligence to objectively classify pneumoconiosis severity on a 4-point scale (0–3) using posterior-anterior chest radiographs from the NIOSH repository. A ResNet framework employing various loss functions (cross-entropy, corn, coral, focal staging, hierarchical, and hierarchical cross-entropy) is explored to enhance diagnostic reliability.

Machine LearningMedical ImagingComputer VisionResNetUURAF

Research Poster

View or download the full poster

Research Poster
January 2025

HPC Agentic-AI Framework for Batch Job Script Validation

iCER MidSURE 2025 - Michigan State University

Research poster presenting an innovative Agentic-AI framework designed to help HPC users validate batch submission scripts using large language models to reduce computational waste.

High-performance computing (HPC) users frequently make errors when writing batch job submission scripts, e.g. syntax errors, references to unavailable software installations and/or data files, inappropriate resource requests for the given cluster. These errors generally result in failed submissions or worse, jobs failing after having spent significant time in a queue or after having run for some time on the cluster, leading to wasted compute cycles with unnecessary energy consumption and needlessly prolonging the research cycle. This project aims to help HPC users increase efficiency and productivity by employing an HPC hosted large language model (LLM) as an Agentic-AI framework designed to examine batch submission scripts and advise users on potential errors in syntax, software/file refences, and resource allocation prior to submission. We first focus our efforts on the Michigan State University High-Performance Computing Center, using the 'codellama' family of LLMs.

High Performance ComputingLarge Language ModelsAgentic AICodeLlamaSoftware EngineeringiCER

Research Poster

View or download the full poster

Get In Touch

Let's Connect

I'm currently looking for new opportunities and collaborations in the field of Machine Learning. Feel free to reach out if you have any questions or just want to say hi!