Software Engineer · Data & AI

Nilanjan DebData Engineer & Software Developer

Software Engineer at Harness working on cloud cost management. Previously built scalable data pipelines and AI-driven analytics at Glance. BITS Pilani alumni.

Resume

About

Who I Am

Currently a Software Engineer at Harness, contributing to the CCM (Cloud Cost Management) team with a focus on the Billing Data Ingestion Pipeline. Previously spent over three years at Glance building large-scale data infrastructure — from real-time streaming pipelines and data lake architectures to AI-powered analytics and internal developer tooling.

I graduated from BITS Pilani with a Bachelor of Engineering in Computer Science, where I built a strong foundation across Data Structures, Algorithms, Operating Systems, Databases, and Networks. That foundation shapes how I approach system design — with an emphasis on correctness, scalability, and performance.

My work spans the full data stack: streaming pipelines with Spark and Kafka, query engines with Trino and Iceberg, AI integrations with LLMs, and cloud cost infrastructure. I care about building systems that are reliable, efficient, and observable.

Data Engineering

Big Data, ETL, Streaming Pipelines

AI & Analytics

LLM Integration, AI-driven Workflows

Full-Stack Dev

APIs, Microservices, Web Platforms

System Design

LLD, HLD, Performance Optimization

Experience

Professional Experience

Software Engineer

Harness

Aug 2025 – Present

Key Work

• Contributing to the CCM (Cloud Cost Management) team at Harness
• Focused on the Billing Data Ingestion Pipeline for cloud cost analytics

Cloud Cost Management

Data Ingestion

Billing Pipelines

Java

Spring Boot

Software Engineer

Glance

Jul 2022 – Jul 2025

98% Latency Reduction

From 1 hour to 1 minute

70% Onboarding Improvement

SQL query interface for analysts

30% Cost Reduction

Infrastructure optimization

Key Achievements

• Implemented Spark Streaming with Kafka, reducing data latency from 1 hour to 1 minute using Iceberg tables
• Architected and deployed a Trino gateway system, optimizing query performance and resource utilization
• Engineered SQL query interface for analysts, reducing data onboarding time by 70% and improving metric reporting
• Developed a CDC pipeline using Hudi and Debezium, transforming OLTP tables into OLAP for BI analytics
• Implemented OPA with Trino for row-level filters and column masking to improve data security compliance
• Standardized infrastructure with Helm charts, enabling one-click deployment of Trino, Superset, Hive Metastore, and Airflow
• Designed an A/B testing experimentation platform, driving data-driven product decisions
• Built a distributed job scheduling pipeline with BullMQ and Redis, reducing content onboarding time by 80%
• Developed a content generation pipeline using Celery workers and LLM integration for articles and images
• Deployed Vanna.AI for AI-powered analytics, improving data analyst productivity
• Built enterprise configuration store with SSO authentication and rule-based ACLs

Apache Spark

Kafka

Trino

Apache Iceberg

Hudi

Debezium

Vanna.AI

LLM

BullMQ

Redis

Celery

Helm

OPA

Airflow

Leadership & Technical Responsibilities

System Design (LLD/HLD)

Team Mentoring

Code Reviews

Cross-functional Coordination

Projects

Featured Projects

Data Engineering & AI

AI-Powered Analytics Platform

Professional

Deployed Vanna.AI to enable natural language querying over data warehouse — analysts query data without writing SQL

• Natural language to SQL
• Integrated with Trino query engine
• Improved analyst productivity

Vanna.AI

LLM

Python

Apache Spark

Trino

Real-time Data Streaming Pipeline

Professional

Replaced hourly batch jobs with Spark Streaming + Kafka, cutting data latency from 1 hour to under 1 minute

• 98% latency reduction
• OLAP-ready Iceberg tables
• CDC with Debezium

Apache Spark

Kafka

Trino

Apache Iceberg

Hudi

Trino Monitoring Stack

Open Source

Production-grade observability for Trino clusters — JMX metrics scraped by Prometheus and visualized in Grafana

• Cluster-level JMX metrics
• Pre-built Grafana dashboards
• Query performance tracking

Trino

JMX

Prometheus

Grafana

Docker

Code

Trino + Apache Ranger Security

Open Source

Fine-grained authorization for Trino using Apache Ranger — table, column, and row-level access policies via Docker

• Column & row-level policies
• Ranger admin UI
• Reproducible Docker setup

Trino

Apache Ranger

Docker

SQL

Code

Trino + Hive Data Lakehouse

Open Source

End-to-end local data lakehouse setup — Trino query engine over Hive Metastore with MinIO object storage

• Local lakehouse environment
• Hive Metastore integration
• S3-compatible storage

Trino

Hive Metastore

MinIO

Docker Compose

Code

Distributed Systems

Remote Code Execution Engine

Microservice that securely executes arbitrary code in isolated environments with concurrency control and resource limits

• Sandboxed execution
• Concurrent job management
• Resource isolation

Node.js

Docker

Redis

Express.js

Code

Multiplayer Real-time Game Server

In-memory game server for multiplayer Tic-Tac-Toe — real-time state sync across clients with Socket.IO

• Real-time bidirectional events
• In-memory game state
• Room-based matchmaking

Socket.IO

Node.js

React

In-memory state

Code

Food Ordering Platform

Full-stack food ordering app with cart management, restaurant listings, auth, and order tracking

• 56 GitHub stars
• End-to-end full-stack
• Firebase auth & storage

Vue.js

Node.js

MongoDB

Express.js

Firebase

Code

View All on GitHub

Contact

Get In Touch

Let's Connect

I'm always interested in discussing data engineering challenges, AI innovations, and opportunities to build impactful solutions. Feel free to reach out!

hello@nilanjandeb.com

linkedin.com/in/nil1729

GitHub

github.com/nil1729

Phone

+91-93664-96119

Send a Message