Software Engineer · Data & AI

Nilanjan DebData Engineer & Software Developer

Software Engineer at Harness working on cloud cost management. Previously built scalable data pipelines and AI-driven analytics at Glance. BITS Pilani alumni.

Resume

About

Who I Am

Currently a Software Engineer at Harness, contributing to the CCM (Cloud Cost Management) team with a focus on the Billing Data Ingestion Pipeline. Previously spent over three years at Glance building large-scale data infrastructure — from real-time streaming pipelines and data lake architectures to AI-powered analytics and internal developer tooling.

I graduated from BITS Pilani with a Bachelor of Engineering in Computer Science, where I built a strong foundation across Data Structures, Algorithms, Operating Systems, Databases, and Networks. That foundation shapes how I approach system design — with an emphasis on correctness, scalability, and performance.

My work spans the full data stack: streaming pipelines with Spark and Kafka, query engines with Trino and Iceberg, AI integrations with LLMs, and cloud cost infrastructure. I care about building systems that are reliable, efficient, and observable.

Data Engineering

Big Data, ETL, Streaming Pipelines

AI & Analytics

LLM Integration, AI-driven Workflows

Full-Stack Dev

APIs, Microservices, Web Platforms

System Design

LLD, HLD, Performance Optimization

Experience

Professional Experience

Software Engineer

Harness

Aug 2025 – Present

Key Work

  • • Contributing to the CCM (Cloud Cost Management) team at Harness
  • • Focused on the Billing Data Ingestion Pipeline for cloud cost analytics
Cloud Cost Management
Data Ingestion
Billing Pipelines
Java
Spring Boot
Software Engineer

Glance

Jul 2022 – Jul 2025

98% Latency Reduction

From 1 hour to 1 minute

70% Onboarding Improvement

SQL query interface for analysts

30% Cost Reduction

Infrastructure optimization

Key Achievements

  • • Implemented Spark Streaming with Kafka, reducing data latency from 1 hour to 1 minute using Iceberg tables
  • • Architected and deployed a Trino gateway system, optimizing query performance and resource utilization
  • • Engineered SQL query interface for analysts, reducing data onboarding time by 70% and improving metric reporting
  • • Developed a CDC pipeline using Hudi and Debezium, transforming OLTP tables into OLAP for BI analytics
  • • Implemented OPA with Trino for row-level filters and column masking to improve data security compliance
  • • Standardized infrastructure with Helm charts, enabling one-click deployment of Trino, Superset, Hive Metastore, and Airflow
  • • Designed an A/B testing experimentation platform, driving data-driven product decisions
  • • Built a distributed job scheduling pipeline with BullMQ and Redis, reducing content onboarding time by 80%
  • • Developed a content generation pipeline using Celery workers and LLM integration for articles and images
  • • Deployed Vanna.AI for AI-powered analytics, improving data analyst productivity
  • • Built enterprise configuration store with SSO authentication and rule-based ACLs
Apache Spark
Kafka
Trino
Apache Iceberg
Hudi
Debezium
Vanna.AI
LLM
BullMQ
Redis
Celery
Helm
OPA
Airflow
Leadership & Technical Responsibilities
System Design (LLD/HLD)
Team Mentoring
Code Reviews
Cross-functional Coordination

Projects

Featured Projects

Data Engineering & AI

AI-Powered Analytics Platform
Professional

Deployed Vanna.AI to enable natural language querying over data warehouse — analysts query data without writing SQL

  • Natural language to SQL
  • Integrated with Trino query engine
  • Improved analyst productivity
Vanna.AI
LLM
Python
Apache Spark
Trino
Real-time Data Streaming Pipeline
Professional

Replaced hourly batch jobs with Spark Streaming + Kafka, cutting data latency from 1 hour to under 1 minute

  • 98% latency reduction
  • OLAP-ready Iceberg tables
  • CDC with Debezium
Apache Spark
Kafka
Trino
Apache Iceberg
Hudi
Trino Monitoring Stack
Open Source

Production-grade observability for Trino clusters — JMX metrics scraped by Prometheus and visualized in Grafana

  • Cluster-level JMX metrics
  • Pre-built Grafana dashboards
  • Query performance tracking
Trino
JMX
Prometheus
Grafana
Docker
Code
Trino + Apache Ranger Security
Open Source

Fine-grained authorization for Trino using Apache Ranger — table, column, and row-level access policies via Docker

  • Column & row-level policies
  • Ranger admin UI
  • Reproducible Docker setup
Trino
Apache Ranger
Docker
SQL
Code
Trino + Hive Data Lakehouse
Open Source

End-to-end local data lakehouse setup — Trino query engine over Hive Metastore with MinIO object storage

  • Local lakehouse environment
  • Hive Metastore integration
  • S3-compatible storage
Trino
Hive Metastore
MinIO
Docker Compose
Code

Distributed Systems

Remote Code Execution Engine

Microservice that securely executes arbitrary code in isolated environments with concurrency control and resource limits

  • Sandboxed execution
  • Concurrent job management
  • Resource isolation
Node.js
Docker
Redis
Express.js
Code
Multiplayer Real-time Game Server

In-memory game server for multiplayer Tic-Tac-Toe — real-time state sync across clients with Socket.IO

  • Real-time bidirectional events
  • In-memory game state
  • Room-based matchmaking
Socket.IO
Node.js
React
In-memory state
Code
Food Ordering Platform

Full-stack food ordering app with cart management, restaurant listings, auth, and order tracking

  • 56 GitHub stars
  • End-to-end full-stack
  • Firebase auth & storage
Vue.js
Node.js
MongoDB
Express.js
Firebase
Code

Contact

Get In Touch

Let's Connect

I'm always interested in discussing data engineering challenges, AI innovations, and opportunities to build impactful solutions. Feel free to reach out!

Send a Message