Nilanjan DebData Engineer & Software Developer
Software Engineer at Harness working on cloud cost management. Previously built scalable data pipelines and AI-driven analytics at Glance. BITS Pilani alumni.
About
Who I Am
Currently a Software Engineer at Harness, contributing to the CCM (Cloud Cost Management) team with a focus on the Billing Data Ingestion Pipeline. Previously spent over three years at Glance building large-scale data infrastructure — from real-time streaming pipelines and data lake architectures to AI-powered analytics and internal developer tooling.
I graduated from BITS Pilani with a Bachelor of Engineering in Computer Science, where I built a strong foundation across Data Structures, Algorithms, Operating Systems, Databases, and Networks. That foundation shapes how I approach system design — with an emphasis on correctness, scalability, and performance.
My work spans the full data stack: streaming pipelines with Spark and Kafka, query engines with Trino and Iceberg, AI integrations with LLMs, and cloud cost infrastructure. I care about building systems that are reliable, efficient, and observable.
Data Engineering
Big Data, ETL, Streaming Pipelines
AI & Analytics
LLM Integration, AI-driven Workflows
Full-Stack Dev
APIs, Microservices, Web Platforms
System Design
LLD, HLD, Performance Optimization
Experience
Professional Experience
Harness
Key Work
- • Contributing to the CCM (Cloud Cost Management) team at Harness
- • Focused on the Billing Data Ingestion Pipeline for cloud cost analytics
Glance
98% Latency Reduction
From 1 hour to 1 minute
70% Onboarding Improvement
SQL query interface for analysts
30% Cost Reduction
Infrastructure optimization
Key Achievements
- • Implemented Spark Streaming with Kafka, reducing data latency from 1 hour to 1 minute using Iceberg tables
- • Architected and deployed a Trino gateway system, optimizing query performance and resource utilization
- • Engineered SQL query interface for analysts, reducing data onboarding time by 70% and improving metric reporting
- • Developed a CDC pipeline using Hudi and Debezium, transforming OLTP tables into OLAP for BI analytics
- • Implemented OPA with Trino for row-level filters and column masking to improve data security compliance
- • Standardized infrastructure with Helm charts, enabling one-click deployment of Trino, Superset, Hive Metastore, and Airflow
- • Designed an A/B testing experimentation platform, driving data-driven product decisions
- • Built a distributed job scheduling pipeline with BullMQ and Redis, reducing content onboarding time by 80%
- • Developed a content generation pipeline using Celery workers and LLM integration for articles and images
- • Deployed Vanna.AI for AI-powered analytics, improving data analyst productivity
- • Built enterprise configuration store with SSO authentication and rule-based ACLs
Projects
Featured Projects
Data Engineering & AI
Deployed Vanna.AI to enable natural language querying over data warehouse — analysts query data without writing SQL
- • Natural language to SQL
- • Integrated with Trino query engine
- • Improved analyst productivity
Replaced hourly batch jobs with Spark Streaming + Kafka, cutting data latency from 1 hour to under 1 minute
- • 98% latency reduction
- • OLAP-ready Iceberg tables
- • CDC with Debezium
Production-grade observability for Trino clusters — JMX metrics scraped by Prometheus and visualized in Grafana
- • Cluster-level JMX metrics
- • Pre-built Grafana dashboards
- • Query performance tracking
Fine-grained authorization for Trino using Apache Ranger — table, column, and row-level access policies via Docker
- • Column & row-level policies
- • Ranger admin UI
- • Reproducible Docker setup
End-to-end local data lakehouse setup — Trino query engine over Hive Metastore with MinIO object storage
- • Local lakehouse environment
- • Hive Metastore integration
- • S3-compatible storage
Distributed Systems
Microservice that securely executes arbitrary code in isolated environments with concurrency control and resource limits
- • Sandboxed execution
- • Concurrent job management
- • Resource isolation
In-memory game server for multiplayer Tic-Tac-Toe — real-time state sync across clients with Socket.IO
- • Real-time bidirectional events
- • In-memory game state
- • Room-based matchmaking
Full-stack food ordering app with cart management, restaurant listings, auth, and order tracking
- • 56 GitHub stars
- • End-to-end full-stack
- • Firebase auth & storage
Contact
Get In Touch
Let's Connect
I'm always interested in discussing data engineering challenges, AI innovations, and opportunities to build impactful solutions. Feel free to reach out!
GitHub
github.com/nil1729Phone
+91-93664-96119