A Data Engineer & AI/ML Engineer by craft.
Building petabyte-scale pipelines, real-time Kafka streaming systems, and ML-powered data products across AWS, GCP, and Azure.
I'm Kiran Paritala — a Data Engineer and AI/ML Engineer with an MS in Computer Science from Cleveland State University (May 2026). I build systems that move data reliably at scale and layer intelligence on top of it. At Airtel — India's largest telecom — I spent nearly three years engineering petabyte-scale ETL pipelines, Medallion Architecture across Azure and GCP, and real-time Kafka streaming systems. During my MS, I built four production-grade AI/ML projects: a Water Quality ML Pipeline, a zero-cost RAG chatbot, an autonomous LangGraph AI agent, and a Retail BI pipeline serving Power BI dashboards.
I don't wait for problems to escalate — I look for them before they do.
At Airtel, query costs were quietly climbing across reporting systems. Nobody flagged it. I dug into execution plans, identified missing partitioning and clustering strategies, and rebuilt the table architecture — cutting scan costs by 60–70% without touching a single downstream report.
When I built the Ask Athreya AI agent, the model kept hallucinating column names. The obvious fix was prompt engineering. The real fix was injecting the actual dataset schema into the system prompt at startup — eliminating the entire class of errors in one structural change.
Data problems are rarely about the data. They're about the assumptions built into the system that nobody questioned.
Medallion Architecture IoT pipeline with Isolation Forest anomaly detection (Δ=0.78) and a global LSTM model predicting threshold breaches 1 hour early. 60–70% BigQuery cost reduction on TB-scale data with idempotent MERGE writes and a 5-check quality gate.
RAG-powered chatbot that reads any PDF or Word document and answers questions using Retrieval Augmented Generation — fully locally at zero API cost. HuggingFace all-MiniLM-L6-v2 runs on-device with no disk writes. Multi-query retrieval expands vague queries into semantically related variants for better recall on ambiguous inputs. A custom prompt template extracts exact names, dates, and skills from retrieved context rather than returning empty responses.
AI agent built with LangChain and LangGraph that answers plain-English questions about CSV/Excel files using 4 custom pandas tools selected autonomously. Eliminated column-name hallucination by injecting the actual dataset schema into the system prompt at startup. Multi-turn memory via LangGraph checkpointing so follow-up questions like "list them" work correctly. 34-test pytest suite plus a separate eval-harness that caught a false-positive bug unit tests missed entirely.
Daily ETL pipeline ingesting retail order data from PostgreSQL, REST APIs, and S3 CSV files into Databricks. PySpark Medallion Architecture (Bronze→Silver→Gold) with dbt staging and mart models calculating monthly revenue, MoM growth, revenue rank, and top 10 products. Data quality framework covers null checks, duplicate detection, and referential integrity validation. Orchestrated with Apache Airflow at 6AM daily with retry logic and failure alerting. Analytics-ready data loaded to Snowflake and BigQuery — served via Power BI semantic models.
2+ years across multi-cloud platforms, big-data tooling, and AI/ML frameworks.
Get in touch
Open to full-time Data Engineer & AI/ML Engineer roles. Let's connect and build something that scales.