Kiran Paritala

About Me

Engineer by craft

I'm Kiran Paritala — a Data Engineer and AI/ML Engineer with an MS in Computer Science from Cleveland State University (May 2026). I build systems that move data reliably at scale and layer intelligence on top of it. At Airtel — India's largest telecom — I spent nearly three years engineering petabyte-scale ETL pipelines, Medallion Architecture across Azure and GCP, and real-time Kafka streaming systems. During my MS, I built four production-grade AI/ML projects: a Water Quality ML Pipeline, a zero-cost RAG chatbot, an autonomous LangGraph AI agent, and a Retail BI pipeline serving Power BI dashboards.

How I Find and Solve Problems

I don't wait for problems to escalate — I look for them before they do.

At Airtel, query costs were quietly climbing across reporting systems. Nobody flagged it. I dug into execution plans, identified missing partitioning and clustering strategies, and rebuilt the table architecture — cutting scan costs by 60–70% without touching a single downstream report.

When I built the Ask Athreya AI agent, the model kept hallucinating column names. The obvious fix was prompt engineering. The real fix was injecting the actual dataset schema into the system prompt at startup — eliminating the entire class of errors in one structural change.

Data problems are rarely about the data. They're about the assumptions built into the system that nobody questioned.

Projects · Portfolio

Personal Project · AWS · GCP

Water Quality ML Pipeline

1 hr
Predictive Alert60–70%
BQ Cost SavedTB+
Pipeline Scale

Medallion Architecture IoT pipeline with Isolation Forest anomaly detection (Δ=0.78) and a global LSTM model predicting threshold breaches 1 hour early. 60–70% BigQuery cost reduction on TB-scale data with idempotent MERGE writes and a 5-check quality gate.

PySparkTensorFlowGCP BigQueryAirflowLSTM

▶ View on GitHub

Personal Project · Local · Zero API Cost

RAG-Reader

$0
API CostLocal
EmbeddingsMulti‑Q
Retrieval

RAG-powered chatbot that reads any PDF or Word document and answers questions using Retrieval Augmented Generation — fully locally at zero API cost. HuggingFace all-MiniLM-L6-v2 runs on-device with no disk writes. Multi-query retrieval expands vague queries into semantically related variants for better recall on ambiguous inputs. A custom prompt template extracts exact names, dates, and skills from retrieved context rather than returning empty responses.

LangChainHuggingFaceGroqRAG

▶ View on GitHub

Personal Project · LangGraph · AI Agent

Ask Athreya — AI Data Analyst

34
Pytest Tests4
Pandas ToolsMulti
Turn Memory

AI agent built with LangChain and LangGraph that answers plain-English questions about CSV/Excel files using 4 custom pandas tools selected autonomously. Eliminated column-name hallucination by injecting the actual dataset schema into the system prompt at startup. Multi-turn memory via LangGraph checkpointing so follow-up questions like "list them" work correctly. 34-test pytest suite plus a separate eval-harness that caught a false-positive bug unit tests missed entirely.

LangChainLangGraphGroq Llama 3.3pandas

▶ View on GitHub

Personal Project · ETL · Business Intelligence

Retail Product Performance Dashboard

Daily 6AM
Airflow DAG5 KPIs
Revenue Tracked4 Sources
Ingested

Daily ETL pipeline ingesting retail order data from PostgreSQL, REST APIs, and S3 CSV files into Databricks. PySpark Medallion Architecture (Bronze→Silver→Gold) with dbt staging and mart models calculating monthly revenue, MoM growth, revenue rank, and top 10 products. Data quality framework covers null checks, duplicate detection, and referential integrity validation. Orchestrated with Apache Airflow at 6AM daily with retry logic and failure alerting. Analytics-ready data loaded to Snowflake and BigQuery — served via Power BI semantic models.

PySparkdbtAirflowDatabricksSnowflake

▶ View on GitHub

Expertise

Technical stack

2+ years across multi-cloud platforms, big-data tooling, and AI/ML frameworks.

📥

Ingest

Apache Kafka Kafka Connect REST APIs Kafka CDC MySQL PostgreSQL

→

⚡

Process

PySpark Python Apache Spark pandas Databricks Apache Airflow AWS Glue

→

🏔

Store

Delta Lake Apache Iceberg Snowflake BigQuery AWS Redshift Azure Synapse Avro

→

🛡

Govern

dbt Great Expectations MLflow DataHub Terraform Grafana Git / GitHub

→

🧠

AI / ML

LangChain LangGraph TensorFlow HuggingFace scikit-learn Groq RAG

☁ Cloud Platforms

AWS — S3 · Glue · Redshift GCP — BigQuery · Dataflow Azure — Synapse · Data Lake Gen2 Databricks Power BI · Tableau

Background

Work experience

Graduate Teaching Assistant

Cleveland State University · Cleveland, OH

Jan 2026 – May 2026

→Led weekly SQL and Python lab sessions for 50+ students covering query optimization, complex joins, and real-world data engineering workflows.
→Held office hours diagnosing SQL and Python logic errors; collaborated with professor to improve overall class performance.

Software Engineer — Data Engineering

Airtel · India

Oct 2021 – Feb 2024

→Engineered TB-scale ETL/ELT pipelines in Python and PySpark ingesting from MySQL, PostgreSQL, Kafka CDC, REST APIs into BigQuery, Redshift, and Snowflake.
→Architected Medallion Architecture (Bronze/Silver/Gold) achieving 60–70% query scan cost reduction via partitioning, clustering, and Z-ordering.
→Built real-time Kafka streaming pipelines with Spark Structured Streaming for CDC replication into Delta Lake and Apache Iceberg.
→Provisioned cloud infrastructure using Terraform across AWS, GCP, and Databricks — eliminating manual provisioning and reducing environment drift.
→Built Grafana dashboards and Airflow SLA alerts routed to Slack and PagerDuty; participated in on-call incident response.

MS in Computer Science

Cleveland State University · Cleveland, OH

May 2026

Engineer by craft

Water Quality ML Pipeline

RAG-Reader

Ask Athreya — AI Data Analyst

Retail Product Performance Dashboard

Technical stack

Work experience

Let's buildsomething great.

Let's build
something great.