What is the difference between a model parameter and a hyperparameter?

Parameters are learned from data during training (weights, biases). Hyperparameters are set before training and control the learning process (learning rate, number of layers, batch size). You tune hyperparameters with cross-validation; you do not tune parameters directly.

What is transfer learning and when is it most beneficial?

Transfer learning uses a model pre-trained on a large dataset as a starting point. Most beneficial when: labelled data is scarce, compute budget is limited, or domains are similar (ImageNet to medical imaging). Fine-tune the last layers; freeze early layers.

Explain backpropagation. What is it actually computing?

Backpropagation computes the gradient of the loss function with respect to each weight using the chain rule. It propagates error signal from output layer backwards. The gradient tells the optimiser (SGD/Adam) how to adjust each weight to reduce loss.

What is the vanishing gradient problem? How is it addressed in modern deep learning?

In deep networks with sigmoid/tanh activations, gradients shrink exponentially during backprop — early layers learn very slowly. Solutions: ReLU activations, residual connections (ResNet skip connections), batch normalisation, gradient clipping for RNNs.

What is model drift? How do you detect and handle it?

Data drift: input distribution shifts over time. Concept drift: the relationship between inputs and output changes. Detect with: monitoring prediction score distributions, input feature statistics, and business KPIs. Handle with: scheduled retraining, online learning.

How would you deploy a machine learning model to production at scale?

Serve via REST API (FastAPI + uvicorn), containerise with Docker, orchestrate with Kubernetes. For low-latency: ONNX runtime or TensorRT. Use a model registry (MLflow) for versioning. A/B test new models via traffic splitting. Monitor prediction latency and drift.

What is the difference between batch inference and real-time inference?

Batch: run inference on a large dataset offline, results stored (e.g. daily churn predictions). Real-time: single-sample inference on demand (e.g. fraud detection during a transaction). Batch: higher throughput, cheaper. Real-time: low-latency requirement, more infrastructure complexity.

Tell me about an ML model you trained, validated, and deployed. What was the end-to-end pipeline?

Cover all stages: problem framing, data collection/cleaning, feature engineering, model selection, training, evaluation, deployment, monitoring. Highlight one non-trivial decision at each stage. Show you understand the full MLOps lifecycle, not just modelling.

A model performs well in testing (90% accuracy) but poorly in production (65%). What are the likely causes?

Training-serving skew: test data distribution differs from production. Target leakage: a feature in training was not available in production. Check feature distributions at serving time vs training time. Log and analyse production inputs.

How do you version control ML models and datasets?

Model versioning: MLflow, DVC, or Weights & Biases — track hyperparameters, metrics, artifacts per experiment. Dataset versioning: DVC or cloud storage with immutable versioned paths. Never overwrite a dataset used in training.

Amazon · data

Amazon Machine Learning Engineer Interview Questions 2026

Preparation guide for Machine Learning Engineer positions at Amazon India. Covers their Online Assessment → Technical × 2 → Bar Raiser process with technical, behavioral, and HR questions.

Interview rounds: 4
Avg. package: 18–45 LPA
Role type: data