What is the difference between classification and regression?

Classification predicts a discrete category (spam/not spam). Regression predicts a continuous value (house price). Logistic regression is classification despite the name — a common exam trap.

Explain the bias-variance trade-off. How does it guide model selection?

Bias: error from wrong assumptions (underfitting — model too simple). Variance: error from sensitivity to training data (overfitting — model too complex). Goal: sweet spot that generalises. Regularisation trades some variance for lower bias.

What is cross-validation? Why is it better than a simple train-test split?

k-Fold CV splits data into k folds, trains k times each using a different fold as validation. Averages performance across folds for a more reliable estimate than a single split. Especially important for small datasets.

What is the difference between L1 (Lasso) and L2 (Ridge) regularisation?

L1 (sum of absolute weights): produces sparse models by driving some weights to exactly 0 — acts as feature selection. L2 (sum of squared weights): shrinks all weights towards 0 but rarely to exactly 0. Use L1 for feature selection, L2 for general regularisation.

What is the confusion matrix? Define precision, recall, and F1 score.

Precision = TP/(TP+FP) — of all predicted positives, how many are correct. Recall = TP/(TP+FN) — of all actual positives, how many did we catch. F1 = harmonic mean. High-precision when false positives are costly; high-recall when false negatives are costly.

Explain gradient descent and the difference between batch, stochastic, and mini-batch variants.

Gradient descent: update weights opposite to the gradient of loss. Batch: uses ALL training data per step — accurate but slow. SGD: one sample per step — noisy but fast. Mini-batch: k samples per step — best of both, what deep learning uses (k typically 32–256).

What is overfitting? How do you detect it and what techniques prevent it?

Overfitting: model memorises training data, poor on unseen data. Detect: training accuracy significantly exceeds validation accuracy. Prevent: cross-validation, regularisation, dropout (neural nets), early stopping, more training data, feature reduction.

Tell me about a data science project with a measurable business impact.

STAR format. Quantify impact wherever possible: "reduced churn by 8%", "improved conversion by 12L/month." The more specific the number, the more credible the answer.

Your classification model has 95% accuracy but the client is unhappy. What might be wrong?

Class imbalance: if 95% of data is class A, a model that always predicts A gets 95% accuracy without learning anything. Check precision/recall for the minority class. Also check if the metric the client cares about is accuracy at all — it often is not.

What is feature engineering? Give a concrete example that improved a model.

Feature engineering: creating new input features from existing data. Examples: log-transforming skewed salary data, extracting day-of-week from a timestamp, creating interaction terms. Good features often matter more than algorithm choice.

Microsoft · data

Microsoft Data Scientist Interview Questions 2026

Preparation guide for Data Scientist positions at Microsoft India. Covers their Online Assessment → Technical × 3 → Hiring Manager process with technical, behavioral, and HR questions.

Interview rounds: 4
Avg. package: 20–55 LPA
Role type: data