InterviewEra.com

AI-powered mock interviews and resume-aware scoring — built for Indian campus and early-career hiring. Now in private beta.

Start Mock Interview
Secure payments via Razorpay

Tools

  • Question Generator
  • ATS Resume Checker
  • STAR Builder

Interview Questions

  • Software Engineer Questions
  • TCS Interview Questions
  • React Interview Questions

Resources

  • Blog
  • Placement Guide
  • STAR Method Guide

Company

  • About
  • Pricing
  • Contact

© 2026 InterviewEra.com. All rights reserved.

Privacy PolicyTermsRefundRanchi, Jharkhand, India
Interview Questions›Data Scientist

data · Experienced

Data Scientist Interview Questions India 2026

Data Scientist interview questions on ML algorithms, feature engineering, model evaluation, and statistical inference.

data role12 curated questionsUpdated 2026

Data Scientists Interview Questions

Placement-oriented · Updated 2026
  1. 01

    What is the difference between classification and regression?

    TechnicalEasy

    Tip: Classification predicts a discrete category (spam/not spam). Regression predicts a continuous value (house price). Logistic regression is classification despite the name — a common exam trap.

  2. 02

    Explain the bias-variance trade-off. How does it guide model selection?

    TechnicalMedium

    Tip: Bias: error from wrong assumptions (underfitting — model too simple). Variance: error from sensitivity to training data (overfitting — model too complex). Goal: sweet spot that generalises. Regularisation trades some variance for lower bias.

  3. 03

    What is cross-validation? Why is it better than a simple train-test split?

    TechnicalMedium

    Tip: k-Fold CV splits data into k folds, trains k times each using a different fold as validation. Averages performance across folds for a more reliable estimate than a single split. Especially important for small datasets.

  4. 04

    What is the difference between L1 (Lasso) and L2 (Ridge) regularisation?

    TechnicalHard

    Tip: L1 (sum of absolute weights): produces sparse models by driving some weights to exactly 0 — acts as feature selection. L2 (sum of squared weights): shrinks all weights towards 0 but rarely to exactly 0. Use L1 for feature selection, L2 for general regularisation.

  5. 05

    What is the confusion matrix? Define precision, recall, and F1 score.

    TechnicalMedium

    Tip: Precision = TP/(TP+FP) — of all predicted positives, how many are correct. Recall = TP/(TP+FN) — of all actual positives, how many did we catch. F1 = harmonic mean. High-precision when false positives are costly; high-recall when false negatives are costly.

  6. 06

    Explain gradient descent and the difference between batch, stochastic, and mini-batch variants.

    TechnicalHard

    Tip: Gradient descent: update weights opposite to the gradient of loss. Batch: uses ALL training data per step — accurate but slow. SGD: one sample per step — noisy but fast. Mini-batch: k samples per step — best of both, what deep learning uses (k typically 32–256).

  7. 07

    What is overfitting? How do you detect it and what techniques prevent it?

    TechnicalMedium

    Tip: Overfitting: model memorises training data, poor on unseen data. Detect: training accuracy significantly exceeds validation accuracy. Prevent: cross-validation, regularisation, dropout (neural nets), early stopping, more training data, feature reduction.

  8. 08

    Tell me about a data science project with a measurable business impact.

    BehavioralMedium

    Tip: STAR format. Quantify impact wherever possible: "reduced churn by 8%", "improved conversion by 12L/month." The more specific the number, the more credible the answer.

  9. 09

    Your classification model has 95% accuracy but the client is unhappy. What might be wrong?

    SituationalHard

    Tip: Class imbalance: if 95% of data is class A, a model that always predicts A gets 95% accuracy without learning anything. Check precision/recall for the minority class. Also check if the metric the client cares about is accuracy at all — it often is not.

  10. 10

    What is feature engineering? Give a concrete example that improved a model.

    TechnicalMedium

    Tip: Feature engineering: creating new input features from existing data. Examples: log-transforming skewed salary data, extracting day-of-week from a timestamp, creating interaction terms. Good features often matter more than algorithm choice.

  11. 11

    How do you communicate model results to a non-technical business stakeholder?

    BehavioralMedium

    Tip: Lead with the business recommendation, not the methodology. Use plain language: "the model correctly identifies 9 out of 10 at-risk customers" beats "recall is 0.9". Use visuals. Avoid model-jargon entirely unless asked.

  12. 12

    Python vs R for data science — which do you prefer, and when is R the better choice?

    HREasy

    Tip: Python: general-purpose, better ML libraries (sklearn, PyTorch), production-deployment friendly. R: superior for statistical analysis, publication-quality plotting (ggplot2), bioinformatics. In Indian tech companies, Python is overwhelmingly preferred.

Key topics to prepare for Data Scientist interviews

Recruiters test these skill areas specifically. Click any topic to see curated questions.

PythonSQLMLPandas

Practice, not just reading

Get scored on your Data Scientist answers

Upload your resume and practice a full Data Scientist mock interview with AI-generated questions and rubric-based scoring across 5 dimensions — free to start.

Start free mock interviewGenerate more questions free

Companies hiring Data Scientists

  • Google DS questions
  • Microsoft DS questions
  • IBM DS questions

Practice tools

  • DS question generator
  • DS ATS checker
  • STAR answer builder

Other data roles

  • Data Analyst questions
  • Machine Learning Engineer questions

Guides and resources

  • All interview questions
  • STAR method with examples
  • HR interview answer tips