• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

Data Scientist Interviews

Contents

  • Skillset
  • Questions
    • Conversation Starters
    • Concepts
    • Classification
    • EDA
    • Model Building

Interviews for Data Scientists - which traits and skills are important for a Data Scientist? Which questions should you be able to answer as a Data Scientist?

Skillset

The following is a typical skillset I expect from a data scientist. It might be that there are some data scientists with a different skillset. This is absolutely ok, but I would certainly ask why it is the case.

  • Statistics: A/B Testing, Confidence intervalls
  • Programming Languages: Python or R - the following points are only for Python, as I don't know R well enough for them.
  • Exploratory Data Analysis: Pandas, Jupyter Notebooks

Questions

Conversation Starters

  • What are you passionate about?
  • How would you explain an A/B test to an engineer with no statistics background?
  • Do you think Data Science is important? Why so?

Concepts

  • What is the curse of dimensionality? → answer
  • How can you reduce the dimensionality? → PCA, LDA, Auto-Encoders. See Wikipedia for more.
  • Is more data always better?
    • It depends on the quality of your data.
    • It depends on your model.
    • You have to deal with this amount as well (storage, memory, computational power)

Classification

  • Which scoring/distance/similarity functions do you know? → Euclidean distance, cosine distance, MSE, MAE, ...
  • You do you deal with imbalenced data? → Oversampling; different error metrics

EDA

  • How can you start EDA?
    • CSV-data: Feature ranges, null-values, covariance
    • Image-data: Eigenfaces, Fisher-Faces, Average image, t-SNE
  • When do you stop EDA?

Model Building

This is about building regression models or classifiers

  • Which models do you know → Linear Regression, Gradient Boosting, Neural Network, Random Forests, Decision Trees, ...
  • How do you decide which model to use?
  • How can you improve a model? → page 15, point I1 to I7
  • How can you determine which features are the most im- portant in your model? → answer

Published

Jun 14, 2018
by Martin Thoma

Category

Machine Learning

Tags

  • Data Science 7
  • Machine Learning 81

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor