Interviews for Data Scientists - which traits and skills are important for a Data Scientist? Which questions should you be able to answer as a Data Scientist?
The following is a typical skillset I expect from a data scientist. It might be that there are some data scientists with a different skillset. This is absolutely ok, but I would certainly ask why it is the case.
- Statistics: A/B Testing, Confidence intervalls
- Programming Languages: Python or R - the following points are only for Python, as I don't know R well enough for them.
- Exploratory Data Analysis: Pandas, Jupyter Notebooks
- What are you passionate about?
- How would you explain an A/B test to an engineer with no statistics background?
- Do you think Data Science is important? Why so?
- What is the curse of dimensionality? → answer
- How can you reduce the dimensionality? → PCA, LDA, Auto-Encoders. See Wikipedia for more.
- Is more data always better?
- It depends on the quality of your data.
- It depends on your model.
- You have to deal with this amount as well (storage, memory, computational power)
- Which scoring/distance/similarity functions do you know? → Euclidean distance, cosine distance, MSE, MAE, ...
- You do you deal with imbalenced data? → Oversampling; different error metrics
- How can you start EDA?
- CSV-data: Feature ranges, null-values, covariance
- Image-data: Eigenfaces, Fisher-Faces, Average image, t-SNE
- When do you stop EDA?
This is about building regression models or classifiers
- Which models do you know → Linear Regression, Gradient Boosting, Neural Network, Random Forests, Decision Trees, ...
- How do you decide which model to use?
- How can you improve a model? → page 15, point I1 to I7
- How can you determine which features are the most im- portant in your model? → answer