Data science, also known as data-driven decision, refers to an interdisciplinary field that deals with scientific processes, operations, and scientific practices. It is a multidisciplinary area that focuses on scientific practices, processes and operations to extract knowledge from data in many forms and make decisions based on this knowledge or experience. Data science is complex and requires data scientists to be familiar with many things. We have compiled a list of questions you might face in a data science interview. Let’s start with the basics of data science.
1. What do you know about Data Science?
Data Science is a combination of algorithms, tools, and machine learning methods that help one to discover basic hidden patterns in the raw data.
2. How can logistic regression be achieved?
Logistic regression is a method of estimating the relationship between the dependent variable (label for what we need to predict), and one or more additional independent variables (features). It involves measuring probability while pursuing its underlying logistic purpose (sigmoid).
3. What is the difference between unsupervised and supervisable machine learning?
Supervised LearningUnsupervised learningUses known and identifying input data. Supervised machine-learning has a feedback device. Logistic regression, decision trees, support vector machines are the most common supervised learning algorithms. It takes unlabeled data into account. Unsupervised machine learning does not have a feedback tool. Hierarchical clustering and k-means are the most popular unsupervised learning algorithms. 4. Three types of biases can occur during sampling.
There are three types of biases in the sampling method:
Under coverage bias
Selection bias
Survivorship bias
5. What are the Assumptions for Linear Regression
Linear relationship between Y, X, and Y should exist.
These characteristics must be independent of each other.
Homoscedasticity is the variation in the output that must be constant for different input data.
The Normal Distribution must be used to determine the distribution of Y and X.
6. Consider the steps involved in creating a decision tree.
Select the entire data set to be input.
Estimate the entropy of your target variable and the predictor characteristics
Add all properties to your information
As the root node, choose the attribute that provides the most information gain
Repeat the same procedure on each branch until you reach the decision node.
7. Define Selection Bias.
Selection bias refers to an error made by a researcher when they choose who will be studied. It is usually associated with research that involves a selection of participants. It is the result of sampling. The study’s conclusions may not be valid if the selection bias isn’t taken into account.
8. Explain the difference between Regression and Classification.
Regression
Regression predicts the quantity.
Data for regression can be both discrete and continuous.
Time series forecasting can be made if input data are derived with respect to the time.
Classification
Binary Classification is the Classification query for two classes.
Classification can be divided into Multi-Label Classification and Multi-Class Classification.
We place more emphasis on efficiency in Classification, while we are more focused on the mistake term for Regression.
9. Which technique is used to predict categorical responses
Mining for data sets classification is a common method.
10. Explain the benefits of dimensionality reduction.
Dimensionality reduction is a method of transforming large data sets into data with smaller dimensions (fields), to transmit similar information quickly.
This reduces storage space and data. It also reduces computation time because of the multiple dimensions that lead to fewer commas.