The
decision tree Algorithm belongs to the family of supervised machine learning algorithms.
It can be used for both a classification problem as well as for regression
problem.
The goal of this algorithm
is to create a model that predicts the value of a target variable, for which
the decision tree uses the tree representation to solve the problem in which
the leaf node corresponds to a class label and attributes are represented on
the internal node of the tree.
Lets take the below data example-
Now we have to predict which company which job profile and degree have salary more than 100k, Here 1 means 'Yes' and 0 means 'No'.
A decesion tree will be created like this-
We can see google and ABC pharma employees have salaries below and above 100K for different positions while all facebook employees are getting more than 100K
Now come to the problem statement-
With the help of python we can solve this problem, Please find below the step-wise codes-
1. Loading the data set
import pandas as pd
df = pd.read_csv("D:/Python Tutorials/Machine Learning/salaries.csv")
df.head(4)
Output:
Selecting columns as inputs and targets required-
inputs = df[["company", "job", "degree"]]
inputs.head(4)
target = df["salary_more_then_100k"]
target.head(4)
Output:
As machine laerning algo only works on numbers, we need to change columns values into numbers
Need to import some libaries. Lets have a look on codes-
from sklearn.preprocessing import LabelEncoder
le_company = LabelEncoder()
le_job = LabelEncoder()
le_degree = LabelEncoder()
inputs['company_n'] = le_company.fit_transform(inputs["company"])
inputs['job_n'] = le_company.fit_transform(inputs["job"])
inputs['degree_n'] = le_company.fit_transform(inputs["degree"])
inputs.head(4)
Selecting columns having only numeric values-
inputs_n = inputs[["company_n", "job_n", "degree_n"]]
Output:
Applying the decesion tree algorithm and checking the algo score-
Libraries we need to import-
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
model = tree.DecisionTreeClassifier()
model.fit(inputs_n, target)
Checking score-
model.score(inputs_n, target)
Output: 1.0
Output near to 1 is good for any algorithm,
Now we can predict for which company which job profile and degree has salary more than 100K
model.predict([[2,2,1]])
Output: array([0], dtype=int64)
So for comany google, sales executive and having masters degree salary is below 100K.
Thats all for the decesion tree.
If you want to read more about decesion tree please go through the following link-
Soon I will share more Machine Learning important algorithms, If you want to learn data science messsage me on "goodluckankur@gmail.com"
Comments
Post a Comment
If You have any doubt please let me know