The Basics of Classification Trees
Classification trees are another method of classifying unknown points into different groups, and they have several advantages over more complex machine learning strategies such as neural networks. In this skill, we'll be taking a look at how classification trees work, as well as building our own algorithm in Jupyter that will create them for us. Let's get started by talking about the basics of classification trees.
Knowledge Check
A region that has a roughly even mixture of points for different groups has ______________.
- Ahigh entropy
- Blow entropy
- Chigh enthalpy
- Dlow enthalpy
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Finding Possible Splits
Now that we're a little more familiar with how classification trees work and how they're created, let's jump in and start implementing our own algorithm for constructing a classification tree. We start here by seeing how to determine what all the possible splits are so that we can compare them later.
Knowledge Check
When testing splits on a numerical attribute, the most straightforward way is to simply use all the unique values in that column as possible split points.
- A
- B
- C
- D
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Deciding on the Best Split
Next, let's see how we can decide which of all possible splits is the best one by comparing entropies.
Knowledge Check
The entropy equation uses log base _______
- A2
- B3
- C10
- D16
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Splitting Recursively
Finally, in order to construct our classification tree, we'll need to repeatedly find the best split in smaller and smaller groups. Let's do it.
Knowledge Check
When we find that a split produces a group where all the datapoints have the same label, we can turn that into a "leaf" in the classification tree
- A
- B
- C
- D
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Challenge & Solution: Maximum Tree Depth
Now it's time for a challenge! In this challenge, you'll be asked to add a "maximum depth" parameter to the function we created. Watch the video for more information.
And now that you've attempted the challenge, I'll show you how to solve it.
Knowledge Check
Adding a max depth when creating a classification tree prevents underfitting
- A
- B
- C
- D
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
View Transcript
The Basics of Classification Trees
0:00Hi, Sean here and welcome to this skill where we're going to be taking a look
0:04at another
0:05machine learning technique.
0:08And this one is trees.
0:10All right.
0:11Now, specifically, we're going to be taking a look at classification trees.
0:17And these are just a special type of decision tree that's used to classify data
0:24points,
0:25similarly to what we've seen with neural networks, K nearest neighbor, things
0:30like that.
0:31All right.
0:32Now, the basic idea of classification trees is what if we could reduce the
0:37problem of
0:38classifying an unknown data point down to simply a series of questions that
0:44would lead
0:45us to the right result, right?
0:48So let's just say that we're trying to write a program that will predict
0:53whether someone
0:54will qualify for a loan, perhaps, right?
0:58Well, you might ask something like, you know, is there income greater than a
1:03certain amount?
1:05All right.
1:06Here, I'll just write that as income greater than X. And you would have a
1:09specific amount
1:10here.
1:11So if that's true, then you'd go in one direction.
1:14If it's not true, you'd go in another direction.
1:18And at the next steps, you would ask a different question.
1:22Now, this question could be, you know, different in either of these cases.
1:26If the income is greater than X, for example, you might want to know if their
1:31debts are
1:32greater than a certain amount.
1:34And over here, you might want to know, you know, if their income is greater
1:38than X, you
1:38might want to know if they qualify for down payment assistance, for example.
1:44All right.
1:45We'll just say qualify for assistance.
1:51And in each of those cases, you'd split off further and ask different questions
1:56in each
1:56of those until you reach a specific answer, right?
2:00So maybe down here, you get yes, no.
2:03And maybe down here, you get yes and no like so, all right?
2:08So that's the basic idea of a classification tree is again, just by asking a
2:13specific series
2:15of questions that's tailored to the previous answer, right?
2:19We're only asking about debts if the previous answer to this is yes, for
2:23example.
2:23In many cases, we can achieve a very high degree of accuracy in labeling
2:30unknown data
2:31points.
2:32Now, the problem, of course, with classification trees like this comes down to
2:39how do we decide
2:40what questions to ask in the first place, right?
2:43Why did we ask about income first instead of debts first?
2:46Or why do we ask about debts over here instead of qualifying for assistance,
2:51right?
2:51In other words, how do we organize these questions into a classification tree
2:58so that they'll
2:59lead us to the highest possible accuracy, right?
3:03That's really the big question here in working with these classification trees.
3:09And it all comes down to something called entropy and a similar concept here
3:17called information
3:19gain.
3:20I'll just write that as info gain, but these are two concepts that when
3:26combined, we can
3:27use in order to construct our classification trees in the most beneficial way.
3:34All right, so let's discuss what each of these things means.
3:37We'll start off with entropy.
3:39And while you might be familiar with entropy in a physics context, this entropy
3:43is a little
3:44bit different.
3:46Although you will definitely see some similarities.
3:49So up here at the top of this, I've basically just run the same code that we've
3:53been using
3:53in previous skills to create some blobs, which we've assigned group A, B, and C
4:01. And we've
4:01displayed them in a scatter plot, right?
4:05Now essentially what the questions in a decision tree allow us to do is draw
4:12lines either vertically
4:14or horizontally along a specific axis.
4:18Now the only reason that vertical and horizontal are the only two options here
4:23is just because
4:24we're working with only two attributes, but in higher dimensional data sets,
4:30you can obviously
4:30draw different lines than these.
4:34But anyway, let's say that in order to try and tell whether something is in
4:40class A or
4:41not with a classification tree, we might start off by asking, is the X value
4:48greater than,
4:50let's say, negative six.
4:52I drew that line a little bit slanted, but you get the idea.
4:56And then we might ask if the answer is yes, is why greater than, I don't know
5:01about one
5:02or so, right?
5:03So if the answer to both those questions is yes, then the output is yes, it's
5:09in group
5:10A, otherwise the output is no.
5:13So a decision tree or a classification tree for this situation would look like
5:18this.
5:19We would start off by asking, is X greater than negative six?
5:25If so, then we would ask, well, really in this situation, if the answer is no,
5:31the answer
5:32would just always be no, all right, you can have decision trees or
5:36classification trees
5:37that just have that, you know, that sort of short circuit like that and just
5:41lead directly
5:42to no.
5:44But if yes, we would ask another question, right?
5:47We would ask, is why greater than, we'll just say 1.0 there, right?
5:53If the answer is yes here, then the ultimate answer to that would be yes.
6:00All right, otherwise the answer would be no.
6:04So this would be a decision tree here.
6:06But again, we just constructed that by looking at a graph.
6:11In most cases, the dimensionality of the data set, right, the number of
6:15attributes that
6:16a data set has is too high for us to, you know, be able to just construct it by
6:22looking
6:22at a scatter plot.
6:24So here's where entropy comes in.
6:28Let's say that we draw a line right here, all right, and we say that everything
6:34over
6:34here on the right hand side is a, and we say that everything over here on the
6:40left hand
6:41side is not a.
6:43Well what entropy refers to is how well that division actually divides that
6:49data set, right?
6:50So we can see in this case over here with not a, that's 100% correct for all of
6:56the points
6:56inside of here.
6:57So this has a very low entropy.
7:02Entropy refers to how mixed the members of a group are, whereas over here on
7:08the right
7:09hand side of this division, we have, we can see that we basically have almost
7:15even numbers
7:16of elements that are in A and that are not in A in this group.
7:23So this group has very high entropy.
7:27Now the basic idea here, we can actually use this concept to know exactly where
7:33to draw
7:34the correct lines.
7:36What we want to do is draw the lines in just the right places so that we
7:42minimize the entropy
7:44of our groups.
7:45All right, and this makes sense if you think about it because if we were to
7:48have different
7:49groups, maybe groups that are easier to divide and we have one group here that
7:52's maybe group
7:53A and we have group B over here, right?
7:58That's just sort of a cluster of points like that.
8:01Well if we were to draw a line like this, that obviously reduces the entropy of
8:06the overall
8:07group because the new split, both of these groups have very low entropy because
8:13there's
8:13no mixture of different points in each group.
8:20However, if we were to draw that line in a slightly different way, right?
8:24If we were to draw that line through here, both those groups have very high
8:28entropy because
8:30they're a mix of elements from different groups.
8:33And in fact, almost an even mix, which as we saw is the highest entropy.
8:37So this idea of dividing our data set based on what reduces the entropy the
8:46most, that
8:48act of reducing the entropy of our data set is referred to as information gain,
8:55right?
8:56So basically what we want to do is choose our divisions based on which division
9:02gives
9:03us the highest information gain.
9:07So anyway, that's the basic idea of working with classification trees and some
9:12of the
9:12fundamental concepts around constructing these classification trees.
9:18So the next thing that we're going to do is we're actually going to jump in and
9:21implement
9:21this algorithm ourselves.
9:23I understand that this might seem a little bit strange, but one thing that I
9:27love about
9:28classification trees is that this is an algorithm that's actually very easy or
9:34relatively
9:35easy, I suppose, to implement on our own.
9:38And I think it will help you to understand these concepts on a much deeper
9:42level.
9:43So anyway, that's what we're going to be doing next is implementing a
9:46classification
9:47tree algorithm, right?
9:49One that actually will build classification trees for us inside Jupiter.
9:54So I hope this has been informative for you and I'd like to thank you for
9:56viewing.
Finding Possible Splits
0:00All right, so now that we're familiar with the basics of how these
0:03classification trees
0:05work and have some idea of how to create these, let's get started creating this
0:11algorithm
0:12that will create decision trees for us.
0:14So the first thing that we're going to do here, and again, this is where it
0:18will really
0:18help us get a deeper understanding of this process by implementing it ourselves
0:23, the
0:24first thing that we're going to do is figure out how to loop through all the
0:30possible ways
0:32that we could split our data.
0:35It's pretty obvious when we have, let's say, an attribute that has a yes or no
0:40value,
0:41such as drinks coffee, for example, if you're looking at a data set of people,
0:48it's pretty
0:49easy to choose where to split it.
0:51There's only one place you could split it, and that is have all of the people
0:55whose answer
0:56is yes in one group, and all of the people whose answer is no in another group.
1:00However, as you can see, even just from this simple blob data that we generated
1:06here, the
1:07attributes can sometimes be more complex than this, in particular with numeric
1:12attributes,
1:13right, such as X and Y here, these have a range, and that means that there's a
1:19lot of places
1:20that we could draw vertical or horizontal lines here, right?
1:25So how do we decide, you know, what the possibilities even are?
1:30Well, when working with numeric data, it's pretty common to try the split along
1:38each of
1:39the specific values, right, each of the unique values in each dimension, right?
1:45So we might try splitting along the X axis at all the unique X values and along
1:50the Y
1:50axis at all the unique Y values.
1:53And this probably seems like a lot of potential splits.
1:57So in some cases, you know, you might want to reduce that number down a little
2:02bit, but
2:02in our case here, it's not going to be too difficult to actually implement that
2:07.
2:07And it's not going to be too computationally expensive to run it.
2:11So here's what this is going to look like.
2:14First of all, what we're going to do is we're going to define a function here,
2:19and we'll
2:19call that function something like find best split.
2:24Okay, and what this is going to do is it's going to take all of the X values as
2:29well
2:30as all of the Y labels, which we'll just call capital X and lowercase Y.
2:36And we're going to assume that this X, by the way, is a data frame.
2:39That'll just allow us to perform some nice operations that'll make it a little
2:44bit easier
2:45to implement this.
2:47So what we're going to do here is we're going to start off by looking through
2:52all of the
2:52potential columns that are in this data frame.
2:57Right?
2:58So in this case, with our blobs, that's pretty easy.
3:00That's just going to be the X and Y columns.
3:03But again, because we're assuming this is a data frame, we don't even have to
3:06know that
3:06much.
3:07We're going to do this before, and then we'll say feature, or you could call it
3:12attribute
3:12in X dot columns.
3:17What we want to do there is we simply want to determine how many different
3:22values there
3:23are in a specific column.
3:26Right?
3:27So if we're looking at this X axis feature, for example, how many different X
3:33axis values
3:33are there?
3:34Right?
3:35So we're going to have a lot of different points we might want to split our
3:38data set
3:39at.
3:40Now, again, data frames have a pretty nice way of doing this.
3:42And that is we can just say possible splits, we'll call this equals.
3:49And then what we're going to do is say X capital X, that is feature.
3:54Right?
3:55So we're looking at that column, and then we're going to say dot unique, right?
4:00You may recognize this method from a lot of other previous skills.
4:06This gives us all of the unique values that that column contains.
4:10All right.
4:11So now that we know all of the possible columns and values in those columns
4:15that we might
4:16want to split on, all we have to do is really test it out to see how each of
4:22those splits
4:23improves or worsens the entropy.
4:28So here's what this is going to look like.
4:30First of all, we'll try splitting by that value that we just got for that
4:35column.
4:36So what that's going to look like is we're going to say here, we'll just say
4:40four, and
4:41we'll call this threshold in possible splits.
4:46All right.
4:47So we're going to try out each of those possible split points in this column.
4:51What we're going to do is we'll start off by just dividing up the groups based
4:57on that
4:58threshold.
4:59So we'll call this group one, and we'll say equals, and we'll use another nice
5:03feature
5:04here of NumPy and pandas.
5:07We're just going to say X and use that feature.
5:13And we're going to test whether or not it's less than or equal to that
5:17threshold.
5:17Okay.
5:18And then we'll say group two, and that's just going to be sort of the opposite
5:21of that.
5:22So we'll say X feature is greater than whatever that threshold was.
5:28Cool.
5:29So now that we have our two groups, and by the way, this won't give us the
5:32groups themselves,
5:33we need to actually say X and then have that X feature thing in there.
5:39All right.
5:40So we'll say X, there we go, X feature, and then put that in square brackets.
5:46So now that we have our two groups, the next thing that we're going to do is we
5:50're going
5:51to calculate the entropy of both of those groups and see what that gives us.
5:58So here's what that's going to look like.
5:59We're going to say entropy one, and I'll talk about the entropy equation in
6:07more detail
6:08a little bit later.
6:09But for now, we're just going to say something like get entropy, or we'll call
6:14that calculate
6:15entropy.
6:16Okay.
6:17And we're just going to pass group one into there, and then we'll say entropy
6:24two equals
6:25calculate entropy of group two.
6:30All right.
6:32So I'm going to just define this as a stub method here.
6:35We'll say define calculate entropy up here, and then we'll say just group.
6:42And for now I'm just going to return something like zero, right?
6:47Or here, we'll return something like one, perhaps.
6:50All right.
6:51Cool.
6:52So now that we've done that, right, we have the entropy of both of those groups
6:57.
6:57So now we need to combine those two entropies to get the full entropy of that
7:04division,
7:05right?
7:06So the way that this works is we basically just need to multiply the individual
7:10entropies
7:10by the size of each of those splits, right?
7:13If one group just has one item in it, we don't want that to have a huge
7:17influence on the
7:18overall entropy versus if the two groups are having even number roughly of
7:25elements.
7:26So here's what that's going to look like.
7:27We're just going to say, and we'll call this something like total entropy.
7:33And what that's going to look like is we're just going to say equals, and we'll
7:37get the
7:38relative sizes by saying group or a length of group one, that is, divided by
7:45the size
7:46of the total group.
7:47So that's just going to be the length of why there.
7:50And in fact, I think I did something wrong up here.
7:52I think this should be why and why because we want the labels instead of the
7:58actual rows
7:58themselves.
7:59All right.
8:00So we're going to, we have the relative size of group one.
8:04So we're going to multiply that by entropy one.
8:08All right.
8:09So entropy one.
8:10And then we're going to add that to the same thing, but for the second group.
8:15So I'm just going to copy this actually, and we'll paste it over here.
8:19And this is just going to be the size, the relative size that is of group two
8:24times the
8:25entropy of group two.
8:27That'll give us the total entropy.
8:29And what we want to do now is check to see how that total entropy compares to
8:36the other
8:36options that we have available.
8:39Right.
8:40So you can probably imagine that as we go through all of these columns and all
8:43the possible
8:44splits, some of those splits are going to be much better than others, right?
8:48A split that goes like right here is much better than a split that goes like
8:53right through
8:54here, right?
8:55That one doesn't really do much of anything for us.
8:58So what we need to do now is just compare the entropies across multiple
9:04features and possible
9:07splits within that feature to find the best one.
9:10And that's what we're going to do next as well as see how to actually calculate
9:14the entropy
9:14of a group in the first place.
9:16So I hope this has been informative for you, and I'd like to thank you for
9:19viewing.
9:20[BLANK_AUDIO]
Deciding on the Best Split
0:00All right, so at this point, we're going through all of the columns and all of
0:05the possible
0:05split values that we might be able to use in each of those columns.
0:10So the next thing that we're going to need to do is compare the relative entrop
0:15ies of
0:16all of those different options to find out which one is the best one.
0:21Now really the starting point for making this happen is going to be to actually
0:26implement
0:27this calculate entropy function.
0:31And really the only thing you need to know about this is that there's a
0:35specific equation
0:36for determining the entropy of a given division.
0:42And that looks like this.
0:43Don't worry.
0:44I'll show you how to implement this in Python in just a minute if you don't
0:47understand
0:48the mathematical syntax.
0:49But the equation here is that entropy is equal to the negative sum from i
0:59equals 1 to however
1:01many groups we have.
1:03And then under that sum sign, we want to know how many samples or how many data
1:12points
1:13are in each group.
1:15So we want to know, you know, this PI thing is the proportion of data points
1:20belonging
1:21to whatever group i is.
1:25And then we multiply that proportion by log base two of PI.
1:31All right.
1:32So that's just the equation for entropy.
1:34Don't worry too much about that equation.
1:38Just know that by implementing this calculate entropy function with that
1:42equation, that
1:43will allow us to make the best decisions with regards to where to split our
1:48data set.
1:49So here's what this is going to look like.
1:51We're going to change this calculate entropy thing from return one, which
1:55obviously isn't
1:55what we want to.
1:57And we're going to say groups and counts equals, and then we're going to say NP
2:05dot unique.
2:07And we're going to say group, right.
2:10And this group thing here is a potential division.
2:14All right.
2:15And that should give us all of the unique groups there.
2:17But we also want the counts, which we can get nicely enough by saying return
2:21counts as
2:22a keyword arg equals true.
2:25All right.
2:26So at this point, we have the different groups that are in that division, as
2:32well as how
2:32many data points are in each group.
2:36So that's really all that we need in order to implement that equation there.
2:41So what we're going to do is we're just going to say probabilities equals, we
2:47'll say counts
2:48divided by length of group.
2:52And then what we're going to do is we're going to say return negative and we'll
2:55use
2:56the NP dot sum function.
2:59That's very similar to Python's built in some function, but it's optimized for
3:02working with
3:02NumPy arrays.
3:03So we're going to say NP dot sum, and then we'll say probabilities times NP dot
3:12log two.
3:14And then we're going to say probabilities.
3:17And we'll say plus, and you need to do this in order to avoid calling log two
3:21on the value
3:22zero.
3:24We just need to add a very small value here, IE to the negative ninth there
3:29that will prevent
3:30that from happening.
3:31All right.
3:32Cool.
3:33So that's how we calculate the entropy.
3:36That's just the function that we need there, which implements that equation.
3:39So don't worry too much about the details there.
3:41If that doesn't make sense, that's honestly the least fun part of all of this.
3:45But now that we have that, we're actually calculating the entropy for those two
3:49groups
3:49and combining that into a total entropy.
3:52So this really becomes a pretty elementary programming exercise.
3:56We just need to keep track of what the minimum entropy is, and you know, what
4:02feature and
4:02split give us that.
4:04So here's what that's going to look like.
4:06We're going to say up here at the top of this function, we'll say something
4:10like best
4:10feature equals none.
4:13Right.
4:15We don't really know what the best feature is yet.
4:18Then we're going to say best threshold equals none because we don't know that
4:23either yet.
4:24And then we're going to say best entropy and we'll set that to infinity, but we
4:29're going
4:29to convert that to a float here just so that, you know, because infinite
4:34entropy is obviously
4:35going to be the starting point because anything will be less than that.
4:39And then what we're going to do is we're going to say number of features and
4:43that's
4:44going to be equal to X dot shape.
4:46Oops, let's try that again.
4:48There we go.
4:50Index one.
4:51Right. That'll give us the number of features that are in this data set.
4:55Cool.
4:56So all we need to do now is after we've calculated the weighted entropy, right,
5:01or the total
5:02entropy here, we called it, we just need to compare that to the current best
5:07entropy
5:08and replace the best feature and best threshold if it's better, right, if it's
5:13less than the
5:14best entropy that we have currently.
5:16Here's what this is going to look like, we're going to say if total entropy,
5:21all right,
5:22is less than the current best entropy, then what we want to do is we want to
5:28say best
5:28entropy equals total entropy, right, we're replacing that former best number
5:36with the
5:37new best.
5:38And then we're going to say best feature equals feature.
5:43And then we'll say best threshold equals threshold.
5:49And that is all we need to do there.
5:52So at the end of this, after the for loop completes, we just want to return the
5:56best
5:57feature and the best threshold.
6:01Okay.
6:02So that should be all we need to do.
6:05Let's test this function out now, what we're going to do is we're going to just
6:08call find
6:09best split now on the blobs that we created up here.
6:15This will be super informative, but later on, we'll see how to apply this to a
6:20slightly
6:20more involved data set.
6:23But first what we need to do is we need to actually convert the positions here
6:27into a
6:28data frame.
6:29So what that's going to look like is this before we call find best split, we're
6:33going
6:33to need to say df equals pd dot data frame.
6:38And we do need to import pandas up here at the top.
6:40So let's just do that import pandas as pd.
6:45And if we go back down now, we're going to pass the positions in there.
6:50And we'll just have the columns that are in there.
6:53Here we'll say columns equals.
6:55And then what we'll do is we'll say, we'll say feature one.
7:00Right, that's going to be the X there.
7:03And we'll say feature two.
7:07And that should be all we need to do there.
7:09So now that we have that as a data frame, oh, and here we can say data target
7:14or df target
7:15rather, if we want to equals, and we can set the labels there if we want.
7:21So let's just let's actually start off by printing that data frame out.
7:26We'll just say df dot head there.
7:28And sure enough, that gives us a lot of the points from our from our plot up
7:34here, along
7:35with the groups that they belong to.
7:37So we can call this groups instead if we wanted to, I suppose we'll just we'll
7:42just
7:42say group.
7:43There we go.
7:45And that gives us all we need.
7:47Cool.
7:48So let's try this out now.
7:49Now that we have that as an actual data frame, we're going to say find best
7:53split.
7:54And we just need to pass the data frame itself there.
7:58Okay, that's going to be the positions.
8:01And then we'll have the labels as the Y argument there.
8:05And actually, we don't want to add this group there because we don't want our
8:08algorithm
8:09there to try and split on this group column, which it will if we don't remove
8:14that.
8:15So we'll just remove that like so leave the labels intact.
8:18So let's just try running that.
8:21And sure enough, we see that feature two apparently has the best entropy when
8:27we draw
8:27the line at negative 0.2.
8:30So let's actually see what that means.
8:33We're going to go up to here.
8:35And that is around here ish, I believe, right?
8:39So feature two is this axis here.
8:43So it's saying that the best place we can divide it is right about there, which
8:47I would
8:47I think I would agree with.
8:49All right.
8:50Well, whether I agree with it or not, the math has spoken.
8:52So anyway, that is a function for finding the best split.
8:58So you might wonder where we're going to be going from here.
9:01Well, the next thing that we need to actually do is run this find best split
9:07thing repeatedly
9:08on smaller and smaller groups.
9:11So we know the best first split here is this line, but this leaves us now with
9:17two different
9:18regions that we have to determine how to split up further, right?
9:23So in other words, if we draw the line there, that gives us a place in the
9:28decision tree
9:30where we know that if something is greater than this line, it belongs to group
9:35A, just
9:36based on the, you know, just based on the current data that we have here.
9:41But we still need to try and divide up the B and C groups here by drawing
9:47another line,
9:48right?
9:49So that's the next step in this algorithm is actually finding what that next
9:54line is
9:55and keeping track of those usually in something like a Python list.
9:59So that's what we're going to see how to do next.
10:01So I hope this has been informative for you, and I'd like to thank you for
10:04viewing.
Splitting Recursively
0:00All right, so now that we've finished this find best split function, as I said
0:05previously,
0:05the next thing we need to do is actually repeatedly apply this find best split
0:11function to smaller
0:13and smaller groups until we reach the point where we can successfully identify
0:19each and
0:19every point in this graph.
0:22Now in practice, that's probably not the best idea, right?
0:27As you can see, there's quite a bit of overlap here.
0:30So in order to really create a decision tree that would identify each of the
0:35points in
0:36this graph correctly, we would need to do quite a bit of what's known as over
0:41fitting,
0:42right?
0:43In other words, you'd start to see that the lines would be drawn sort of like
0:47this here
0:49in a way that's not really very useful and isn't very general, right?
0:53We want it to be kind of like, you know, maybe like that at best where it
0:58separates them
0:59kind of crudely, but simply cool.
1:02So anyway, here's what this is going to look like.
1:05We just need to create a recursive function here and we're going to call this
1:10something
1:10like we'll call it create classification tree.
1:18But what this is going to do is it's going to take the data that we want to
1:23create classification
1:25tree for.
1:26So this is going to just take x and y as we had before as arguments.
1:32And what it's going to do is it's going to again, recursively call this find
1:38best split
1:39function in order to find splits on smaller and smaller groups.
1:44However, before we can do that, we need to actually decide what we want this
1:48decision
1:49tree to look like.
1:51In theory, it would be possible to represent this thing fully programmatically,
1:55right?
1:55As a series of functions or something like that.
1:59But what we want here really is something reusable so that we could use this
2:03this classification
2:05tree over and over again after a single training.
2:09So what I'm going to recommend here is that we have our classification tree
2:12look a little
2:13bit like this.
2:15We're going to have it represented by a nested Python dictionary that has type
2:22entries.
2:23And this will tell us whether this current part of the tree is a leaf which
2:28contains
2:28an answer to what group that element should be in.
2:32Right.
2:33So we might have type leaf.
2:35And in that case, we would have another entry called something like group.
2:41And that might be, you know, a, for example, or we might have type equal to, we
2:49'll just
2:50call it node.
2:51I suppose you could call it branch as well.
2:53This is, you know, a part of the decision tree where we have another question
2:58that we
2:58want to ask or another split that we want to perform.
3:02So in that case, that might look like this, we might have type node.
3:06But in this case, we'll have the feature we want to test, right?
3:11So we'll call that feature here and that'll contain the name of the feature.
3:15So maybe X or Y or feature one or whatever your feature is called.
3:22And then we'll have the threshold there as well.
3:26So maybe feature one, right, as we just got here, or we'll say feature two, the
3:31threshold
3:32for that is negative zero point two, seven, nine, blah, blah, blah.
3:36Okay.
3:37And finally, depending on the answer to that, we'll want to have a left and
3:41right, you know,
3:44subtree.
3:45So we'll say left and that will actually be a nested structure containing
3:48either a leaf
3:49or another node like this.
3:52All right.
3:53So that's how we're going to represent our tree here.
3:56So we just need to make it so that this function will construct that for us
4:01using this find
4:02best split thing over and over again.
4:04So here's what this is going to look like.
4:06We're going to start off with some of the so-called base cases or stopping
4:11points in
4:11our recursion.
4:13So the first one here is if all of the items in the group are the same, then we
4:20want to
4:21return a leaf node because there's no way we can possibly split that anymore,
4:25right?
4:26So we'll say if and the way we test this is by checking the length of NP dot
4:31unique and
4:32we're going to test why here, those are all the possible labels in this sub
4:38group.
4:38If that's equal to one, then we want to return a type leaf, right?
4:45Because we know, oops, leaf not left, because we know what class this should
4:50belong to.
4:51So we'll say group and that will be whatever that group happens to be.
4:56So that's going to be why and in order to actually get the label, we need to
4:59say dot
5:00I, loc, zero in square brackets.
5:03So all right.
5:04So that's really the main stopping point for our recursion here.
5:08What we're going to do next is we're going to say feature and threshold and we
5:12're going
5:13to use this find best split function that we created up above in order to split
5:19it further.
5:20So here's what that's going to look like.
5:21We'll say feature and threshold equals find best split.
5:27And that's going to be X and Y there that we're going to pass through.
5:32And then we're going to take that feature and threshold and actually split the
5:36data that
5:37we have here, according to it.
5:39So this is going to look pretty familiar to what we did up here when we did
5:43something
5:43like this, we're going to say, and then we'll actually call this left mask and
5:48right mask,
5:50because we'll be using this a few times here, we'll say X feature less than or
5:55equal to
5:56threshold.
5:57All right.
5:58So we're getting the sort of the left side of that split.
6:00And then we'll get the right side of that split by saying right mask equals X
6:05feature
6:05is greater than threshold.
6:09And now that we have those two things, we're going to build the sub trees,
6:13right?
6:14We know that because this stopping point wasn't true, that we actually need to
6:20have a node
6:21that produces a split.
6:24So what we're going to do is we're going to say left tree equals, and we're
6:28going to
6:28call this create classification tree thing recursively.
6:33So create classification tree.
6:36And we're going to call that with X.
6:40And we're going to pass the left mask to it there.
6:43And then we're going to say Y left mask to it.
6:46So we're just passing the sort of X's and Y's for the subgroup here.
6:52And then we're going to do the same thing for our right tree.
6:56We're just going to say create classification.
7:00There we go tree.
7:02And that's going to be X right mask and Y right mask.
7:10And finally, the actual Python dictionary that we want to create for this, we
7:16're going
7:16to say return.
7:18The type for that will be node.
7:22And then we'll say feature, that's going to be whatever feature we decided was
7:27the best
7:27one.
7:29We also need the threshold.
7:30Just like I said earlier, that's going to be whatever threshold we decided on
7:34there.
7:35And then we're going to have the left and right sub trees there by saying left,
7:39that's
7:40going to be our left tree that we created recursively.
7:44And right, that's going to be our right tree that we created recursively.
7:48And that should be all we need to do there.
7:51So that should create our tree for us recursively.
7:55Let's give this thing a try instead of saying find best split for our data
7:59frame and labels,
8:00we're going to say create classification tree and call that on there.
8:06Let's give this a try.
8:07And oops, it looks like something was wrong there.
8:09So what I'm going to do, we need to actually convert the labels here into a
8:14data frame.
8:15What I'm actually going to do is I'm going to say, D F and we'll say labels
8:20like that.
8:21And then I'm going to say equals labels.
8:23All right, that will create a data frame column for those.
8:27And then I'm just going to say labels equals D F labels.
8:32And finally, down here, we're just going to pass that data frame without that
8:38labels
8:38column by saying D F dot drop.
8:42And we're going to drop the labels column there when we pass it in again so
8:46that it
8:47doesn't try and split it on that labels column, which would kind of ruin the
8:51point.
8:52So let's try that again.
8:53And oops, I need to add axis equals one to this dot drop thing there.
8:59Okay, that should work better.
9:01And sure enough, we see that that creates our decision tree.
9:04Now, I'm only going to walk through the first few decisions here, right?
9:10First of all, we see that feature two is up at the top.
9:12So that's the first line that's going to be drawn there.
9:16And recall that that gives us a line right about there.
9:20In fact, it goes basically through this uppermost orange point.
9:25Since again, we use the actual y values of real points in order to decide where
9:31to do
9:31the split.
9:33But let's take a look at the next one here.
9:35We see that the next split is on feature two as well.
9:39And that is negative six point zero five.
9:43So let's just go take a look at where that is exactly.
9:46But sure enough, we see that that's right along here.
9:50So it's trying to draw a horizontal line through groups B and C in order to
9:56divide them.
9:57That's just mathematically what the best split was.
10:00All right, so you can continue on with that.
10:03And I encourage you to take a look at that full decision tree that was
10:06generated.
10:06But one thing that you might notice is that really the, oh, and here you see
10:12that the,
10:13you know, the, the rightmost thing from the first node says that it belongs to
10:17group zero,
10:18which is group A. Because again, once you've drawn that horizontal line that
10:23divides A
10:24from B and C, you know that anything above that is in group A according to the
10:28decision
10:29tree or classification tree that we've created.
10:33As I was saying though, you'll notice that this decision tree is actually quite
10:38deep.
10:38And it's because of this that many times people will set a maximum depth on the
10:44decision tree
10:45that can be constructed, right?
10:47That helps us to avoid overfitting.
10:50And that's something that I'm actually going to give you in the challenge.
10:53So I hope this has been informative for you and I'd like to thank you for
10:56viewing.
10:57[BLANK_AUDIO]
Challenge & Solution: Maximum Tree Depth
0:00All right, so now that we've seen the basics of recursively building a
0:03classification tree
0:04It's time for you to do a challenge that will significantly improve in many
0:09cases
0:10the performance of
0:12the classification tree that you build and that is as I said previously adding
0:17a maximum depth
0:19parameter to this create classification tree
0:24Function so what we would like to be able to do is specify how deep it's
0:28possible for the tree to go
0:30So that you don't get ridiculous little
0:32Choices like what we had down here. So let me try that again. There we go
0:37You know if you look at some of these things down here, you can see that it's
0:41really at the point of
0:42identifying
0:45single points in
0:47Some sort of boundary right and usually while things like that will lead to
0:51better accuracy on the training set
0:54They won't actually improve the real world performance of the tree that you
0:59build
0:59So we usually want to set some sort of maximum depth so that we don't go that
1:04far now
1:05Your task here is going to be to
1:08Stop at a maximum depth and in that case what you'll want to do is simply
1:14return
1:15whatever the majority is of
1:19the items that are in that group right so in other words if you're left with a
1:23group that has a lot of
1:26Elements from both B and C in it right if you have a group here like this
1:32Then you're just going to want to return whatever
1:34Group has the largest number of data points in that area right so in that case
1:41or in this case here
1:42That would be C right because there's four C's and three B's
1:47So that's really all you need to do is add that case and you should be able to
1:52add another
1:53Argument or parameter to this function as well you can call that something like
1:57max depth if you want to and
1:58Well, that is your challenge so feel free to give this a try
2:02Maybe give it you know five to ten minutes to complete and once you've done
2:07that you can feel free to move on to the next video
2:09We're I'll walk you through the solution
2:11So I hope this has been informative for you, and I'd like to thank you for
2:14viewing
2:15[BLANK_AUDIO]
Challenge & Solution: Maximum Tree Depth
0:00All right, well, hopefully you gave this challenge a try.
0:02Let's take a look at the solution.
0:04So the solution here was pretty straightforward.
0:07All we had to do, again, was add this little max depth thing,
0:13or max depth argument to our function.
0:15So here's what that looked like.
0:16I just said max depth.
0:20Oops, there we go.
0:21And I actually set the default value for that keyword
0:24arg to none so that we could say,
0:28we could just leave that off if we didn't want to have
0:30a maximum depth.
0:32From there, all we really had to do
0:34was add another stopping case to our recursive create
0:39classification tree function.
0:41And that looked like this.
0:42We could just say if max depth is not none and then--
0:51actually, we had to add another argument here called depth.
0:57And I said, and depth is greater than or equal to max depth.
1:02Well, in that case, once we've reached the maximum depth,
1:05what we want to do is just return a leaf node with whatever
1:11or whatever the most common label is for the elements
1:15in that group.
1:16So here's what that's going to look like.
1:17We're just going to say return type leaf.
1:21And then we'll say group.
1:23And that's going to be, in this case, y dot mode index 0.
1:29That mode thing gets the most common thing in there.
1:32But you could have done that with a for loop as well,
1:34I suppose, if you wanted.
1:36Cool.
1:36So that is all you really needed to do there,
1:40except, of course, when we recurse,
1:42we want to make sure that we're passing that depth argument
1:46at the right place.
1:47So let's actually flip these things around.
1:49We'll say x, y, and depth.
1:51And we'll set the initial value for that to 0.
1:57Since we're starting at depth 0, and then we'll increment that
2:00as we recurse.
2:02And then all we need to do is when we recurse down here,
2:05we're just going to say depth plus 1 for that depth argument.
2:10So depth plus 1 there as well.
2:12And then we need to pass that maximum depth for maximum depth.
2:17So we'll say max depth, max depth.
2:19And that should be all we need to do there.
2:21So let's maybe try setting a maximum depth here of, say, 3
2:25or 4, something fairly reasonable.
2:28And we can do that by saying create classification tree.
2:32We'll start off by saying max depth.
2:37And we're going to set that equal to--
2:39we'll try 4 here.
2:40Actually, we'll try 3, just so that it's easier to see.
2:43So let's try running this.
2:44And what we should see is that sure enough,
2:46much smaller classification tree, you can adjust this too.
2:51If you wanted to set that to max depth equals 2, which
2:53actually would get you pretty good performance.
2:55That would be where we have a tree that's only two decisions
3:00deep.
3:02And you could also set this to 4 if you wanted to.
3:05And that would give you a significantly larger tree.
3:08So feel free to play around with that.
3:10But that is the solution to the challenge.
3:13And in case you're wondering how to take this classification
3:16tree that we've created and actually apply it to some data,
3:21I was going to leave this up to you as a further challenge.
3:23But I'll show you it because it's pretty straightforward.
3:25What we're going to do is we're just
3:26going to say something like define apply tree.
3:30That's going to take a classification tree as an argument,
3:33as well as a data point that we want to apply it to.
3:39And what this is going to look like
3:40is we're simply going to say if tree type is equal to leaf.
3:48Well, in that case, we just want to return whatever
3:50the group was for that.
3:52So we'll say return tree group.
3:56Otherwise, we want to test the threshold.
3:58So we're going to say if data point, and then we'll say tree,
4:03and we're going to have whatever the feature is here.
4:06So we'll say feature like so is less than
4:09or equal to the corresponding threshold.
4:12So we're going to say tree threshold there.
4:15Then in that case, what we're going to do
4:17is we're going to return and we're actually
4:20going to call this thing recursively by saying apply tree.
4:24And we're going to call it on the data point
4:28with that subtree by saying tree.
4:30And this would be the left subtree there.
4:33And we can do the same thing--
4:35oops, here I forgot a closing parentheses there.
4:38And we can do the same thing here for the right subtree.
4:41We just need to say if data point--
4:44and actually, you could just say else here.
4:45That would be easier, I suppose.
4:47So we'll say else.
4:48In that case, we would want to apply the right subtree.
4:53And that should be all you need to do.
4:54So I'll leave it up to you to test this thing out
4:56and make any further changes or adjustments to it.
5:00But that is the basic solution to the challenge.
5:03So I hope this has been informative for you,
5:05and I'd like to thank you for viewing.
Team training path
Turn this skill into assignable team training
This free skill is a preview of the courses your team can assign, track, and report on with CBT Nuggets.
$749
seat / year