One of the most classical machine learning applications is classification. This could, for example,
be encountered when separating pictures of cats from pictures of dogs. To do this, an engineer picks
a specific algorithm to learn from lots of examples. But which way is best? That can depend on lots
of things!
In our project, we are going to build a framework that helps the engineer pick the best algorithm,
based on many different factors.
One of these factors is based on the so-called decision boundary. This is a region, one could think of
a line, that separates cat and dog pictures. Imagining ones has a big pile of mixed-up pictures and
wants to sort them into two piles, one for cats and one for dogs, then knowing on which side of the
line the image lies, tells us whether it is a cat of a dog.
Sometimes, this region is simple and straight, which makes the job easier. When its easy, the
engineer can pick a simpler way for the algorithm to learn. Other times, the region might be more
complicated and wavy. In addition, there may sometimes be many region that perform well, and
also sometimes no sensible region exists, and there are some cat/dog images on both sides of the
region. We can say that the decision region has a complexity, that there is a margin (if the region
can be perturbed and still does a good job at classifying) and there may be noise.
All these factors can affect each other and can make the learning job easier or harder for the
algorithm. Our framework will help engineers understand how these factors interact and how they
can choose the best algorithm for classification.
Naturally, this framework will be immensely beneficial for applications, because it helps engineers
avoid suboptimal algorithms, saving a lot of time and effort.