Blog


At deepsense.ai, we’re doing our best to make our mark in state‑of‑the‑art data science. For many years, we have been competing in machine learning challenges, gaining both conceptual and technical expertise. Now, we have decided to open source an end‑to‑end image classification sample solution for the ongoing Cdiscount Kaggle competition. In so doing, we believe we’ll encourage data scientists both seasoned and new to compete on Kaggle and test their neural nets.

Introduction

Competing in machine learning challenges is fun, but also a lot of work. Participants must design and implement end‑to‑end solutions, test neural architectures and run dozens of experiments to train deep models properly. But this is only a small part of the story. Strong Kaggle competition solutions have advanced data pre‑ and post‑processing, ensembling and validation routines, to name just a few. At this point, competing effectively becomes really complex and difficult to manage, which may discourage some data scientists from rolling up their sleeves and jumping in. Here at deepsense.ai we believe that Kaggle is a great platform for advanced data scientific training at any level of expertise. So great, in fact, that we felt compelled to open‑source an image classification sample solution to the currently open Cdiscount challenge. Below, we describe what we have prepared.

Image classification sample solution overview

When we say our solution is end‑to‑end, we mean that we started with raw input data downloaded directly from the Kaggle site (in the bson format) and finish with a ready‑to‑upload submit file. Here are the components:

  1. data loader
    1. Keras custom iterator for bson file
    2. label encoder representing product IDs to fit the Keras API
  2. neural network training on n classes and k examples per class. We use the following architectures:
    1. MobileNet (Howard et al. ’17)
    2. Inception v3
    3. ensembles of the models mentioned above
  3. model predictions
    1. single-model prediction
    2. ensembling (by averaging) for multiple models
  4. submit generation

 

For instance, the image classification with MobileNets ensemble would be defined as followings:

 

Watch this video for quick code overview:

What if I want to use my network architecture?

You are encouraged to replace our network with your own. Below you can find a short snippet of code that you simply place in the  models.py file:

Otherwise I would suggest extending BasicKerasClassifier, or  KerasDataLoader with custom augmentations, learning rate schedules and other tricks of your choice.

How to get started?

To start using our pipeline, follow these steps:

  1. download the source code from https://github.com/deepsense-ai/cdiscount-starter
  2. follow the README instructions to run the code
  3. modify this image classification sample solution to fit your needs
  4. have fun competing on Kaggle!
Image classification sample solution - neptune dashboard with Kaggle experiment

Image classification sample solution running in Neptune. Live charts presents log-loss and accuracy for the running experiment.

Final remarks

Feel free to use, modify and run this code for your own purposes. We run multiple of them on Neptune, which you may find useful for managing your experiments. Sign up and get a $100 Google Cloud Platform registration bonus to run your experiments. This small gift will enable you to train a really nice model for free!

Related Posts

About

Jakub is a Data Scientist at deepsense.io. He holds a master’s degree in Physics (University of Silesia) and Finance (University of Economics in Wroclaw). He worked on various data-scientific projects such as facial recognition, cancer detection and classification and text mining of labor market data. He was in the team that won MICCAI 2015 in Munich, MICCAI 2016 in Athens and took 4-6th place in the CIMA Global Business Challenge 2013 in Johannesburg.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


Pin It on Pinterest