At Appsolutely, we are constantly exploring the possibilities of the Salesforce platform and its surrounding eco system of API's and platforms. Earlier this year, a small team set out to experiment with the Einstein Vision API's and their possible application in Salesforce.
Creating a Vision Model
One of the main challenges was self-inflicted. Einstein Vision supports several types of prediction models. The easiest to implement is an image classification model. These models will classify one entire image into some category or categories, e.g. beach, mountain, forest etc. As the data set for such a model, only a zip file containing sample images is necessary.
Einstein Vision comes with some sample image classification models, for example the Scene- and Food Classifiers.
For our business case, we needed another type of model; object detection. Object detection models do not merely return a general classification of the full image, but in stead attempt to find objects inside the image and return the coordinates at which it thinks these objects reside. An example "prediction" is shown on the screenshot below.
Creating an object detection model requires, besides the images, a CSV file to denote the coordinates of objects within these images. This way, 1 image may contain many labels, for many objects. Obtaining the exact coordinates of objects we wanted to recognize in our sample images turned out to be virtually impossible, lacking a tool to make the appropriate measurements.
Searching the world wide web did not return any feasible tooling. As developers we cannot allow a lack of tools to stop us, so we decided to build the tool!
The annotation tool, as we call it, allows the user to drag boxes onto the image canvas and provide a label for it. As shown on the screenshots, it allows uploading of one or more images, to then use these images to create the annotations. Each annotation is stored in Salesforce and can be easily converted to the required format for Einstein Vision modelling.
Using the tool we were able to fairly quickly annotate the sample images with the appropriate labels, required to create and train the Einstein Vision model. As a consequence, we are now able to accurately recognize several home devices ;)
Although creating this proof of concept on the Salesforce platform was a very interesting endeavor, we found several limitations and caveats, leading us to conclude that (parts of the) application of Vision models are better off hosted on an external platform, like Heroku or Amazon. Salesforce currently has too harsh limits on the image size one could pass to the Einstein API's