Cat or Dog? Check to see if my Fast.ai neural network can recognise whether your picture is a Cat or Dog

Integrated the Fast.ai app from lesson2 with this hugo site

October 31, 2024

No image selected

Test your Cat or Dog image above

Go ahead and try uploading cat and dog images. You can also try any other image and see what the classifier thinks of it!

Creating this post was a two step process:

Create an html page with numerous prompts to chatgpt
Integrate html snippet into hugo post with the help of this article.

I’ve asked chatgpt to generate a description of the html page for the remainder of the post.

Requirements of the HTML Page

The web page serves as a user-friendly interface for cat and dog image classification, with these key requirements:

Image Upload Capability: Users can select multiple images through a file input element.
Image Preview: Selected images appear in a preview panel at a reasonable size (300×300 pixels).
Image Gallery: Thumbnails of all uploaded images are displayed in a horizontally scrollable container.
Classification Results: After processing, the page displays both the primary classification result (“It’s a cat!” or “It’s a dog!") and the confidence scores for both categories.
Interactive Selection: Users can click on any thumbnail to make it the active image for classification.

All of these features are implemented without requiring a page reload, creating a smooth, application-like experience right in the browser.

HTML Layout Understanding

The HTML structure is organized in a clean, functional way with several key sections:

Style Section: Contains CSS rules for the layout, including:
- Flex-based side-by-side containers for the prediction results and image preview
- Styling for the image thumbnails and the scrollable container
- Visual indicators for the selected image (border highlighting)
- Responsive design elements to ensure proper display across devices
Input Element: A simple file input control that accepts image files with the multiple attribute enabled to allow batch uploads.
Content Containers:
- The side-by-side div creates a two-column layout containing:
  - resultContainer for displaying classification results
  - previewContainer for showing the selected image at a larger size
- imageContainer provides a horizontally scrollable gallery of all uploaded images

The layout elegantly balances functionality with visual appeal, providing clear separation between the main preview image, classification results, and the thumbnail gallery.

JavaScript Rationale

The JavaScript code handles the application’s dynamic behavior with a thoughtful approach to user experience:

Gradio Client Integration:
```
import { Client } from "https://cdn.jsdelivr.net/npm/@gradio/client@1.6.0-beta.3/dist/index.min.js";
```
- The page connects to a Gradio backend (specifically “atomglitch/fastailesson2”) which hosts the trained fastai model
- This allows us to leverage a pre-trained machine learning model without needing complex backend infrastructure
Event-Driven Handling:
- The code uses event listeners to respond to user actions like file selection and thumbnail clicks
- This creates a responsive, app-like feeling in a simple webpage
Duplicate Prevention:
- A Set data structure (selectedFiles) tracks unique file data URLs
- This prevents the same image from being added to the gallery multiple times
Preview Management:
- The setPreviewImage() function handles both updating the preview and triggering the classification
- Visual feedback includes highlighting the selected thumbnail and displaying “Processing…” during classification
Classification Process:
- When a new image is selected, the code:
  1. Converts the image to a blob format
  2. Sends it to the Gradio backend for prediction
  3. Parses and displays the results, showing both the main prediction and confidence scores
  4. Handles errors gracefully with console logging

This implementation strikes an excellent balance between simplicity and functionality. It demonstrates how powerful machine learning capabilities can be integrated into web applications including this blog with relatively little code, creating an intuitive interface for end users to interact with a sophisticated image classification model.