2026 Best Practices Showcase Evaluation

Utilizing AI to Support in Communication with ASD Individuals

General description of the project

This project investigates how artificial intelligence, specifically computer vision (CV) and natural language processing (NLP), can support communication for individuals with autism spectrum disorder (ASD). The system generates simplified, meaningful narratives from user-uploaded images and adapts them to the user’s communication level using a Most-to-Least (MTL) prompting strategy drawn from applied behavior analysis (ABA). The resulting prototype, built in Streamlit and integrated with MC-LLaVA and text-to-speech (TTS), provides an accessible, low-complexity interface suitable for individuals who experience sensory or linguistic overload.
Early testing demonstrated the feasibility of using a lightweight multimodal model (MC-LLaVA-3B) to produce concise captions with fewer hallucinations than larger alternatives like BLIP-2. The prototype successfully allows image upload, caption generation, audio playback, and prompt-fading controls, offering evidence that AI-assisted narrative support can be implemented in an affordable and user-adaptive way.
The initiative is cost-effective due to its reliance on open-source tools and compact pre-trained models, minimizing the need for large annotated datasets or expensive infrastructure. Key lessons learned include the importance of realistic scoping (image-only input), selecting models that balance performance and computational cost, and prioritizing accessibility and user adaptability during design.

Technologies

The project integrates multimodal AI models, web technologies, and accessibility tools to generate adaptive image-based narratives for individuals with ASD. We evaluated two vision-language models, BLIP-2 and MC-LLaVA-3B, ultimately finding MC-LLaVA slightly more effective due to its lower computational requirements, reduced hallucination rate, and built-in prompt control. The system is delivered through a Streamlit-based web prototype that enables image upload, caption generation, and user-added context. Text-to-speech (TTS) and Most-to-Least (MTL) prompt-fading controls enhance accessibility and allow captions to be tailored to different communication levels. Together, these technologies achieved a functional, but in-progress, user-responsive communication tool.

Explain project results

Although the system is still in development, the groundwork laid by this project expands opportunities for future research in accessible technology, ASD-focused communication tools, and human-AI interaction. The institution can leverage these accomplishments to support additional student training, research funding, and interdisciplinary collaborations.

Why it should be considered best practice?

This project represents a best practice because it demonstrates how accessible, open-source AI technologies can be combined to address a real communication need for individuals with ASD without requiring expensive infrastructure. The approach is replicable: it uses publicly available vision-language models, lightweight deployment tools like Streamlit, and evidence-based ABA strategies such as Most-to-Least prompting. This creates a framework that other institutions can adapt for their own accessibility or assistive-technology initiatives.

Highlights of your proposed presentation

This presentation will outline the development of an early-stage web prototype that integrates computer vision and natural language processing to support communication in individuals with autism spectrum disorder (ASD). Key highlights include the use of pre-trained vision-language models (BLIP-2 and MC-LLaVA), the refinement from image-and-video inputs to image-only processing, and the creation of an accessible Streamlit interface with text-to-speech and prompt-fading controls. The presentation will demonstrate how the system generates simplified narratives from user-provided images and how ABA-informed prompting strategies are incorporated to support expressive language.

Lessons learned focus on the practical challenges of implementing large multimodal models in an assistive context, including computational limitations, fine-tuning constraints, and the need for streamlined deployment environments. Building the prototype underscored the importance of accessibility, simplicity, and user-centered design, particularly for individuals who may experience sensory or cognitive overload. These insights will guide future work in improving model performance, expanding evaluation with metrics like CIDEr and CLIPScore, and preparing for user testing with the ASD and AAC communities.

The Evaluation Committee will evaluate submitted proposals based on the following criteria. Each area will be rated on a scale from 1 to 5 (1= non-satisfactory; 5 =outstanding), for a maximum of 45 points.