AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

Software

AI Health Companion — Making healthcare information accessible for everyone.

psitbdUser2 months ago05 mins

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built AI Health Companion, an accessibility-focused applet designed to support patients and caregivers with three core modes:

Visual Aid: Users upload an image, and the app describes the scene in plain language, highlighting important objects and potential hazards. In addition to the text description, users can listen to the description as an audio file, making it even more accessible for people with visual impairments.
Symptom Recorder: Users record or upload a short audio clip of their symptoms. The app transcribes the speech and summarizes the key symptoms in simple terms.
Report Simplifier: Users upload a PDF or image of a lab report, and the app provides a plain-language explanation of key information with a glossary of terms. To make the experience more helpful, users can listen to the simplified explanation and also download the simplified report as a PDF for sharing or offline use.

Together, these modes address real problems faced by patients with vision loss, elderly users who communicate better verbally, and anyone struggling with complex medical documents.

Key Features

Visual Aid with audio playback: Scene description plus text-to-speech output.
Symptom Recorder: Audio-to-text transcription and symptom summarization.
Report Simplifier with audio + PDF download: Plain-language explanations that can be listened to or saved as a PDF.
Strict input limits: 5 MB max for images and 2 MB max for PDFs to stay within the Gemini free tier.
Deployed directly from Google AI Studio to Cloud Run for a seamless build-and-deploy pipeline.

Demo

🔗 Live Applet on Cloud Run: https://ai-health-companion-390658277222.us-west1.run.app/

🖼️ Screenshots:

👁️ Visual Aid

📝 Symptom Recorder

📊 Report Simplifier

How I Used Google AI Studio

I created the applet in Google AI Studio’s Build mode, where I designed and refined the prompts for each mode.
I specified structured JSON outputs directly in the system instructions to ensure reliable parsing.
Once the applet was stable, I used AI Studio’s one-click deployment to deploy it directly to Google Cloud Run.

Models Used

Gemini 2.5 Flash was the default model, chosen for speed and efficiency.
For more complex reasoning tasks (like analyzing detailed reports), the applet optionally supports Gemini 2.5 Pro if enabled.
Both models’ multimodal capabilities were leveraged for image understanding, audio transcription, and text simplification.

Multimodal Features

Image Understanding (Visual Aid): Gemini processes uploaded images to describe content, list objects, and identify hazards. Added feature: users can listen to the description as speech.
Audio Understanding (Symptom Recorder): Gemini transcribes patient voice notes and summarizes them into key symptoms.
Document + Image Understanding (Report Simplifier): Gemini explains lab reports in everyday language with glossary terms. Added features: users can listen to the simplified explanation and download the result as a PDF.

By combining image, audio, and text processing through Gemini 2.5 Flash/Pro, the applet delivers a practical, real-world healthcare companion experience that is lightweight, privacy-friendly, and accessible.

Source link