Every Book.
Every Voice.
Free for Everyone.
My father is visually impaired. Getting him access to books — real books, not just audiobooks that happen to exist — has always been a frustrating, expensive, and technically broken experience. So I decided to build the tool I wish existed.
A fully accessible, free, open-source web tool that converts any book — EPUB, PDF, or even a photo of a printed page — into high-quality audio, built from the ground up around the needs of screen reader users.
The Problem
The visually impaired community faces barriers that most of us never think about.
Audiobooks are expensive
Commercial audiobooks cost $15–$30 each. Subscriptions like Audible add up fast. For a retired person on a fixed income, this is simply unaffordable.
Most books don't have audio versions
Publishers only produce audio for bestsellers. Millions of educational, religious, technical, and cultural books exist only as text — invisible to blind readers.
Existing tools are inaccessible
The irony: many free book converters are themselves unusable by blind users. Cluttered interfaces, no keyboard navigation, no screen reader support.
Arabic content is underserved
Arabic-language audiobooks are rare even in commercial markets. For Arabic-speaking visually impaired users, the options are nearly zero.
How It Works
Four stages from any book format to a ready-to-listen audio file.
Upload
Drag and drop — or use voice command — to upload an EPUB, PDF, TXT, or even a photo of a printed page. The interface is designed for keyboard-only and screen reader users first.
Extract Text
The backend intelligently extracts clean text depending on the format. PDFs are parsed with PyMuPDF. EPUBs use EbookLib. Physical book photos go through Tesseract OCR.
Text-to-Speech
Clean text is sent through a TTS engine. The default is Edge TTS — Microsoft's free, lifelike AI voices — with fallback support for Google Cloud TTS (free tier) and offline Coqui TTS.
Listen or Download
The audio is streamed in-browser or downloaded as MP3 chapters. An aria-live announcement tells screen readers the moment the audio is ready — no manual refresh needed.
Built for Accessibility First
Not an afterthought. Every design decision starts with the question: can my father use this on his own?
Semantic HTML
Proper <h1>, <button>, <main>, and <nav> landmarks so screen readers like NVDA, JAWS, and VoiceOver navigate it naturally.
WAI-ARIA
Every dynamic update uses aria-live="polite". Upload progress, errors, and "audio ready" confirmations are all announced automatically — no visual required.
Full Keyboard Navigation
Tab, Enter, Space — the entire app works without a mouse. Focus states are highly visible (not the browser default) so low-vision users always know where they are.
Voice Commands
Web Speech API integration lets users say "Upload book," "Play audio," or "Download chapter" without touching the keyboard — ideal for users with limited motor control.
Minimalist Interface
One primary action on the homepage. No pop-ups, no modals, no visual clutter. The fewer decisions a user must make, the more confidence they have using it independently.
Arabic & RTL Support
First-class support for Arabic text extraction and Arabic TTS voices — because this project started with an Arabic-speaking user and that should never be a second-class experience.
Technical Stack
Entirely open-source, entirely free to run.
Frontend
Backend
Text-to-Speech
My father loves reading. He always has. Watching that ability shrink as his vision did — and knowing that the barrier wasn't the words themselves but just the format they came in — made me want to fix it.
This isn't a side project to me. It's a tool I'm building for a real person, with a real need, right now. That changes how I approach every design decision — because I know exactly who I'm designing for, and he'll tell me directly if I got it wrong.
Roadmap
Where the project is now, and where it's going.
Architecture & Tech Research
Evaluated all TTS engines, chose Edge TTS as default, mapped the full processing pipeline.
Accessibility Design Spec
Defined the ARIA structure, keyboard navigation map, and screen reader announcement strategy.
Backend API — In Progress
Building the Python + FastAPI server: file upload endpoint, EPUB/PDF extraction, Edge TTS integration.
Accessible Frontend
Semantic HTML interface with full keyboard navigation, ARIA live regions, and voice command support.
OCR for Physical Books
Tesseract integration so users can photograph printed pages and convert them to audio.
Arabic Voice & RTL Support
Full Arabic TTS voices and right-to-left text handling — the original motivation for the project.
Public Launch & Community Feedback
Open beta with visually impaired users, accessibility audit with NVDA / JAWS / VoiceOver testers.
Try It Now — Free for Everyone
No account. No sign-up. No server. Upload any EPUB, PDF, or TXT and listen instantly in your browser. Or reach out to collaborate.