Replies: 4 comments 2 replies
-
Is there any you could specifically suggest? |
Beta Was this translation helpful? Give feedback.
-
I know nothing about licenses looks like the base project might be MIT? But I think Kokoro might be somewhat useful in this project. Here's an audio file I generated with it, which if course shows us nothing at all about the responsiveness or the CPU load. Here's a link to a github page about it. It's not immediately apparent how to use the backend directly, but I think this project has the right idea. Plus, the am adam voice is my favorite, and is what I used in the linked audio clip |
Beta Was this translation helpful? Give feedback.
-
Piper TTS looks promising for this project. However, I've heard that in order for it to sound better, one needs to clean/normalize the text first. |
Beta Was this translation helpful? Give feedback.
-
I think the more choices we have to choose from; the better. Personally I'm not a huge fan of piper, but that could be just because I'm used to eliquence. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi NVDA Developers and Community,
I'd like to propose exploring the potential integration of modern, high-quality, open-source AI-powered Text-to-Speech (TTS) technology into the NVDA screen reader.
Motivation:
Currently, NVDA utilizes various TTS engines. While functional, many available synthesizers sound robotic and lack the natural intonation of human speech. This can lead to listening fatigue, especially during extended use.
Recent advancements in AI have produced TTS models capable of generating remarkably human-like audio. Leveraging such open-source models could significantly enhance the user experience for NVDA users by providing more natural and pleasant voice output.
Proposal:
Investigate the feasibility of integrating a suitable open-source, high-quality, AI-based TTS engine into NVDA as an alternative or potentially even a future replacement for some existing options. The primary goal is to offer users a much more natural, expressive, and less fatiguing voice.
Considerations / Potential Challenges:
This is an initial idea to spark discussion. Would the community find value in having a more natural, AI-powered voice option? What are the perceived technical hurdles, and are there specific open-source TTS models that the community thinks might be suitable candidates for investigation?
Looking forward to hearing your thoughts!
Beta Was this translation helpful? Give feedback.
All reactions