Hey! Author of the blog here. The current implementation uses llama.cpp GBNF which has allowed for a quick implementation. The biggest value-add at this time was getting the feature out.
With the newer research - outlines/xgrammar coming out, I hope to be able to update the sampling to support more formats, increase accuracy, and improve performance.
Yes! I have checked guidance out, as well as a few others. Planning to refactor sampling in the near future which would include improving using grammars for sampling as well. Thanks for sharing!
Key Features:
- Generates conversational content from multiple sources (e.g. URLs, YouTube, and PDFs) and modalities (images+text)
- Customizes transcript and audio generation (e.g., style, language, structure, length)
- Provides multi-language support for global content creation
Technical Highlights:
- Flexible LLM integration with LangChain, supporting both cloud-based and local models
- Support for advanced text-to-speech models (OpenAI, ElevenLabs, and Microsoft Edge)
- Seamless CLI and Python package integration for automated workflows
NotebookLM's AI-generated voices remain unparalleled in quality (SoundStorm is awesome!). We would love additional contributors to help build this open source alternative!
I am excited to release Podcastfy.ai: An open-source Python package and CLI tool that transforms multi-modal content into engaging, multi-lingual audio conversations using GenAI; akin to Google's NotebookLM but open, programmatic, and customizable. You can simply 'pip install podcastfy' and start using it today!
You can run it on a paper, your CV, a website or even on artwork images if you like as well as the combination of the above!
I was intrigued by Google's newest GenAI product: NotebookLM, especially its “deep dive” podcast feature that converts uploaded content into a two-person AI-generated audio conversation. As Andrej Karpathy put it, "NotebookLM [...] is a re-imagination of the UX of working with LLMs" and I do agree!
While exploring NotebookLM, however, I got a bit frustrated with its UI which added friction to the process, leaving me yearning for more automation and customization options. This sparked a question: Could we replicate the essence of NotebookLM's podcast feature as a customizable API?
To address this, I developed Podcastfy – a weekend project built using Cursor dot com - akin to NotebookLM’s podcast feature but open, programmatic, and customizable by anyone.
Key Features:
- Generates conversational content from multiple sources (e.g. URLs, YouTube, and PDFs) and modalities (images+text)
- Customizes transcript and audio generation (e.g., style, language, structure, length)
- Provides sulti-language support for global content creation
Technical Highlights:
- Flexible LLM integration with LangChain, supporting both cloud-based and local models
- Support for advanced text-to-speech models (OpenAI, ElevenLabs, and Microsoft Edge)
- Seamless CLI and Python package integration for automated workflows
The Verdict:
While NotebookLM's AI-generated voices remain unparalleled in quality, this project did solve my original problem and showcased the fascinating possibilities of building GenAI products today. It's now live on GitHub, and I'd love for you to check it out and even contribute!
https://www.souzatharsis.com/tamingLLMs/notebooks/structured...