highlanderNJ's comments

highlanderNJ · on Dec 7, 2024

What's the value-add compared to `outlines`?

https://www.souzatharsis.com/tamingLLMs/notebooks/structured...

parthsareen · on Dec 7, 2024

Hey! Author of the blog here. The current implementation uses llama.cpp GBNF which has allowed for a quick implementation. The biggest value-add at this time was getting the feature out.

With the newer research - outlines/xgrammar coming out, I hope to be able to update the sampling to support more formats, increase accuracy, and improve performance.

mwieler · on Dec 8, 2024

Hi, just wanted to say how much I appreciate your work.

I'm curious if you have considered implementing Microsoft's Guidance (https://github.com/guidance-ai/guidance)? Their approach offers significant speed improvements, which I understand can sometimes be shortcoming of GBNF (e.g https://github.com/ggerganov/llama.cpp/issues/4218).

parthsareen · on Dec 9, 2024

Yes! I have checked guidance out, as well as a few others. Planning to refactor sampling in the near future which would include improving using grammars for sampling as well. Thanks for sharing!

highlanderNJ · on Oct 29, 2024

Thanks. The webapp is new and hopefully with make it easier for users to try it out and share feedback so we can improve the underlying package.

Would love your feedback!

highlanderNJ · on Oct 29, 2024

Could we replicate the NotebookLM's podcast feature as a customizable API?

Live demo: https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo Open Source Python package: https://github.com/souzatharsis/podcastfy

Apache-2.0 license

Key Features: - Generates conversational content from multiple sources (e.g. URLs, YouTube, and PDFs) and modalities (images+text) - Customizes transcript and audio generation (e.g., style, language, structure, length) - Provides multi-language support for global content creation

Technical Highlights: - Flexible LLM integration with LangChain, supporting both cloud-based and local models - Support for advanced text-to-speech models (OpenAI, ElevenLabs, and Microsoft Edge) - Seamless CLI and Python package integration for automated workflows

NotebookLM's AI-generated voices remain unparalleled in quality (SoundStorm is awesome!). We would love additional contributors to help build this open source alternative!

highlanderNJ · on Oct 16, 2024

Awesome, let me know if you run into issues whiles trying podcastfy; I can help you. Feel free to open an issue https://github.com/souzatharsis/podcastfy/issues

etewiah · on Oct 16, 2024

Thanks, will do.

highlanderNJ · on Oct 15, 2024

I am excited to release Podcastfy.ai: An open-source Python package and CLI tool that transforms multi-modal content into engaging, multi-lingual audio conversations using GenAI; akin to Google's NotebookLM but open, programmatic, and customizable. You can simply 'pip install podcastfy' and start using it today!

You can run it on a paper, your CV, a website or even on artwork images if you like as well as the combination of the above!

I was intrigued by Google's newest GenAI product: NotebookLM, especially its “deep dive” podcast feature that converts uploaded content into a two-person AI-generated audio conversation. As Andrej Karpathy put it, "NotebookLM [...] is a re-imagination of the UX of working with LLMs" and I do agree!

While exploring NotebookLM, however, I got a bit frustrated with its UI which added friction to the process, leaving me yearning for more automation and customization options. This sparked a question: Could we replicate the essence of NotebookLM's podcast feature as a customizable API?

To address this, I developed Podcastfy – a weekend project built using Cursor dot com - akin to NotebookLM’s podcast feature but open, programmatic, and customizable by anyone.

Key Features: - Generates conversational content from multiple sources (e.g. URLs, YouTube, and PDFs) and modalities (images+text) - Customizes transcript and audio generation (e.g., style, language, structure, length) - Provides sulti-language support for global content creation

Technical Highlights: - Flexible LLM integration with LangChain, supporting both cloud-based and local models - Support for advanced text-to-speech models (OpenAI, ElevenLabs, and Microsoft Edge) - Seamless CLI and Python package integration for automated workflows

The Verdict:

While NotebookLM's AI-generated voices remain unparalleled in quality, this project did solve my original problem and showcased the fascinating possibilities of building GenAI products today. It's now live on GitHub, and I'd love for you to check it out and even contribute!

What would you like to Podcastfy today?

GitHub: https://github.com/souzatharsis/podcastfy