Scaling AI RAG Applications: Insights from My Conference Talk about SlideSpeak

Hello, everyone! 👋 I’m excited to share a summary of my recent conference presentation on scaling AI-powered applications, particularly focusing on my company, SlideSpeak. Over the past few months, we have successfully navigated the challenges of serving millions of users, and I’d like to highlight the key points I discussed, along with some valuable insights.

Scaling RAG Applications at SlideSpeak – Fwdays Talk by Kevin Goedecke

What is SlideSpeak?

SlideSpeak is an innovative AI-powered platform that automates presentations. We enable users to create PowerPoint presentations from documents like Word, PDF, and Excel files as well as other data. Our system reads the content of these documents, stores it in our database, and allows users to create presentations efficiently.

We also cater to users who wish to create presentations from scratch. In this case, our AI leverages its understanding of topics to generate relevant content. This dual capability enhances user productivity, particularly for professionals in consulting and education.

Our Growth Journey

Since launching just six months ago, we’ve made remarkable strides. Here are some key metrics that illustrate our growth:

Over 2 million files uploaded to the platform.
More than 250,000 monthly active users.
At peak times, we consume over 2 million LLM tokens per minute.

These numbers reflect not only our product’s effectiveness but also the demand for efficient presentation solutions in today’s fast-paced work environment.

Key Challenges and Lessons Learned

Every startup journey comes with its share of challenges. Here are some key lessons we’ve learned along the way:

Data Management: Initially, we stored all vector data indefinitely, leading to skyrocketing costs. We transitioned from using Pinecone to a self-hosted Postgres vector database. This change prompted us to implement a data cleanup strategy, removing unused vectors to optimize performance and reduce expenses.
Avoid Overusing LLMs: In the early stages, we relied heavily on LLMs for tasks like language detection. However, we discovered that specialized libraries are often more reliable and efficient for such tasks. This realization led us to streamline our architecture and reduce unnecessary LLM calls.
Monitoring and Alerts: Early in our journey, we faced challenges due to a lack of monitoring. We’ve since implemented robust monitoring systems to ensure we can address issues promptly, avoiding service disruptions during peak usage times.

Our Technical Architecture

I also shared insights into our technical architecture, which is crucial for understanding how we manage scaling:

Frontend: Built with Next.js and hosted on Vercel, offering a smooth user experience.
API Layer: Our main API is developed with FastAPI and connects to a Postgres database that manages user-related data.
Queue Management: We use a queuing system to handle tasks like document ingestion and query processing efficiently.
Microservices: Various microservices handle specific tasks, including document conversion and OCR processing.

This architecture allows us to scale effectively while maintaining performance.

Scaling LLMs and Vector Databases

Scaling AI applications presents unique challenges. One major hurdle is managing rate limits imposed by LLM providers. For instance, we reached a critical point where we had to migrate from OpenAI to Azure OpenAI to accommodate our growing demand.

We implemented load balancing across different regions to optimize API calls, ensuring that user requests are processed efficiently. This strategy has proven effective in maintaining service availability during peak times.

When it comes to vector databases, we faced similar scaling issues. The complexity of managing a large database requires careful planning and optimization. We employed several strategies, including:

Creating Indexes: Utilizing HNSW (Hierarchical Navigable Small World) algorithms to improve query speed.
Partitioning: This allows us to separate data by file type or other relevant criteria, enhancing query performance.

These methods have significantly improved our query response times and overall system efficiency.

Current and Future Endeavors

As we continue to innovate, we’re exploring several exciting initiatives:

Direct Embedding Creation: We’re investigating Azure’s Postgres AI extension to simplify our backend logic by generating embeddings directly within the database.
Non-Text Data Ingestion: We aim to expand our capabilities to include images and other non-textual data in our presentations, broadening the scope of what SlideSpeak can offer.

We’re also refining our evaluation pipeline to ensure that any updates to our LLM or prompt changes maintain the quality and correctness of responses.

Conclusion

In conclusion, my journey with SlideSpeak has been both challenging and rewarding. By sharing our experiences, I hope to inspire others in the tech community to embrace the challenges of scaling AI applications. As we continue to grow and evolve, I’m eager to see where this journey takes us next.

Thank you for joining me in this recap! I’d love to hear your thoughts or questions. Feel free to reach out, and let’s continue the conversation about the exciting world of AI-driven solutions! 🚀