AI chat interface for support #139

Closed
opened 2024-11-25 08:16:59 +00:00 by thabeta · 2 comments
Owner

Minimal Bot Chat interface

Before we can do this, Kristof needs to provide more specs, maybe a POC, but we can already integrate a chat interface, which needs to be like chatgpt, with recording audio, upload files, start new chat, …

  • This chat talk to our bot (python backend), fed with our content from website, books, …
  • We know user Id when they talk to us
  • We as TF Team can talk over the chat

Chat interface has following requirements

  • show markdown
  • attachments
  • record voice
  • go back up make an edit
  • have multiple channels (like in chatgpt we can have multiple channels (sessions) next to each other)
  • Reuse API from ChatGPT (if possible)
## Minimal Bot Chat interface Before we can do this, Kristof needs to provide more specs, maybe a POC, but we can already integrate a chat interface, which needs to be like chatgpt, with recording audio, upload files, start new chat, … - This chat talk to our bot (python backend), fed with our content from website, books, … - We know user Id when they talk to us - We as TF Team can talk over the chat ### Chat interface has following requirements - show markdown - attachments - record voice - go back up make an edit - have multiple channels (like in chatgpt we can have multiple channels (sessions) next to each other) - Reuse API from ChatGPT (if possible)
thabeta added the
Story
label 2024-11-25 12:23:07 +00:00
Owner

Here are some notes on what I found while researching AI components to demo on the Grid:

  • OpenWebUI is a prominent FOSS chat frontend that interfaces with Ollama and OpenAI compatible APIs. We might want to look at this for inspiration or even to fork as a foundation
  • Ollama is an LLM backend provider that can host a variety of open source models like Llama, Gemma, and their variants. It has some experimental support for OpenAI API but primarily uses its own API. Very simple to operate and supports the basic operations of fetching the models and processing prompts into responses
  • We could build our own solution that operates in a similar way to Ollama + OpenWebUI, but the question is where to host it. These run best on GPU and ideally we can scale up and down with response to user load

With that in mind, here's some further thoughts:

  • If we would be looking at renting GPUs from some provider to "self host" a model, then we should compare costs to renting time on a model directly and paying by the token
  • Small models like Claude Haiku or MiniGPT are rather inexpensive for API usage and probably perform well enough for this task. The extreme there would be DeepSeek, which is an open source model out of China with an extremely inexpensive hosted API
  • HuggingFace provides a service that hosts open source models on demand and provides an API endpoint. This would be an in between option in terms of self hosting
  • No matter if we host the backend or not, we should consider adding a RAG or similar layer. That's essentially running a search over the source content (our docs) to find the most relevant content and narrowing down the inputs to the AI. The alternative is feeding the entire source material into the model as part of each prompt, which while perhaps feasible for the size of our content base would be a higher cost option. OpenWebUI implements a RAG feature for local documents and also web searches. There are other open source options too
Here are some notes on what I found while researching AI components to demo on the Grid: * [OpenWebUI]([url](https://github.com/open-webui/open-webui)) is a prominent FOSS chat frontend that interfaces with Ollama and OpenAI compatible APIs. We might want to look at this for inspiration or even to fork as a foundation * Ollama is an LLM backend provider that can host a variety of open source models like Llama, Gemma, and their variants. It has some experimental support for OpenAI API but primarily uses its own API. Very simple to operate and supports the basic operations of fetching the models and processing prompts into responses * We could build our own solution that operates in a similar way to Ollama + OpenWebUI, but the question is where to host it. These run best on GPU and ideally we can scale up and down with response to user load With that in mind, here's some further thoughts: * If we would be looking at renting GPUs from some provider to "self host" a model, then we should compare costs to renting time on a model directly and paying by the token * Small models like Claude Haiku or MiniGPT are rather inexpensive for API usage and probably perform well enough for this task. The extreme there would be DeepSeek, which is an open source model out of China with an extremely inexpensive hosted API * HuggingFace provides a service that hosts open source models on demand and provides an API endpoint. This would be an in between option in terms of self hosting * No matter if we host the backend or not, we should consider adding a RAG or similar layer. That's essentially running a search over the source content (our docs) to find the most relevant content and narrowing down the inputs to the AI. The alternative is feeding the entire source material into the model as part of each prompt, which while perhaps feasible for the size of our content base would be a higher cost option. OpenWebUI implements a RAG feature for local documents and also web searches. There are other open source options too
thabeta added this to the tfgrid_3_16 project 2024-11-28 13:51:07 +00:00
Owner

No longer relevant.

No longer relevant.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: tfgrid/circle_engineering#139
No description provided.