Skip to content

Bioconductor support chatbot

No due date 0% complete

We aim to build a helpful chat solution for users of the Bioconductor ecosystem using the technologies around current Large Language Models. To be able to help with questions regarding the Bioconductor ecosystem, the LLM requires a good knowledge of Bioconductor packages, structure, and coding etiquette; something which cannot be assumed to be present in …

We aim to build a helpful chat solution for users of the Bioconductor ecosystem using the technologies around current Large Language Models. To be able to help with questions regarding the Bioconductor ecosystem, the LLM requires a good knowledge of Bioconductor packages, structure, and coding etiquette; something which cannot be assumed to be present in any LLM.

To counter this problem, we aim to implement a custom solution with concrete knowledge from the Bioconductor ecosystem, including a Retrieval-Augmented Generation (RAG) process that can inject context into the LLM prompt for answering specific questions. To achieve this, we will use and extend the BioChatter library, an open source library for the biomedical application of LLMs.

The desired outcome is a chatbot instance that can inform Bioconductor users about the ecosystems (which packages to use for which purpose, where to get more info), usage of specific libraries (including their vignettes and idiomatic programming style), and troubleshooting of common problems (by integrating the Bioconductor support forum).

In the process of managing the complex multi-layered knowledge required for this assistant, we will use the BioCypher library, which is designed to facilitate knowledge management, and which natively interacts with BioChatter. Building specific knowledge graphs for the layers of information in the Bioconductor ecosystem (from the meta-information about the ecosystem down until the occurrence of errors in a specific package) will allow the RAG mechanism to give context-specific answers to the users’ questions.

Loading