readme.html

﻿<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Welcome file</title>
  <link rel="stylesheet" href="https://stackedit.io/style.css" />
</head>

<body class="stackedit">
  <div class="stackedit__left">
    <div class="stackedit__toc">
      
<ul>
<li>
<ul>
<li><a href="#llama">LLaMA</a></li>
<li><a href="#ci-build-status">CI Build Status</a></li>
<li><a href="#overview">Overview</a></li>
<li><a href="#known-issues">Known Issues</a></li>
<li><a href="#installation">Installation</a></li>
<li><a href="#configuration">Configuration</a></li>
<li><a href="#how-to-use">How to use</a></li>
<li><a href="#advanced-configuration">Advanced Configuration</a></li>
</ul>
</li>
</ul>

    </div>
  </div>
  <div class="stackedit__right">
    <div class="stackedit__html">
      <h2 id="llama">LLaMA</h2>
<p>AI Inference engine for Openfire using LLaMA.<br>
This plugin is a wrapper to llama.cpp server binary. It uses the HTTP API to create a chatbot in Openfire which will engage in XMPP chat and groupchat conversations.</p>
<h2 id="ci-build-status">CI Build Status</h2>
<p><a href="https://github.com/igniterealtime/openfire-llama-plugin/actions"><img src="https://github.com/igniterealtime/openfire-llama-plugin/workflows/Java%20CI/badge.svg" alt="Build Status"></a></p>
<h2 id="overview">Overview</h2>
<img src="https://igniterealtime.github.io/openfire-llama-plugin/llama-chat.png">
<h2 id="known-issues">Known Issues</h2>
<p>This version has embedded binaries for only Linux 64 and Windows 64.</p>
<h2 id="installation">Installation</h2>
<p>copy llama.jar to the plugins folder</p>
<h2 id="configuration">Configuration</h2>
<img src="https://igniterealtime.github.io/openfire-llama-plugin/llama-settings.png">
<h3 id="enable-llama">Enable LLaMA</h3>
<p>Enables or disables the plugin. Reload plugin or restart Openfire if this or any of the settings other settings are changed.</p>
<h3 id="enable-hosted">Use Hosted LLaMA server</h3>
<p>This causes the plugin to use a remote llama.cpp server instead of the local server running Openfire. The plugin will assume that remote server has the correct LLaMA model and configuration. It will send requests to this URL.</p>
<h3 id="hosted-url">Hosted URL</h3>
<p>The URL to the remote llama.cpp server to be used.</p>
<h3 id="usernamepassword">Username/Password</h3>
<p>This is Openfire username/password for the user that will act as a chatbot for LLaMA. By default the user will be “llama” and the password witll be a random string. If you are using ldap or your Openfire user manager is in read-only mode and a new user cannot be created, then you must create the user and specify the username and password here…</p>
<h3 id="alias">Alias</h3>
<p>Set an alias for the model. The alias will be returned in chat responses.</p>
<h3 id="host">Bind IP Address/Hostname</h3>
<p>Set the IP address to bind to. Default: localhost (127.0.0.1)</p>
<h3 id="port">Port</h3>
<p>Set the port to listen. Default: 8080</p>
<h3 id="model-path">Model Location Path</h3>
<p>Specify the path to where the LLaMA model file to be downloaded on the server. Default is OPENFIRE_HOME/llama folder. If a model file is already available, copy it here as llama.model.gguf.</p>
<h3 id="model-url">Model URL</h3>
<p>Specify the path to the LLaMA model file to be downloaded and used. Default is <a href="https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q2_K.gguf?download=true">https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q2_K.gguf?download=true</a><br>
The first time the plugin starts, it will download this file and cache in OPENFIRE_HOME/llama folder.</p>
<h3 id="system-prompt">System Prompt</h3>
<p>Prompting large language models like Llama 2 is an art and a science. Set your system prompt here. Default is “You are a helpful assistant”</p>
<h3 id="predictions">Predictions</h3>
<p>Set the maximum number of tokens to predict when generating text. Note: May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. Default is 256</p>
<h3 id="temperature">Temperature</h3>
<p>Adjust the randomness of the generated text (default: 0.5).</p>
<h3 id="top-k-sampling">Top K Sampling</h3>
<p>Limit the next token selection to the K most probable tokens (default: 40).</p>
<h3 id="top-p-sampling">Top P Sampling</h3>
<p>Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P (default: 0.95).</p>
<h2 id="how-to-use">How to use</h2>
<img src="https://igniterealtime.github.io/openfire-llama-plugin/llama-test.png">
<p>To confirm that llama.cpp is working, use the demo web app to test.</p>
<p>The plugin will create an Openfire user called llama (by default). The user can be engaged with in chat or groupchats from any XMPP client application like Spark, Converse or Conversations.</p>
<h3 id="chat">Chat</h3>
<p>Add llama as a contact and start a chat conversation</p>
<pre><code>(22:20) Seye: what are female goats called?
(22:20) LLaMA:   Female goats are called does.
</code></pre>
<h3 id="group-chat">Group Chat</h3>
<p>Start as chat message with the LLaMA username (llama) and LLaMA will join the groupchat and respond to the message typed.</p>
<pre><code>(22:19) Seye: llama, what is a radiogram?
(22:19) LLaMA:   Oh my llama-ness! I'm so glad you asked! A radiogram (also known as a wireless gram) is an old-fashioned term for a message or telegram that is sent via radio communication.
</code></pre>
<p>Note that this only works with group-chats hosted in your Openfire server. Federation is not supported.</p>
<h2 id="advanced-configuration">Advanced Configuration</h2>
<h3 id="gguf-model-files">GGUF Model files</h3>
<p>The gguf model files for LLaMA 2 are large 5GB+ and may take serveral minutes to download. The downloaded model file is cached as OPENFIRE_HOME/llama/llama.model.gguf.<br>
To speed up this process, you can preload the model by copying a local file to this destination and rename accordingly before installing the plugin.</p>
<h3 id="gpu-support">GPU Support</h3>
<p>The plugin has generic binaries for Linux64 and Windows64 with no GPU support. In order add GPU support, build the llama.cpp server binary with the appropriate GPU configuration and replace in the OPENFIRE_HOME/plugins/llama/classes/linux-64 or OPENFIRE_HOME/plugins/llama/classes/win-64 folder after installing the plugin or replace in the source code and rebuild plugin with maven.</p>

    </div>
  </div>
</body>

</html>