LLM

The LLM view enables the operator to configure a cutting-edge retrieval augmented generation (RAG) large language model (LLM) artificially intelligent (AI) feature designed to empower network operators to harness the power of advanced language models within their private networks. By leveraging this innovative technology, operators can revolutionize network operations, enhance customer experiences, and unlock new revenue streams.

LLM Options

The LLM Options scaffold configures the end-user behavior of the LLM feature. It may be useful to think of each LLM Option as a unique chatbot. In a scenario where there are multiple end-user and operator portals, multiple LLM Option records may be created to drive each portal to a unique chatbot experience.

The name field is an arbitrary string descriptor used only for administrative identification. Choose a name that reflects the purpose of the record. This field has no bearing on the configuration or settings determined by this scaffold.

The default checkbox, if checked specifies this LLM Option will be used for all portals without an explicit LLM option defined.

The temperature field is a number between 0 and 1 (0.8 default). Increasing the temperature value will make the model answer more creatively.

The chatbot name field is used to set the name of the chatbot, default is Romeo George.

The chatbot avatar field allows the operator to upload a custom image to be as the chatbots avatar in chatbox window.

The apply guardrails checkbox applies guardrails against harmful, unethical, racist, sexist, toxic, etc. conversations. This is applied after any custom instructions and is enabled by default.

The custom instructions field allows the operator to provide the LLM with a full set of custom system instructions.

The initial greeting field allows the operator to specify a specific greeting the chatbox will use to introduce itself when a user initiates a new chat.

The default llm model drop down is used to specify the default LLM modem this LLM will use.

The llm models field allows the operator to select all the models that can be used with this LLM Option.

The admin roles field sets the admin roles that will use this LLM option, selecting any roles here will remove them from being selected in another LLM option.

The operator portal field selects which Operator Portals are assigned to this LLM Option. Note that an admin role match is prioritized over an operator portal match. Associating a record removes it from other options.

The landing portals field sets which splash/landing portals use this LLM option. Associating a record removes it from the other options.

The alow anonymous chats field, if checked allows users to chat via the portal without being logged in with an account.

LLM Workers

The LLM Workers scaffold configures an LLM back-end service that will be used by the chatbots defined by LLM Options. LLM Workers may leverage both local as well as remote GPU resources. The remote GPU resource configuration is intended to be used with a Fleet Manager. An organization might wish to install one or more GPUs into the Fleet Manager and thus have a centralized pool of GPU resources that are shared amongst of a fleet in order to meet cost, power, and cooling budgets at the edge. It is also possible to create LLM Workers that leverage cloud AI systems. We highly recommend that operators deploy their own GPUs to maximize ROI and information security. Tying an LLM Worker to a cloud system is primarily included for the purpose of demonstration.

The adapter field specifies the adapter to be used, Ollama is selected by default.

The run locally checkbox tells the system to run a local server for the specfied backend (adapter) and optionally can be made available to specific WAN targets and or policies.

The host field is used to specify the IP or FQDN of the host providing the API interface for the designated backend.

The port field specifies the port to be used for communication.

The timeout field sets the amount of time the system should wait for the LLM worker to respond.

The online checkbox, if checked indicates that the LLM worker is online and ready to process reqquests, unchecking this field will make the LLM worker unavailable.

The llm model drop down specifies which LLM Model this worker will use by default.

The llm models field specfies which LLM Models this worker is allowed to use.

The use for embeddings field if checked, designates the worker that will be used to generate embeddings for context lookup. Enabling will deactive embedding for other workers.

LLM Models

The LLM Models scaffold displays the models that are available to the LLM Worker. Use the import models action link on the LLM Worker to bring in the available models and populate this scaffold. This scaffold is primarily for informational purposes. We recommend pulling llama3.1:latest for most purposes, or llama3.1:70b if you have more than 40 GB of VRAM. We recommend pulling nomic-embed-text (or mxbai-embed-large if you have a powerful machine) for embeddings.

The name field is the name of the model as recognized by the LLM Workers running it. This name needs to match the model name this field is not arbitrary.

The url NEED Clarification here, does it pull the model to the system and the URL is where that model can be downloaded from.

The formatter specifies which LLM model to use to format the requests that are sent to the LLM Worker Needs clarification

The context window sets the size of the set of information that is relevant for answering questions. Setting a larger context window can allow for more detailed and comprehensive answers. What limits this

The embedding dimensions field sets the scope for how many different features/variables are being considered, different models require varying amounts of data.

The quantization level field is used to set how many bits are being used per word when encoding text data. The higher this value the more detail can be gained from the information/data.

LLM Sources

The LLM Sources scaffold allows operators to upload data sources that are used by the retrieval augmented generation (RAG) system. Documents uploaded to this scaffold are indexed using an embeddings model. Vector similarity search is performed upon the chatbot input to acquire relevant fragments of LLM Sources to provide context to the LLM when generating a chatbot response.

The intended method of use is for the operator to upload a large number of files from their existing dataset. For example, the operator might upload all of the menus for all of the restaurants at a large public venue. Another common use case would be an upload of all of the recommended nearby activities for a hospitality venue. The LLM Sources are unique to each edge. Synchronization of LLM Sources between edges is accomplished using a Fleet Manager.

The llm_remote_data_source if selected, marks this LLM Source as remote, meaning that the rXg is expected to fetch the source data from a remote host.

The frequency selector allows the operator to choose whether to make the llm source get called live, or periodically. If it is called periodically, it will not have access to a specific user's specific query, the way it will if being called live. If this LLM Option is periodic, the remote source will be called periodically and the result will be stored in the LLM Option's source attachment, and a button to manually refresh it will be made available. If this is a Live LLM Source, the query will be accessible from within the parameterization's ERB context as the query variable.

The source field lets you choose the file to upload that will be used as an embeded source.

The visibilty field sets which users can access this information, if set to admins only, the source will only be referenced if you are logged in as an admin. Admins and Users allows both admins and client users to receieve information from this source, and setting it to anonymous allows any user interacting with the LLM to recieve this information regardless of being logged in or not. The anonymous setting should only be used if the client would be interacting with the chatbot without a login sessions, ie from the splash portal.

The request properties are merged with the request properties of the remote llm data source when this data is retreived. These request properties can use ERB, and if they do, the user's query will be available in that ERB context as the query variable. Other variables such as client_ip, client_mac, current_account_id, anonymous_user_id, connected_ap_id, current_admin_id will be available if possible.

The path attribute is merged with the base url of the LLM Remote Data Source to determine the URI that will be called. This allows the operator to configure different paths with different query parameters off of one LLM Remote Data Source.

The timeout attribute determines how long the rXg will wait for a response when it tries to query the remote data source.

The frequency field choice determines how often a periodic remote data source is redownloaded.

The cache duration field determines how long an on-demand response is cached for. This works in conjunction with the cache duration unit field.

LLM Remote Data Sources

LLM Remote Data Sources Represent "base" configuration for getting data from remote web servers to use in LLM generated responses. LLM Remote Data Sources contain properties that are used when requesting data from that remote llm source.

The base url field will be combined with the path attribute of a correlated LLM Source. This is intended to give operators significant flexibility when using LLM Remote Data Sources.

The request properties will be merged with the LLM Source's request properties when making the request. These request properties can use ERB, and if they do, the user's query will be available in that ERB context as the query variable. Other variables such as client_ip, client_mac, current_account_id, anonymous_user_id, connected_ap_id, current_admin_id will be available if possible.

The basic auth username and basic auth password will be included as basic auth if they are present.

LLM Embeddings

The LLM Embeddings scaffold displays all of the indexes that have been created by the embeddings model for the various LLM Sources. This scaffold is intended for informational purposes only.

An entry in the embedding scaffold represents data the LLM can pull from to answer client questions. The source shows where the data is from, updated reflects the last time this information was updated, the llm model used and the dimensionality.

LLM Prompts

This scaffold is a list of all the prompts sent to the LLM from the clients.

LLM Requests

This scaffold is a list of all the prompts sent to the LLM from the clients, it will list which llm model was used, the LLM worker, when the task was started, when it completed, and how long it took to complete the response.

Chats

The chats scaffold is a history of the chats that have been initiated on the system.

LLM Setup Example

In this example the hardware is a pc with a 3090 graphics card, WAN + certificate is configured, no other configuration has been done.

Navigate to Services::LLM

Create a new LLM Worker.

Give the record a name, in this case since it will be running locally on the system using Ollama I will use the name Local Ollama.

The adapter field should be set to Ollama, and the run locally checkbox should be checked.

The default port value of 11434 should be used, and timeout can be left at 30 seconds. It may be necessary to increase the timeout to 120 seconds to support larger models such as the 70b variant of llama.

Add any WAN targets that should be allowed to communicate on this port and/or any polcies that should be allowed. Being that there is no other configuration currently on this system I will select the default policy, if this were a live deployment I would need to add any client policies that will have access to the chatbot.

Leave the online checkbox, checked.

We do not have any llm models yet so we will leave those fields blank. In this demo we will also be using this worker for the embedding so the use for embeddings checkbox should be checked.

Click Create.

To pull a new model click the pull model link and enter the name of the model to be fetched.

Here we will pull the latest llama3 model, then click submit. This process can take a long time to complete as it must download and process the model file which an be quite large. If the model does not automatically appear in the model scaffold after pulling, wait a bit longer, and click the Import models link in the scaffold.

Repeat for each desired model.

Edit the LLM Worker created previously and now we can select the default model to use with this worker as well as specify other models the worker can use.

After selecting the default LLM model and any additional models click create.

Next we will enable embedding generation. Embeddings are numerical representations of text or other data, which can be compared against each other in order to detect similarity between different data. If enabled, embeddings will be created for the Retrieval Augmented Generation (RAG) sources, as well as the admin manual and the Active Record Models that make up the database. There must be a worker designated for creating embeddings as well as a model designated for embeddings. For this we will use the nomic-embed-text:latest model. Edit the nomic LLM worker.

The embedding dimensions field is required when using a model for embedding. This field is typically populated automatically when importing models from a worker. Valid values are as follows: 512, 768, 1024. Default is 768 and will be used if the field is blank. Check the use for embeddings checkbox and click update. NOTE It will not start generating LLM embeddings until we create the LLM Option which brings us to the next step.

Create a new LLM Option.

Give the record a name, since this will be the default LLM Option I will call it default, and check the default box below the name field. If desired you can enter a name for the chatbot, and upload a custom avatar.

Be default apply guardrails is checked, for the purpose of this demo it will remain checked.

I will not be changing the custom instructions or the the initial greeting at this time.

Select the default LLM model, for this we will be using llama3:latest, I will select the other models as well.

In the Provisioning section select which admin roles that will use this option set. If there are any operator portals or splash/landing portals that should use this LLM option they can be selected at this time as well. If the goal is to allow anyone to access the chatbot without a login session check allow anonymous chats. Primarily this would be checked if you intend to have the chatbot on the splash portal before authentication.

Click create.

This will then start generating the LLM Embeddings to be used as resources for chat responses. The time it will take to generate the embeddings depends on hardware.

The Chat is now available for use in the admin gui.

LLM Remote Sources Example

Using the LLM Remote Data Source feature allows the system to pull in realtime data via API calls. In this example we will create a remote data source that pulls information from Aviationstack.com. In this example we are using the paid service, however I believe a free account allows 100 API calls per month.

Before we begin we should determine which API endpoints we will use. We can see a list of available API calls for Aviationstack here aviationstack.com/documentation. For this example we will using the following endpoints.

alt text

dep_iata which allows us to narrow the scope of our searches to a specific airport.

alt text

flight_date which allows us to specify a time for our queries.

alt text

flight_number which allows us to inquire about specific flights.

Lastly access_key which is required and will pass our API access key.

alt text

To begin using Remote Data Sources navigate to Services::LLM and create a new LLM Remote Data Source.

alt text

Give the record a name. Enter the base URL, which for aviationstack is https://api.aviationstack.com/. Next we need to configure a Request property. Kind should be set to Query Parameter, key set to access_key from the endpoint above, and the value is the API key provided by aviationstack. Click Create.

alt text

Next create a new LLM Source.

alt text

Give the record a name. Set the Visibility field to Admins and users, if we leave it at the defult Admins, then only admins will be able to access this source, we want clients on the network that have logged in to be able to access this. Setting it to anonymous then a client would be able to access the source regardless of login status.

The path here needs to be set to v1/flights for aviationstack. Select the LLM Remote Data Source that we created previously in the LLM remote data source field.

The Remote Data Description field is important and we must provide a value here. This is what the system will look at to determine if this source has information relevant to the inquery.

Next we need to create the Request Properties to use for this source. We will be using dep_iata to narrow the scope to a specific airport in this example DEN (Denver), flight_date which allows to search for a specific dates/times, and finally flight_number which allows us to retrieve information based on the specific flight numbers.

Click Create.

alt text

Now we should see a new LLM Embedding for this source, if not, click the Regenerate Embeddings link above the LLM Embeddings.

alt text

Now we are ready to start asking questions.

alt text

GPUs

Nvidia datacenter and workstation GPUs are preferred. The RTX 4000 is the preferred GPU for space and power constrained scenarios. The RTX 4000 consumes a single PCI-e slot and contains 16 GB to 20 GB of VRAM which is enough to run most typical models. Production servers from major manufacturers are usually ordered with the Nvidia L40S GPU which comes with 48 GB of VRAM.

It is possible to use desktop GPUs which are often available a lower prices, especially in the second-hand market, which is useful for demonstation and development purposes. The Nvidia 3090 GPU is known to work and available at reasonable prices. The 24 GB of RAM present onboard the 3090 is to run most models. The 3090 GPU is also available in two-slot configurations. Later generation Nvidia GPUs such as the 4090 typically require three slots and provide similar amounts of VRAM.

It is possible to put two (or more) GPUs in the same machine. For example, two 3090s with 24 GB of VRAM, or three A4000s with 16 GB of VRAM each, would be enough to run Llama 3.1 70b which requires 40 GB of VRAM.