Running AI Locally Without Spending All Day On Setup


There are many AI models you can play with from companies like OpenAI, Google, and many others. But when you use them, you get the experience they want and you run it on their computer. There are various reasons why you might not like this. You may not want your data or ideas sent through someone else’s computer. Maybe you want to adjust and change in a way they won’t let you.

There are many more or less open models, but setting them up to run them can be a real chore and, unless you are very patient, require a large video card to use as a vector processor. There is very little help for the last problem. You can outsource the processing, but you can just as easily use a hosted chatbot. But there are very easy ways to load and run many AI models on Windows, Linux or Mac. One of the simplest we’ve found is Msty. The program is free for personal use and claims to be private, although if you’re really paranoid you’ll want to check it out yourself.

What is Msty?

I’m talking about Hackaday!

Msty is a desktop application that allows you to do several things. First, it can allow you to chat with an AI engine locally or remotely. It knows many popular options and can collect your keys for paid services. For local options, it can download, install and run the engines of your choice.

For services or engines it doesn’t know about, you can do your own configuration, which ranges from easy to moderately difficult, depending on what you’re trying to do.

Of course, if you have a local model or even the most remote models, you can use Python or a basic interface (for example, with ollama; there are plenty of them examples). However, Msty allows you to have a much richer experience. For example, you can attach files. You can export the results and go back to previous discussions. If you don’t want them to be remembered, you can chat in vapor mode or delete them later.

Each discussion is in a folder, which may contain helpful prompts to start the discussion. So a folder might say: “You are an 8th grade math teacher…” or any other instructions you want to load before jumping into a conversation.

MultiChat

What two models think of the 555

One of the coolest features is the ability to chat with multiple chatbots simultaneously. Of course, if it were just a matter of switching between them, it would be little more than a gimmick. However, you can sync chats so that each chatbot responds to the same prompt, and you can easily see the differences in speed and their response.

For example, I asked Google Gemini 2.0 and Llama 3.2 how a 555 timer works, and you can see the answers were quite different.

RAG

The “knowledge stack” feature allows you to easily retrieve your own data to use as a discussion source (i.e. RAG or Retrivial Augmented Generation) for use with certain engines. You can add files, folders, Obsidian vaults, or YouTube transcripts.

Discuss the podcast

For example, I built a knowledge stack named “Hackaday Podcast 291” using the YouTube link. I could then open a chat with Google’s Gemini 2.0 beta (hosted remotely) and chat with the podcast. For example:

You: Who are the hosts?

gemini-2.0-flash-exp: Elliot Williams and Al Williams are the hosts.

You: What type of microscope were we talking about?

gemini-2.0-flash-exp: The text discusses a probe tip engraver used to make tips for a type of microscope capable of imaging at the atomic level.

It would be easy, for example, to load a bunch of PDF datasheets for a processor and, perhaps, your design documents to allow discussion of a particular project.

You can also save prompts to a library, analyze result metrics, refine prompts and results, and a host of other features. The prompt library also has a few, ranging from an accountant to a yogi, if you don’t want to set your own.

New models

The chat features are great and having a single interface for a multitude of backends is nice. However, the best feature is how the program will download, install, run and stop local templates.

Selecting a new local template will download and install it for use.

To get started, tap the Local AI Model button at the bottom of the left toolbar. This will give you several choices. Keep in mind that many of them are quite large and some of them require a lot of GPU memory.

I started on a machine with an NVidia 2060 card with 6 GB of memory. Certainly, part of this is making the display work. But most of it was available. Some of the smaller models would work a bit, but eventually I would get a weird error. That was enough of an excuse to swap for a 12GB 3060 card, and it seems to be enough for everything I’ve tried so far. Granted, some of the larger models are a bit slow, but that’s acceptable.

There are more options if you press the black button at the top, or you can import GGUF models from places like hugging face. If you already have templates loaded for something like ollama, you can point them to Msty. You can also point to a local server if you prefer.

The version I tested did not support the Google 2.0 model. However, when adding one of the Google templates it was quite simple to add the (free) API key and the model ID (models/gemini-2.0-flash-exp) for the new model.

Conclude

You can spend a lot of time researching and comparing different AI models. This helps to have a listalthough you can wait until you have burned the ones Msty already knows.

Is this the only way to run your own AI model? No, of course not. But this is perhaps the simplest method we’ve seen. We wish it was open source, but at least it’s free to use for personal projects. What is your favorite way to run AI? And yes, we know that some people’s answer is “don’t use AI!” » This is also an acceptable answer.

Leave a Reply

Your email address will not be published. Required fields are marked *