This is a cool tool that lets you jam multiple Macs (or other machines together) and run inference in a circle with a project called EXO
It does require a bit of setup. I tried to do a pipx install of it, but it is not packaged that way. You also can’t do a `pipx install git+https://github.com/exo-explore/exo since it is not packaged that way.
I really don’t like this git clone stuff, but if you are going to do it, at least fork the repo and then have your own local copy.
Installation is pretty easy except the default is to install into the system python which is not such a great idea. They do allow manual creation with install.sh
but the asdf, direnv combination is much better, just
- brew install asdf direnv
- create a
.envrc
withlayout uv
in it - then a .tool-versions with
asdf direnv local python 3.12.7
- and when you cd into the directory it starts it automatically
- run it with just
exo
This thing is great it comes with tiny chat, but also a v1 OPenAI end point. It also gives the TF available on each machine which look right (42TFLOPS on M4 Max and 21TF on M1 Max)
Loading new models
They have their own scheme for loading models that you can access with their TinyChat. I’ve not quite figured it all out, but it has a big selection of the latest models and seems to use a single machine, then fill up and call the other. Nice work!
Integration with OpenWebui via OpenAI API!
Because it has this it’s easy to go to Admin Settings > Connections and add that point which is on your local machine just http://localhost:52415
. Note that if you do this to get the most memory, don’t run anything but models from there.
The only issue is that the v1 API only shows all the modes but it does work, if you get to this end point, you can watch it use MLX to load safetensors which is great!
Leave a Reply