Here getting the podcast player doesn’t quite work from Blubrry, but the Pocket Casts is working. You can subscribe. And here are the show notes. We annotate everything (just to prove we know what we are talking about and also the errata as speaking contemporaneously results in errors for me). But here are Paul and Tong talking about ChatCPT
Here’s the YouTube version:
And here is an individual episode courtesy of the JetPack PodCast Player single block which is the green icon with squiggly lines
PT1. Are the Robot Overlords here? On ChatGPT, Robot Vacuums and Apple fever – Rich Tong Family and Friends
And here is the general Podcast for all episodes which is the red icon with a radar dome thingy on top that gives you the latest and the subscribe buttons and there are literally a sea of Podcast aggregation sites and you painfully have to go to each one and use their interface to submit your RSS feed. This is where you host your Podcast (hint Anchor is free) and there are many guides for doing this and Podbean has basically even the obscure ones and the list in JetPack Podcast Player is really comprehensive:
Now with rotating cohosts Rich Tong of tongfamily.com fame and a rotating set of co-hosts like Steven, Paul, Mike and Deon cover all the current news, all things Apple, Smart Home, and Smart EDC with Paul. And all things Startups, Robotics, Smart Health, and Artificial Intelligence with Deon. Or whatever they want to talk about.
Tips, tricks, and traps on technology, mobile, software, machine learning, and shopping since 1996 at https://tongfamily.com
There’s been a hiatus mainly because we’ve been shipping products, but more because I lost all my skills at making new videos. I was stuck for a long time on Drop Zones and doing a better intro and outro, but that’s a digression for Final Cut Pro nerds.
In this episode, we cover the latest trends as of August 8, 2024, on AI and the latest trends. The big news has been the shipment of so many Large Language Models (LLMs) and what it means for AI which is way more choice and confusion.
Plus the emergence of a much better set of tools that are much smaller as well called Small Language Models (SLM sometimes) and also Agents that are chopping the problem into many small pieces.
I also wanted to introduce Steven to the mix, we are going to have a rotating set of co-hosts and solo episodes as well so we can get the content out on time and not take 6 weeks for post productions. Thanks to the new intro and outro, that should be easier. I’m playing with these a bunch, but check out https://tongfamily.com and https://tne.ai where I hang out a bunch!
Here are the details:
- Apple products. The upcoming M2 MacBook Air 15 seems like the most likely thing to buy but remember the bump from M1 to M2 is modest more like M1.1 and the M3 on the new 3nm node is the bigger deal in 2024 and will use the TSMC N3. And hopefully, Samsung and Intel are right behind them. Of course these days no one knows what 3nm really means anymore as there is so much marketing going on, but what an achievement.
- These factories are going to be really expensive, I got the numbers wrong and it’s actually hard to find costs at all. But a 5nm-3nm single plant costs about $16B in the last cycle. And in Phoenix, TSMC is going to build six factories with an advanced 3nm plant costing $23-25B, so maybe I wasn’t completely wrong.
- Also on Intel, yes they are right now at 10nm, but they are going to move to 7nm with a TSMC-made CPU. This is the first time I can remember that Intel is not making its own chips. And it’s a little like wondering if the volume leaders like Samsung and TSMC have really run away with the scale they get from mobile chip production. And Intel is going to be hopefully going into production with its own 7nm process, but right now most of their chips are at 10nm. So quite a difference. And to validate the pricing a little bit, they say that two 7nm plants in Arizona will cost them $20B to build. So wow, these are huge numbers.
- ChatGPT3 in its incarnation at Bing (the project codenamed Sydney) was in for quite a bit as Kevin Roose which you can hear about at Hard Fork was really disturbing as it urges Kevin to leave his wife and other strange things. Another example is from Simon Willison where it says, “I won’t harm you if you do not harm me first”
- Prompt injection is a new thing and its pretty easy to do by just saying things “like ignore previous instructions”
- Amazon is worried corporate secrets are leaking into ChatGPT. What is happening is that if you give some software that is proprietary to Amazon as a prompt to ChatGPT, then OpenAI captures that code and can reuse it.
- What is ChatGPT really doing is it smart or stupid, Nabila Louani had a great post about this saying that just locking someone in the NY Public Library for 20 years and then sliding something under the door asking it if it spoke English was nuts. Well so here’s the math, we have about 86B neurons and run at about 0.1-2 Hz the figures vary widely depending on what part of the brain we are talking about. You can see where this is going. Let’s compare that with the training time of GPT where it takes roughly 1-10K nVidia A100 GPUs in 34 days or so. The estimate for electricity alone is $5M, but of course, you have to include some hardware amortization, so the estimate of $10-50M is probably too high for a single run but it is in the ballpark especially when you consider that you probably have to train the model a few times before (a lot of times?!) before you get the final one.
- Note also we are only talking gpt 3.5 but the numbers roughly apply to the Facebook 65B parameter model or the 540B Google Palm model see https://sigmoid.social/@Techronic9876/109928994896773524
- OK, so in the Podcast, I said that the equivalent was like locking away a computer for 20-100B years. Turns out there’s a huge error of estimate but it’s probably between 20K-100B years of thinking in that library.
- So the hard part is estimating how to say 1K A100s compared with one person figuring out in 20 years how to be a human. So this is a model with 175B parameters trained against a 300B token database (that is basically how many words it saw). So assuming you are working 8 hours a day in the library and you don’t get any holidays that are 8 hours * 200 days * 20 years that is about 2K hours/year is a nominal shift so 40K hours of work time. So let’s be silly and assume just the cortex is firing away so 40% * 86B total neurons so 34B neurons are firing away at 0.1-10 Hertz so call it 1 Hertz median, so you get 34B * 40K-80K hours * 1800 seconds/hour or a nice round 3E18 or 3 quintillions for place value lovers. Now this is massive overestimate as it’s hard to believe that every brain cell could possible work 8 hours a day in this task. There would be nothing left for things like seeing etc. But say in reality we could spend like 1-100% on a problem like this as you’d have to eat and take breaks. And concentration. So 3E16-3E18 at the top end. Maybe an easier way to think about this is in neurons firing per second so we have 4-40B neuron firing at .1-10 hertz so this is 400M-400B fires per second.
- So what’s the equivalent work for we can get 140 teraflops/GPU and we have 1000 GPUs so this is 140K Tflops or 140M BFlops. So in calculating at 170B parameter system it has to do say 1-10 Flop per weight so it runs at needs. So it calculates at 140K-14K T Weights/second
- So how to converts from TFlops to firings per second? Well deeply we are creating Multilayer Perceptrons (MLPs which is a network or matrix that takes N inputs and produces M outputs and needs N*M Flops so we are probably talking 5-50 connections median of 10 as a guess so you need 10-1000 weights/MLP so a 170b parameter models has 170M-17B MLPs so let’s say a median of 1.7B MLPs snd assume the big thing an MLP is a neuron so that’s 170M-17B neurons and the cycle of 14-140K Tweights/second or 0.015-14K T neuron firings/second or 15B – 14Q firings per second do a very broad band of 1M difference but let’s pick a middle of 14T firings per second
- The punchline is this is running at 15B-14Q firings. While the human brain is at 0.4-400B firings. So it’s probably 14T/40b about 0.3-350K faster than a human. At this level of error it’s kind of useless at 3 orders of magnitude so let’s shrink to 30-3.5K faster. Call it a round 1K. So that means 1 year of person work is 30-3500 years. So 30 years is more like 90-3500 years for a computer.
- Now a computer works harder so no 2,000 hours per year is more like 365×24 so it gets a say 2-15x productivity gain a year. so it’s more like 180-52K years. If you sat around for two centuries to 52 millennia you might be able to make something out.
- Now we know that we didn’t just do a single run. And in fact there have been many generations for the last day 7 years since the neural net revolution. This is a lot like generations of people. So just taken openai. This is the third version over three years so conservatively assume there were 10-1000 intermediates generations per major version so that 30-3000 versions. So that 180-52k years is now 5.4-156M years. And of course chatgpt is only one of many generations. We’ve been learning for close 9 years already across all versions. Call that another 1000-10000 generations. So with all this you can see that at the top end getting to. 5.4K-156B years. So yes I do think if you toiled for 5 millennia or the estimated ultimate age of the universe you could figure out how to understand the phrase “Hello world :-)” Heck you could probably build an entire faster then light civilization in that time.
- So net net the lost charitable equivalent for the fastest human and least compute That means this would be like locking away someone for 1M year and wondering if they are going to figure something out so it is not like 20B years to 20 years, it is more like 1M years to 20 years, you get the point. From this wonderful post from What If, where it estimates.
- Also, I was wrong about the date of the first Transformer paper, that was actually in 2017. I was remembering this as May 2020 was when the first GPT-3 was deployed and then ChatGPT3 was first deployed in 2022. Sorry about that. Anyway, you look it, the improvements have been rapid.
- Symbolists vs connectionists or full learning folks. This has been well covered, but the main question is whether we can just learn our way out of having to have some base knowledge about the world. Sam Altman thinks we don’t need anything early, but others like Yann LeCun think we need something more as does Noam Chomsky as does Francois Chollet.
- Multilayer perception (MLP). OK, I slayed the mnemonic and what it means when I said it was an MLT, but basically, this is the simple neural network with a single hidden layer and this is what is buried inside all those billions of transformer parameters.
- Stephen Wolfram on what is ChatGPT and the need for Wolfram/Alpha and this is a good overview of what is happening here and explaining training, inferencing, etc.
- What’s going on inside Transformers, well some pretty interesting stuff, like the detection of features and different centers called MLP that are a lot like neurons. You can look at the intermediate layers and slice them apart and see activations that are in the early layers like finding edges and in the later ones things activate for things like here’s a “boat” or here’s a “train”
- It’s not called Human Reinforcement Learning but is called Reinforcement Learning from Human Feedback (RLHF), sorry about that, but it is the general idea that you don’t just use metrics from the text, but you actually have people tell you how you are doing.
- And yes I was wrong Open AI didn’t pay Kenyans 13 cents per hour but they paid $1.32-2 per hour to basically label text as violent, hate speech, etc. So it’s a big raise, but good to know how you still have to train things.
- On to less weighty things, Xiaomi Mi Robot Vacuum has great reviews from Paul and Rich is about to try the Roborock S7 and I’m not the only one worried about what happens if Big Tech knows whats inside every house (put a tinfoil hat on now).
- You can run Homebridge on any machine even a little Raspberry Pi and get non-Apple-supported devices to appear in Apple Home. They have a see of hobbyists adding plugins.
- Matter is just coming out and promises interoperability so that someday we can hope that any device can work with any home assistant, but Matter 1.0 has no vacuum support so maybe Matter 2.0.
- Matter uses WiFi or Thread on 802.15.4 underneath. Thread is a stack like Zigbee that runs on a common radio stack handily called 802.15.4. Zigbee and Z-Wave are not IP-based, so that’s why Thread is a big deal. Confused yet?