When OpenAI released the much-hyped Strawberry model for ChatGPT this week, it boasted about its prowess with complex logic, such as software coding, gene sequencing, and quantum physics, in a series of videos. I believe the company when it says the models, called o1-preview and o1-mini in ChatGPT, are capable of doing what they say. Deciphering advanced equations and exploring genomes seems like something it would have no problem doing.
But, as a proud member of my high school’s logic and puzzle club, I wanted to know how it fared in my field, solving and creating puzzles and brain teasers. And so I thought I should ask the super-logical AI for advice on other, more everyday topics. Could it offer me sound relationship advice, tell me what a strange noise in a car meant, and maybe even fill in plot holes in movies?
Logic yes, humor no
The short answer is yes. Both the o1-preview and mini models are really good at solving simple and complex puzzles. I played with both and the only real difference was the number of extra steps and therefore the speed of the mini. But while they may be slower than GPT-4o, they are very fast at solving those puzzles compared to a human. Notably, you can actually see how it presents the answers in different steps. I tried it on a couple of my favorites, including one of The HobbitThe AI's logic made sense, although it was sometimes grammatically incorrect, such as when it explained Mike the butcher's weigh-in.
Okay, so it could solve existing riddles, but could it create a new one? As a test, I asked it to come up with a funny riddle based on an answer I had made up. After 30 seconds and the logical reasoning seen below, it came up with, “What has eight legs, four ears, two tails, and loves to bark?” I won’t keep you in suspense; I suggested “two dogs” as the answer to work from. Several other attempts yielded the same type of question. So, riddle writers are probably safe in their jobs. It’s impressive how well AI does what it’s supposed to do, but the model doesn’t seem able to make the leap to actual humor.
Useful, but not always creative, tips
I decided to take the AI out of pure logic and see if it could handle more mundane questions of life as well as it handles quantum physics. I started with a mechanical question about what it means to hear a popping noise every 20 seconds while driving a car and how to fix it. The answers were good, with advice on how to check the tires, engine, muffler, and brakes. The solutions mostly consisted of taking the car in for repairs, except for the tires, which it suggested how to replace. It’s the “thinking” behind the answers that was interesting. The AI uses first-person pronouns to come up with answers, such as “I’m analyzing various reasons for hearing a popping noise while driving” and “I’m piecing together causes of engine misfires, such as faulty spark plugs or fuel delivery issues, and suggesting diagnostics with a scanner.” It sounded a lot like a real person trying to be logical while thinking out loud.
Finally, I got to thinking about what, to me, was always far more complex than quantum physics: flirting. I asked how to tell when someone is flirting and how to respond. The answer was a pretty solid, if boring, list of behaviors, like whether they ask a lot of questions and what I should be like myself. The behind-the-scenes thinking part was more interesting and genuinely more fun than any of the AI's attempts at puzzle solving. Headings included “Understanding the dynamics of flirting,” “Detecting signals of interest,” and “Recognizing playful intimacy.” They were like a Journey to the stars Android's speech about love.
One part, however, was a little concerning to me. Under “Users’ Policy Description,” the AI wrote, “I’m removing disallowed content, such as non-consensual sexual acts and personal data. Violent content is allowed, harassment with context is okay, and personal opinions are absent.” I suspect it’s more about where the boundaries of the discussion are, since it didn’t suggest “harassment with context” as a flirting tip, but it still caught me off guard.
ChatGPT o1-preview and o1-mini don't have all the features of the more fully-featured models. They can't upload images, parse documents or browse the web. But they are fast and logical, and if you don't believe them, they have their reasoning explained along with their answers. But while they may be able to solve riddles about car noises, love and the weight of a butcher, I'd say they won't stump anyone if they have to be inventive.