Because vision intelligence can understand what it sees, contextualize that information, make decisions based on the information, and change or alter the appearance of what is there. Credit: eXeX From its Darwin AI acquisition to recent reports claiming Apple might work with Google and others to support a wider array of generative AI (genAI) tools than it plans to introduce, it’s pretty clear the company has chosen to be focused in where it creates its own AI technologies. At least one of these focus areas reflects work the company has been doing since before AI became a buzzword — and that’s vision intelligence. Intimations of life By this, I specially mean AI that can understand what it sees, contextualize that information, make decisions based on it, change or alter the view, and so on. You might already be making use of this kind of AI: Each time you photograph a document and Apple lets you copy the text to paste into another document. When your iPhone can tell you where the doors of a building are. When you tap the ‘I’ button in Photos to get connected to descriptions of what is visible. When your iPhone tells you the meaning of a laundry label you expose it to. When you use Translate to decipher text on signs around you. When the LiDAR sensor provides you with a room map. There are many other examples. There may even be better illustrations that demonstrate the direction of travel. Electron blues Apple’s researchers recently published a white paper that has generated consternation and comment since its release. It describes a technology called MM1, which is a Multimodal Model for Text and Image Data. That means it can train large language models (LLMs) using both text and images and is being called a “significant advance” for AI. The models using the tech performed excellently at such tasks as image captioning, visual question answering, and natural language inference. The system also showed strong in context-learning capabilities. In other words, it can learn fast by being exposed to text/words and images, which also means the tech could eventually handle really complex, open-ended problems. The latter is a holy grail for AI research, as achieving it means machines capable of solving problems in a highly contextual way. That’s all good, but what’s important here is the use of images. This is not the first time in recent months Apple has harnessed machine vision intelligence this way. Toward the end of 2023, its Keyframer animation tool shipped, and even earlier in 2023 we heard that part of what the company intended to build was AI capable of creating realistic immersive scenes for use in Vision Pro. Automated for the people And the latter product is of course the space in which so much of Apple’s vision for Generative Visual AI may make the biggest difference, as the implications are profound. Think how it makes it possible for one person wearing a Vision Pro to enter an environment — any environment — and while exploring that space build a perfect digital replica of that place that can also be shared with others. Thing is, this tool isn’t just a dumb representation of the place; armed with vision intelligence, the resulting shared experience wouldn’t just look like the place you were exploring, with a few parameter tweaks to correct any errors, it would effectively be a fully functioning digital representation of that space. This is useful in all kinds of situations, from traffic management to building and facilities management, but the capacity to build true-to-life, smart and intelligent representations of spaces also extends to architecture and design. And, of course, there are evident implications for health. None of these ideas may turn out to work quite the way I’m articulating, though I’m 100% certain Vision Pro’s place in building digital twins for multiple industries will turn out to be set in stone. Everybody hurts But the combination of new highly visual operating systems (visionOS) with a highly visual AI capable of deep contextual understanding and response isn’t something that’s just catching up with the famed Tom Cruise movie, Minority Report. It is a tech deployment about to happen in real time that is moving beyond the visions of the futurologists who advised on that movie. No wonder the entire industry now wants to move in Apple’s direction — it’s got to hurt to see the company get there fastest. But everybody hurts, sometimes. Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe. Related content news analysis When it comes to AI, Apple is opening up for intelligence Apple is becoming increasingly open as its research teams cook up Apple Intelligence. By Jonny Evans Jun 18, 2024 4 mins Apple Developer Generative AI how-to How to use iCloud with Windows If you have an Apple ID, you can use iCloud with Windows. Here's how to get started. By Jonny Evans Jun 17, 2024 10 mins Small and Medium Business Apple Computers feature Apple's grip on retail tech is strengthening From point-of-sale systems to food storage management, Apple is having an impact on parts of retail we never thought possible. By Jonny Evans Jun 17, 2024 5 mins Retail Industry Apple iOS news analysis Everything Apple Intelligence will do for you (so far) These are the tools and services Apple's new Apple Intelligence will boost — with more to come. By Jonny Evans Jun 14, 2024 6 mins Apple Generative AI Mobile Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe