Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Created

Is Jax for Apple Silicon is still supported
Hi From https://developer.apple.com/metal/jax/ I checked all active workflows on https://github.com/jax-ml/jax and any open issues with tags Metal and seems in DEC 2025 the Jax maintainers have closed all issues citing No active development on Jax-metal and the project seems dead. We need to know how can we leverage Apple silicon for accelerated projects using popular academia library and tools . Is the JAX project still going to be supported or Apple has plans to bring something of tis own that might be platform agnostic . Thanks
0
0
116
3w
Assert error breaking previews
A foundation models bug I keep running into when in the preview phase of the testing. The error never seems to occur or break the app when I am testing on the simulator or on a device but sometimes I am running into this error when in a longer session while being in preview. The error breaks the preview and crashes it and the waring on it is labeled as : "Assert in LanguageModelFeedback.swift" This is something I keep running into, where I have been using foundation models for my project
2
0
230
3w
Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)
Hello All, I’m working on a computer-vision–heavy iOS application that uses the camera, LiDAR depth maps, and semantic segmentation to reason about the environment (object identification, localization and measurement - not just visualization). Current architecture I initially built the image pipeline around CIImage as a unifying abstraction. It seemed like a good idea because: CIImage integrates cleanly with Vision, ARKit, AVFoundation, Metal, Core Graphics, etc. It provides a rich set of out-of-the-box transforms and filters. It is immutable and thread-safe, which significantly simplified concurrency in a multi-queue pipeline. The LiDAR depth maps, semantic segmentation masks, etc. were treated as CIImages, with conversion to CVPixelBuffer or MTLTexture only at the edges when required. Problem I’ve run into cases where Core Image transformations do not preserve numeric fidelity for non-visual data. Example: Rendering a CIImage-backed segmentation mask into a larger CVPixelBuffer can cause label values to change in predictable but incorrect ways. This occurs even when: using nearest-neighbor sampling disabling color management (workingColorSpace / outputColorSpace = NSNull) applying identity or simple affine transforms I’ve confirmed via controlled tests that: Metal → CVPixelBuffer paths preserve values correctly CIImage → CVPixelBuffer paths can introduce value changes when resampling or expanding the render target This makes CIImage unsafe as a source of numeric truth for segmentation masks and depth-based logic, even though it works well for visualization, and I should have realized this much sooner. Direction I’m considering I’m now considering refactoring toward more intent-based abstractions instead of a single image type, for example: Visual images: CIImage (camera frames, overlays, debugging, UI) Scalar fields: depth / confidence maps backed by CVPixelBuffer + Metal Label maps: segmentation masks backed by integer-preserving buffers (no interpolation, no transforms) In this model, CIImage would still be used extensively — but primarily for visualization and perceptual processing, not as the container for numerically sensitive data. Thread safety concern One of the original advantages of CIImage was that it is thread-safe by design, and that was my biggest incentive. For CVPixelBuffer / MTLTexture–backed data, I’m considering enforcing thread safety explicitly via: Swift Concurrency (actor-owned data, explicit ownership) Questions For those may have experience with CV / AR / imaging-heavy iOS apps, I was hoping to know the following: Is this separation of image intent (visual vs numeric vs categorical) a reasonable architectural direction? Do you generally keep CIImage at the heart of your pipeline, or push it to the edges (visualization only)? How do you manage thread safety and ownership when working heavily with CVPixelBuffer and Metal? Using actor-based abstractions, GCD, or adhoc? Are there any best practices or gotchas around using Core Image with depth maps or segmentation masks that I should be aware of? I’d really appreciate any guidance or experience-based advice. I suspect I’ve hit a boundary of Core Image’s design, and I’m trying to refactor in a way that doesn't involve too much immediate tech debt, remains robust and maintainable long-term. Thank you in advance!
2
0
265
3w
Is it possible to instantiate MLModel strictly from memory (Data) to support custom encryption?
We are trying to implement a custom encryption scheme for our Core ML models. Our goal is to bundle encrypted models, decrypt them into memory at runtime, and instantiate the MLModel without the unencrypted model file ever touching the disk. We have looked into the native apple encryption described here https://developer.apple.com/documentation/coreml/encrypting-a-model-in-your-app but it has limitations like not working on intel macs, without SIP, and doesn’t work loading from dylib. It seems like most of the Core ML APIs require a file path, there is MLModelAsset APIs but I think they just write a modelc back to disk when compiling but can’t find any information confirming that (also concerned that this seems to be an older API, and means we need to compile at runtime). I am aware that the native encryption will be much more secure but would like not to have the models in readable text on disk. Does anyone know if this is possible or any alternatives to try to obfuscate the Core ML models, thanks
0
1
416
4w
CoreML GPU NaN bug with fused QKV attention on macOS Tahoe
Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2. Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True). Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline) Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.) Has anyone else hit this? Filed FB report too.
0
0
330
4w
Tone, Sentiment, language analysis on iPhone - Ideas
Hi everyone, I’m exploring ideas around on-device analysis of user typing behavior on iPhone, and I’d love input from others who’ve worked in this area or thought about similar problems. Conceptually, I’m interested in things like: High-level sentiment or tone inferred from what a user types over time using ML-models Identifying a user’s most important or frequent topics over a recent window (e.g., “last week”) Aggregated insights rather than raw text (privacy-preserving summaries: e.g., your typo-rate by hour to infer highly efficient time slots or "take-a-break" warning typing errors increase) I understand the significant privacy restrictions around keyboard input on iOS, especially for third-party keyboards and system text fields. I’m not trying to bypass those constraints—rather, I’m curious about what’s realistically possible within Apple’s frameworks and policies. (For instance, Grammarly as a correction tool includes some information about tone) Questions I’m thinking through: Are there any recommended approaches for on-device text analysis that don’t rely on capturing raw keystrokes? Has anyone used NLP / Core ML / Natural Language successfully for similar summarization or sentiment tasks, scoped only to user-explicit input? For custom keyboards, what kinds of derived or transient signals (if any) are acceptable to process and summarize locally? Any design patterns that balance usefulness with Apple’s privacy expectations? If you’ve built something adjacent—journaling, writing analytics, well-being apps, etc.—I’d appreciate hearing what worked, what didn’t, and what Apple reviewers were comfortable with. Thanks in advance for any ideas or references 🙏
1
1
582
Feb ’26
Shortcut - “Use Model” error handling?
I have a series of shortcuts that I’ve written that use the “Use Model” action to do various things. For example, I have a shortcut “Clipboard Markdown to Notes” that takes the content of the clipboard, creates a new note in Notes, converts the markdown content to rich text, adds it to the note etc. One key step is to analyze the markdown content with “Use Model” and generate a short descriptive title for the note. I use the on-device model for this, but sometimes the content and prompt exceed the context window size and the action fails with an error message to that effect. In that case, I’d like to either repeat the action using the Cloud model, or, if the error was a refusal, to prompt the user to enter a title to use. I‘ve tried using an IF based on whether the response had any text in it, but that didn’t work. No matter what I’ve tried, I can’t seem to find a way to catch the error from Use Model, determine what the error was, and take appropriate action. Is there a way to do this? (And by the way, a huge ”thank you” to whoever had the idea of making AppIntents visible in Shortcuts and adding the Use Model action — has made a huge difference already, and it lets us see what Siri will be able to use as well.)
3
0
494
Jan ’26
FoundationModels coding
I am writing an app that parses text and conducts some actions. I don't want to give too much away ;) However, I am having a huge problem with token sizes. LanguageModelSession will of course give me the on device model 4096 available, but when you go over 4096, my code doesn't seem to be falling back to PCC, or even the system configured ChatGPT. Can anyone assist me with this? For some reason, after reading the docs, it's very unclear how this transition between the three takes place.
3
0
747
Jan ’26
ML contraints & Timeout clarificaitions for Message Filtering Extension
Hello everyone, I’m currently working with the Message Filtering Extension and would really appreciate some clarification around its performance and operational constraints. While the extension is extremely powerful and useful, I’ve found that some important details are either unclear or not well covered in the available documentation. There are two main areas I’m trying to understand better: Machine learning model constraints within the extension In our case, we already have an existing ML model that classifies messages (and are not dependant on Apple's built-in models). We’re evaluating whether and how it can be used inside the extension. Specifically, I’m trying to understand: Are there documented limits on the size of an ML model (e.g., maximum bundle size or model file size in MB)? What are the memory constraints for a model once loaded into memory by the extension? Under what conditions would the system terminate or “kick out” the extension due to memory or performance pressure? Message processing timeouts and execution constraints What is the timeout for processing a single received message? At what point will the OS stop waiting for the extension’s response and allow the message by default (for example, if the extension does not respond in time)? Any guidance, official references, or practical experience from Apple engineers or other developers would be greatly appreciated. Thanks in advance for your help,
0
0
227
Jan ’26
Create ML fails to train a text classifier using the BERT transfer learning algorithm
I'm trying to train a text classifier model in Create ML. The Create ML app/framework offers five algorithms. I can successfully train the model with all of the algorithms except the BERT transfer learning option. When I select this algorithm, Create ML simply stops the training process immediately after the initial feature extraction phase (with no reported error). What I've tried: I tried simplifying the dataset to just a few classes and short examples in case there was a problem with the data. I tried experimenting with the number of iterations and language/script options. I checked Console.app for logged errors and found the following for the Create ML app: error 10:38:28.385778+0000 Create ML Couldn't read event column - category is invalid. Format string is : <private> error 10:38:30.902724+0000 Create ML Could not encode the entity <private>. Error: <private> I'm not sure if these errors are normal or indicative of a problem. I don't know what it means by the "event" column – I don't have an event column in my data and I don't believe there should be one. These errors are not reported when using the other algorithms. Given that I couldn't get the app to work with BERT, I switched over to the CreateML framework and followed the code samples given in the documentation. (By the way, there's an error in the docs: the line let (trainingData, testingData) = data.stratifiedSplit(on: "text", by: 0.8) should be stratifying on "label", not on "text"). The main chunk of code looks like this: var parameters = MLTextClassifier.ModelParameters( validation: .split(strategy: .automatic), algorithm: .transferLearning(.bertEmbedding, revision: 1), language: .english ) parameters.maxIterations = 100 let sentimentClassifier = try MLTextClassifier( trainingData: trainingData, textColumn: "text", labelColumn: "label", parameters: parameters ) Ultimately I want to train a single multilingual model, and I believe that BERT is the best choice for this. The problem is that there doesn't seem to be a way to choose the multilingual Latin script option in the API. In the Create ML app you can theoretically do this by selecting the Latin script with language set to "Automatic", as recommended in this WWDC video (relevant section starts at around 8:02). But, as far as I can tell, ModelParameters only lets you pick a specific language. I presume the framework must provide some way to do this, since the Create ML app uses the framework under the hood, but I can't see a way to do it. Another possibility is that the Create ML app might be misrepresenting the framework – perhaps selecting a specific language in the app doesn't actually make any difference – for example, maybe all Latin languages actually use the same model under the hood and the language selector is just there to guide people to the right choice (but this is just my speculation). Any help would be much appreciated! If possible, I'd prefer to use the Create ML app if I can get the BERT option to work – is this actually working for anyone? Or failing that, I want to use the framework to train a multilingual Latin model with BERT, so I'm looking for instructions on how to choose that specific option or confirmation that I can just choose .english to get the correct Latin multilingual model. I'm running Xcode 26.2 on Tahoe 21.1 on an M1 Pro MacBook Pro. I have version 6.2 of the Create ML app.
8
0
1.6k
Jan ’26
Translation Framework: Code 16 "Offline models not available" despite status showing .installed
Hi everyone, I'm experiencing an inconsistent behavior with the Translation framework on iOS 18. The LanguageAvailability.status() API reports language models as .installed, but translation fails with Code 16. Setup: Using translationTask modifier with TranslationSession Batch translation with explicit source/target languages Languages: Portuguese→English, German→English Issue: let status = await LanguageAvailability().status(from: sourceLang, to: targetLang) // Returns: .installed // But translation fails: let responses = try await session.translations(from: requests) // Error: TranslationErrorDomain Code=16 "Offline models not available" Logs: Language model installed: pt -> en Language model installed: de -> en Starting translation: de -> en Error Domain=TranslationErrorDomain Code=16 "Translation failed"NSLocalizedFailureReason=Offline models not available for language pair What I've tried: Re-downloading languages in Settings Using source: nil for auto-detection Fresh TranslationSession.Configuration each time Questions: Is there a way to force model re-validation/re-download programmatically? Should translationTask show download popup when Code 16 occurs? Has anyone found a reliable workaround? I've seen similar reports in threads 791357 and 777113. Any guidance appreciated! Thanks!
1
0
422
Jan ’26
Image object detection with video sizing issue
I'm working on my first model that detects bowling score screens, and I have it working with pictures no problem. But when it comes to video, I have a sizing issue. I added my model to a small app I wrote for taking a picture of a Bowling Scoring Screen, where my model will frame the screens in the video feed from the camera. My model works, but my boxes are about 2/3 the size of the screens being detected. I don't understand the theory of the video stream the camera is feeding me. What I mean is that I don't want to make tweaks to the size of my rectangles by making them larger, and I'm not sure if the video feed is larger than what I'm detecting in code. Questions I have are like is the video feed a certain resolution like 1980x something, or a much higher resolution in the 12 megapixel range? On a static image of say 1920x something, My alignment is perfect. AI says that it's my model training, that I'm training on square images but video is 16:9. Or that I'm producing 4:3 images in a 16:9 environment. I'm missing something here but not sure what it is. I already wrote code to force it to fit, but reverted back to trying for a natural fit.
1
0
343
Jan ’26
Foundation Models: Is the .anyOf guide guaranteed to produce a valid string?
I've created the following Foundation Models Tool, which uses the .anyOf guide to constrain the LLM's generation of suitable input arguments. When calling the tool, the model is only allowed to request one of a fixed set of sections, as defined in the sections array. struct SectionReader: Tool { let article: Article let sections: [String] let name: String = "readSection" let description: String = "Read a specific section from the article." var parameters: GenerationSchema { GenerationSchema( type: GeneratedContent.self, properties: [ GenerationSchema.Property( name: "section", description: "The article section to access.", type: String.self, guides: [.anyOf(sections)] ) ] ) } func call(arguments: GeneratedContent) async throws -> String { let requestedSectionName = try arguments.value(String.self, forProperty: "section") ... } } However, I have found that the model will sometimes call the tool with invalid (but plausible) section names, meaning that .anyOf is not actually doing its job (i.e. requestedSectionName is sometimes not a member of sections). The documentation for the .anyOf guide says, "Enforces that the string be one of the provided values." Is this a bug or have I made a mistake somewhere? Many thanks for any help you provide!
11
0
805
Jan ’26
Image Playground files suddenly not available
My app lets you create images with Image Playground. When the user approves an image I move it to the documents dir from the temp storage. With over a year of usage I’ve created a lot of images over time. Out of nowhere the app stopped loading my custom creations from Image Playground saying it couldn’t find the files. It still had my VoiceOver strings I had added for each image and still had the custom categories I assigned them. Debug code to look in the docs dir doesn’t find them. I downloaded the app’s container and only see the images I created as a test after the problem started. But my ~70MB app is still taking up 300MB on my iPhone so it feels like they’re there but not accessible. Is there anything else I can try?
2
0
880
Jan ’26
Image understanding to on-device model
I can’t seem to find a way to include an image when prompting the new on-device model in Xcode, even though Apple explicitly states that the model was trained and tested with image data (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates). Has anyone managed to get this working, or are VLM-style capabilities simply not exposed yet?
1
0
433
Jan ’26
Khmer Script Misidentified as Thai in Vision Framework
It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications. There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai. While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos). import SwiftUI import Vision class TextExtractor { func extractText(from data: Data, completion: @escaping (String) -> Void) { let request = VNRecognizeTextRequest { (request, error) in guard let observations = request.results as? [VNRecognizedTextObservation] else { completion("No text found.") return } let recognizedStrings = observations.compactMap { observation in let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } completion(recognizedStrings.joined(separator: "\n")) } request.automaticallyDetectsLanguage = true // <-- This is the issue. request.recognitionLevel = .accurate let handler = VNImageRequestHandler(data: data, options: [:]) DispatchQueue.global(qos: .background).async { do { try handler.perform([request]) } catch { completion("Failed to perform OCR: \(error.localizedDescription)") } } } } Recognizing Khmer Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score) Recognizing English Confidence Score is high expected. Recognizing Thai Confidence Score is high as expected Issues on Preview, Photos Khmer text Copied text Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย" สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม สงสอบานตรเราะากกต่ายภากายระตารุกเตียน Recommended Solutions 1. Set a Threshold Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue. For example, let recognizedStrings = observations.compactMap { observation in if observation.confidence <= 0.5 { return nil } let str = observation.topCandidates(1).first?.string return "{text: \(str!), confidence: \(observation.confidence)}" } 2. Add Khmer Language Support This issue would never happen if the model has the capability to detect and recognize image with Khmer language. Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift
2
0
968
Jan ’26