In WWDC25 Metal 4 released quite excited new features for machine learning optimization, but as we all know the pytorch based on metal shader performance (mps) is the one of most important tools for Mac machine learning area.but on mps introduced website we cannot see any support information for metal4.
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hey Devs,
I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images
I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project.
This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong.
@Observable
class CameraManager: NSObject, AVCapturePhotoCaptureDelegate
...
override init() {
super.init()
setUpVisionRequest()
}
private func setUpVisionRequest() {
textRequest = RecognizeTextRequest(.revision3)
}
...
func setup() -> Bool {
captureSession.beginConfiguration()
guard
let captureDevice = AVCaptureDevice.default(
.builtInWideAngleCamera, for: .video, position: .back)
else {
return false
}
self.captureDevice = captureDevice
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice)
else {
return false
}
/// Check whether the session can add input.
guard captureSession.canAddInput(deviceInput) else {
print("Unable to add device input to the capture session.")
return false
}
/// Add the input and output to session
captureSession.addInput(deviceInput)
/// Configure the video data output
videoDataOutput.setSampleBufferDelegate(
self, queue: videoDataOutputQueue)
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
videoDataOutput.connection(with: .video)?
.preferredVideoStabilizationMode = .off
} else {
return false
}
// Set zoom and autofocus to help focus on very small text
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return false
}
captureSession.commitConfiguration()
// potential issue with background vs dispatchqueue ??
Task(priority: .background) {
captureSession.startRunning()
}
return true
}
}
// Issue here ???
extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(
_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection
) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
Task {
textRequest.recognitionLevel = .fast
textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")]
do {
let observations = try await textRequest.perform(on: pixelBuffer)
for observation in observations {
let recognizedText = observation.topCandidates(1).first
print("recognized text \(recognizedText)")
}
} catch {
print("Recognition error: \(error.localizedDescription)")
}
}
}
}
The results I get look like this ( full page of English from a any book)
recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5))
recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3))
recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3))
recognized text Optional(RecognizedText(string: li, confidence: 0.3))
recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3))
recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3))
recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5))
recognized text Optional(RecognizedText(string: 1141, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3))
recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3))
recognized text Optional(RecognizedText(string: ktril, confidence: 0.3))
recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3))
recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3))
recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3))
Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish.
Any help would be amazing thank you.
Am I using the buffer right ?
Am I using the new perform(on: CVPixelBuffer) right ?
Maybe I didn't set up my camera properly? I can provide code
Hi, recently i tried to fine-tune Gemma-2-2b mlx model on my macbook (24 GB UMA). The code started running, after few seconds i saw swap size reaching 50GB and ram around 23 GB and then it stopped. I ran the Gemma-2-2b (cuda) on colab, it ran and occupied 27 GB on A100 gpu and worked fine. Here i didn't experienced swap issue.
Now my question is if my UMA was more than 27 GB, i also would not have experienced swap disk issue.
Thanks.
Topic:
Machine Learning & AI
SubTopic:
General
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used
Imports
import SwiftUI
import MLX
import MLXLMCommon
import MLXLLM
let modelFactory = LLMModelFactory.shared
let configuration = ModelConfiguration(
id: "pharmpk/pk-mistral-7b-v0.3-4bit"
)
// Load the model off the main actor, then assign on the main actor
let loaded = try await modelFactory.loadContainer(configuration: configuration)
{ progress in
print("Downloading progress: \(progress.fractionCompleted * 100)%")
}
await MainActor.run {
self.model = loaded
}
I'm getting an error
runModel error: downloadError("A server with the specified hostname could not be found.")
Any suggestions?
Thanks, David
PS, I can load the model from the app bundle
// directory: Bundle.main.resourceURL!
but it's too big to upload for Testflight
Topic:
Machine Learning & AI
SubTopic:
General
Hi
We're on tensorflow 2.20 that has support now for python 3.13 (finally!). tensorflow-metal is still only supporting 2.18 which is over a year old.
When can we expect to see support in tensorflow-metal for tf 2.20 (or later!) ?
I bought a mac thinking I would be able to get great performance from the M processors but here I am using my CPU for my ML projects.
If it's taking so long to release it, why not open source it so the community can keep it more up to date?
cheers
Matt
Hello,
I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental.
After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error
JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22
-:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1
My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0.
Thank you!
Hi team,
I’m exploring the Model Context Protocol (MCP), which is used to connect LLMs/AI agents to external tools in a structured way. It's becoming a common standard for automation and agent workflows.
Before I go deeper, I want to confirm:
Does Apple currently provide any official MCP server, API surface, or SDK on iOS/macOS?
From what I see, only third-party MCP servers exist for iOS simulators/devices, and Apple’s own frameworks (Foundation Models, Apple Intelligence) don’t expose MCP endpoints.
Is there any chance Apple might introduce MCP support—or publish recommended patterns for safely integrating MCP inside apps or developer tools?
I would like to see if I can share my app's data to the MCP server to enable other third-party apps/services to integrate easily
Topic:
Machine Learning & AI
SubTopic:
General
Hi everyone,
I'm experiencing an inconsistent behavior with the Translation framework on iOS 18. The LanguageAvailability.status() API reports language models as .installed, but translation fails with Code 16.
Setup:
Using translationTask modifier with TranslationSession
Batch translation with explicit source/target languages
Languages: Portuguese→English, German→English
Issue:
let status = await LanguageAvailability().status(from: sourceLang, to: targetLang) // Returns: .installed
// But translation fails:
let responses = try await session.translations(from: requests)
// Error: TranslationErrorDomain Code=16 "Offline models not available"
Logs:
Language model installed: pt -> en
Language model installed: de -> en
Starting translation: de -> en
Error Domain=TranslationErrorDomain Code=16 "Translation failed"NSLocalizedFailureReason=Offline models not available for language pair
What I've tried:
Re-downloading languages in Settings
Using source: nil for auto-detection
Fresh TranslationSession.Configuration each time
Questions:
Is there a way to force model re-validation/re-download programmatically?
Should translationTask show download popup when Code 16 occurs?
Has anyone found a reliable workaround?
I've seen similar reports in threads 791357 and 777113. Any guidance appreciated!
Thanks!
Topic:
Machine Learning & AI
SubTopic:
General
Hi everyone,
I’m exploring ideas around on-device analysis of user typing behavior on iPhone, and I’d love input from others who’ve worked in this area or thought about similar problems.
Conceptually, I’m interested in things like:
High-level sentiment or tone inferred from what a user types over time using ML-models
Identifying a user’s most important or frequent topics over a recent window (e.g., “last week”)
Aggregated insights rather than raw text (privacy-preserving summaries: e.g., your typo-rate by hour to infer highly efficient time slots or "take-a-break" warning typing errors increase)
I understand the significant privacy restrictions around keyboard input on iOS, especially for third-party keyboards and system text fields. I’m not trying to bypass those constraints—rather, I’m curious about what’s realistically possible within Apple’s frameworks and policies. (For instance, Grammarly as a correction tool includes some information about tone)
Questions I’m thinking through:
Are there any recommended approaches for on-device text analysis that don’t rely on capturing raw keystrokes?
Has anyone used NLP / Core ML / Natural Language successfully for similar summarization or sentiment tasks, scoped only to user-explicit input?
For custom keyboards, what kinds of derived or transient signals (if any) are acceptable to process and summarize locally?
Any design patterns that balance usefulness with Apple’s privacy expectations?
If you’ve built something adjacent—journaling, writing analytics, well-being apps, etc.—I’d appreciate hearing what worked, what didn’t, and what Apple reviewers were comfortable with.
Thanks in advance for any ideas or references 🙏
Topic:
Machine Learning & AI
SubTopic:
General
I’m trying to group my EntityPropertyQuery selection into sections as well as making it searchable.
I know that the EntityStringQuery is used to perform the text search via entities(matching string: String). That works well enough and results in this modal:
Though, when I’m using a DynamicOptionsProvider to section my EntityPropertyQuery, it doesn’t allow for searching anymore and simply opens the sectioned list in a menu like so:
How can I combine both? I’ve seen it in other apps, but can’t figure out why my code doesn’t allow to section the results and make it searchable? Any ideas?
My code (simplified)
struct MyIntent: AppIntent {
@Parameter(title: "Meter"),
optionsProvider: MyOptionsProvider())
var meter: MyIntentEntity?
// …
struct MyOptionsProvider: DynamicOptionsProvider {
func results() async throws -> ItemCollection<MyIntentEntity> {
// Get All Data
let allData = try IntentsDataHandler.shared.getEntities()
// Create Arrays for Sections
let fooEntities = allData.filter { $0.type == .foo }
let barEntities = allData.filter { $0.type == .bar }
return ItemCollection(sections: [
ItemSection("Foo",
items: fooEntities),
ItemSection("Bar",
items: barEntities)
])
}
}
struct MeterIntentQuery: EntityStringQuery {
// entities(for identifiers: [UUID]) and suggestedEntities() functions
func entities(matching string: String) async throws -> [MyIntentEntity] {
// Fetch All Data
let allData = try IntentsDataHandler.shared.getEntities()
// Filter Data by String
let matchingData = allData.filter { data in
return data.title.localizedCaseInsensitiveContains(string))
}
return matchingData
}
}
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this:
LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295)
and crash, the stack backtrace screenshot is attached. Note that 5th frame is
mlir::getIntValues<long long>
and 6th frame is
llvm::SmallVectorBase<unsigned int>::grow_pod
It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of
mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
Topic:
Machine Learning & AI
SubTopic:
General
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity.
The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz."
Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected.
I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092])
My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
I have seen inconsistent results for my Colab machine learning notebooks running locally on a Mac M4, compared to running the same notebook code on either T4 (in Colab) or a RTX3090 locally.
To illustrate the problems I have set up a notebook that implements two simple CNN models that solves the Fashion-MNIST problem. https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing
For the good model with 2M parameters I get the following results:
T4 (Colab, JAX): Test accuracy: 0.925
3090 (Local PC via ssh tunnel, Jax): Test accuracy: 0.925
Mac M4 (Local, JAX): Test accuracy: 0.893
Mac M4 (Local, Tensorflow): Test accuracy: 0.893
That is, I see a significant drop in performance when I run on the Mac M4 compared to the NVIDIA machines, and it seems to be independent of backend. I however do not know how to pinpoint this to either Keras or Apple’s METAL implementation. I have reported this to Keras: https://colab.research.google.com/drive/11BhtHhN079-BWqv9QvvcSD9U4mlVSocB?usp=sharing but as this can be (likely is?) an Apple Metal issue, I wanted to report this here as well.
On the mac I am running the following Python libraries:
keras 3.9.1
tensorflow 2.19.0
tensorflow-metal 1.2.0
jax 0.5.3
jax-metal 0.1.1
jaxlib 0.5.3
Topic:
Machine Learning & AI
SubTopic:
General
I'm developing a tennis ball tracking feature using Vision Framework in Swift, specifically utilizing VNDetectedObjectObservation and VNTrackObjectRequest.
Occasionally (but not always), I receive the following runtime error:
Failed to perform SequenceRequest: Error Domain=com.apple.Vision Code=9 "Internal error: unexpected tracked object bounding box size" UserInfo={NSLocalizedDescription=Internal error: unexpected tracked object bounding box size}
From my investigation, I suspect the issue arises when the bounding box from the initial observation (VNDetectedObjectObservation) is too small. However, Apple's documentation doesn't clearly define the minimum bounding box size that's considered valid by VNTrackObjectRequest.
Could someone clarify:
What is the minimum acceptable bounding box width and height (normalized) that Vision Framework's VNTrackObjectRequest expects?
Is there any recommended practice or official guidance for bounding box size validation before creating a tracking request?
This information would be extremely helpful to reliably avoid this internal error.
Thank you!
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction?
🌹
Topic:
Machine Learning & AI
SubTopic:
General
If try to dynamically load WhipserKit's models, as in below, the download never occurs. No error or anything. And at the same time I can still get to the huggingface.co hosting site without any headaches, so it's not a blocking issue.
let config = WhisperKitConfig(
model: "openai_whisper-large-v3",
modelRepo: "argmaxinc/whisperkit-coreml"
)
So I have to default to the tiny model as seen below.
I have tried so many ways, using ChatGPT and others, to build the models on my Mac, but too many failures, because I have never dealt with builds like that before.
Are there any hosting sites that have the models (small, medium, large) already built where I can download them and just bundle them into my project? Wasted quite a large amount of time trying to get this done.
import Foundation
import WhisperKit
@MainActor
class WhisperLoader: ObservableObject {
var pipe: WhisperKit?
init() {
Task {
await self.initializeWhisper()
}
}
private func initializeWhisper() async {
do {
Logging.shared.logLevel = .debug
Logging.shared.loggingCallback = { message in
print("[WhisperKit] \(message)")
}
let pipe = try await WhisperKit() // defaults to "tiny"
self.pipe = pipe
print("initialized. Model state: \(pipe.modelState)")
guard let audioURL = Bundle.main.url(forResource: "44pf", withExtension: "wav") else {
fatalError("not in bundle")
}
let result = try await pipe.transcribe(audioPath: audioURL.path)
print("result: \(result)")
} catch {
print("Error: \(error)")
}
}
}
I'm experimenting with the new SpeechTranscriber in macOS/iOS 26, transcribing speech from a prerecorded mp4 file. Speed and quality are amazing!
I've told the transcriber to include time indexes. Each run is always exactly one word, which can be very useful. When I look at the indexes the end of one run is always identical to the start of the next run, even if there's a pause.
I'd like to identify pauses, perhaps to generate something like phrases for subtitling. With each run of text going into the next I can't do this, other than using punctuation - which might be rather rough.
Any suggestions on detecting pauses, or getting that kind of metadata from the transcriber?
Here's a short sample, showing each run with the start, end, and characters in the run:
105.9 --> 107.04 I
107.04 --> 107.16 think
107.16 --> 108.0 more
108.0 --> 108.42 lighting
108.42 --> 108.6 is
108.6 --> 108.72 definitely
108.72 --> 109.2 needed,
109.2 --> 109.92 downtown.
109.98 --> 110.4 My
110.4 --> 110.52 only
110.52 --> 110.7 question
110.7 --> 111.06 is,
111.06 --> 111.48 poll
111.48 --> 111.78 five,
111.78 --> 111.84 that
111.84 --> 112.08 you're
112.08 --> 112.38 increasing
112.38 --> 112.5 the
112.5 --> 113.34 50,000?
113.4 --> 113.58 Where
113.58 --> 113.88 exactly
I am using gemini2.5-flash with SwiftUI. How can I receive a response in JSON?
Topic:
Machine Learning & AI
SubTopic:
General
In this WWDC25 session, it is explictely mentioned that apps should support AttributedString for text parameters to their App Intents.
However, I have not gotten this to work. Whenever I pass rich text (either generated by the new "Use Model" intent or generated manually for example using "Make Rich Text from Markdown"), my Intent gets an AttributedString with the correct characters, but with all attributes stripped (so in effect just plain text).
struct TestIntent: AppIntent {
static var title = LocalizedStringResource(stringLiteral: "Test Intent")
static var description = IntentDescription("Tests Attributed Strings in Intent Parameters.")
@Parameter
var text: AttributedString
func perform() async throws -> some IntentResult & ReturnsValue<AttributedString> {
return .result(value: text)
}
}
Is there anything else I am missing?
I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API.
I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and
VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video)
First one have revision only added in iOS 18+:
https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1
Second one have revision only added in iOS14+:
https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1
I don't see any new revision targeting iOS26+