Case study

I needed one place for every meeting. So I built a recorder by chatting with an AI.

My calls happen in Google Meet, Zoom, Teams, phone bridges, and whatever link a customer sends. The work after the call should not depend on which meeting tool hosted it. So I built a native recorder that captures the conversation, transcribes it locally, and gives me one reviewable place for summaries, transcripts, and todos. First prototype in about ninety minutes. Polished over roughly fourteen more iterations across two weeks. Audio never leaves the laptop. No autonomous agent. Less than a penny per meeting.

Josh Weckesser · May 2026 · 8 min read

The premise

I take a lot of calls. Customer intros, partner syncs, internal reviews. Some are on Google Meet because I sent the invite. Some are on Zoom because the customer did. Some are on Teams because procurement or IT chose the tool before I ever joined the conversation. A few are phone calls. The meeting platform changes all week, but the job after the meeting is always the same.

The information that comes out of those calls has to land in three places: my own task list, the deal record, and the next conversation. The transcript should be findable later. The action items should be reviewed while the call is still fresh. The context should survive the fact that the last call was in Meet and the next one is in Zoom. That work is exactly what gets dropped on busy weeks.

I did not want another meeting assistant tied to a single conferencing product or a bot that joins customer calls and stores everything in someone else’s database. I wanted a layer I controlled, above the meeting tools, that gave me a single local record regardless of where the call happened.

I also did not need a tool that listens to my calls and decides what to do about them. I needed a tool that listens, structures what it heard, and hands me a one minute review so I can act through the systems I already use. That distinction matters. The first kind of tool is an autonomous agent. The second is a process flow with three small automations bolted onto the boring parts.

What it does

A native Mac app called Recorder. One button. Hit Record before a call. Hit Stop after. Within a couple of minutes it produces:

A clean transcript of both sides of the conversation, generated locally on the laptop with no audio sent to a cloud service.
A short business summary of what the meeting was about.
A list of action items, each tagged with the owner, the timeframe if one was mentioned, and the quote from the transcript that justified the item.
The calendar event the recording corresponds to, with attendee names pulled in so owners are real people instead of placeholders.
A prep panel for upcoming meetings that surfaces existing daily briefs alongside recent emails and prior transcripts.

Every recording, transcript, and todo lives as plain files on the laptop. The sidebar history persists across launches because the source of truth is the filesystem, not a database I have to maintain.

How it was built

I described what I wanted in chat with Claude Code. Claude wrote Swift, wrote Python, ran builds, fixed compile errors, smoked tests against real recordings, and reported back. I corrected the model when it was wrong, picked a direction when there were tradeoffs, and tested every change myself before moving to the next one.

The collaboration model is important. The model proposed. I disposed. When a wrong calendar event got matched, I caught it because I could see the title and time in the sidebar. When a 23 minute recording crashed the audio engine, I read the actual error and asked for a chunking strategy instead of accepting the suggestion to truncate.

The stack

App shell

Swift 6.2 with SwiftUI on macOS. SwiftPM build, hand assembled .app bundle, ad hoc codesign.

Audio capture

ScreenCaptureKit for system output (the other participants) and microphone (me) as two parallel PCM WAV tracks at 48 kHz.

Local transcription

parakeet-mlx running on Apple Silicon, chunked at 120 seconds with 15 seconds of overlap so long recordings fit the Metal memory budget. Audio never leaves the laptop.

Cloud analysis

AWS Bedrock with Claude Sonnet 4.5, one short call per recording. Roughly 2,000 input tokens and a few hundred output tokens.

Calendar context

Google Calendar API, reusing an OAuth token from a sync project I had already built. No new auth flow needed.

Meeting prep

A 100 line Markdown adapter that pulls daily briefs from a separate project of mine and overlays them onto the prep panel.

Python tooling

uv for dependency management and venv isolation. Sidecar scripts invoked from Swift by subprocess.

The data flow

01Click Record. ScreenCaptureKit starts pulling the system audio mix and the microphone as two PCM WAV streams.
02Click Stop. The two tracks finish writing to disk.
03In parallel, a Python script queries Google Calendar using the existing OAuth token and finds the event whose start time and audio duration match the recording.
04The two audio tracks get mixed into a single mono file and split into 120 second chunks with overlap.
05parakeet-mlx transcribes each chunk on the laptop. Output is a single JSON object with the full text plus timed segments.
06The transcript, the meeting title, and the attendee list get sent to Claude Sonnet 4.5 on Bedrock in one request. The model returns a JSON object with a summary and an array of action items.
07Everything writes to the recordings folder as plain JSON next to the audio files.
08The sidebar refreshes. The recording shows the meeting title instead of a timestamp, and the todos pane appends to the running list.

Each step has a single responsibility. Each output is a file I can open in any editor. Nothing in this pipeline is opaque.

Why human in the loop

The simplest version of the argument: the failure mode of an autonomous agent is silent, and the failure mode of a surfaced suggestion is visible.

In practice this shows up in every recording. The transcription model misheard a person named Elvis as the word George. The first version of the calendar matcher picked the next meeting on my schedule instead of the one being recorded. The first version of the todo extractor attributed actions to the wrong person on the call. Each of those failures took me less than five seconds to catch in the UI and would have caused real damage if the system had acted on them automatically.

A reviewer reads in seconds what a model would spend a thousand tokens trying to justify. The economics of human review at the right step beat the economics of trying to make the model never be wrong.

Where the data goes

The privacy posture was not a side effect. It was a constraint I designed for from the first iteration. Here is the actual data path, step by step.

Audio. Both tracks write to a folder on the laptop. They are never uploaded anywhere. If I delete the folder, the recording is gone.
Transcription. parakeet-mlx runs locally on Apple Silicon. The audio is read by a Python script on the same machine and never crosses a network boundary. The output is a JSON transcript on disk.
Calendar lookup. A Python script reads my own OAuth token and queries the Google Calendar API directly. Google sees the request because Google owns the calendar. Nothing else does.
LLM analysis. One ephemeral request to AWS Bedrock containing the transcript text and the meeting context. Per AWS Bedrock’s published data policy, customer inputs and outputs are not retained after the inference call completes and are not used to train the base models. The request is processed in the AWS region I configured.
Storage. Every artifact (audio, transcript, meeting metadata, todos) lives on the laptop as plain JSON and WAV. There is no SaaS database, no third-party meeting bot, and no shared cloud workspace.

The practical effect: the only data that ever leaves my machine is the text of one transcript per meeting, sent to AWS for a single ephemeral call. The recordings stay local. The calendar metadata stays between me and Google. The todo history stays on the laptop.

Compare this to a third party meeting assistant that joins the call as a participant, captures the audio into its own database, and trains on aggregate user data. The shape of the trust relationship is different. With this approach I can answer the obvious compliance questions cleanly: where is the audio stored, who has access, is it used to train someone else’s model, can I delete it. The answers are: my laptop, only me, no, and yes immediately.

The economics

90 min

From empty folder to a working prototype

~ $0.005

Per meeting in Bedrock API cost

Transcription cost. Local on the laptop.

A head to head comparison against an open weight 12 billion parameter model showed the local option could produce structurally similar todos at zero per token cost, but missed the owner attribution and the higher order strategic items that made the Bedrock version useful. The cents per meeting buy real quality.

Iterations that taught me something

Container format

Switched from .m4a to .wav

parakeet-mlx uses libsndfile, which does not decode AAC. The first transcription failed silently with an audio loader error. The lesson: pick the encoding your downstream pipeline can actually read, not the prettiest one.

Permission model

Screen Recording permission for audio-only capture

macOS requires Screen Recording permission even when ScreenCaptureKit is only being used for system audio. The first run produced an empty WAV file with no error. The permission system is a feature, not a workaround.

Two tracks instead of one

Added microphone capture for the meeting use case

System audio alone gave me the remote participants but not my own voice. Switched to ScreenCaptureKit’s microphone output channel and mixed both tracks in Python before transcribing.

Long recordings

Chunked transcription for the Metal memory budget

A 23 minute meeting crashed with "Attempting to allocate 18.86 GB which is greater than the maximum allowed buffer size of 9.5 GB." parakeet supports chunked inference with overlap stitching out of the box. Five lines of Python fixed it. Read the API docs before fighting the symptom.

Calendar reuse

Used existing OAuth from a different project

I had already built a calendar sync tool with the right token. Pointing the recorder at that token file gave me Google Calendar access in fifteen minutes. The integration was the wiring, not any new auth flow.

Wrong meeting matched

Audio duration into the matcher

My first matcher used a 90 minute default search window. For a 23 minute recording starting at 9:01, the window overlapped a 10:00 meeting by more seconds than the 9:00 meeting that was actually being recorded. Passing the real audio duration fixed it.

Silent build failure

A missing file in the bundle copy script

The build script copied four Python files into the .app bundle. The fifth file existed in source but was never bundled, and the live calendar lookup quietly did nothing. Integration tests have to run against the bundled binary, not the dev workspace.

Reuse what you already have

Adapter for an existing daily brief

I had been writing daily meeting briefs by hand in another project for weeks. The recorder’s prep panel was generating thin generic content of its own. A 100 line Markdown parser overlays the existing briefs onto the prep panel. The most expensive iteration is the one that recreates data you already have.

What this is not

It is not an autonomous agent. It does not send email, schedule follow-ups, or write to a CRM on my behalf.
It is not a SaaS subscription that captures my calls into someone else’s database.
It is not a magic productivity layer. It is a defined process flow with three small automations bolted onto the parts a human does not need to be involved in.

Takeaways for leaders looking at AI

Connect what you already have. The biggest gains in this build came from wiring together a calendar sync from one project, a brief generator from another, and a Bedrock account from a third. The novelty was the wiring, not any new model.
Buy quality where it pays for itself. A frontier model for a few cents a meeting beat an open weight model running for free on the same laptop. The local option produced acceptable output. The hosted one produced output that was worth acting on.
Audit your data before you generate more. The single most expensive iteration was the one where the recorder was generating prep that already existed in plain Markdown in another folder.
Put the human at the right step, not every step. A reviewer can scan a structured todo list in seconds. Trying to make the model never be wrong is more expensive and less effective than letting it be wrong in places where a person will catch it cheaply.
Pick a default that respects the data. Local transcription on a modern Mac is now fast enough that sending audio off the laptop is a choice, not a requirement. Make the private path the default and reach for the cloud only when the value justifies it.
Treat the data path as a feature. Which network boundary the data crosses, who keeps it after the request, and whether it trains someone else’s model are answerable questions in this architecture. They are usually not answerable for a consumer meeting assistant or an opaque agent platform. The business case for keeping the analysis on Bedrock or another zero retention hosted model is that you can put the answer in a customer contract.

The shape of the work

Around 2,500 lines of Swift, 600 lines of Python, three config files, and the disciplined choice not to build the parts that already existed. Total recurring external cost: less than the price of two coffees per month. Total build time: roughly fourteen iterations in chat, none longer than an afternoon.

The reason this approach worked is that the goal was never an autonomous agent. The goal was a clean process flow with the rote work removed and the judgment work preserved. That framing changed which problems were worth solving and which were worth ignoring.