Lab is where I build experimental tools for exploring the AI infra stack.
The goal is go way deep into AI systems. Build them. Break them. Run them locally. See where the bottlenecks are. Then use that hands-on work to develop sharper ideas about infra, agents, model serving, RL environments, and developer tooling.
These projects are intentionally practical. They are not polished products. They are experiments that help me understand where the next generation of AI infra companies might come from.
Here are the open source tools I've built so far:
pforge: It's a CLI for working with open models on your own machine or GPU server. It is built around the idea that developers should be able to shape, serve, inspect, compare, and experiment with models without needing a heavyweight platform. It includes things like local model serving, chat interfaces for open models, comparing base vs fine-tuned models, varying reasoning budgets, logit lens and model introspection, LoRA training workflows, constraint-based generation, multi-model debate, and refinement loops
phabitat: It's a CLI to give every AI agent its own computer. The idea is that agents should not just be stateless API calls. They need persistent environments with files, logs, history, artifacts, credentials, and long-running state. It includes things like persistent agent workspaces, agent sandboxes, task execution environments, logs, artifacts, replayability, long-running agents, runtime isolation, and agent memory through environment state.
psplice: It's a CLI to perform runtime surgery on foundation models. It is focused on experimentation inside models: steering activations, modifying internal representations, and exploring how models behave layer by layer. It includes things like activation steering, transformer internals, model patching, layer-level interventions, representation editing, and running models as inspectable systems.
pscope: It's a CLI that's a local hardware scanner for AI models. It looks at your machine’s CPU, RAM, GPU, and VRAM, then helps estimate which models can actually run well on your hardware. It includes things like local model fit, VRAM constraints, model quality vs speed tradeoffs, quantization tradeoffs, practical local inference, and matching models to available hardware.
pworlds: It's an experiment around verifiable RL environments. The core idea is that AI models need environments where they can practice, fail, recover, and improve against objective reward signals. The best environments have deterministic state transitions, programmable rewards, and outcomes that can be verified. Here are the things it explores: RL environments for AI agents, verifiable tasks, deterministic sandboxes, programmable reward functions, simulation as training data, post-training infra, and environments that generate useful learning signal.
Why This Exists
I believe the best way to understand AI infrastructure is to build pieces of it yourself. Writing gives me a way to clarify ideas. Investing gives me exposure to founders and markets. Lab gives me direct contact with the actual technical substrate: models, runtimes, sandboxes, GPUs, memory, evals, and environments.
The recurring question behind all of this work is: What infra needs to exist for AI agents and models to become more capable, reliable, inspectable, and useful? Lab is where I try to answer that question by building.
The goal is go way deep into AI systems. Build them. Break them. Run them locally. See where the bottlenecks are. Then use that hands-on work to develop sharper ideas about infra, agents, model serving, RL environments, and developer tooling.
These projects are intentionally practical. They are not polished products. They are experiments that help me understand where the next generation of AI infra companies might come from.
Here are the open source tools I've built so far:
pforge: It's a CLI for working with open models on your own machine or GPU server. It is built around the idea that developers should be able to shape, serve, inspect, compare, and experiment with models without needing a heavyweight platform. It includes things like local model serving, chat interfaces for open models, comparing base vs fine-tuned models, varying reasoning budgets, logit lens and model introspection, LoRA training workflows, constraint-based generation, multi-model debate, and refinement loops
phabitat: It's a CLI to give every AI agent its own computer. The idea is that agents should not just be stateless API calls. They need persistent environments with files, logs, history, artifacts, credentials, and long-running state. It includes things like persistent agent workspaces, agent sandboxes, task execution environments, logs, artifacts, replayability, long-running agents, runtime isolation, and agent memory through environment state.
psplice: It's a CLI to perform runtime surgery on foundation models. It is focused on experimentation inside models: steering activations, modifying internal representations, and exploring how models behave layer by layer. It includes things like activation steering, transformer internals, model patching, layer-level interventions, representation editing, and running models as inspectable systems.
pscope: It's a CLI that's a local hardware scanner for AI models. It looks at your machine’s CPU, RAM, GPU, and VRAM, then helps estimate which models can actually run well on your hardware. It includes things like local model fit, VRAM constraints, model quality vs speed tradeoffs, quantization tradeoffs, practical local inference, and matching models to available hardware.
pworlds: It's an experiment around verifiable RL environments. The core idea is that AI models need environments where they can practice, fail, recover, and improve against objective reward signals. The best environments have deterministic state transitions, programmable rewards, and outcomes that can be verified. Here are the things it explores: RL environments for AI agents, verifiable tasks, deterministic sandboxes, programmable reward functions, simulation as training data, post-training infra, and environments that generate useful learning signal.
Why This Exists
I believe the best way to understand AI infrastructure is to build pieces of it yourself. Writing gives me a way to clarify ideas. Investing gives me exposure to founders and markets. Lab gives me direct contact with the actual technical substrate: models, runtimes, sandboxes, GPUs, memory, evals, and environments.
The recurring question behind all of this work is: What infra needs to exist for AI agents and models to become more capable, reliable, inspectable, and useful? Lab is where I try to answer that question by building.