After Anthropic restricted OpenClaw-backed OpenAI-style flows, Jeff ran a 2-hour call with AI Money Group, testing two local models on the same hard-edged prompt. This page captures what actually happened, what each model did well, and how to frame these results in production decisions.
One-minute summary before the deep notes.
Same math puzzle fed to both models: 30 ft well, 3 ft climb / 2 ft slide + prime-trigger rule.
Under the likely intended cumulative-climb interpretation, Nemotron reached escape on day 33.
Gemma initially failed on Ollama history file permissions, then ran after fix. No major infra issue for Nemotron.
For precision logic tasks, Nemotron behaved more structured. Gemma was stronger on “nice phrasing + caveated reasoning,” but less deterministic.
Below is the observed outcome from the exact logs and the attached scratch work.
permission denied on ~/.ollama/history, fixed via ownership correction.Nemotron delivered the practical recommendation quickly. Gemma offered valuable caveats and model-interpretation notes but required more readback for a single decision.
Nemotron’s reported route: assume “prime-check uses cumulative climbed distance,” then test the rule cycle, yielding 33 days to escape. Gemma acknowledged that strict reading can trap the snake if interpreted differently, then used pragmatic interpretation as a fallback path to explain the ambiguity.
Official model pages and release notes guided this section, so expectations match the actual model families we tested.
| Model | What it is | Research signal we can verify | Best use here |
|---|---|---|---|
| Gemma 4 (E4B) | Google DeepMind family with multimodal and tool-use capabilities, including compact reasoning variants. | Gemma 4 pages list multimodal/tooling strengths and benchmark families relevant to reasoning and coding workloads. | Fast drafting, multilingual output, and mixed workloads where explanation quality and tool integration matter. |
| Nemotron-Cascade 2 (30B A3B) | Open-source NVIDIA MoE model, 30B with active routing around 3B parameters. | Research release materials cite top scores in high-level reasoning contests and coding/contest-oriented strength. | Structured long-form reasoning, recursive checks, and deterministic decision prompts. |
What it is: Google DeepMind family with multimodal and tool-use capabilities, including compact reasoning variants.
Research signal: Gemma 4 pages list multimodal/tooling strengths and benchmark families relevant to reasoning and coding workloads.
Best use here: Fast drafting, multilingual output, and mixed workloads where explanation quality and tool integration matter.
What it is: Open-source NVIDIA MoE model, 30B with active routing around 3B parameters.
Research signal: Research release materials cite top scores in high-level reasoning contests and coding/contest-oriented strength.
Best use here: Structured long-form reasoning, recursive checks, and deterministic decision prompts.
Sources used: model and release pages for each family (Ollama and official NVIDIA materials).
Sources reviewed: Ollama Gemma 4 model page and NVIDIA Nemotron-Cascade 2 release page.
To keep decision quality high on ambiguous prompts:
Lock the rule interpretation before model run. In math prompts, explicitly state whether the prime condition uses cumulative distance or net peak position.
Use the second model only when ambiguity is mission-critical or output quality has commercial impact.
Keep the raw logs + system state. The first run can pass; the audit trail prevents drift when the team scales.
Share one-paragraph conclusion for your member thread, then append the caveat. Fast clarity beats perfect nuance during emergency calls.
Bottom line for the team: In this emergency call context, Nemotron-Cascade 2 gave the cleaner first-pass answer on the same prompt, while Gemma 4 gave the best interpretive notes. The best default is not “which model wins” — it’s the right model for the specific ask: Nemotron for deterministic logic, Gemma for review-grade framing.
I can write both versions next: one fast for group, one detailed for operations and SOP notes.