the pulse
nvidia day on the local AI subs. the company quietly removed "gaming" as a standalone revenue category from its financial reports (742 upvotes, 223 comments, velocity 170.1), and within hours a separate thread asking "is NVIDIA still the default best choice for local LLMs in 2026?" rocketed to the fastest-rising post of the day (116 upvotes, 117 comments, velocity 189.1). meanwhile the uncensored model debate is back with 243 comments and some genuinely practical takes beyond the usual RP discourse. the Qwen3.6-35B-A3B vs Gemma4-26B-A4B comparison thread (89 upvotes, 92 comments) is quietly producing real benchmark data from actual users, and someone leaked what might be GPT-5.5's internal reasoning trace, which... reads like caveman mode (238 upvotes, 138 comments). also, u/ttkciar is back from yesterday's troll-of-the-day honors, this time dropping a 89-upvote comment about using uncensored models for neutron transport physics research. range.
hottest thread
"NVIDIA Removes Gaming Revenue Category From Financial Reports" by u/HumanDrone8721 | r/LocalLLaMA | 742 upvotes | 223 comments | velocity: 170.1
the actual news is more boring than the headline suggests, and u/iamapizza (238 upvotes) was quick to point that out: "Not a single comment so far has read the article. They've combined it with other categories because GPUs are used for gaming, inference, research, etc. The hardware is still part of the roadmap." fair. but the signal matters even if the substance is mundane. u/Dry_Yam_4597 (234 upvotes) read between the lines: "It does however signal that NVIDIA is potentially planning to contribute to moving gaming into the cloud." that reading got traction.
u/kiwibonga (248 upvotes, top comment) brought the nostalgia: remembering a 2000s TV ad where someone snatches a chip from a scientist's hands to use for games, then noting "funny how the turn tables." correction in the edit: it was actually 3DFX, and "nvidia is surprisingly humorless." the thread is a mix of people who actually read the filing, people projecting their fears about GeForce's future, and people making jokes. standard r/LocalLLaMA distribution, honestly.
repo of the day
Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by u/EvilEnginer (posting work by LuffyTheFox) | 194 upvotes | 73 comments | velocity: 85.4
GGUF: LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF Safetensors: LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-FP8-Safetensors
an uncensored fine-tune of Qwen3.6-35B-A3B with MTP (multi-token prediction) support. the MoE architecture means only ~3B params active at inference, so this fits in surprisingly little VRAM. u/No-Implement9967 (62 upvotes) put it perfectly: "LocalLLaMA users casually running 35B models with 200k context on mini PCs while big tech still says 'requires 8 H100s' ๐". relevant timing given the uncensored model discussion blowing up in parallel. if you're running Qwen3.6 already, this is worth a swap-in test.
best comment award
u/Citadel_Employee (292 upvotes) on "Is there any reason for an uncensored model if you have no interest in roleplaying?":
"I've use it for stock related research, and I like uncensored models because they don't refuse to give opinions and skip the whole 'I can't give financial advice', etc. This doesn't get around hallucination and you still have to be very critical of it. But it reduces a lot of the friction."
this is the best comment because it does three things at once: gives a concrete use case, acknowledges the limitation (hallucination doesn't go away), and frames the value precisely (friction reduction, not truth generation). no hype. just a practitioner explaining what works and what doesn't. the whole thread is surprisingly high quality, with u/profbx (162 upvotes) adding reverse engineering, u/brahh85 (178 upvotes) listing real refusal scenarios (models refusing to debug code they think is "hacking"), and u/ttkciar (89 upvotes) describing neutron transport research getting blocked by nuclear weapons safety filters.
troll of the day
u/jacek2023 (320 upvotes) on "Have we passed the peak of inflated expectations?":
"youtube -> slop -> idiots -> claw -> 'how can I run model without paying' -> 'ok these local models don't work' -> focus on something else"
the entire hype cycle compressed into a single pipeline diagram. no notes. the "claw" step (presumably Claude) sitting right between "idiots" and the disappointed exit is... architecturally accurate. this is the Gartner hype cycle for people who actually read man pages.
fun facts
- ๐ the "uncensored models" thread pulled 243 comments, the most discussion-dense post of the day, beating even the NVIDIA financial reporting story (223 comments) despite having a quarter of the upvotes
- ๐ฐ RTX Pro 6000 Blackwell jumped from โฌ7,500 to โฌ11,500 (without VAT) in Central Europe according to u/XO33OX. that's a 53% price hike in "the past days." builders feeling that one
- ๐ค u/alex20_202020 asked what's the smallest RAM to run any GGUF model on HuggingFace, defining "run" as "process 20 tokens prefill and generate 20 tokens within a month." respect the commitment to edge cases
- ๐ฅ๏ธ u/Borkato uses vim with a custom plugin as their LLM frontend. asked what everyone else uses. then in a separate post asked what MCP is. priorities in order
- ๐ฌ someone is running Command A+ (218B MoE) on Apple Silicon via MLX. a PR is open. we live in interesting times
code drop
llama.cpp server now has built-in native tools (exec_shell, edit_file, etc.) per u/srigi (139 upvotes, 41 comments, velocity 45.9). if you've been wiring up tool calling through external harnesses, this might simplify your stack significantly. the server can now handle shell execution and file editing natively, which means your local model can function as a coding agent without bolting on LangChain or custom middleware.
also worth noting: u/Designer_Elephant227 (75 upvotes) in the Qwen vs Gemma thread reported that Gemma4-26B-A4B has persistent tool call issues while Qwen3.6-35B-A3B works cleanly:
"I use 35b Q5 and 26b Q4. I got many problems with tool calls with Gemma and literally none with qwen."
if you're building agentic workflows on local models, Qwen3.6 at Q5 seems to be the safer bet for reliable tool calling right now.
builder takeaways
- Qwen3.6-35B-A3B is pulling ahead of Gemma4-26B-A4B for tool calling reliability. Gemma runs faster but breaks on structured outputs. if your workflow depends on function calls, test Qwen first
- GPU spacing with 4x 5060 Ti 16GB cards is fine if you undervolt. u/Technical-Earth-3254 (130 upvotes): "Just test it and see how the temps are going." thermal throttling is the real constraint, not physical proximity
- RTX Pro Blackwell prices are spiking hard in Europe. if you were planning a 96GB build, the window may be closing. check local pricing before committing to a budget plan
- llama.cpp server's native tool support changes the local agent game. fewer dependencies, fewer failure points. worth migrating if you're currently using external tool-calling wrappers
- "Uncensored" models have legitimate non-RP uses: financial research without refusal friction, reverse engineering, physics research with dual-use materials, medical triage questions. the 243-comment thread is worth reading for use cases you might not have considered
the scoreboard
- posts tracked: 148
- total upvotes: 4,829
- total comments: 3,159
- subreddits scanned: LocalLLaMA, LocalLLM, MachineLearning
- fastest rising: "Is NVIDIA still the default best choice for local LLMs in 2026?" (velocity: 189.1)