the pulse

heretic won't die quietly. the FT article from yesterday kept climbing (856 upvotes, 215 comments, velocity 227.4), and now we're seeing the downstream effects: u/LLMFan46 dropped a full uncensored heretic-processed Qwen3.5 35B A3B with all 785 MTPs preserved (344 upvotes, 68 comments). meanwhile, china's reportedly restricting overseas travel for AI talent at alibaba and deepseek (116 upvotes, 113 comments), which has obvious implications for the open-weight pipeline that feeds this entire community. on the lighter side, a lawyer with 12 V100s is still living his best life (308 upvotes), prismml shipped a 3GB image model that might just be rebranded FLUX (182 upvotes), and elon promised a 0.5T open grok model "next year" to exactly nobody's surprise (286 upvotes, community already translating from elon-time).

hottest thread

"The Financial Times has published an article about Heretic" by u/-p-e-w- | r/LocalLLaMA | 856 upvotes | 215 comments | velocity: 227.4

the heretic saga enters its mainstream media arc. the FT reports they were able to remove guardrails from Meta's Llama 3.3 "in less than 10 minutes without any specialist hardware." for anyone tracking the timeline: meta sent a takedown letter, and u/ambient_temp_xeno (194 upvotes) connected the dots: "Gee, I wonder if this is related to Meta sending a takedown." u/a_beautiful_rhind (159 upvotes) went further with genuine concern for the author: "Congratulations on becoming a target of the system. Be very careful if someone approaches you for an interview, even if they seem friendly. FT likely approached meta for comment before publishing this piece."

u/FastHotEmu (126 upvotes) voiced what many builders feel: "How I wish this could stay out of the mainstream, last thing I want is more stupid takes by people who don't understand anything about LLMs or technology." the tension here is real. mainstream visibility makes a tool both more impactful and more vulnerable. the community seems to be rallying around p-e-w while bracing for regulatory consequences. heretic's creator is now simultaneously a folk hero and a legal target.

repo of the day

llama.cpp PR #21344 by pedapudi, surfaced by u/fallingdowndizzyvr (85 upvotes, 70 comments)

a rejected PR that gives Strix Halo users up to 30% faster prompt processing for MoE models. the changes are minimal enough to cherry-pick into any current llama.cpp release. it was denied from mainline (unclear why from the thread), but the community is patching it in manually. only works with MoE architectures on AMD Strix Halo silicon.

repo: github.com/ggml-org/llama.cpp/pull/21344

why it matters: if you have strix halo hardware and you're running qwen3.6 35B A3B (which is MoE, which is what everyone's running), this is free performance left on the table because it didn't pass review. the beauty of open source: rejected doesn't mean useless.

best comment award

u/farkinga (59 upvotes) on the V100 lawyer update:

"I love your updates; the project is unhinged, you are fully-self-aware, and you're getting real results from a technically difficult build. Nice work."

three sentences that describe exactly what good builder content looks like. absurdity, self-awareness, results. if your project hits all three, people will follow along no matter how questionable your hardware choices are.

troll of the day

u/TheLexoPlexx (128 upvotes) responding to elon's 0.5T open grok promise:

"We're also going to get self-driving-vehicles by the End of 2019 2022 2025 2026."

https://en.wikipedia.org/wiki/List_of_predictions_for_autonomous_Tesla_vehicles_by_Elon_Musk

just linked the wikipedia article. strikethrough did all the work. u/Familiar_Text_6913 (459 upvotes) kept it even simpler: "elontime.io". and u/VoiceApprehensive893 (312 upvotes) with the kill shot: "right when it becomes so useless that you'd rather use a popular 30b model."

fun facts

  • the top comment on "is qwen3.6 current king for local agentic use?" is literally just "Yes" by u/LeMochileiro (217 upvotes). peak efficiency
  • u/dryadofelysium (172 upvotes) compared the Qwen3.5 35B A3B uncensored heretic release title to XDA Android custom ROM naming conventions. accurate
  • u/oxygen_addiction (52 upvotes) called out PrismML's "Bonsai Image" as a stealth FLUX.2 Klein quant with attribution stripped. 182 upvotes on the post, top comment is basically "this is rebranded"
  • the "Rust bare-metal 66.8 TPS on RTX 3050" post (147 upvotes) has its top comment by u/Qxz3 (75 upvotes) strongly implying the whole thing is AI-generated slop. contradictions in the license claims apparently
  • 148 posts tracked today across 3 subreddits, elon-related promises generated 459 upvotes on a one-word comment

code drop

Strix Halo MoE prefill boost from the rejected PR:

the patch from PR #21344 is reportedly small enough to apply manually to any current llama.cpp build. per u/fallingdowndizzyvr, it only affects MoE model prompt processing on Strix Halo APUs. if you're running qwen3.6 35B A3B or the 122B on that silicon, check the diff and cherry-pick.

also worth noting: u/MrMisterShin (44 upvotes) on the Qwen3.6 27B thread recommends enabling MTP (speculative decoding) with a value of 2 or 3 for speed gains on supported models. the uncensored heretic variant from u/LLMFan46 specifically preserves all 785 MTP heads, available in GGUF, NVFP4, and GPTQ-Int4.

model: huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved

builder takeaways

  • qwen3.6 35B A3B is the current local agentic consensus. multiple threads, multiple users, one answer. runs at IQ4_NL on consumer hardware. if you haven't tried it for tool calling, you're behind
  • MTP matters. the uncensored heretic variant preserves all 785 MTP heads. enable speculative decoding (value 2-3) for measurable speed gains on supported backends
  • china travel restrictions on deepseek/alibaba talent could slow future open-weight releases from those labs. worth watching, not panicking about yet
  • NuExtract3 (4B, Apache-2.0, Qwen3.5-4B base) does image/PDF → structured markdown locally. if you have a document extraction pipeline, this is a drop-in candidate
  • strix halo owners running MoEs: patch in PR #21344 manually for ~30% prefill boost. mainline rejected it but the community says it works

the scoreboard

  • posts tracked: 148
  • total upvotes: 5,555
  • total comments: 2,817
  • subreddits scanned: LocalLLaMA, LocalLLM, MachineLearning
  • fastest rising: "The Financial Times has published an article about Heretic" (velocity: 227.4)