

Oh, I forgot!
You should check out Lemonade:
https://github.com/lemonade-sdk/lemonade
It’s supports Ryzen NPUs via 2 different runtimes… though apparently not the 8000 series yet?


Oh, I forgot!
You should check out Lemonade:
https://github.com/lemonade-sdk/lemonade
It’s supports Ryzen NPUs via 2 different runtimes… though apparently not the 8000 series yet?
The most screwed up thing is that doesn’t even matter, because its old news. Decades of lying and controversy (predating his political candidacy) somehow… don’t meet the attention threshold for algorithms? Is that the phrase?
It’s especially weird because I have older relatives that knew way more about “pre politics Trump” than I did, and now all that is forgotten somehow.


To be fair, that game does a lot of stuff.
But yes, its extremely focused too. It’s so medieval it hurts.
They also lucked out picking CryEngine, as (for their use case) it works unbelievably well. Many AAAs fall into development hell wrangling engines, and they easily could have done the same.


Yeah… Even if the LLM is RAM speed constrained, simply using another device to not to interrupt it would be good.
Honestly AMD’s software dev efforts are baffling. They’ve focused on a few on libraries precisely no-one uses, like this: https://github.com/amd/Quark
While ignoring issues holding back entire sectors (like broken flash-attention) with devs screaming about it at the top of their lungs.
Intel suffers from corporate Game of Thrones, but at least they have meaningful contributions in the open source space here, like the SYCL/AMX llama.cpp code or the OpenVINO efforts.


It still uses memory bandwidth, unfortunately. There’s no way around that, though NPU TTS would still be neat.
…Also, generally, STT responses can’t be streamed, so you mind as well use the iGPU anyway. TTS can be chunked I guess, but do the major implementations do that?
exempting owner-occupied homes
It would still suck for anyone stuck with renting. It would disincentivize renting, but still, would suck short term.
Don’t get me wrong. Some YouTubers are great and informative, and I adore those random washing machine repair videos… But yeah. As reference its an awful format.
It’s like how discussion has mostly move from forums, to Reddit, and now to Discord. I get it, it’s highly engaging since it pings your phone and folks shoot the breeze, but it is an information black hole.


Okay. Just because it was proved doesn’t mean they agree.


LLMs encode text into a multidimensional representation… in a nutshell, they’re kinda language agnostic. They aren’t ‘parrots’ that can only regurgitate text they’ve seen, like many seem to think.
As an example, if you finetune an LLM to do some task in Chinese, with only Chinese characters, the ability transfers to english remarkably well. Or Japanese, if it knows Japanese. Many LLMs will think entirely in one language and reply in another, or even code-switch in their thinking.


…Just because it was explained doesn’t mean they agree.


The IGP is more powerful than the NPU on these things anyway. The NPU us more for ‘background’ tasks, like Teams audio processing or whatever its used for on Windows.
Yeah, in hindsight, AMD should have tasked (and still should task) a few engineers on popular projects (and pushed NPU support harder), but GGML support is good these days. It’s gonna be pretty close to RAM speed-bound for text generation.


Ah. On an 8000 APU, to be blunt, you’re likely better off with Vulkan + whatever omni models GGML supports these days. Last I checked, TG is faster and prompt processing is close to rocm.
…And yeah, that was total misadvertisement on AMD’s part. They’ve completely diluted the term kinda like TV makers did with ‘HDR’


…Spongebob?
I guess this is true of lots of western animation, but it’s particularly egregious. At least with the classic episodes.
…That’s not a rule though. As an example, DCAU and YJ keep a lot of shit straight, somehow.


The king of “galaxy-altering implications that are never spoken of again”


…Yeah.
Ironically, they seemed to figure ‘low end’ is where the sales are at. Yet my guess is most ‘low end’ buyers pick Nvidia by default, and folks who are upgrading from anything aim higher than a B580.
I would’ve killed for a 512-bit Arc card (other than the unobtanium datacenter GPUs), but that’s a fever dream for now…
Maybe it’s an ADD thing, or an ‘aging millenial shaking thier fist’ thing, but video is soooo slow.
For reference or discussion, I always seek text first, to the point I’ll even download/make transcripts if video’s the only place I can find something. They just have so much filler.


You can do hybrid inference of Qwen 30B omni for sure. Or fully offload inference of Vibevoice Large (9B). Or really a huge array of models.
…The limiting factor is free time, TBH. Just sifting through the sea of models, seeing if they work at all, testing if quantization works and such is a huge timesink, especially if you are trying to load stuff with rocm.


Sounds about right :(
Stellaris was like that early in its life, too.
Then the video stops loading.
…They clearly didn’t do enough scrubbing testing with neurodivergent users.