Engineering notes

A working postmortem of building VerusHash mining on Apple Silicon — from the first rejected shares through the Metal GPU experiment that taught us why VerusHash favors CPUs everywhere, not just on Apple.

Verox Studio · M5 (10-core, 4P+6E, macOS Tahoe 26.4) · UnminerMac v0.20.0

A 5-post series. Read in order or jump in anywhere. The arc: get any share acceptedmake it fastport it to the GPUlearn why that doesn't help. Code is MIT-licensed, links throughout.

Why every share was rejected

The pool said low difficulty share on every submission. The algorithm was right; the preprocessing wasn't. Three days of hunting led to a single missing blake2b call with a specific personalization string.

Read part 1 →

Reverse-engineering LuckPool's hash pipeline

VerusHash on PBaaS chains doesn't hash the raw block header. It hashes a canonically-cleared version that has a specific blake2b("VerusDefaultHash", …) digest embedded at a specific offset. Here's how we found that out.

Read part 2 →

The 3.8× speedup hiding in the inner loop

Once shares were accepting, the miner ran at 1.04 MH/s. Per-job work was being redone per-nonce. Splitting "what changes every iteration" from "what's constant for the duration" took the same M5 from 1.04 to 3.94 MH/s.

Read part 3 →

Five Metal primitives, byte-perfect

Porting Haraka256/512, clmul64, mulhrs, and precompReduction64 to Metal Shading Language, validated against ~2,100 test vectors. T-table AES, the carryless multiply Metal doesn't have, and the GF(264) reduction that closes the pipeline.

Read part 4 →

We built the GPU miner. Then we learned why it doesn't matter.

First publicly-known Apple Silicon Metal port of VerusHash 2.2. Pool-accepted shares on the first live test. Hashrate: 0.51 MH/s vs the same M5's CPU at 3.88 MH/s. Same story as the RTX 3080. The honest performance story, the M5 GPU spec dive, and why we're shipping it anyway.

Read part 5 →