Blog — UnminerMac engineering notes

A 5-post series. Read in order or jump in anywhere. The arc: get any share accepted → make it fast → port it to the GPU → learn why that doesn't help. Code is MIT-licensed, links throughout.

Part 1 · debugging · v0.18

Why every share was rejected

The pool said low difficulty share on every submission. The algorithm was right; the preprocessing wasn't. Three days of hunting led to a single missing blake2b call with a specific personalization string.

Read part 1 →

Part 2 · reverse-engineering · v0.18

Reverse-engineering LuckPool's hash pipeline

VerusHash on PBaaS chains doesn't hash the raw block header. It hashes a canonically-cleared version that has a specific blake2b("VerusDefaultHash", …) digest embedded at a specific offset. Here's how we found that out.

Read part 2 →

Part 3 · performance · v0.19

The 3.8× speedup hiding in the inner loop

Once shares were accepting, the miner ran at 1.04 MH/s. Per-job work was being redone per-nonce. Splitting "what changes every iteration" from "what's constant for the duration" took the same M5 from 1.04 to 3.94 MH/s.

Read part 3 →

Part 4 · GPU · v0.19

Five Metal primitives, byte-perfect

Porting Haraka256/512, clmul64, mulhrs, and precompReduction64 to Metal Shading Language, validated against ~2,100 test vectors. T-table AES, the carryless multiply Metal doesn't have, and the GF(2⁶⁴) reduction that closes the pipeline.

Read part 4 →

Part 5 · honest postmortem · v0.20

We built the GPU miner. Then we learned why it doesn't matter.

First publicly-known Apple Silicon Metal port of VerusHash 2.2. Pool-accepted shares on the first live test. Hashrate: 0.51 MH/s vs the same M5's CPU at 3.88 MH/s. Same story as the RTX 3080. The honest performance story, the M5 GPU spec dive, and why we're shipping it anyway.

Read part 5 →