M3 max is objectively worse than the M2 for inference.
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M 1... See more