Skip to the content.

First steps

Understand Output

While optimizing, CryptOpt will output the current status of the optimization. Each line has this format:

fiat_curve25519_carry_square|1/10| 14|bs  181|#inst: 140|cyclΔ     70|G  58 cycl σ  0|B  59 cycl σ  0|L  55|l/g 0.9519| P|P[ -14/   0/  14/ -11]|D[MU/  1/ 31/ 59]| 90.0( 1%)  60/s

Lets break this down:

Field Example Comment
Symbol fiat_curve25519_carry_square The symbol being optimized.
Comment 1/10 Arbitrary comment. Usually used in Bet-n-run mode. Then, it means bet 1 from 10, after that it’ll say run.
Stack size 14 How many spills to memory there are. E.g. 6 for all spills of the six callee-saved registers
Batch Size bs 181 BS in the paper, How big is the batch. i.e. how many iterations of Symbol are counted
Instr. Count #inst: 140 How many instructions are used to implement the Symbol
Raw Cycle Delta cyclΔ 70 Measure both batches nob=31 times, take difference of medians. (This is that delta). Based on this a mutation is kept or not.
Cycles +stddev (good) G 58 cycl σ 0 Number of cycles for the good candidate, scaled by bs i.e. per on one iteration. Also states the stdDev of the nob measurements
Cycles +stddev (bad) B 59 cycl σ 0 Same, but for the bad candidate
Cycles Library L 55 Cycles that the CC-Compiled version takes
Ratio l/g 0.9519 Ratio of cycles lib / cycles good. i.e. 55 / 58 -> 0.9519 (uses the non-scaled counts) This is green if the ratio is >1 which means that CryptOpt Code is faster than CC’s.
Mutation P Which mutation has been applied. Permuation or Decision. (Permutation means mutation in operation order; Decision means which template to use, or how to load operands.)
P-Mutation-Detail P[ -14/ 0/ 14/ -11] Details on last P-Mutation. [steps to go back / steps to go forward / absolute index of operation to move / applied relative movement ]
D-Mutation-Detail D[MU/ 1/ 31/ 59] Details on last D-Mutation. [new chosen template / absolute index of operation to change decision / number of operations with changeable decisions / number of operations with decisions]
Progress, speed 90.0( 1%) 60/s Number of the current Mutation, then in Percent, then how many mutations per second can be evaluated. This is green if the mutation is kept.

More on the D-Mutation-Details:

Template Short Description
AR A different argument is loaded from memory
KK The flags CF/ OF flags are cleared differently
FL A different flag CF/ OF is used as an accumulator
B& For a binary-and a different instruction-template is used
MU For a multiplication-with-immediate a different instruction-template is used
IM A different immediate value is loaded
SL Spill location changed spill-to-memory <-> spill-to-xmm

Understand Output Files

While Optimizing, CryptOpt will generate files in the ./results/<BRIDGE>/<SYMBOL> folder (adjustable with --resultDir parameter to ./CryptOpt).

CryptOpt writes out intermediate ASM-files whenever it finishes a bet and an additional file when finished the run. CryptOpt also generates the internal state (in *.json files) of the optimization for each bet-outcome. The directory also contains a *.dat (space separated) file with rows for every bet/run containing the l/g value every time it is printed to the terminal. From that *.dat file, the generated *.gp file will generate the *.pdf file, which shows the optimization in progress. Additionally, (currently WIP) there is a *.csv file logging all the mutations and which ones were kept.

Play around w/ Fiat

We can give CryptOpt a wide range of parameters:

Parameter default possible / typical values description
–bridge fiat fiat, bitcoin-core, manual which connection i.e. input should be used.
–evals 10000 100, 1k, 100k, 1M t; How many mutations to evaluate
–curve curve25519 curve25519, p224, p256, p384, p434, p448_solinas, p521, poly1305, secp256k1 which field - this determines the prime, the implementation strategy and number of limbs
–method square mul, square Method (i.e. function) to optimize. Multiply or Square
–cyclegoal 10000 1, 100, 100000 How many cycles to measure, and adjust batch size bs accordingly
–bets 10 10, 30, 100 How many ‘bets’ for the bet-and-run heuristic
–betRatio 0.2 0.1, 0.3 The share from parameter --evals, which are spent for all bets, in per cent (i.e. 0.2 means 20% of –evals will be used for the bet part, and 80% for the final run-part)
–resultDir ./results /tmp/myresults The directory under which <BRIDGE>/<SYMBOL> will be created and the result files will be stored
–no-proof   –no-proof, –proof [dis|en]ables the Fiat-Proofing system. It is enabled by default for fiat-bridge, disabled for the rest.
–memoryConstraints none none, all, out1-arg1 none: any argN[n] can be read after any outN[n] as been written. That is okay, if all memoy is distinct. ‘out1-arg1’ specifies that arg1[n] cannnot be read, if out1[n] has been written. It may read arg1[n+m] after out1[n] has been written. Use that if you want to call mul(r,r,x) or sq(a,a). all: no argN[n] can be read if any outN[n] has been written. Use that if memory can be overlapping and unaligned.

For more information check ./CryptOpt --help

As next example, use CC=clang ./CryptOpt --curve p256 --method mul --evals 10k to generate an optimized version for NIST P-256 multiply and compare the function with clang-compiled version of the C-equivalent.

Play around w/ Bitcoin

  1. Run ./CryptOpt --bridge bitcoin-core --curve secp256k1 --method mul --bets 5
    This will try 5 different bets for the primitive mul of libsecp256k1.

  2. Find the result files (*.asm,*.pdf) for this run in ./results/bitcoin-core/secp256k1_fe_mul_inner/