Matrix multiplier machine
Build a machine with the following interface:
- Input: Clock (1b)
- Input: Reset (1b)
- Output: mem_read_en (1b)
- Output: mem_read_addr (20b)
- Input: mem_read_data (64b) arrive 3 clocks after read_en is set
- Output: mem_write_en (1b)
- Output: mem_write_addr (20b)
- Output: mem_write_data (64b)
- Input: multiply_en (1b)
- Output: ready (1b)
The machine has access to external memory that contains 2 matrices of 16-bit elements.
The machine can read 4 elements (64 bits) every clock cycle and write 4 elements (64 bits) every clock cycle.
Matrix A (size 1024×1024) located at address 0 and above
Matrix B (size 1024×1024) located at address 256K and above
Once the signal multiply_en is asserted by the user, the machine will:
- Deassert the output ready
- Multiply the two matrices and write the result to address 512K and above
- Assert the output ready
Design requirements:
- Build a machine that executes the task as fast as possible (area is not a concern).
- Build a separate machine that executes the task using only 4 multipliers of (16b×16b→32b)