Matrix multiplier machine

Build a machine with the following interface:

  • Input: Clock (1b)
  • Input: Reset (1b)
  • Output: mem_read_en (1b)
  • Output: mem_read_addr (20b)
  • Input: mem_read_data (64b) arrive 3 clocks after read_en is set
  • Output: mem_write_en (1b)
  • Output: mem_write_addr (20b)
  • Output: mem_write_data (64b)
  • Input: multiply_en (1b)
  • Output: ready (1b)

The machine has access to external memory that contains 2 matrices of 16-bit elements.
The machine can read 4 elements (64 bits) every clock cycle and write 4 elements (64 bits) every clock cycle.
Matrix A (size 1024×1024) located at address 0 and above
Matrix B (size 1024×1024) located at address 256K and above

Once the signal multiply_en is asserted by the user, the machine will:

  1. Deassert the output ready
  2. Multiply the two matrices and write the result to address 512K and above
  3. Assert the output ready

Design requirements:

  1. Build a machine that executes the task as fast as possible (area is not a concern).
  2. Build a separate machine that executes the task using only 4 multipliers of (16b×16b→32b)

Submit