Here's a 16 byte (16x 8-bit word, 128 bits total) extensible SRAM unit I completed today. Its total volume is about 90x35x8 and it's based on this D-latch design. This unit can be extended both vertically (more capacity, will need to add more address lines on the left) and horizontally (larger word size). Initially I started out with a 4x 4-bit word design and expanded it to this, so it's not very difficult.
Unfortunately the performance isn't great. It takes 39 ticks for the address lines to settle and 17 for the furthest memory cell to update when the clock is pulsed. The addressing can probably be improved a bit by doubling up on address lines (non-inverted and inverted states), but performance is never going to be very good due to the large area that has to be traversed. This is bad enough that I'm considering using some kind of cache, or possibly implementing a serial bus to save space (might as well if addressing is going to take a long time).
Thoughts? I know a few people here have been working on computer parts and I'm interested in seeing some other designs.
Unfortunately the performance isn't great. It takes 39 ticks for the address lines to settle and 17 for the furthest memory cell to update when the clock is pulsed.
I wonder how that compares to real computers of perhaps a few years ago? With hyperthreading and pipelining modern processors are able to execute instructions in 3 or 4 ticks by having multiple instructions at various stages on the pipeline. Until that came about, a CPU was limited to a single thread and as such it took a considerable number of ticks to fetch from memory, perform operations, and then send the result to memory.
Obviously a redstone computer won't ever have a stellar tick rate, but even at megahertz speed on a circa 1980s IBM PC 39 (or whatever) ticks per instruction would add up.
Rollback Post to RevisionRollBack
If at first you don't succeed, repair the creeper damage and try again.
I'm pretty sure this is worse than just about anything. 39 ticks (redstone updates, not clock) just to get the address decoded is a really long time and would require my main clock to run very, very slowly. It's due primarily to all the inversions going on in the address decoder... I was a little too concerned about space without taking speed into account at all. Fortunately I've had some success with the two-wire method I mentioned in the OP. It's much faster, though it'll be about 3x larger. This is closer to how it's actually done in silicon. Now I just need to find a way to clean up the mess of inversions under the actual memory cells.
This is an improved design I've been working on. I've only bothered to construct an 8x 4-bit unit because I think it'll probably be a better idea to put half of the bits on the other side of the address decoder and I need to decide on a capacity before I do that since I won't be able to resize the decoder once that's done. It uses a different D-latch (larger than the last one, unfortunately) that gives me a little more freedom to wire the clock, data, and address lines. Now they don't rely on repeaters to work around each other, so repeaters are only necessary where the signal drops off. Despite the larger D-latch the cell pitch is actually a little smaller (9 blocks instead of 10), but to reduce latency the address decoder has to be much larger. I'm still trying to figure out a good way to shrink that. I haven't connected the clock lines to each other yet because I'm planning on rearranging some things (see below).
Anyway, this design only takes 17 ticks for the address lines to settle (for 8x 4-bit), which is probably going to be around ~25 ticks for a 16x 8-bit unit like the one in the OP. This is still somewhat high, so I'm considering doing three things:
- Splitting up each byte by storing 4 bits on each side of the address decoder. This will allow address changes to propagate in two directions instead of one, meaning I can activate twice the number of bits in the same time.
- Splitting up the entire module by mirroring it vertically and running the clock and data lines down the center, same reasoning as above.
- Rearranging the clock lines, data lines (maybe), and all repeaters to a tree structure. This will have to be done after the other two and will probably require profiling to see whether it actually helps.
Here's a preview of the full 16-byte unit I've been working on based on the new design with the improvements I mentioned in my last post. Everything seems to be working much faster than the original design, but it's a little larger at ~90x40x13. I'm going to profile it before going any further. Since it uses trees for the clock, data, and addressing it should be possible to dump operations in the pipeline without interfering with each other as long as there's a minimum amount of delay between them which I have yet to measure.
This is also built to use a bidirectional I/O bus. To my knowledge none exist yet, but there are some bidirectional repeaters that work in-game but not in the simulator and I'm hoping that with a simulator update it'll be possible to get a bidirectional bus set up. If it's not possible then the internal input and output sections can be split up so I won't have to significantly change the design.
Anyway, here it is. Sorry for the GIF, I'll have a full diagram up after I profile.
Bottom level: Clock tree
Ground level: Address input (left, 4 wide), I/O lines (center, 8 wide), internal input lines are below, output above, R/W signal (9th line to the right of I/O)
Top level: Address decoder
Schematic: Download (sorry about MediaFire, Megaupload has decided not to work for me and tinyupload is down)
Capacity: 16 bytes (16 8-bit words)
Final size: 87x40x13 (only two little wires stick up on the top layer, I couldn't find anywhere else to put them :/)
Redstone cost: 5296 wire, 2428 torches, 7723 ore
Performance is much better than before but not as good as I'd hoped. It takes 4 ticks to write data successfully (sets can be done in 2 ticks, but resets take longer). The R/W bit has to be high for 4 ticks to allow the data and clock signals to propagate. The data also cannot change for one tick after the R/W bit is turned off, otherwise it puts the cells at whatever address is selected into an indeterminate state. That's probably confusing, so this graph may help explain it better:
It doesn't matter what the data is set to for that one tick right after the R/W bit is set. Since these are one tick out of phase it's a little hard to control them with another circuit though, so realistically it probably takes 6 ticks to write if you can coordinate the signals properly. In the schematic I had to add 4 sets of torches to the input lines to add the necessary delay to get it to arrive at the same time as the clock signal, but nothing can be done about that 1 tick without resorting to the N/S trick.
It takes something like 21 ticks for the data to propagate back to the output from the beginning of the write cycle. That's not terrible, but it's not impressive either.
Addressing takes ~12 ticks.
I think I'm going to take a break from this for a while. The latency for these things seems like it's always going to be too high and I'm starting to think that getting access times under 1 second in-game is probably unrealistic.
Unfortunately the performance isn't great. It takes 39 ticks for the address lines to settle and 17 for the furthest memory cell to update when the clock is pulsed. The addressing can probably be improved a bit by doubling up on address lines (non-inverted and inverted states), but performance is never going to be very good due to the large area that has to be traversed. This is bad enough that I'm considering using some kind of cache, or possibly implementing a serial bus to save space (might as well if addressing is going to take a long time).
Thoughts? I know a few people here have been working on computer parts and I'm interested in seeing some other designs.
Schematic file: Schematic
I wonder how that compares to real computers of perhaps a few years ago? With hyperthreading and pipelining modern processors are able to execute instructions in 3 or 4 ticks by having multiple instructions at various stages on the pipeline. Until that came about, a CPU was limited to a single thread and as such it took a considerable number of ticks to fetch from memory, perform operations, and then send the result to memory.
Obviously a redstone computer won't ever have a stellar tick rate, but even at megahertz speed on a circa 1980s IBM PC 39 (or whatever) ticks per instruction would add up.
This is an improved design I've been working on. I've only bothered to construct an 8x 4-bit unit because I think it'll probably be a better idea to put half of the bits on the other side of the address decoder and I need to decide on a capacity before I do that since I won't be able to resize the decoder once that's done. It uses a different D-latch (larger than the last one, unfortunately) that gives me a little more freedom to wire the clock, data, and address lines. Now they don't rely on repeaters to work around each other, so repeaters are only necessary where the signal drops off. Despite the larger D-latch the cell pitch is actually a little smaller (9 blocks instead of 10), but to reduce latency the address decoder has to be much larger. I'm still trying to figure out a good way to shrink that. I haven't connected the clock lines to each other yet because I'm planning on rearranging some things (see below).
Anyway, this design only takes 17 ticks for the address lines to settle (for 8x 4-bit), which is probably going to be around ~25 ticks for a 16x 8-bit unit like the one in the OP. This is still somewhat high, so I'm considering doing three things:
- Splitting up each byte by storing 4 bits on each side of the address decoder. This will allow address changes to propagate in two directions instead of one, meaning I can activate twice the number of bits in the same time.
- Splitting up the entire module by mirroring it vertically and running the clock and data lines down the center, same reasoning as above.
- Rearranging the clock lines, data lines (maybe), and all repeaters to a tree structure. This will have to be done after the other two and will probably require profiling to see whether it actually helps.
This is also built to use a bidirectional I/O bus. To my knowledge none exist yet, but there are some bidirectional repeaters that work in-game but not in the simulator and I'm hoping that with a simulator update it'll be possible to get a bidirectional bus set up. If it's not possible then the internal input and output sections can be split up so I won't have to significantly change the design.
Anyway, here it is. Sorry for the GIF, I'll have a full diagram up after I profile.
Bottom level: Clock tree
Ground level: Address input (left, 4 wide), I/O lines (center, 8 wide), internal input lines are below, output above, R/W signal (9th line to the right of I/O)
Top level: Address decoder
Schematic: Download (sorry about MediaFire, Megaupload has decided not to work for me and tinyupload is down)
Capacity: 16 bytes (16 8-bit words)
Final size: 87x40x13 (only two little wires stick up on the top layer, I couldn't find anywhere else to put them :/)
Redstone cost: 5296 wire, 2428 torches, 7723 ore
Performance is much better than before but not as good as I'd hoped. It takes 4 ticks to write data successfully (sets can be done in 2 ticks, but resets take longer). The R/W bit has to be high for 4 ticks to allow the data and clock signals to propagate. The data also cannot change for one tick after the R/W bit is turned off, otherwise it puts the cells at whatever address is selected into an indeterminate state. That's probably confusing, so this graph may help explain it better:
It doesn't matter what the data is set to for that one tick right after the R/W bit is set. Since these are one tick out of phase it's a little hard to control them with another circuit though, so realistically it probably takes 6 ticks to write if you can coordinate the signals properly. In the schematic I had to add 4 sets of torches to the input lines to add the necessary delay to get it to arrive at the same time as the clock signal, but nothing can be done about that 1 tick without resorting to the N/S trick.
It takes something like 21 ticks for the data to propagate back to the output from the beginning of the write cycle. That's not terrible, but it's not impressive either.
Addressing takes ~12 ticks.
I think I'm going to take a break from this for a while. The latency for these things seems like it's always going to be too high and I'm starting to think that getting access times under 1 second in-game is probably unrealistic.