[问答]

怎么改进使用过多多路复用器的设计

问答对人有帮助，内容完整，我也想知道答案 0 你好，我在Spartan 3E开发板上开发了一个小型微型计算机系统（简单的CPU，视频，I / O控制器等）。我的系统工作正常，但我只是制定了50MHz的目标时序约束，我想改进它。每个时序报告的最长路径似乎涉及我用于多路复用存储器单元和寄存器的逻辑。 CPU存储空间包括多个具有不同功能的SRAM块以及存储器映射的基于LUT的寄存器。在我需要将不同的块“连接”在一起的情况下，我只使用WITH SELECT或IF THEN结构根据相关控制信号复用块。基于LUT的寄存器尤其如此：我有超过25个独立的基于8位LUT的寄存器，用于控制视频，I / O等各种模块。所有存储器都使用一个存储器映射到CPU地址空间使用CPU提供的存储器地址的巨型多路复用器作为多路复用器的控制信号。即使写下来，我也意识到它必须非常低效，我想知道是否存在以更少的逻辑层实现相同目的的永恒方式，从而提高了时序性能。我会非常感激任何想法！最好的祝福，安定以上来自于谷歌翻译以下为原文 Hello, I have developed a small micro-computer system on a Spartan 3E development board (simple CPU, video, I/O controllers etc.). My system works fine but I only just made my target timing constraint of 50MHz and I'd like to improve this. The longest paths per the timing report seems to involve the logic that I have for multiplexing memory units and registers. The CPU memory space includes multiple SRAM blocks of different functionality and also memory-mapped LUT-based registers. Where I have needed to "join" different blocks together I simply used WITH SELECT or IF THEN structures to multiplex the blocks according to the relevant control signals. This is particularly the case with my LUT-based registers: I have more than 25 separate 8-bit LUT-based registers that control various modules such as video, I/O, etc.. All are memory mapped into the CPU address space using one giant multiplexer that uses the memory address give by the CPU as the control signal for the multiplexer. Even writing this down I realize it must be enormously inefficient and I wondered if there are alternaltive ways of achieving the same ends with less logic layers and consequently better timing performance. I would be very grateful for any thoughts! Best regards, Anding 0
2019-2-13 13:36:39　　评论淘帖0 邀请回答您可以邀请以下用户，快速回答问题 × asd008 该类别下有 32 个回答。邀请回答 aaaa321 该类别下有 27 个回答。邀请回答 doublelove 该类别下有 26 个回答。邀请回答 vynywrwr 该类别下有 21 个回答。邀请回答 hgjhgd 该类别下有 19 个回答。邀请回答 bwerwer 该类别下有 18 个回答。邀请回答 hanyan533 该类别下有 18 个回答。邀请回答 Jaionm 该类别下有 18 个回答。邀请回答 wang2222222 该类别下有 17 个回答。邀请回答 handsomelchcpp 该类别下有 17 个回答。邀请回答 ningee 该类别下有 16 个回答。邀请回答 binro 该类别下有 16 个回答。邀请回答 wznnzw 该类别下有 15 个回答。邀请回答 myf888 该类别下有 15 个回答。邀请回答 guotong1984 该类别下有 14 个回答。邀请回答 yhxc 该类别下有 14 个回答。邀请回答 armortech 该类别下有 14 个回答。邀请回答 60user25 该类别下有 14 个回答。邀请回答 zcy615 该类别下有 14 个回答。邀请回答 wanglinhua2627 该类别下有 14 个回答。邀请回答举报萧昕腾相关推荐 • 你真的了解的模拟多路复用器和开关的原理和应用吗？ 3931 • 低漏电多路复用器在高阻抗PLC系统中的重要性 1691 • 如何为多路复用器应用选择正确的Δ-Σ转换器类别？ 1566 • 如何使用SOT-23薄型多路复用器克服最后时刻的需求变化 951 • 有多路复用器的运算放大器切换到新设置需要多长时间？ 372 • 设计电路时，为什么有时候用到多路复用器 3400 • 开关和多路复用器的常见故障是什么？怎么解决？ 2992 • 如何成功使用34901A 20通道多路复用器进行34970A数据采集 1818 • 多路复用ICSP引脚如何控制开关？ 2190 • 基于模拟多路复用器实现多通道电流测量 2693 2个回答

答案对人有帮助，有参考价值 0 对于基于结构的寄存器，常见的优化是具有“影子RAM” - 通常是块RAM - 它包含最近写入的寄存器值的副本。然后阅读寄存器，你的多路复用器默认为这个“影子RAM”，除非读回可以在外部改变的数据只读位或寄存器位，否则可以在处理器范围之外进行更改。在大多数系统中，这会相当多地减少多路复用器输入的数量。将寄存器分组只读或有其他原因不使用影子RAM有帮助。另一件需要考虑的事情是Block RAM和分布式RAM都有更长的Q时序比织物人字拖鞋。如果您能负担得起，在这些RAM之后添加管道阶段会有很大帮助额外的延迟周期。在许多情况下，你将一些不真实的东西多路复用每个周期都会发生变化（如寄存器），因此额外的延迟几乎没有问题。 - Gabor - Gabor 以上来自于谷歌翻译以下为原文 For fabric-based registers, a common optimization is to have a "shadow RAM" - usually a block RAM - that holds a copy of the most recently written value of the registers. Then to read the registers, your multiplexer defaults to this "shadow RAM' unless reading back data that can change externally like read-only bits, or register bits that can otherwise be changed outside the scope of the processor. In most systems, this reduces the number of mux inputs by quite a bit. Grouping together registers that are read-only or have other reasons not to use shadow RAM helps. Another thing to consider is that block RAM and distributed RAM both have longer clock to Q timing than fabric flip-flops. Adding a pipeline stage after these RAMs helps immensely if you can afford the extra cycle of latency. In many cases you are multiplexing together some things that don't really change on every cycle (like registers) so the extra latency is of little or no concern. -- Gabor -- Gabor

2019-2-13 13:48:48 评论举报杨玲

答案对人有帮助，有参考价值 0 谢谢Gabor，这给了我很多思考。我做了一些非常有用的改进 - 使可写硬件寄存器只写。（换句话说，保持可写寄存器值的记录工作是软件问题而不是硬件问题，从而减小了读取端多路复用器树的大小） - 将所有可写寄存器的更新流水线化到CPU写入之后的循环（在CPU写入后将可写寄存器多路复用器树移动到时钟周期） - 维护所有传入硬件信号的本地寄存器副本（在CPU读取之前将传入路由延迟移至时钟周期）您是否知道将单独的SRAM块有效组合到单个地址空间的技巧？我使用CORE向导配置SRAM块。当向导将多个SRAM块连接到更大的存储器时，它似乎使用专用的路由资源而不是通用多路复用器，因此结果几乎与单个块一样快。但有时我需要配置单独的块块（可能是因为端口B侧的连接不同），然后将它们连接在一起，形成端口A侧的单个地址空间。当我在VDHL中执行此操作时，会与多路复用器建立连接，这会引入时序延迟。无论如何使用端口A侧的专用资源将SRAM块链接在一起，但是能够为端口B侧的不同块指定不同的连接，以及为不同的块使用不同的COE初始化文件？以上来自于谷歌翻译以下为原文 Thanks Gabor, this has given me much food for thought. I made some quite helpful improvements by -- making writable hardware registers write only. (In other words the job of record keeping the values of the writable registers is made a software problem rather than a hardware problem, thus reducing the size of the multiplexer tree on the read side) -- pipelining the update of all writable registers to the cycle following the CPU write (moves the writable register multiplexer tree to the clock cycle after the CPU write) -- maintaining local register copies of all incoming hardware signals (moves the incoming routing delays to the clock cycle before the CPU read) Do you know any trick for combining separate SRAM blocks into a single address space efficiently? I configure my SRAM blocks with the CORE wizard. When the wizard connects multiple SRAM blocks into larger piece of memory it seems to use dedicated routing resources rather than general purpose multiplexers so that the result is almost as fast as if it were a single block. However sometimes I need to configure separate chunks of SRAM (perhaps because of different connections on the port B side) and then connect them together into a single address space on the port A side. When I do this in VDHL the connections are made with multiplexers and this introduces timing delays. Is there anyway to link SRAM blocks together using the dedicated resources on the port A side, yet be able to speicify different connections for different block on the port B side as well as use different COE initialization files for different blocks?

2019-2-13 14:04:30 评论举报李先吊