![]() ![]() ![]() In contrast, the bandwidth of reading data through the bus is low and the delay is large, and the calculation efficiency can be greatly improved through the output buffer. Output Buffer: It is used to store the intermediate results of each calculation layer in the neural network, for obtaining data conveniently when entering the next layer. The frequency of data access on the bus is reduced as well as the risk of data congestion on the bus, so as to minimize power dissipation and improve performance. Input Buffer: It is used to temporarily reserve the data that needs frequent use, which avoids external reading to AI Core via the bus interface every time. The memory migration unit, which is responsible for the read-write management of the internal data of AI Core between different buffers, is also set as the transmission controller of the internal data path of AI Core, completing a series of format conversion operations, such as filling, Img2Col, transposition, extract, etc. It can also directly access the memory through Double Data Rate Synchronous Dynamic Random Access Memory (SDRAM, DDR for short) or High Bandwidth Memory (HBM). Storage Control Unit: A direct access to lower-level caches in addition to AI Core can be achieved via bus interface. Memory Unit is composed of memory control unit, buffer and register. Memory Unit and the corresponding data path constitute Memory System of Da Vinci Architecture, as shown in Fig. Scalar Unit: It is equivalent to a micro CPU, which controls the entire AI Core operation, completing the cycle control and branch judgment of the whole program, providing data address and related parameter calculation for matrix and vector, as well as basic arithmetic operation. Its functions cover various basic calculation types and many customized calculation types, such as FP16, FP32, Int32, Int8 and other data types. Vector Unit: It implements the calculation between vector and scalar, as well as dual vector. One beat completes 16 × 16 and 16 × 16 matrix multiplication (4096) of one FP16 if the input data belongs to Int 8 type, then one beat will complete 16 × 32 and 32 × 16 matrix multiplication (8192). ![]() 6.3.Ĭube Unit: The main function of Cube Unit and accumulator is to complete matrix correlation operations. There are three basic computing units in Da Vinci Architecture: Cube Unit, Vector Unit and Scalar Unit, which respectively correspond to the three common computing modes, namely, cube, vector and scalar, as shown in Fig. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |