PART 1: THE HARDWARE
BYTE • AUGUST 1985
Plug a 32-bit microcomputer into your IBM PC
Do you have scientific number-crunching problems that leave your IBM Personal Computer (PC) gasping? Do you want to learn about the 32032, one of the first commercially available 32-bit microprocessors? Or do you just want the fastest IBM PC on the block?
If you answered "yes" to any of these questions, then you may be looking for the DSI-32 coprocessor board from Definicon Systems Inc.
The DSI-32 coprocessor board uses the National Semiconductor NS32032 full 32-bit CPU (central processing unit), the NS32081 high-speed FPU (floating-point processing unit), and optionally the NS32082 MMU (memory-management unit).
There are two kits. The starter kit has a 6-MHz CPU and 256K bytes of RAM (random-access read/write memory). The advanced kit has a 10-MHz CPU and 1 megabyte of RAM. The only difference between the two kits is the jumper configuration; both use the same board. If you get the starter kit, you can upgrade later to the more advanced system. Both kits have a socket for the MMU chip.
The board also has two high-speed (up to 38.4k bits per second) RS-232C serial ports and a 16-bit programmable timer.
In addition, all MS-/PC-DOS facilities - such as communication ports and video and printer controllers - are available to the NS32032 via the Definicon MS-/PC-DOS interface software. The interface has special support for bit-mapped graphics-display access, including support for multiple-screen images in memory.
Did we call the DSI-32 a coprocessor? Well, that's only one way to look at it. You can also think of the IBM PC as a convenient standard chassis-supplying disk drives, power supply, display, keyboard, and expansion-board connectors-into which you can plug a powerful 32-bit microcomputer. Since Definicon's interface software runs in MS-DOS, you don't have to learn a different operating system to use the DSI-32.
The DSI-32 consists of a number of relatively independent functional units (see photo 1 and figures 1 through 4). The 32032 CPU (IC44) is near the center of the board. Above it is the 32201 TCU (timing control unit, IC43), which contains the clock oscillator and much of the bus-interface timing circuitry. To its right is the 32081 FPU (IC49) and to its immediate left is the MMU (IC40). Further left is the DP8409 dynamic RAM controller (IC37) and the RAM array (IC32). To the right of the FPU is the 2681 DUART (dual universal asynchronous receiver/transmitter, IC55), the RS-232C drivers (IC58,59,61,63,64), and serial port connectors. Above the DUART is a socket for user-defined peripheral devices. This socket simplifies the task of designing additional special-function daughterboards. At the far lower left are the dual bidirectional latches (74LS646, IC33-36) that buffer the data between the asynchronous 8-bit PC bus and the 32-bit internal data bus of the DSI-32. The remaining circuits perform address decoding, buffering, and control-signal generation. There are four jumper blocks (JB1, JB2, JB3, and JB4) for selecting the operational configuration of the board. When shipped, the jumpers are in the correct position for a 32032 (full 32-bit bus) with no MMU chip in the MMU socket. Other possible configurations include the 32032 with the 32082 MMU, or just the 32016 CPU (16-bit bus). Jumpers for these configurations are shown in figure 5. The JBI jumpers determine whether 64K-byte RAM chips or 256K-byte RAM chips are installed and the memory-refresh rate.
Trevor G. Marshall, George Scolaro, David L. Rand, and Tom King are engineers with Definicon Systems Inc. Vincent P. Williams is president of Definicon.
They can be contacted at 21042 Vintage St., Chatsworth , CA 91311.
BY TREVOR G. MARSHALL, GEORGE SCOLARO, DAVID L. RAND, TOM KING, AND VINCENT P. WILLIAMS
Photo 1: The DSI-32 coprocessor board.
National Semiconductor's absence from the 8-bit and 16-bit microprocessor markets turned out to be an advantage in one way. The 32000 series could be designed from scratch. Other microprocessor makers often felt it was important to keep some compatibility between their earlier chips and any new designs. Free of such constraints, National Semiconductor took what it calls a "radical departure from popular trends in architectural design."
The main aim was to make software development easy: to design a chip that compiler writers would love.
Figure A: A block diagram of the 32032 CPU.
Figure A is a block diagram of the 32032. It has a 16-megabyte uniform (nonsegmented) linear-addressing space and is available in 6-, 8-, and 10-MHz versions. The 32032 has eight 32-bit-wide, general-purpose registers that can handle byte, word, or double-word data. It also has eight dedicated registers including a 32-bit program counter, a processor status register, two stack pointer registers for user and interrupt stacks, the frame-pointer register that points to a procedure's dynamically allocated local storage, the static base register (which points to relocatable global variables), the interrupt base register (which locates the dispatch table for interrupts and traps), and the module register (which holds the descriptor's address of the currently executing module)
The 32032's design was heavily influenced by the VAX, particularly its addressing modes. Besides the standard immediate, absolute, register, and register-relative modes. there are five other modes that help support high-level languages. These are the memory space, memory-relative, external, sca led-index. and top-of-stack modes. As with many advanced microprocessors. the 32032 has both supervisor and user operating modes. To protect operating systems, a user mode program cannot execute some instructions or access certain registers. A supervisor mode program doesn't have such restrictions.
The 32032 has customary move, integer-arithmetic, BCD (binary-coded decimal), integet-comparison, logical, Boolean, shift, bit, jump, stack, and control instructions. To that stew it adds new instructions such as MODi (modulus arithmetic) as well as new instruction groups including bit-field, array, and string operations. Finally, the 32032 has a list of floating-point, memory-management, and custom slave instructions that allow it to cooperate with other processors.
Figure 1: Schematic of the DSI-32's board's CPU and CPU support circuitry, inclding the optional memory mapping unit.
Figure 2: Schematic of the DSI-32's IBM PC bus interface circuitry.
When power is first applied to the IBM PC, a bus signal RESET DRV is generated. This is a power-on reset for any slave boards on the bus. #The RESET DRV signal is latched in IC60, the DIAG vector PAL (programmable array logic). Pin 22 of this PAL, /POWERON, initializes the board and then remains latched until the loader software resets it by pulsing the RFSH INHIBIT line.
Since there is no room for a ROM (read-only memory) on this board, we needed some mechanism for the CPU to execute defined instructions during the power-on cycle. The DIAG vector PAL performs this function. The PAL forces a DIA instruction on the data bus whenever the CPU is uninitialized. This makes the CPU fetch a DIA as its first instruction (at address 0) [Editor's note: All addresses to follow are in hexadecimal.] The DIA instruction causes the CPU to flush its queue and execute a "branch to self". In this way, the 32032 is put into a very tight loop and won't lock up by executing some undefined instruction from its uninitialized main RAM memory space. The remaining function of the DIAG vector PAL is to act as a 4-bit read-only port so the CPU can read the RS-232C status lines of J2.
Figure 3: Schematic of the DSI-32's RAM controller circuitry including the address decoding and HOLD arbiter PALs.
Figure 4: The DSI-32's DUART serial port circuitry, DIAG vector PAL, and IBM PC bus control PAL.
Figure 5: Various jumper configurations for the DSI-32. All diagrams are shown looking at the top of the board. with the edge connector to the bottom right.
The PC bus interface PAL, IC62, performs many functions. It provides the IBM PC with a 2-bit status (read) port and also a 4-bit control (write) port. Also, this PAL supplies two polled flags for interprocessor communication, in addition to the one level of interrupt in each direction.
When the IBM PC's CPU accesses the DSI-32's RAM, it performs a memory (read or write) cycle in a 64K-byte segment of its address space. For the PC XT this is from E000 to EFFF. For the PC AT the board can be mapped from D000 to DFFF IC47 determines which 64K-byte segment of the DSI-32 address space the PC is referencing.
When a memory read/write request from the 8088 is detected by the dual port RAM controller (IC56), it asserts a HOLD (DMA) request to the 32032 CPU. When it is able to service the HOLD request, the 32032 responds with a HLDA (hold acknowledge) signal and IC 56 completes the DMA cycle. Due to the 8-byte instruction prefetch queue on the 32032, the internal CPU state machine continues to run even after it has relinquished its bus to the DMA cycle. This is fortunate because, although the much slower 8088 may take almost a microsecond to complete its portion of the DMA cycle, the usual loss of execution time to the 32032 is only 100 nanoseconds.
The default I/O (input/output) memory-address allocation for the DSI-32's control register can be changed if another board in your IBM PC has an address clash with the DSI-32. A program that comes with the kit guides your choice of memory and I/O port address configuration.
Either 256K bytes or 1 megabyte of dynamic RAM can be installed on the DSI-32. The first 14 megabytes of address space are uniquely decoded, allowing for future memory expansion on a separate board. No parity checking or obvious-error correction was designed into the DSI-32. Parity checking on an IBM PC slave board is of little use. The only action you could take if an error were detected would be to shut down the host 8088 CPU. Although the 8088 then reports the address currently on its bus, this usually bears no relation to the true cause of the problem. Conventional external error correction is slow and requires adding wait states to the memory cycles, reducing the 32032's performance.
These three benchmarks in table A represent numerically intensive algorithms that require both integer and floating-point arithmetic.
The Sieve of Eratosthenes tests the performance of a high-level language implementing Boolean algebra and integer arithmetic (see listing A). The Float benchmark examines the processor's ability to execute floating-point array arithmetic (see listing B). The FLT benchmark tests the speed of the floating-point coprocessor (see listing C)
Array handling is primarily exercised by the Float and Sieve benchmarks, since the FLT benchmark uses only scalar calculations.
It should be noted, however, that the Sieve benchmark uses only a Boolean array, and this negates much of the throughput advantage of the NS32032's 32-bit bus (and indeed the VAX's 64-bit bus), tending to favor the 8- and 16-bit processors.
The Sieve benchmarks that were run on the IBM PC XT and PC AT were written in Digital Research C. The FLT benchmark used Microsoft FORTRAN for the XT and DRF77 for the AT.
The variable n represents the maximum control number on the major loop of the benchmark test. In the Sieve benchmark, the major loop was run 10 times.
N/D indicates No Data, a test not run.
N/A indicates Not Available. No compiler could be found that could use arrays with more than 64,000 elements.
The five target machines being compared are the IBM PC XT (8088 CPU), the IBM PC AT (80286 CPU), the VAX11 1750, the VAX-I 11780, and Definicon System's OSI-32 coprocessor (10-MHz 32032 CPU).
All five machines have additional numeric-processing hardware:
1. IBM PC XT has Intel 's 8087 floating-point chip (4.77 MHz).
2. IBM PC AT has Intel's 80287 floating-point chip (4.0 MHz)
3. VAX-11/750 has Digital Equipment's Floating-Point Accelerator.
4. VAX-11/780 has Digital Equipment's Floating-Point Accelerator.
5. DSI-32 coprocessor has National Semiconductor's 32081 FPU (floating-point unit).
The compilers used for the PC XT and PC AT were chosen on their published reputation for generating high-speed code. Microsoft FORTRAN version 3.1 was used for the FLT and Float benchmarks on the PC XT. As it did not execute on the PC AT Digital Research F77 was used for the FLT and Float benchmarks on that machine. Digital Research C was used for the Sieve benchmark on both the PC XT and the PC AT The FORTRAN compiler for the VAX was written by Digital Equipment running under the VMS operating system.
The compilers for the DSI-32 coprocessor were written by Green Hills Software and ported to the Definicon MS-/PC-DOS environment.
Table A: Execution time (in seconds) for 10 iterations of the Sieve. All machines have floating-point accelerators.
Sieve Benchmark
n IBM IBM VAX- VAX- DIS-32 PC XT PC AT 11/750 11/780 8191 11.6 3.71 2.41 1.90 1.85 20000 35.3 8.13 6.11 3.04 4.52 30000 44.9 12.40 N/D N/D 6.78 40000 351.5 99.71 13.13 6.38 9.04 80000 N/A N/A 29.65 13.34 18.12
Float Benchmark
n IBM IBM VAX- VAX- DIS-32 PC XT PC AT 11/750 11/780 40000 11.46 17.71 0.83 0.50 0.84
FLT Benchmark
n IBM IBM VAX- VAX- DIS-32 PC XT PC AT 11/750 11/780 256000 119.3 134.0 9.48 6.18 16.48
#define LIMIT 8191 #define ITS 10 #define FALSE 0 #define TRUE 1 main () { char flags[LiMIT + 1]; register long i,prime,k; int count, iter; for (iter = 1 ;iter <= ITS;iter++) { count = 0; for (i = 0;i <= LIMIT;i++) flags[i] = TRUE; for (i = 0;i <= LIMIT;i++) { if (flags[i] { prime = i + i + 3; for (k = i + prime;k< = LiMIT;k + = prime) flags[k] = FALSE ; count++; } } } printf( " Found %d primes",count); }
PROGRAM FLOAT DIMENSION RARRAY (40000) COMMON /FAST/ RARRAY INTEGER * 4 I DO 10 I = 1,40000 10 RARRAY(I) = 1.0 DO 91 I = 1,40000 RARRAY(I) = RARRAY(I) * C RARRAY(40000 - I) 91 CONTINUE STOP END
C PROGRAM FLT INTEGER *4 I,J REAL * 8 X,Y,Z DO 10 I = 1,256000 J = 256000 - 1 X = FLOAT(I) Y = FLOAT(J) Z = Y/X X = Y-Z y = Z*X Z = Y+X 10 CONTINUE C Force the loop optimizer to retain C all four lines by: X= Z + Y STOP END
There is a level of protection provided by the DSI-32 interface software. Should the 32032 execute an instruction that it cannot decode (such as would occur on a faulty program read), it executes an ILLEGAL INSTRUCTION trap. This trap is caught by the Definicon MS-/PC-DOS interface software, and the full status is reported to the operator.
To further increase the board's reliability, a REFRESH INHIBIT control signal has been made available to the 8088/8086 CPU. This lets the diagnostic software determine the exact safety margin of each dynamic RAM chip in the memory array. This signal is also used to start the CPU after a cold boot.
Nevertheless, true error correction is available as an option on the 1-megabyte advanced kit. The DSI-32 is designed to accommodate the new INMOS (a British semiconductor company) 256K by 1-bit RAM chip that in ternally detects and corrects errors. such as those occurring from irregular refresh or alpha-particle activity. The RAM array is driven and controlled by a National Semiconductor DP8409. This device (IC37) contains high-current outputs that can drive the highly capacitive RAM array. as well as circuitry to insert "hidden" refresh cycles whenever the RAM is inactive. These cycles all ow the CPU to avoid the otherwise mandatory HOLD request (every 12 microseconds or so) to allow the DP8409 to refresh the array. Nevertheless, when a forced refresh is required, it is completed in two T (timing) states, leaving the processor's execution essentially unaffected.
Since both the RAM array and the 8088/8086 are asynchronously placing HOLD requests on the CPU, a HOLD arbiter PAL (1C38) allots each a priority and ensures that no access contention can occur.
The following hardware kits and software are available from Definicon Systems Inc.. 21042 Vintage St. Chatsworth, CA 91311. (818) 341-5654. if you want to run these compilers, Definicon suggests you use the Advanced kit with its 1-megabyte of RAM,)
1. Starter kit - 32032 CPU and 32081 FPU, 6-MHz clock rate, 256K bytes of RAM (32 64K by 1-bit chips) wave-soldered, partially tested, fully socketed printed-circuit board. Full set of integrated circuits and assembly instructions. Diagnostic software disk. Simplified NSX-compatible assembler/linker/loader. MS-DOS interface software, advanced debug monitor. Public-domain software disk (supplied upon request). Price $995.
2. Advanced kit - Same as above, except CPU and FPU are 10-MHz and 1 megabyte of memory (32 256K by 1-bit chips) is supplied. Price : $1495. The DSI-32 is suitable for use with the IBM PC or any identical "clone" microcomputer. However, a money-back guarantee is the only guarantee of compatibi lity offered by Definicon. The DSI-32 draws up to 15 watts from the PC's power supply. Make sure you have that much spare power before ordering.
A fixed disk is almost essential if you are to run the Green Hills compilers (which range to 250K bytes of code)
Public-domain compilers/linterpreters are available for FORTH, Small-C Pascal, and Tiny BASIC A disk containing them will be included with your kit provided that you specifically ask for it. The following advanced software is available. (Note that the C and Pascal compilers will run in 256K bytes of RAM, but their capabilities will be considerably limited. The FORTRAN compiler will not run in 256K bytes of RAM.
1. Green Hills C Compiler Kernighan and Ritchie C plus full Berkeley 4.2 UNIX extensions.
2. Green Hills Pascal Compiler: Full Berkeley 4.2 UNI X-compatible plus many extensions.
3. Green Hill s FORTRAN Compiler: ANSI (American National Standards institute) FORTRAN 77 plus full Berkeley 4.2 UNIX extensions.
4. Definicon/Computer Systems Design NS32000 Assembler/Linker Advanced National Semiconductor NSX syntax assembler with the GENIX extensions required by Green Hills compilers. Supports fully relocatable code and "Pascal-like" high-level constructs. Linker supports assembler output syntax and fully relocatable code, including named COMMON blocks and initialized statics.
5. Definicon/Computer Systems Design NS32000 Library Manager and Programmer's Utilities: LIB32 program to form and examine libraries of object modules, assembly and high-level language examples for direct (OEM) interface to the Definicon MS-/PC-DOS interface.
Library manager/programmer's utilities $49
Assembler/linker (purchased separately):$149
One compiler (your choice), including assembler/linker: $299
Two compilers, including assembler/linker (one purchase): $499
Three compilers, including assembler/linker (one purchase): $649
Any compiler, purchased alone (needs assembler above): $249
Note: Green Hills Software has helped make these compilers available to BYTE readers using the Definicon MS-/PC-DOS software environment at prices well below those of the identical compilers for their original UNIX environment.
A 32-bit FORTH interpreter is avai lable for $299 from Symbolic Processing Systems, 501 West Maple, Orange, CA 92668, (714) 637-4298. This FORTH includes a screen editor. string and file handling, and full floating-point support. Debugging aids - including TRACE and VIEW - are provided. The metacompiler and source-code screens are provided to ease system customization.
The prices Definicon is charging for the software are special discounts for BYTE readers. The only support that Definicon can offer to purchasers of this software is a guarantee to respond promptly to written bug reports. Definicon assumes that BYTE readers will be proficient in the basic programm ing syntax of a language before they order these products, and the documentation provided reflects this assumption. Trevor Marshall's Thousand Oaks Technical Database (RCP/M) will act as a focal point for public-domain software for the DSI-32, The database may be reached on the public access number (24 hours a day, 1200 bits per second) at (805) 492-5472, or, for up loads, at the restricted (sysop's) number. (805) 493-1495.
Micro Cornucopia (POB 223, Bend, OR 97709, (503) 382-8048) has agreed to form a users group to support the DSI-32. Contact them directly for details.
Murata ceramic resonators are used instead of quartz crystals. Although not quite as stable as the crystals, they are perfectly adequate. The typical frequency tolerance is ±0.5 percent maximum. They are easier to mount than crystals and are also cheaper. Note that the RS-232C data-transfer rate can be up to 3 percent slower due to the use of a standard 3.58-MHz resonator rather than the 3.686-MHz resonator originally specified for the 2681 DUART. Rogers O-PAC bypass capacitors are used in several critical areas of the board. They provide near-perfect bypassing of high-frequency transients and help reduce noise that otherwise might reduce reliability.
The DSI-32 can accommodate the NS32082 MMU.
This, in conjunction with the interrupt-driven 16-bit timer and the two serial ports, an MMU will let the DSI-32 run UNIX.
This, in conjunction with the interrupt-driven 16-bit timer and the two serial ports, gives it the capability of running UNIX (when it becomes available). The MMU also adds some debugging capability to the current monitor, such as a breakpoint-on-address reference.
In addition to a number of public-domain compilers and interpreters, three high-performance, UNIX-compatible, optimizing compilers for the DSI-32 are currently available from Definicon. The Green Hills Software C FORTRAN, and Pascal implement the full Berkeley 4.2 extensions in addition to the commonly accepted language definitions. These compilers produce NS32032 source code, which is assembled with the Computer Systems Design/Definicon assembler, linker, and loader. In addition, a 32-bit FORTH interpreter, a Tiny BASIC and a dBASE II compiler were scheduled for release last month.
The disk operating system is MS-/PCDOS. No special partitions or file conversions are required. The 32032 data files can be identical to their MS-/PCDOS counterparts, and 32032 executable code files exist on disk as standard MS-/PC-DOS files. Software development is done entirely within the MS-/PC-DOS command shell, with no need for special editors or other file managers.
A resident (RAM-based) monitor allows easy debugging. Its command syntax is similar to DEBUG and DDT (dynamic debugging tool). It allows single-step execution, running with multiple breakpoints, and, with the optional MMU, breakpoint-on-address reference. Also, the monitor includes standard memory and register display and substitute features. A powerful disassembler with full floating-point support is part of the monitor.
We have taken a glimpse inside the hardware of the OSI-32, and we hope that this gives you some idea of this coprocessor board's speed and flexibility. Although we have discussed software only briefly, next month we will look in greater detail at the languages and programming tools available.
The authors are indebted to Martin A. Lewis of Cambrian Consultants Inc. of Calabasas, California, for his help and guidance during the project and to applications engineer Les Wilson of National Semiconductor for his untiring assistance.