# BITS, Pilani-Dubai Campus, Knowledge Village, Dubai IIIrd Year Second Semester 2004-2005 Degree:B.E.(Hons) Branch:C.S.E COURSE NO. : CS UC 342 #### **COURSE TITLE: Advanced Computer Organisation** Time: 3 hrs Date: 22-5-2005 Marks: 70 Comprehensive exam(closed book) For all mathematical problems steps should be provided #### Part -A Answer any 10 questions. All questions carry equal marks.(10 \*2=20M) Q1. Specify four levels of programs for measuring the performance of a computer. Q2. Find out the fraction of CPU time consumed in the following polling approach: FD transfers data to the processor in 16-bit units. And has a data transfer rate of 50 KB/sec. No data transmission can be missed. The number of clock cycles for polling =400 Time for every clk=500 n.sec - Q3. With respect to Q2 find out the fraction of cpu time consumed for data transfer by a hard disk that transfers data in 4 word chunks and can transfer at 8 MB/sec - Q4. Using appropriate diagrams outline the scenarios in which a DMA controller can act as the - a) Master of the bus - b) Slave of the bus - Q5. Assume a DMA type of data transfer takes place. Can a CPU do any useful work during DMA type of data transfer. Justify your answer. - Q6. Outline the difference in sharing of information between processors in centralized shared memory multiprocessor and processors in distributed memory multiprocessor. - Q7 outline how interrupt driven data transfer is better than polling? - Q8. Outline what is meant by structural hazard in a pipelined architecture? - Q9. Draw a flowchart for performing the floating point multiplication - Q10. what are the pros and cons for a micro programming sequencer? - Q11. Briefly bring out the difference between DSP and general purpose processor. - Q12. Briefly compare the difference in architecture of RISC processor with that of CISC processor Part –B (Answer all the questions. All questions carry equal marks) (5 \*10=50 marks) 1.a) With appropriate timing diagram outline how data transfer between a CPU and I/O device takes place using synchronous bus ?(5 M) b) Outline how using asynchronous bus speed matching between slow I/O and high speed memory can be carried out during data transfer (using necessary timing diagram). (5 M) - 2. Outline the design of single cycle data path and control path for handling MIPS instructions explaining the need for every component and the control signals (10 M) 3 a)Outline how Read miss and write miss are handled in a multiple and the control signals. - 3 a)Outline how Read miss and write miss are handled in a multiprocessor system that employ cache system which makes use of snoopy based cache coherency protocol? (6 M) - b) Assume that there are 50 processors in a multiprocessing environment executing a program which is 10% sequential, and 90% suitable for parallel execution. What is the speed up in executing the program due to the usage of multiple processors? (4 M) - 4. Consider the following MIPS code sequence where a,b,c,d,e,f are memory variables. lw Rb, b lw Rc, c Add Ra, Rb, Rc 1 sw a, Ra lw Re, e lw Rf, f sub Rd, Re, Rf sw d, Rd - a)Identify the types of hazards in the above instructions of MIPS when getting executed from a pipelined architecture (5M) - b)Outline how using appropriate compiler scheduling of the instructions the above hazards can be prevented ?(5 M) - 5.a) Using relevant diagrams outline the need for bus arbitration in a common bus based multiprocessor system? (2 M) - b) What are the differences between daisy chain arbitration and centralized bus arbitration ?(4 M) - c)Outline the techniques used for improving the bandwidth of multibus? (4 M) #### BITS, Pilani-Dubai, Campus, Knowledge Village, Dubai .... IIIrdi Year Eirst Semester 2004-2005 Degree: B.E. (Hons) & Branch: C.S.E. Test: 2 Advanced Computer organization All questions carry equal marks Time-50mts Date:24-4-2005 (open book)? Books and class notes are permitteds Dipoli Q1. Identify all of the data dependencies in the following code of pipelified MIPS. Which dependencies are data hazards that will be resolved via forwarding?(6 marks) Add \$2,\$5,\$4 Add \$4,\$2,\$5 Sw \$5,100(\$2) Add \$3,\$2,\$4 Q2. Consider executing the following code on the pipelined datapath of MIPS (6 marks) 7.13.75 . 49.00 (1.19.4) add \$1,\$2,\$3 add \$4,\$5,\$6 add \$7.\$8.\$9 add \$10,\$11,\$12 add \$13,\$14,\$15 Which registers are read during the fifth clock cycle? Which registers are written at the end of the fifth clock cycle? Q3. Assuming single cycle clock MIPS, explain the flow of data and control signals. appropriate timing diagram for an ADD instruction during the interval between the falling edges of the CLK pulse (when various types of data; address and control signals become active & when they will become inactive etc;) (6 marks) (6 marks) in a series of the series of the control of the series Q4. Outline clearly the need for different multiplexers in single cycle data path of MIPS and the purpose of every control signal needed for those multiplexers (6 marks) area de la companya Q5.a) With relevant diagrams and examples bring out the difference between SIMD, and MIMD architecture of multiprocessors (2 marks) b) With relevant diagrams explain how the following computation can be done in parallel manner using two processors in the 1. Centralized shared multiprocessor architecture and 2. Distributed memory machine architecture (4 marks) Result = (A \* B \* C)/(D\*E\*F) #### BITS, Pilani-Dubai, Campus, Knowledge Village, Dubai Land Year First Semester 2004-2005 Degree: B.E. (Hons) Branch: C.S.E Test: 2 ---- Advanced-Computer-organization- All questions carry equal marks Time-50mts Date: 24-4-2005 (open book) Books and class notes are permitted Q1. Identify all of the data dependencies in the following code of pipelined MIPS. Which dependencies are data hazards that will be resolved via forwarding? (6 marks) 1.Add \$2,\$5,\$4 2.Add \$4,\$2,\$5 3.Sw \$5,100(\$2) 4.Add \$3,\$2,\$4 In instruction 1 and 2, \$2 register contents lead to dependency- Other than that any other dependency has to be found out using timing diagram Q2. Consider executing the following code on the pipelined datapath of MIPS (6 marks) add \$1,\$2,\$3 add \$4,\$5,\$6 add \$7,\$8,\$9 add \$10,\$11,\$12 add \$13,\$14,\$15 Which registers are read during the fifth clock cycle? Which registers are written at the end of the fifth clock cycle? Using appropriate timing diagram represent each instruction in the pipeline. Each inst differs from other instruction by one clk cycle-Then we can find out which regs are read during fifth clk cycle and written during fifth clk cycle Q3. Assuming single cycle clock MIPS, explain the flow of data and control signals using appropriate timing diagram for an ADD instruction during the interval between the falling edges of the CLK pulse(when various types of data, address and control signals become active & when they will become inactive etc.) (6 marks) Marks for the diagram -1 Explanation about when and why the values of PC, Rs, Rt and Rd change from old values to new values and become stable with respect to first falling edge of clock -the data to the register takes place at the falling edge of clock -all these things have to be OM Orthing about When and why the values of PC, Rs, Rt and Rd change from old values to new values how the contents of bus A and bus B transition- actual writing of explained step by step(5 marks) Q4. Outline clearly the need for different multiplexers in single cycle data path of MIPS and the purpose of every control signal needed for those multiplexers. (6 marks) for one input of ALU data may come from busB or 16-bit sign extended to 32 bit depending on Add instruction or LW instruction. To resolve this appropriate multiplexer is needed. For addressing the registers one multiplexer is needed because Rd field will decide the destination register in case of Add instruction-whereas in some instructions like logical operations Rt will decide the destination. Similar reasons should be given to other multiplexers also (1 +5) marks Q5.a) With relevant diagrams and examples bring out the difference between and MIMD architecture of multiprocessors.(2 marks) With appropriate diagrams SIMD and MIMD have to be explained. Two processors can share the same memory for fetching instructions. Every instruction will be fetched by both the processors and will be executed-But the data for every instruction is available from the local memory of each processor and it may be different For MIMD the processors involved can fetch and execute different instructions at the same time-similarly data also will be different for different instructions b) With relevant diagrams overlain bounds. b) With relevant diagrams explain how the following computation can be done in parallel manner using two processors in the 1. Centralized shared multiprocessor architecture and 2. Distributed memory machine architecture (4 marks) Result = (A \* B \*C)/(D\*E\*F) In the case of centralized shared memory the func1 to compute the numerator will be stored in the main memory. Then we will have the func2 to compute the denominator in the shared memory. CPU1 will execute code for finding Nr. Whereas CPU2 will execute the code corresponding code for Dr. Both the processors fetch the instructions using time interleaved technique. To exchange computed Nr or Dr among the different nodes shared memory itself can be used. On the other hand in the case of distributed memory machine the func1 will be in the local memory of processor1 and function2 will be in the local memory of processor2. Both can be executed without any time interleaving. For exchanging the Nr or Dr then messaging has to be implemented. #### BITS, Pilani-Dubai Campus, Knowledge Village, Dubai IIIrd Year Second Semester 2003-2004 Degree: B.E.(Hons) Branch: C.S.E COURSE NO. : CS UC 372 ### COURSE TITLE: Advanced Computer Organisation Time: 30 mts Date: 17-3-2005 Marks: 14 Quiz1:1 ## Answering scheme(All questions carry equal marks) Q1 How the following high level language construct is represented using assembly language of memory to memory machine? Example A = B+C (hypothetical code) Example A = B+C (hypothetical code) add Address A, Address B, Address C Q2. With respect to high level language construct given in example1, specify the implementation using assembly language of stack machines? - Push AddressC; - Push AddressB - Add - Pop AddressA; Q3 Specify the implementation for the problem in question 1 using assembly instruction of load store architecture? Load R1 B Load R2 C Add R1 R1 R2 Store A R1 Q4 Specify the instructions that are used to call and return from a procedure in MIPS assembly language? - jal procA - jump and link - jumps to the address specified by procA - stores the address of the following instruction in register \$31. - this is called the return address. - ir \$31 - used at the end of the procedure - to return back to the appropriate address Q5. Simplicity favours regularity. How the above concept is carried out in MIPS? Design Principle: simplicity favors regularity. Why? Of course this complicates some things... Operands must be registers, only 32 registers provided - Design Principle: smaller is faster. Why? Q6 Specify any two varieties of programs for performance evaluation? Toy Benchmarks Synthetic benchmarks Q7. Good design demands good compromises. How the above concept is carried out in MIPS? Compromise chosen by the MIPS designer to keep all instructions the same length thereby requiring different kinds of instruction formats for diff kinds of instructions Q8. State Amdahl's Law for performance improvement. Execution Time After Improvement = **Execution Time Unaffected +** ( Execution Time Affected / Amount of Improvement ) Q9 Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" Speed up=16 Q10. with respect to question Q9 How much do we have to improve the speed of multiplication to make the program run 5 times faster? Speed up= infinity Q11. Specify an instruction sequence that may result in data hazard in the case of pipelined architecture. InstrJ tries to read operand before InstrI writes it l: add r1,r2,r3 J: sub r4,r1,r3 - Caused by a "Dependence" (in compiler nomenclature). This hazard results from an actual need for communication - Q12. Specify any two techniques using which data hazard can be overcome. #### Bypass/forward/shortcircuit Pipeline Scheduling Q13. Specify an instruction sequence that may result in structural hazard. Structural Hazard 2: Register Try read and write to registers simultaneously Q14.State with appropriate reason whether the pipelined architecture improves the execution speed of an instruction or increase the throughput of instructions? Increase the throughput of instructions because for any instruction the total absolute execution time remains similar to that of non-pielined approach. # BITS, Pilani-Dubai, Campus, Knowledge Village, Dubai IIIrd Year First Semester 2004-2005 Degree: B.E. (Hons) Branch: C.S.E Test: 1 Advanced Computer organization All questions carry equal marks Time-50mts (Closed book) - Q1. Explain how each category of MIPS instructions is associated with constructs that appear in high level programming languages? (say assignment statements, Handling data structures, if statements) (10 marks) - Q2. Consider two different implementations ,M1 and M2 of the same instruction set. There are three classes of instructions (A,B and C) in the instruction set.M1 has a clock rate of 400MHz and M2 has a clock rate of 200MHz. The average number of cycles for each instruction class on M1 and M2 is given in the following table. | Class | | Cycles per | C1 usage | C2 usage | Third- | |-------|-------------|------------|----------|----------|--------| | | instruction | | | | party | | | on M1 | on M2 | | | usage | | A | 4 | 2 | 30% | 30% | 50% | | В | 6 | 4 | 50% | 20% | 30% | | C | 8 | 3 | 20% | 50% | 20% | The table also contains a summary of how three different compilers use the instruction set. C1 is a compiler produced by the makers of M1,C2 is a compiler produced by the makers of M2 and the other compiler is a third-party compiler product. Assume that each compiler uses the same number of instructions for a given program but that the instruction mix is as described in the table. Using C1 on both M1 and M2 how much faster can makers of M1 claim that M1 is compared M2? (2 marks) Using C2 on both M1 and M2 how much faster can makers of M2 claim that M2 is compared M1? (2 marks) If you purchase M1, which compiler would you use? (3 marks) If you purchase M2 which compiler would you use? (3 marks) Q3. You are going to enhance a machine and there are two possible improvements: either make multiply instructions run four times faster than before or make memory access instructions run two times faster than before . You repeatedly run a program that takes 100 secs to execute. Of this time 20% is used for multiplication ,50% for memory access instructions and 30 % for other tasks. What will be the speedup if you improve only multiplications? What will be the speedup if you improve only memory access? What will be the speedup if both improvements are made?(10 marks) Q4.Outline the four design principles that are used to design any ISA? (5 marks) Specify how MIPS ISA satisfy the above requirements? (5 marks) Q5. Suppose we enhance a machine to make all the floating point instructions run 5 times faster. Let us look at how speed up behaves when we incorporate the faster floating point hardware. If the execution time of some benchmark before the floating point enhancement is 10 seconds, what will be the speedup if half of the 10 secs is spent executing floating-point instructions? We are looking for a benchmark to show off the new floating point described as above and we want the overall bench mark to show a speed up of 3. One benchmark we are considering runs for 100 secs with the old floating point hardware. How much of the initial execution time would floating point instructions have to account for to show an overall speedup of 3 on this benchmark.? (5+5)