25th International Symposium on
Tutorial 1: Java: VM Architecture, Software Architecture, Implementations, and Applications
Lecturer: Wen-mei Hwu (University of Illinois)
In this tutorial, I will give an in-depth presentation on the design, implementation, and usage of Java systems. I will first introduce the Java Virtual Machine bytecode architecture, focusing on its unique instruction set features. I will then discuss Java's run-time software architecture, emphasizing on object file formats, dynamic loading, JavaBeans component architecture, security, and portability. I will then present the various implementation models of Java: interpreters, just-in-time compilers, native-executable-translation compilers, run-time optimizers, and Java microprocessors. I will compare Java systems with competing systems such as COM and ActiveX. I will then survey the promising Java application domains, i.e., extensible servers, thin clients, and embedded applications in the context of the architecture and implementation of Java systems.
Tutorial 2: Emerging Processor Techniques for Exploiting Instruction-level Parallelism
Lecturer: Sriram Vajapeyam (Indian Institute of Science)
Instruction-Level Parallelism (ILP) has witnessed a number of significant innovations in the last few years. At the same time, explosive technology trends promise a continued boom in transistor counts. The resulting scenario is one where several recent ILP innovations are expected to be implemented in commercial processors in the next few years, making processor design and evaluation both complex and challenging. This tutorial presents an overview of important recently-proposed ILP processor techniques. Primarily, the tutorial is a quick but comprehensive tour of ILP techniques for each of the different processor pipeline stages and selected aspects of the top level memory hierarchy. Examples of topics to be covered include: high-bandwidth instruction fetch, dispatch, and issue mechanisms; high-accuracy and high-bandwidth control prediction; memory dependence handling; data value prediction; aspects of high-bandwidth data caches; reuse of dependence and scheduling information; recovery from mis-speculation; sub-word SIMD parallelism for multimedia; and support for precise interrupts. Drawing from relevant literature, we discuss the hardware complexity of several of the techniques. Putting things together, we present an example high-ILP processor of the future that incorporates several of these techniques, in order to show how they might fit together and interact with each other. In conclusion, we mention open research issues and current research directions in the ILP area.
Tutorial 3: High-performance I/O Systems: From Architectures to Applications
Lecturer: Alok Chouddhary (Northwestern University)
Large-scale computing includes many application areas with intensive I/O demands, including scientific computing, databases, mining, decision support, and multimedia. For high-performance, it is critical to improve I/O performance of these applications. There are many solutions at different levels to address I/O problems. This tutorial presents architecture and software issues in designing scalable and efficient parallel I/O systems as well as I/O requirements, characteristics and examples from application domains of science and engineering, databases, and multimedia systems. The tutorial discusses interactions between compute node architectures hierarchical storage systems, and system software. The discussion will also include examples of commercial systems as well as software research projects. Finally, applications from various domains listed above will be discussed with respect to their I/O requirements and their impact on architectures and system software.
Tutorial 4: How to Make Protocols and Processors Work Right Every Time
Lecturers: Arvind and Martin Rinard (Laboratory for Computer Science, MIT)
Over the last several years, we have been trying to understand and teach published cache consistency protocols and architectures of modern microprocessors. Unfortunately, we have had a very difficult time convincing ourselves that these protocols were actually correct. In fact, we became convinced that one of the key difficulties was that the notion of correctness itself was not clearly defined!
Sophisticated microarchitectures present a similar problem. Modern microprocessors contain many optimizations such as write buffers, out-of-order execution of loads and stores, etc., which are clearly correct in a uniprocessor setting. But the overall effect of the interaction of these optimizations with the memory consistency model in a multiprocessor setting is often far from clear.
We have developed a method for precisely describing architectures and protocols. This method is based on a formalism called Term Rewriting Systems (TRS). In an architecture context, terms represent the state of the system and rewrite rules represent the state transitions. We have used term rewriting systems to precisely specify and verify complex microarchitectures and protocols. The approach is to start with a simple TRS that models a clearly correct system. For example, to investigate microarchitectures, we start with a TRS that models a non-pipelined implementation that serially executes instructions in order. We then develop increasingly more sophisticated TRS's that include features such as register renaming, out-of-order and speculative execution. The correctness is shown by proving that if one TRS can generate a result, then the other also generates the same result. More formally, the correctness is shown by proving that the two TRS's can simulate each other. This approach has fundamentally altered the way we view the design process.
Finally, we believe that TRS's may be useful as high-level architectural description language, from which hardware can be synthesized directly. We will describe our research in progress on hardware synthesis. In this full-day tutorial, we will introduce the TRS formalism, and show how it can be used to model a variety of protocols and microarchitectures. In particular, we will show TRSs for out-of-order and speculative processors, a family of TRS's for protocols that implement sequential consistency on a distributed shared memory system with a hierarchy of caches, and a TRS for location consistency (the relaxed consistency model for the parallel language Cilk). We will also go over some of the correctness proofs for these TRS's.