IA-64 instead depends on the compiler for this task. Even before the program is fed into the CPU, the compiler studies the code and makes the similar sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has determined what paths to take, it collect up the instructions it knows can be run in parallel, bundles them into one bigger instruction, and then kept it in that form in the program.
Moving this task from the CPU to the compiler has many advantages. First, the compiler can spend considerably more time examining the code; a advantage the chip itself doesn't have because it has to finish the task as quickly as possible. Therefore the compiler version can be considerably more accurate than the similar code run on the chip's circuitry. Second, the prediction circuitry is rather difficult, and offloading a prediction to the compiler decreases that complexity enormously. It no longer has to study anything; it easily breaks the instruction apart again and feeds the pieces off to the cores. Third, doing the prediction in the compiler is a one-off cost, quite than one incurred every time the program is run.
The downside is that a program's runtime-behaviour is not always obvious in the code used to produce it, and may vary considerably depending on the real data being processed. The out-of-order processing logic of a mainstream CPU can create decisions on the basis of actual run-time data which the compiler can only guess at. It means that it is possible for the compiler to get its prediction wrong more often than comparable (or easier) logic placed on the CPU. Therefore this design this relies heavily on the performance of the compilers. It leads to reduce in microprocessor hardware difficulty by increasing compiler software difficulty.
Registers: The IA-64 architecture contains a very generous set of registers. It has a 64-bit integer registers and 82- bit floating point. In addition to these registers, IA-64 adds in a register rotation mechanism that is handled by the Register Stack Engine. Rather than the typical fill / spill or window mechanisms used in other processors, the Itanium can turn in a set of new registers to accommodate new temporaries or function parameters. The register rotation mechanism combined with predication is also very effective in implementing automatically unrolled loops.