diff options
| author | Chris Lattner <sabre@nondot.org> | 2010-10-04 04:39:25 +0000 | 
|---|---|---|
| committer | Chris Lattner <sabre@nondot.org> | 2010-10-04 04:39:25 +0000 | 
| commit | d3f45c8cf2de458c521c02732ecd314e3814d3b0 (patch) | |
| tree | 5486d21b62007c60d64248bca3095756842831f3 /llvm | |
| parent | 9fd1e92de8e7f8d2896d11659ab201003f211913 (diff) | |
| download | bcm5719-llvm-d3f45c8cf2de458c521c02732ecd314e3814d3b0.tar.gz bcm5719-llvm-d3f45c8cf2de458c521c02732ecd314e3814d3b0.zip | |
checkpoint, the release notes are now feature complete.
llvm-svn: 115495
Diffstat (limited to 'llvm')
| -rw-r--r-- | llvm/docs/ReleaseNotes.html | 97 | 
1 files changed, 50 insertions, 47 deletions
| diff --git a/llvm/docs/ReleaseNotes.html b/llvm/docs/ReleaseNotes.html index 48d5c6fe5cd..29de47c49ec 100644 --- a/llvm/docs/ReleaseNotes.html +++ b/llvm/docs/ReleaseNotes.html @@ -742,8 +742,9 @@ it run faster:</p>  <li>A new (experimental) "-rendermf" pass is available which renders a      MachineFunction into HTML, showing live ranges and other useful      details.</li> - -<!--New SubRegIndex tblgen class for targets -> jakob --> +<li>The new SubRegIndex tablegen class allows subregisters to be indexed +    symbolically instead of numerically.  If your target uses subregisters you +    will need to adapt to use SubRegIndex when you upgrade to 2.8.</li>  <!-- SplitKit -->  <li>The -fast-isel instruction selection path (used at -O0 on X86) was rewritten @@ -760,7 +761,7 @@ it run faster:</p>  </div>  <div class="doc_text"> -<p>New features of the X86 target include: +<p>New features and major changes in the X86 target include:  </p>  <ul> @@ -768,30 +769,38 @@ it run faster:</p>      in registers across basic blocks, dramatically improving performance of code      that uses long double, and when targetting CPUs that don't support SSE.</li> -  New SSEDomainFix pass:  -    On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a -    register in a different domain than where it was defined. Some instructions -    have equvivalents for different domains, like por/orps/orpd.  The -    SSEDomainFix pass tries to minimize the number of domain crossings by -    changing between equvivalent opcodes where possible. - -  X86 backend attempts to promote 16-bit integer operations to 32-bits to avoid -     0x66 prefixes, which are slow on some microarchitectures and bloat the code -     on others. - -  New support for X86 "thiscall" calling convention (x86_thiscallcc in IR) for windows. - -  New llvm.x86.int intrinsic (for int $42 and int3) - -  Verbose assembly decodes X86 shuffle instructions, e.g.: -  	insertps	$113, %xmm3, %xmm0     ## xmm0 = zero,xmm0[1,2],xmm3[1] -	unpcklps	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] -	pshufd	$1, %xmm1, %xmm1        ## xmm1 = xmm1[1,0,0,0] +<li>The X86 backend now uses a SSEDomainFix pass to optimize SSE operations.  On +    Nehalem ("Core i7") and newer CPUs there is a 2 cycle latency penalty on +    using a register in a different domain than where it was defined. This pass +    optimizes away these stalls.</li> + +<li>The X86 backend now promote 16-bit integer operations to 32-bits when +    possible. This avoids 0x66 prefixes, which are slow on some +    microarchitectures and bloat the code on all of them.</li> + +<li>The X86 backend now supports the Microsoft "thiscall" calling convention, +    and a <a href="LangRef.html#callingconv">calling convention</a> to support +    <a href="#GHC">ghc</a>.</li> + +<li>The X86 backend supports a new "llvm.x86.int" intrinsic, which maps onto +    the X86 "int $42" and "int3" instructions.</li> + +<li>At the IR level, the <2 x float> datatype is now promoted and passed +    around as a <4 x float> instead of being passed and returns as an MMX +    vector.  If you have a frontend that uses this, please pass and return a +    <2 x i32> instead (using bitcasts).</li> + +<li>When printing .s files in verbose assembly mode (the default for clang -S), +    the X86 backend now decodes X86 shuffle instructions and prints human +    readable comments after the most inscrutible of them, e.g.: +     +<pre> +  insertps $113, %xmm3, %xmm0 <i># xmm0 = zero,xmm0[1,2],xmm3[1]</i> +  unpcklps %xmm1, %xmm0       <i># xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]</i> +  pshufd   $1, %xmm1, %xmm1   <i># xmm1 = xmm1[1,0,0,0]</i> +</pre> +</li> -  X86 ABI:  <2 x float> in IR no longer maps onto MMX, it turns into <4 x float> - -  new GHC calling convention -  </ul>  </div> @@ -806,14 +815,21 @@ it run faster:</p>  </p>  <ul> - -  NEON: Better performance for QQQQ (4-consecutive Q register) instructions.  New reg sequence abstraction? -  ARM: Better scheduling (list-hybrid, hybrid?) -  ARM: Tail call support. -  ARM: General performance work and tuning. - -  ARM: Half float support through intrinsics LangRef.html#int_fp16 -<li>ARMGlobalMerge: <!-- Anton --> </li> +<li>The ARM backend now optimizes tail calls into jumps.</li> +<li>Scheduling is improved through the new list-hybrid scheduler as well +    as through better modeling of structural hazards.</li> +<li><a href="LangRef.html#int_fp16">Half float</a> instructions are now +    supported.</li> +<li>NEON support has been improved to model instructions which operate onto  +    multiple consequtive registers more aggressively.  This avoids lots of +    extraneous register copies.</li> +<li>The ARM backend now uses a new "ARMGlobalMerge" pass, which merges several +    global variables into one, saving extra address computation (all the global +    variables can be accessed via same base address) and potentially reducing +    register pressure.</li> + +<li>The ARM has received many minor improvements and tweaks which lead to +substantially better performance in a wide range of different scenarios.</li>  <li>The ARM NEON intrinsics have been substantially reworked to reduce      redundancy and improve code generation.  Some of the major changes are: @@ -863,21 +879,8 @@ it run faster:</p>    </li>    </ol>  </li> -</ul> -</div> - -<!--=========================================================================--> -<div class="doc_subsection"> -<a name="otherimprovements">Other Improvements and New Features</a> -</div> - -<div class="doc_text"> -<p>Other miscellaneous features include:</p> -<ul> -<li></li>  </ul> -  </div> | 

