| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
|  | 
llvm-svn: 40824
 | 
| | 
| 
| 
| 
| 
|  | 
measurable speedup.
llvm-svn: 40823
 | 
| | 
| 
| 
| 
| 
| 
|  | 
to the worklist, and handling the last one with a 'tail call'.  This speeds
up PR1432 from 2.0578s to 2.0012s (2.8%)
llvm-svn: 40822
 | 
| | 
| 
| 
| 
| 
|  | 
mem2reg from 2.0742->2.0522s on PR1432.
llvm-svn: 40821
 | 
| | 
| 
| 
|  | 
llvm-svn: 40820
 | 
| | 
| 
| 
|  | 
llvm-svn: 40819
 | 
| | 
| 
| 
| 
| 
| 
|  | 
faster than with the 'local to a block' fastpath.  This speeds
up PR1432 from 2.1232 to 2.0686s (2.6%)
llvm-svn: 40818
 | 
| | 
| 
| 
| 
| 
| 
|  | 
to increment NumLocalPromoted, and didn't actually delete the
dead alloca, leading to an extra iteration of mem2reg.
llvm-svn: 40817
 | 
| | 
| 
| 
|  | 
llvm-svn: 40816
 | 
| | 
| 
| 
| 
| 
|  | 
Predsimplify fails llvm-gcc bootstrap.
llvm-svn: 40815
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
stored value was a non-instruction value.  Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
llvm-svn: 40813
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
and the alloca so they don't get reprocessed.
This speeds up PR1432 from 2.20s to 2.17s.
llvm-svn: 40812
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
1. Check for revisiting a block before checking domination, which is faster.
  2. If the stored value isn't an instruction, we don't have to check for domination.
  3. If we have a value used in the same block more than once, make sure to remove the
     block from the UsingBlocks vector.  Not doing so forces us to go through the slow
     path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432.  This speeds it up from 2.90s to 2.20s (31%)
llvm-svn: 40811
 | 
| | 
| 
| 
| 
| 
|  | 
testcase in PR1432 from 6.33s to 2.90s (2.22x)
llvm-svn: 40810
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
a using block from the list if we handle it.  Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
llvm-svn: 40809
 | 
| | 
| 
| 
|  | 
llvm-svn: 40808
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
LLVM. It cleans up the intrinsic definitions and generally smooths the process for more complicated intrinsic writing. It will be used by the upcoming atomic intrinsics as well as vector and float intrinsics in the future.
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
llvm-svn: 40807
 | 
| | 
| 
| 
| 
| 
|  | 
method.
llvm-svn: 40806
 | 
| | 
| 
| 
|  | 
llvm-svn: 40805
 | 
| | 
| 
| 
|  | 
llvm-svn: 40804
 | 
| | 
| 
| 
| 
| 
|  | 
in PR1432 by 6%
llvm-svn: 40803
 | 
| | 
| 
| 
|  | 
llvm-svn: 40802
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
struct X { int A; };
void foo() {
  struct X s;
  int i;
  i = __builtin_choose_expr(0, s, i);
}
compiles to:
        %tmp = load i32* %i             ; <i32> [#uses=1]
        store i32 %tmp, i32* %i
wow :)
llvm-svn: 40801
 | 
| | 
| 
| 
|  | 
llvm-svn: 40800
 | 
| | 
| 
| 
|  | 
llvm-svn: 40799
 | 
| | 
| 
| 
|  | 
llvm-svn: 40798
 | 
| | 
| 
| 
| 
| 
|  | 
Darwin (which makes size within a struct==96)
llvm-svn: 40796
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Chris suggested this, since it simplifies the code generator.
If this features is needed (and we don't think it is), we can revisit.
The following test case now produces an error.
[dylan:~/llvm/tools/clang] admin% cat t.c
typedef __attribute__(( ocu_vector_type(4) )) float float4;
static void test() {
    float4 vec4;
    vec4.rg.g;
    vec4.rg[1];
}
[dylan:~/llvm/tools/clang] admin% ../../Debug/bin/clang t.c
t.c:8:12: error: vector component access limited to variables
    vec4.rg.g;
           ^~
t.c:9:12: error: vector component access limited to variables
    vec4.rg[1];
           ^~~
2 diagnostics generated.
llvm-svn: 40795
 | 
| | 
| 
| 
|  | 
llvm-svn: 40794
 | 
| | 
| 
| 
|  | 
llvm-svn: 40793
 | 
| | 
| 
| 
| 
| 
| 
|  | 
(I've tried to get the info right for all targets,
but I'm not expert on all of them - check yours.)
llvm-svn: 40792
 | 
| | 
| 
| 
|  | 
llvm-svn: 40791
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This test case currently generates the following unexpected warnings (when compared with gcc).
[dylan:clang/test/Parser] admin% ../../../../Debug/bin/clang -parse-ast-check builtin_types_compatible.c
Warnings seen but not expected:
  Line 28: expression result unused
  Line 29: expression result unused
  Line 30: expression result unused
  Line 31: expression result unused
  Line 32: expression result unused
  Line 33: expression result unused
llvm-svn: 40789
 | 
| | 
| 
| 
|  | 
llvm-svn: 40788
 | 
| | 
| 
| 
|  | 
llvm-svn: 40787
 | 
| | 
| 
| 
| 
| 
|  | 
OCUVectorElementExpr respectively.  This is for consistency with other expr nodes end with *Expr.
llvm-svn: 40785
 | 
| | 
| 
| 
|  | 
llvm-svn: 40783
 | 
| | 
| 
| 
| 
| 
| 
|  | 
vec2.yx = vec2; // reverse
 
llvm-svn: 40782
 | 
| | 
| 
| 
| 
| 
|  | 
vec2.x = f;
llvm-svn: 40781
 | 
| | 
| 
| 
|  | 
llvm-svn: 40780
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
llvm vector shuffle instead of a bunch of insert/extract operations.
For:   vec4 = vec4.yyyy;  // splat
Emit:
        %tmp1 = shufflevector <4 x float> %tmp, <4 x float> undef, <4 x i32> < i32 1, i32 1, i32 1, i32 1 > 
instead of:
        %tmp1 = extractelement <4 x float> %tmp, i32 1          
        %tmp2 = insertelement <4 x float> undef, float %tmp1, i32 0             
        %tmp3 = extractelement <4 x float> %tmp, i32 1          
        %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 1             
        %tmp5 = extractelement <4 x float> %tmp, i32 1          
        %tmp6 = insertelement <4 x float> %tmp4, float %tmp5, i32 2             
        %tmp7 = extractelement <4 x float> %tmp, i32 1          
        %tmp8 = insertelement <4 x float> %tmp6, float %tmp7, i32 3             
llvm-svn: 40779
 | 
| | 
| 
| 
|  | 
llvm-svn: 40778
 | 
| | 
| 
| 
|  | 
llvm-svn: 40777
 | 
| | 
| 
| 
|  | 
llvm-svn: 40776
 | 
| | 
| 
| 
|  | 
llvm-svn: 40775
 | 
| | 
| 
| 
|  | 
llvm-svn: 40774
 | 
| | 
| 
| 
|  | 
llvm-svn: 40772
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
compile stuff
like this:
typedef __attribute__(( ocu_vector_type(4) )) float float4;
float4 test1(float4 V) {
  return V.wzyx+V;
}
to:
_test1:
        pshufd  $27, %xmm0, %xmm1
        addps   %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        ret
and:
_test1:
        mfspr r2, 256
        oris r3, r2, 4096
        mtspr 256, r3
        li r3, lo16(LCPI1_0)
        lis r4, ha16(LCPI1_0)
        lvx v3, r4, r3
        vperm v3, v2, v2, v3
        vaddfp v2, v3, v2
        mtspr 256, r2
        blr 
llvm-svn: 40771
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We can now codegen:
  vec4.xy;
as nothing!
llvm-svn: 40769
 | 
| | 
| 
| 
|  | 
llvm-svn: 40768
 |