| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
(one-by-one until valgrind is happy)
llvm-svn: 127925
|
|
|
|
| |
llvm-svn: 127913
|
|
|
|
| |
llvm-svn: 127909
|
|
|
|
|
|
|
|
| |
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)
llvm-svn: 127895
|
|
|
|
| |
llvm-svn: 127874
|
|
|
|
| |
llvm-svn: 127873
|
|
|
|
|
|
|
|
|
|
|
| |
comparisons on x86. Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.
This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.
llvm-svn: 127852
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
shrl $2, %edi
imulq $613566757, %rdi, %rax
shrq $32, %rax
ret
instead of
movl %edi, %eax
imulq $613566757, %rax, %rcx
shrq $32, %rcx
subl %ecx, %eax
shrl %eax
addl %ecx, %eax
shrl $4, %eax
on x86_64
llvm-svn: 127829
|
|
|
|
| |
llvm-svn: 127821
|
|
|
|
|
|
| |
does not need to be checked on x86_64-win32 (aka Win64).
llvm-svn: 127800
|
|
|
|
|
|
| |
-mtriple=x86_64-linux.
llvm-svn: 127775
|
|
|
|
|
|
|
|
|
|
|
| |
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
|
|
|
|
| |
llvm-svn: 127765
|
|
|
|
|
|
|
| |
plus the test where it used to break.", which broke Clang self-host of a
Debug+Asserts compiler, on OS X.
llvm-svn: 127763
|
|
|
|
| |
llvm-svn: 127761
|
|
|
|
|
|
| |
where it used to break.
llvm-svn: 127757
|
|
|
|
|
|
|
| |
conforms to the ABI, but DAGCombine could in theory recognize the sequence of
zext asserts and truncates and generate incorrect code.
llvm-svn: 127754
|
|
|
|
|
|
| |
can event.
llvm-svn: 127741
|
|
|
|
|
|
| |
x86_64-win32.
llvm-svn: 127734
|
|
|
|
| |
llvm-svn: 127733
|
|
|
|
|
|
| |
are useless to Win64 target.
llvm-svn: 127732
|
|
|
|
| |
llvm-svn: 127731
|
|
|
|
| |
llvm-svn: 127730
|
|
|
|
| |
llvm-svn: 127694
|
|
|
|
| |
llvm-svn: 127683
|
|
|
|
| |
llvm-svn: 127680
|
|
|
|
| |
llvm-svn: 127678
|
|
|
|
|
|
|
|
| |
- Remove PTX 1.4 code generation
- Change type of intrinsics to .v4.i32 instead of .v4.i16
- Add and/or/xor integer instructions
llvm-svn: 127677
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 = bitcast v1
...
v3 = bitcast v2
...
= v3
=>
v2 = bitcast v1
...
= v1
if v1 and v3 are of in the same register class.
bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.
rdar://9104514
llvm-svn: 127668
|
|
|
|
|
|
| |
zext(undef) = 0, because the top bits will be zero.
llvm-svn: 127649
|
|
|
|
| |
llvm-svn: 127648
|
|
|
|
|
|
|
| |
Also more cleanly separate the ARM vs. Thumb functionality. Previously, the
encoding would be incorrect for some Thumb instructions (the indirect calls).
llvm-svn: 127637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:
_shuf:
@ BB#0: @ %entry
push {r4, r7, lr}
add r7, sp, #4
sub sp, #12
mov r4, sp
bic r4, r4, #7
mov sp, r4
mov r2, sp
vmov d16, r0, r1
orr r0, r2, #6
orr r3, r2, #7
vst1.8 {d16[0]}, [r3]
vst1.8 {d16[5]}, [r0]
subs r4, r7, #4
orr r0, r2, #5
vst1.8 {d16[4]}, [r0]
orr r0, r2, #4
vst1.8 {d16[4]}, [r0]
orr r0, r2, #3
vst1.8 {d16[0]}, [r0]
orr r0, r2, #2
vst1.8 {d16[2]}, [r0]
orr r0, r2, #1
vst1.8 {d16[1]}, [r0]
vst1.8 {d16[3]}, [r2]
vldr.64 d16, [sp]
vmov r0, r1, d16
mov sp, r4
pop {r4, r7, pc}
The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>
llvm-svn: 127630
|
|
|
|
| |
llvm-svn: 127621
|
|
|
|
| |
llvm-svn: 127598
|
|
|
|
|
|
|
| |
- Emit all arrays as type .b8 and proper sizes in bytes to conform
to the output of nvcc
llvm-svn: 127584
|
|
|
|
| |
llvm-svn: 127578
|
|
|
|
| |
llvm-svn: 127577
|
|
|
|
|
|
| |
Add a RUN line to this test.
llvm-svn: 127520
|
|
|
|
|
|
|
|
|
| |
Go ahead and add them on when we might want to use them and let
later passes remove them.
Fixes rdar://9118569
llvm-svn: 127518
|
|
|
|
|
|
| |
effect that we get proper instruction printing using the "pop" mnemonic for it.
llvm-svn: 127502
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127498
|
|
|
|
|
|
|
| |
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
llvm-svn: 127497
|
|
|
|
|
|
| |
created from the", it broke some GCC test suite tests.
llvm-svn: 127477
|
|
|
|
|
|
|
|
|
|
| |
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127459
|
|
|
|
|
|
|
|
| |
corresponding testcases back to the previous versions.
Fixes some performance regressions only seen on 32-bit.
llvm-svn: 127441
|
|
|
|
|
|
| |
the load is indexed. rdar://9117613.
llvm-svn: 127440
|
|
|
|
| |
llvm-svn: 127434
|
|
|
|
| |
llvm-svn: 127410
|
|
|
|
| |
llvm-svn: 127397
|