summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Remove 256-bit AVX non-temporal store intrinsics. Similar was previously ↵Craig Topper2012-05-081-0/+24
| | | | | | done for 128-bit. llvm-svn: 156375
* Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.Owen Anderson2012-05-071-0/+18
| | | | llvm-svn: 156324
* Fix a regression from r147481. This combine should only happen if there is aChad Rosier2012-05-071-1/+2
| | | | | | | single use. rdar://11360370 llvm-svn: 156316
* X86: optimization for -(x != 0)Manman Ren2012-05-071-0/+30
| | | | | | | | | | | | | | | | | This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312
* Add support for the 'l' constraint.Eric Christopher2012-05-071-0/+11
| | | | | | Patch by Jack Carter. llvm-svn: 156294
* Add support for the 'c' constraint.Eric Christopher2012-05-071-1/+7
| | | | | | Patch by Jack Carter. llvm-svn: 156293
* Add support for the 'P' constraint.Eric Christopher2012-05-072-0/+22
| | | | | | Patch by Jack Carter. llvm-svn: 156292
* Add support for the 'O' constraint.Eric Christopher2012-05-072-0/+22
| | | | | | Patch by Jack Carter. llvm-svn: 156285
* Add support for the 'N' inline asm constraint.Eric Christopher2012-05-072-0/+23
| | | | | | Patch by Jack Carter. llvm-svn: 156284
* Add support for the 'L' inline asm constraint.Eric Christopher2012-05-072-1/+22
| | | | | | Patch by Jack Carter. llvm-svn: 156283
* Add support for the inline asm constraint 'K'.Eric Christopher2012-05-072-0/+22
| | | | llvm-svn: 156282
* Add SSE4A MOVNTSS/MOVNTSD instructions.Craig Topper2012-05-071-0/+19
| | | | llvm-svn: 156281
* Support the 'J' constraint.Eric Christopher2012-05-072-0/+22
| | | | | | Patch by Jack Carter. llvm-svn: 156280
* Add support for the 'I' inline asm constraint. Also add testsEric Christopher2012-05-075-0/+99
| | | | | | | | from the previous 2 patches. Patch by Jack Carter. llvm-svn: 156279
* Switch the select to branch transformation on by default.Benjamin Kramer2012-05-062-2/+2
| | | | | | | | | The primitive conservative heuristic seems to give a slight overall improvement while not regressing stuff. Make it available to wider testing. If you notice any speed regressions (or significant code size regressions) let me know! llvm-svn: 156258
* CodeGenPrepare: Add a transform to turn selects into branches in some cases.Benjamin Kramer2012-05-051-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This came up when a change in block placement formed a cmov and slowed down a hot loop by 50%: ucomisd (%rdi), %xmm0 cmovbel %edx, %esi cmov is a really bad choice in this context because it doesn't get branch prediction. If we emit it as a branch, an out-of-order CPU can do a better job (if the branch is predicted right) and avoid waiting for the slow load+compare instruction to finish. Of course it won't help if the branch is unpredictable, but those are really rare in practice. This patch uses a dumb conservative heuristic, it turns all cmovs that have one use and a direct memory operand into branches. cmovs usually save some code size, so we disable the transform in -Os mode. In-Order architectures are unlikely to benefit as well, those are included in the "predictableSelectIsExpensive" flag. It would be better to reuse branch probability info here, but BPI doesn't support select instructions currently. It would make sense to use the same heuristics as the if-converter pass, which does the opposite direction of this transform. Test suite shows a small improvement here and there on corei7-level machines, but the actual results depend a lot on the used microarchitecture. The transformation is currently disabled by default and available by passing the -enable-cgp-select2branch flag to the code generator. Thanks to Chandler for the initial test case to him and Evan Cheng for providing me with comments and test-suite numbers that were more stable than mine :) llvm-svn: 156234
* This patch adds a new NVPTX back-end to LLVM which supports code generation ↵Justin Holewinski2012-05-0417-0/+1994
| | | | | | | | | | | | | | | | | for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it. The new target machines are: nvptx (old ptx32) => 32-bit PTX nvptx64 (old ptx64) => 64-bit PTX The sources are based on the internal NVIDIA NVPTX back-end, and contain more functionality than the current PTX back-end currently provides. NV_CONTRIB llvm-svn: 156196
* Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits ↵Sebastian Pop2012-05-041-4/+14
| | | | | | 16-bits encoding of CMN instructions. llvm-svn: 156195
* Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.Craig Topper2012-05-041-0/+8
| | | | llvm-svn: 156156
* Support for target dependent Hexagon VLIW packetizer.Sirish Pande2012-05-034-0/+69
| | | | | | This patch creates and optimizes packets as per Hexagon ISA rules. llvm-svn: 156109
* Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the ↵Craig Topper2012-05-031-1/+1
| | | | | | lower half correctly. Missed in r155982. llvm-svn: 156059
* Fix two-address pass's aggressive instruction commuting heuristics. It's meantEvan Cheng2012-05-032-2/+12
| | | | | | | | | | | | | | | | | | | | | | to catch cases like: %reg1024<def> = MOV r1 %reg1025<def> = MOV r0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 By commuting ADD, it let coalescer eliminate all of the copies. However, there was a bug in the heuristics where it ended up commuting the ADD in: %reg1024<def> = MOV r0 %reg1025<def> = MOV 0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 That did no benefit but rather ensure the last MOV would not be coalesced. rdar://11355268 llvm-svn: 156048
* Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, ↵Owen Anderson2012-05-021-0/+9
| | | | | | just like it now knows for FMULs. llvm-svn: 156029
* Teach DAG combine that multiplication by 1.0 can always be constant folded.Owen Anderson2012-05-021-0/+9
| | | | llvm-svn: 156023
* Revert r155853Manman Ren2012-05-021-21/+0
| | | | | | | | The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992
* Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support ↵Craig Topper2012-05-021-0/+14
| | | | | | for AsmPrinter. llvm-svn: 155982
* Strip the pointer casts off of allocas so that the selection DAG can find them.Bill Wendling2012-05-011-0/+17
| | | | | | PR10799 llvm-svn: 155954
* X86: optimization for max-like structManman Ren2012-05-011-0/+42
| | | | | | | | | | | | | | | | | | | | | | | This patch will optimize the following cases on X86 (a > b) ? (a-b) : 0 (a >= b) ? (a-b) : 0 (b < a) ? (a-b) : 0 (b <= a) ? (a-b) : 0 FROM movl %edi, %ecx subl %esi, %ecx cmpl %edi, %esi movl $0, %eax cmovll %ecx, %eax TO xorl %eax, %eax subl %esi, %edi cmovll %eax, %edi movl %edi, %eax rdar: 10734411 llvm-svn: 155919
* Regression test for PR2960.Jay Foad2012-05-011-0/+13
| | | | llvm-svn: 155912
* X86: optimization for -(x != 0)Manman Ren2012-04-301-0/+21
| | | | | | | | | | | | | This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853
* test/CodeGen/X86/select.ll: remove spacesManman Ren2012-04-301-1/+1
| | | | llvm-svn: 155840
* Fix fastcc structure return with fast-isel on x86-32Derek Schuff2012-04-301-0/+14
| | | | | | | | | | | | | | On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. (this time, actually commit what was reviewed!) llvm-svn: 155825
* Don't introduce illegal types when creating vmull operations. <rdar://11324364>Bob Wilson2012-04-301-0/+74
| | | | | | | | ARM BUILD_VECTORs created after type legalization cannot use i8 or i16 operands, since those types are not legal. Instead use i32 operands, which will be implicitly truncated by the BUILD_VECTOR to match the element type. llvm-svn: 155824
* Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.Andrew Trick2012-04-281-0/+46
| | | | | | | | | | This time, also fix the caller of AddGlue to properly handle incomplete chains. AddGlue had failure modes, but shamefully hid them from its caller. It's luck ran out. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155749
* Revert r155745Derek Schuff2012-04-271-14/+0
| | | | llvm-svn: 155746
* Fix fastcc structure return with fast-isel on x86-32Derek Schuff2012-04-271-0/+14
| | | | | | | | | | | | On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. llvm-svn: 155745
* Temporarily revert r155668: Fix the SD scheduler to avoid gluing.Andrew Trick2012-04-271-46/+0
| | | | | | This definitely caused regression with ARM -mno-thumb. llvm-svn: 155743
* Add x86-specific DAG combine to simplify:Chad Rosier2012-04-271-0/+22
| | | | | | | | | | | | | | | | | | | | | x == -y --> x+y == 0 x != -y --> x+y != 0 On x86, the generated code goes from negl %esi cmpl %esi, %edi je .LBB0_2 to addl %esi, %edi je .L4 This case is correctly handled for ARM with "cmn". Patch by Manman Ren. rdar://11245199 PR12545 llvm-svn: 155739
* Make test less fragile.Evan Cheng2012-04-271-2/+2
| | | | llvm-svn: 155732
* Fix the order of the operands in the llvm.fma intrinsic patterns for ARM,Lang Hames2012-04-271-3/+3
| | | | | | <rdar://problem/11325085>. llvm-svn: 155724
* X86: Don't emit conditional floating point moves on when targeting ↵Benjamin Kramer2012-04-272-2/+17
| | | | | | | | | | | | | | | pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions. * During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704
* Add mcpu to tests to prevent them from using AVX instructions on Sandy ↵Craig Topper2012-04-2732-49/+49
| | | | | | Bridge after r155618. llvm-svn: 155696
* Implement a bastardized ABI.Evan Cheng2012-04-271-1/+0
| | | | llvm-svn: 155686
* - thumbv6 shouldn't imply +thumb2. Cortex-M0 doesn't suppport 32-bit Thumb2Evan Cheng2012-04-271-0/+12
| | | | | | | | instructions. - However, it does support dmb, dsb, isb, mrs, and msr. rdar://11331541 llvm-svn: 155685
* Fix the SD scheduler to avoid gluing the same node twice.Andrew Trick2012-04-261-0/+46
| | | | | | | | | | | DAGCombine strangeness may result in multiple loads from the same offset. They both may try to glue themselves to another load. We could insist that the redundant loads glue themselves to each other, but the beter fix is to bail out from bad gluing at the time we detect it. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155668
* Use VLD1 in NEON extenting-load patterns instead of VLDR.Tim Northover2012-04-261-2/+6
| | | | | | | On some cores it's a bad idea for performance to mix VFP and NEON instructions and since these patterns are NEON anyway, the NEON load should be used. llvm-svn: 155630
* If triple is armv7 / thumbv7 and a CPU is specified, do not automatically assumeEvan Cheng2012-04-261-7/+14
| | | | | | | | | | the feature set of v7a. This comes about if the user specifies something like -arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as uxtab in this case. rdar://11318438 llvm-svn: 155601
* Try to fix llvm-arm-linux builder with -mcpu.Jakob Stoklund Olesen2012-04-251-1/+1
| | | | llvm-svn: 155589
* Trivial change to make the test use -mcpu=generic so as to avoidPreston Gurd2012-04-251-1/+1
| | | | | | a failure if run on an Intel Atom with post RA instruction scheduling. llvm-svn: 155587
* Do not use $gp as a dedicated global register if the target ABI is not O32. Akira Hatanaka2012-04-252-5/+6
| | | | llvm-svn: 155522
OpenPOWER on IntegriCloud