summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalizationMatthias Braun2015-11-122-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Factor out code to query and modify the sign bit of a floatingpoint value as an integer. This also works if none of the targets integer types is big enough to hold all bits of the floatingpoint value. - Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available, otherwise perform bit manipulation on the sign bit. The previous code used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also takes 34 instructions on ARM Cortex-M4. With this patch we only require 5: vldr d0, LCPI0_0 vmov r2, r3, d0 lsrs r2, r3, #31 bfi r1, r2, #31, #1 bx lr (This could be further improved if the compiler would recognize that r2, r3 is zero). - Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is available otherwise perform bit manipulation on the sign bit. - Perform the sign(x) test by masking out the sign bit and comparing with 0 rather than shifting the sign bit to the highest position and testing for "<s 0". For x86 copysignl (on 80bit values) this gets us: testl $32768, %eax rather than: shlq $48, %rax sets %al testb %al, %al Differential Revision: http://reviews.llvm.org/D11172 llvm-svn: 252839
* [TLS on Darwin] use a different mask for tls calls on x86-64.Manman Ren2015-11-121-0/+28
| | | | | | | | | Calls involved in thread-local variable lookup save more registers than normal calls. rdar://problem/23073171 llvm-svn: 252837
* [ARM] Enable shrink-wrapping by default.Quentin Colombet2015-11-1111-18/+27
| | | | | | | | Differential Revision: http://reviews.llvm.org/D14357 rdar://problem/21942589 llvm-svn: 252825
* [WinEH] Don't forward branches across empty EH pad BBsReid Kleckner2015-11-111-0/+53
| | | | | | | For really simple SEH catchpads, we tried to forward the invoke unwind edge across the empty block. llvm-svn: 252822
* [DAGCombiner] Improve zextload optimization.Geoff Berry2015-11-111-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Don't fold (zext (and (load x), cst)) -> (and (zextload x), (zext cst)) if (and (load x) cst) will match as a zextload already and has additional users. For example, the following IR: %load = load i32, i32* %ptr, align 8 %load16 = and i32 %load, 65535 %load64 = zext i32 %load16 to i64 store i32 %load16, i32* %dst1, align 4 store i64 %load64, i64* %dst2, align 8 used to produce the following aarch64 code: ldr w8, [x0] and w9, w8, #0xffff and x8, x8, #0xffff str w9, [x1] str x8, [x2] but with this change produces the following aarch64 code: ldrh w8, [x0] str w8, [x1] str x8, [x2] Reviewers: resistor, mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14340 llvm-svn: 252789
* [WinEH] Only generate UnwindHelp slot for MSVCXXJoseph Tremoulet2015-11-111-6/+6
| | | | | | | | | | | | Summary: Other personalities don't use this special frame slot. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14580 llvm-svn: 252778
* [ARM] Combine BFIs togetherJames Molloy2015-11-111-0/+39
| | | | | | If we have a chain of BFIs, we may be able to combine several together into one merged BFI. We can do this if the "from" bits from one BFI OR'd with the "from" bits from the other BFI form a contiguous range, and the same with the "to" bits. llvm-svn: 252740
* [X86] Replace LEAs with INC/DEC when profitableMichael Kuperstein2015-11-112-1/+35
| | | | | | | | | If possible and profitable, replace lea %reg, 1(%reg) and lea %reg, -1(%reg) with inc %reg and dec %reg respectively. Patch by: anton.nadolsky@intel.com Differential Revision: http://reviews.llvm.org/D14059 llvm-svn: 252722
* [WebAssembly] Support non-legal argument and return types.Dan Gohman2015-11-111-5/+14
| | | | llvm-svn: 252687
* AMDGPU: Set isAllocatable = 0 on VS_32/VS_64Matt Arsenault2015-11-112-3/+3
| | | | llvm-svn: 252674
* [WinEH] Insert the MBB for EH_RESTORE after the catchretReid Kleckner2015-11-101-0/+46
| | | | | | | Inserting it before the target block could be bad, we might already have a fallthrough edge to it. llvm-svn: 252670
* [WebAssembly] Support for floating point min and max.Dan Gohman2015-11-102-0/+48
| | | | llvm-svn: 252653
* [PowerPC] Add an MI SSA peephole pass.Bill Schmidt2015-11-107-28/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a pass for doing PowerPC peephole optimizations at the MI level while the code is still in SSA form. This allows for easy modifications to the instructions while depending on a subsequent pass of DCE. Both passes are very fast due to the characteristics of SSA. At this time, the only peepholes added are for cleaning up various redundancies involving the XXPERMDI instruction. However, I would expect this will be a useful place to add more peepholes for inefficiencies generated during instruction selection. The pass is placed after VSX swap optimization, as it is best to let that pass remove unnecessary swaps before performing any remaining clean-ups. The utility of these clean-ups are demonstrated by changes to four existing test cases, all of which now have tighter expected code generation. I've also added Eric Schweiz's bugpoint-reduced test from PR25157, for which we now generate tight code. One other test started failing for me, and I've fixed it (test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not related to my changes, and I'm not sure why it works before and not after. The problem is that the CHECK-NOT: of "statepoint" from test1 fails because of the "statepoint" in test2, and so forth. Adding a CHECK-LABEL in between keeps the different occurrences of that string properly scoped. llvm-svn: 252651
* [X86] Do not try to custom-lower sitofp/fptosi in soft-float modeMichael Kuperstein2015-11-101-7/+125
| | | | | | Differential Revision: http://reviews.llvm.org/D14495 llvm-svn: 252621
* add 'MustReduceDepth' as an objective/cost-metric for the MachineCombinerSanjay Patel2015-11-101-5/+6
| | | | | | | | | | | | | | | | | | | | | | This is one of the problems noted in PR25016: https://llvm.org/bugs/show_bug.cgi?id=25016 and: http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html The spilling problem is independent and not addressed by this patch. The MachineCombiner was doing reassociations that don't improve or even worsen the critical path. This is caused by inclusion of the "slack" factor when calculating the critical path of the original code sequence. If we don't add that, then we have a more conservative cost comparison of the old code sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64 MULADD patterns because benchmark regressions were observed without that. The two failing test cases now have identical asm that does what we want: a + b + c + d ---> (a + b) + (c + d) Differential Revision: http://reviews.llvm.org/D13417 llvm-svn: 252616
* Reapply "[ARM] Combine CMOV into BFI where possible"James Molloy2015-11-101-0/+34
| | | | | | | | | | | | | | | | | | Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh! Original commit message: If we have a CMOV, OR and AND combination such as: if (x & CN) y |= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252606
* Update test to use explicit tripleOliver Stannard2015-11-101-2/+2
| | | | | | | This is needed for targets which do not support big-endian with the default triple. llvm-svn: 252603
* [AArch64] Fix halfword load merging for big-endian targetsOliver Stannard2015-11-101-11/+26
| | | | | | | | | | | | For big-endian targets, when we merge two halfword loads into a word load, the order of the halfwords in the loaded value is reversed compared to little-endian, so the load-store optimiser needs to swap the destination registers. This does not affect merging of two word loads, as we use ldp, which treats the memory as two separate 32-bit words. llvm-svn: 252597
* AVX512 : Implemented encoding and DAG lowering for VMOVHPS/PD and VMOVLPS/PD ↵Igor Breger2015-11-102-0/+56
| | | | | | | | instructions. Differential Revision: http://reviews.llvm.org/D14492 llvm-svn: 252592
* Support for emitting inline stack probesAndy Ayers2015-11-102-1/+143
| | | | | | | | | | | | | | | | | | For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages between the current stack limit and the desired new stack pointer location. This implements support for the inline expansion on x64. For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications that arise when introducing new machine basic blocks during prolog and epilog creation. Added a new test case, modified an existing one to exclude non-x64 coreclr (for now). Add test case Fix tests llvm-svn: 252578
* [Hexagon] Fixing compound register printing and reenabling more tests.Colin LeMahieu2015-11-101-1/+0
| | | | llvm-svn: 252574
* AArch64: add experimental support for address tagging.Tim Northover2015-11-101-0/+102
| | | | | | | | | | | | | AArch64 has the ability to use the top 8-bits of an "address" for extra information, with the memory subsystem automatically masking them off for loads and stores. When that's happening, we can sometimes skip masks on memory operations in the compiler. However, this requires the host OS and support stack to preserve those bits so it can't be enabled everywhere. In principle iOS 8.0 and above do take the required precautions and but we'll put it under a flag for now. llvm-svn: 252573
* [WebAssembly] Support 'unreachable' expressionDerek Schuff2015-11-101-0/+34
| | | | | | | | | | | | | | | Lower LLVM's 'unreachable' terminator to ISD::TRAP, and lower ISD::TRAP to wasm's 'unreachable' expression. WebAssembly type-checks expressions, but a noreturn function with a return type that doesn't match the context will cause a check failure. So we lower LLVM 'unreachable' to ISD::TRAP and then lower that to WebAssembly's 'unreachable' expression, which typechecks in any context and causes a trap if executed. Differential Revision: http://reviews.llvm.org/D14515 llvm-svn: 252566
* [Hexagon] Fixing store instructions and reenabling a few more tests.Colin LeMahieu2015-11-102-2/+0
| | | | llvm-svn: 252561
* [ARM] Handle t2ADDri in ARMAsmPrinter::EmitUnwindingInstruction.Akira Hatanaka2015-11-101-0/+11
| | | | | | | | | | | | | This fixes a bug in ARMAsmPrinter::EmitUnwindingInstruction where llvm_unreachable was reached because t2ADDri wasn't handled. Test case provided by Tim Northover. rdar://problem/23270609 http://reviews.llvm.org/D14518 llvm-svn: 252557
* [Hexagon] Fixing load instruction parsing and reenabling tests.Colin LeMahieu2015-11-102-2/+0
| | | | llvm-svn: 252555
* MachineVerifier: Add missing linebreakMatthias Braun2015-11-091-1/+2
| | | | | | | MachineInstr::print() with SkipOppers==true does not produce a linebreak, so we have to do that in MachineVerifier::report(). llvm-svn: 252551
* [WinEH] Don't emit CATCHRET from visitCatchPadDavid Majnemer2015-11-095-12/+8
| | | | | | | Instead, emit a CATCHPAD node which will get selected to a target specific sequence. llvm-svn: 252528
* specify triple so Windows bots won't be sadSanjay Patel2015-11-092-2/+2
| | | | llvm-svn: 252519
* [x86] try harder to match bitwise 'or' into an LEASanjay Patel2015-11-093-25/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation for this patch starts with the epic fail example in PR18007: https://llvm.org/bugs/show_bug.cgi?id=18007 ...unfortunately, this patch makes no difference for that case, but it solves some simpler cases. We'll get there some day. :) The current 'or' matching code was using computeKnownBits() via isBaseWithConstantOffset() -> MaskedValueIsZero(), but that's an unnecessarily limited use. We can do more by copying the logic in ValueTracking's haveNoCommonBitsSet(), so we can treat the 'or' as if it was an 'add'. There's a TODO comment here because we should lift the bit-checking logic into a helper function, so it's not duplicated in DAGCombiner. An example of the better LEA matching: leal (%rdi,%rdi), %eax andl $1, %esi orl %esi, %eax Becomes: andl $1, %esi leal (%rsi,%rdi,2), %eax Differential Revision: http://reviews.llvm.org/D13956 llvm-svn: 252515
* [WinEH] Tweak funclet prologue/epilogue insertion to pass verifierReid Kleckner2015-11-094-8/+8
| | | | | | | | | | For some reason we'd never run MachineVerifier on WinEH code, and you explicitly have to ask for it with llc. I added it to a few test cases to get some coverage. Fixes PR25461. llvm-svn: 252512
* [WinEH] Re-committing r252249 (Clone funclets with multiple parents) with ↵Andrew Kaylor2015-11-094-10/+1686
| | | | | | | | additional fixes for determinism problems Differential Revision: http://reviews.llvm.org/D14454 llvm-svn: 252508
* [AArch64] Add UABDL patterns for log2 shuffle.Charlie Turner2015-11-091-0/+66
| | | | | | | | | | | | | | | Summary: This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles. Relies on D14207 to be able to match the `extract_subvector(..., 0)` Reviewers: t.p.northover, jmolloy Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14208 llvm-svn: 252465
* [EABI] Add LLVM support for -meabi flagRenato Golin2015-11-092-104/+214
| | | | | | | | | | | | | | | | | | | | | "GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent '__aeabi_memops'. This convertion violates GCC contract. The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI targets. Without -meabi: use the triple default EABI. With -meabi=default: use the triple default EABI. With -meabi=gnu: use 'memops'. With -meabi=4 or -meabi=5: use '__aeabi_memops'. With -meabi set to an unknown value: same as -meabi=default. Patch by Vinicius Tinti. llvm-svn: 252462
* Revert "[ARM] Combine CMOV into BFI where possible"Renato Golin2015-11-091-23/+0
| | | | | | | This reverts commit r252057, as it broke ARM self-hosting buildbots, probably due to a code-gen fault. llvm-svn: 252460
* [CodeGen] Always promote f16 if not legalOliver Stannard2015-11-091-164/+150
| | | | | | | | | | | | | | | | | | | We don't currently have any runtime library functions for operations on f16 values (other than conversions to and from f32 and f64), so we should always promote it to f32, even if that is not a legal type. In that case, the f32 values would be softened to f32 library calls. SoftenFloatRes_FP_EXTEND now needs to check the promoted operand's type, as it may ne a no-op or require a different library call. getCopyFromParts and getCopyToParts now need to cope with a floating-point value stored in a larger integer part, as is the case for any target that needs to store an f16 value in a 32-bit integer register. Differential Revision: http://reviews.llvm.org/D12856 llvm-svn: 252459
* [Hexagon] Enabling ASM parsing on Hexagon backend and adding instruction ↵Colin LeMahieu2015-11-096-0/+6
| | | | | | parsing tests. General updating of the code emission. llvm-svn: 252443
* [PowerPC] Fix LoopPreIncPrep not to depend on SCEV constant simplificationsHal Finkel2015-11-081-0/+94
| | | | | | | | | | | | | | | | | | | | | | | Under most circumstances, if SCEV can simplify X-Y to a constant, then it can also simplify Y-X to a constant. However, there is no guarantee that this is always true, and concensus is not to consider that a correctness bug in SCEV (although it is undesirable). PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and prefetches) into buckets, where in each bucket the relative pointer offsets are constant. We used to keep each bucket as a multimap, where SCEV's subtraction operation was used to define the ordering predicate. Instead, use a fixed SCEV base expression for each bucket, record the constant offsets from that base expression, and adjust it later, if desirable, once all pointers have been collected. Doing it this way should be more compile-time efficient than the previous scheme (in addition to making the implementation less sensitive to SCEV simplification quirks). Fixes PR25170. llvm-svn: 252417
* [WinEH] Update PHIs of CATCHRET successorsDavid Majnemer2015-11-081-0/+34
| | | | | | | | | | | The TailDuplication machine pass ran across a malformed CFG: a PHI node referred it's predecessor's predecessor instead of it's predecessor. This occurred because we split the edge in X86ISelLowering when we processed the CATCHRET but forgot to do something about the PHI nodes. This fixes PR25444. llvm-svn: 252413
* [WinEH] Update exception pointer registersJoseph Tremoulet2015-11-072-5/+17
| | | | | | | | | | | | | | | | | | | | Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383
* [AArch64][FastISel] Don't even try to select vector icmps.Ahmed Bougacha2015-11-061-0/+100
| | | | | | | | | | | | We used to try to constant-fold them to i32 immediates. Given that fast-isel doesn't otherwise support vNi1, when selecting the result users, we'd fallback to SDAG anyway. However, if the users were in another block, we'd insert broken cross-class copies (GPR32 to FPR64). Give up, let SDAG agree with itself on a vNi1 legalization strategy. llvm-svn: 252364
* [X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.Ahmed Bougacha2015-11-062-12/+4
| | | | | | | | | | | When matching non-LSB-extracting truncating broadcasts, we now insert the necessary SRL. If the scalar resulted from a load, the SRL will be folded into it, creating a narrower, offset, load. However, i16 loads aren't Desirable, so we get i16->i32 zextloads. We already catch i16 aextloads; catch these as well. llvm-svn: 252363
* [X86] SRL non-LSB extracts when folding to truncating broadcasts.Ahmed Bougacha2015-11-064-58/+110
| | | | | | | | | | | | Now that we recognize this, we can support it instead of bailing out. That is, we can fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc (srl Y, 16))))) llvm-svn: 252362
* [X86] Don't fold non-LSB extracts into truncating broadcasts.Ahmed Bougacha2015-11-064-0/+396
| | | | | | | | | | | | | | | We used to incorrectly assume that the offset we're extracting from was a multiple of the element size. So, we'd fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc Y)))) whereas we should have extracted the higher bits from X. Instead, bail out if the assumption doesn't hold. llvm-svn: 252361
* DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> ↵Tom Stellard2015-11-063-10/+32
| | | | | | | | | | | | extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349
* [WebAssembly] Use more explicit types in testcases.Dan Gohman2015-11-0610-114/+114
| | | | llvm-svn: 252345
* [WebAssembly] Add more explicit pushes to the tests.Dan Gohman2015-11-0619-169/+169
| | | | llvm-svn: 252344
* [ShrinkWrapping] Teach shrink-wrapping how to analyze RegMask.Quentin Colombet2015-11-061-0/+59
| | | | | | | Previously we were conservatively assuming that RegMask operands clobber callee saved registers. llvm-svn: 252341
* Improved the operands commute transformation for X86-FMA3 instructions.Andrew Kaylor2015-11-062-12/+515
| | | | | | | | | | | | All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335
* [WebAssembly] Make expression-stack pushing explicitDan Gohman2015-11-0615-191/+191
| | | | | | | | | Modelling of the expression stack is evolving. This patch takes another step by making pushes explicit. Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252334
OpenPOWER on IntegriCloud