summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Materialize i64 constants using bit inversionHal Finkel2015-01-041-0/+20
| | | | | | | | | Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. llvm-svn: 225132
* ARM: permit tail calls to weak externals on COFFSaleem Abdulrasool2015-01-031-0/+19
| | | | | | | | | | | Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119
* [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preferenceHal Finkel2015-01-031-8/+47
| | | | | | | | | | | | | | | | | The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117
* [PowerPC] Use 16-byte alignment for modern cores for functions/loopsHal Finkel2015-01-031-0/+65
| | | | | | | | | | | | Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (*), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. (*) Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. llvm-svn: 225115
* [PowerPC] Add support for the CMPB instructionHal Finkel2015-01-032-0/+254
| | | | | | | | | | | | | | Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106
* [PowerPC] Improve instruction selection bit-permuting operations (64-bit)Hal Finkel2015-01-011-0/+239
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056
* Revert "merge consecutive stores of extracted vector elements"Alexey Samsonov2014-12-311-33/+0
| | | | | | | This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
* x86_64: Fix calls to __morestack under the large code model.Peter Collingbourne2014-12-301-0/+15
| | | | | | | | | | | | | | | | | | | | Under the large code model, we cannot assume that __morestack lives within 2^31 bytes of the call site, so we cannot use pc-relative addressing. We cannot perform the call via a temporary register, as the rax register may be used to store the static chain, and all other suitable registers may be either callee-save or used for parameter passing. We cannot use the stack at this point either because __morestack manipulates the stack directly. To avoid these issues, perform an indirect call via a read-only memory location containing the address. This solution is not perfect, as it assumes that the .rodata section is laid out within 2^31 bytes of each function body, but this seems to be sufficient for JIT. Differential Revision: http://reviews.llvm.org/D6787 llvm-svn: 225003
* [Hexagon] Adding reg-reg indexed load forms.Colin LeMahieu2014-12-302-7/+7
| | | | llvm-svn: 224997
* Semantic tests for memory invalidation at statepointsPhilip Reames2014-12-291-0/+108
| | | | | | | | | | These are simply a collection of tests intended to show that information about the contents of gc references in the heap is lost at a statepoint. I've tried to write them so that they don't disallow correct transformations, while still being fairly easy to understand. p.s. Ideas for additional tests are welcome. Differential Revision: http://reviews.llvm.org/D6491 llvm-svn: 224971
* [Hexagon] Adding post-increment register form stores and register-immediate ↵Colin LeMahieu2014-12-292-2/+2
| | | | | | form stores with tests. llvm-svn: 224952
* Add segmented stack support for DragonFlyBSD.Rafael Espindola2014-12-291-0/+108
| | | | | | Patch by Michael Neumann. llvm-svn: 224936
* llvm/test/CodeGen/X86/fast-isel-call-bool.ll: Add explicit ↵NAKAMURA Takumi2014-12-281-1/+1
| | | | | | -mtriple=x86_64-unknown to satisfy x64. llvm-svn: 224907
* [X86][ISel] Fix a regression I introduced in r224884Keno Fischer2014-12-282-2/+14
| | | | | | | | | | | | | The else case ResultReg was not checked for validity. To my surprise, this case was not hit in any of the existing test cases. This includes a new test cases that tests this path. Also drop the `target triple` declaration from the original test as suggested by H.J. Lu, because apparently with it the test won't be run on Linux llvm-svn: 224901
* [X86] Add missing memory variants to AVX false dependency breakingMichael Kuperstein2014-12-282-62/+73
| | | | | | | | Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246) Differential Revision: http://reviews.llvm.org/D6780 llvm-svn: 224900
* [CodeGenPrepare] Teach when it is profitable to speculate calls to ↵Andrea Di Biagio2014-12-281-0/+250
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899
* Scalarizer for masked load and store intrinsics.Elena Demikhovsky2014-12-281-0/+15
| | | | | | | | Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks. http://reviews.llvm.org/D6436 llvm-svn: 224897
* PowerPC: CTR shouldn't fire if a TLS call is in the loopDavid Majnemer2014-12-271-1/+24
| | | | | | | | | | | | | | | Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 llvm-svn: 224890
* [FastIsel][X86] Fix invalid register replacement for bool argsKeno Fischer2014-12-271-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Consider the following IR: %3 = load i8* undef %4 = trunc i8 %3 to i1 %5 = call %jl_value_t.0* @foo(..., i1 %4, ...) ret %jl_value_t.0* %5 Bools (that are the result of direct truncs) are lowered as whatever the argument to the trunc was and a "and 1", causing the part of the MBB responsible for this argument to look something like this: %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7 Later, when the load is lowered, it will insert %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14 but remember to (at the end of isel) replace vreg7 by vreg15. Now for the bug. In fast isel lowering, we mistakenly mark vreg8 as the result of the load instead of the trunc. This adds a fixup to have vreg8 replaced by whatever the result of the load is as well, so we end up with %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15 which is an SSA violation and causes problems later down the road. This fixes PR21557. Test Plan: Test test case from PR21557 is added to the test suite. Reviewers: ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6245 llvm-svn: 224884
* No need to run llvm-as. NFC.Rafael Espindola2014-12-261-5/+4
| | | | llvm-svn: 224859
* [PowerPC] [FastISel] i1 constants must be zero extendedHal Finkel2014-12-251-0/+27
| | | | | | | | | | | When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. llvm-svn: 224842
* Masked Load/Store - Changed the order of parameters in intrinsics.Elena Demikhovsky2014-12-251-39/+49
| | | | | | | No functional changes. The documentation is coming. llvm-svn: 224829
* CodeGen: Error on redefinitions instead of assertingDavid Majnemer2014-12-242-0/+18
| | | | | | | It's possible to have a prior definition of a symbol in module asm. Raise an error instead of crashing. llvm-svn: 224828
* CodeGen: Allow aliases to be overridden by variablesDavid Majnemer2014-12-241-0/+11
| | | | llvm-svn: 224827
* MC: Label definitions are permitted after .set directivesDavid Majnemer2014-12-241-0/+12
| | | | | | | | | .set directives may be overridden by other .set directives as well as label definitions. This fixes PR22019. llvm-svn: 224811
* [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64Hal Finkel2014-12-231-1/+18
| | | | | | | | | | | | | | | | On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. llvm-svn: 224788
* [Hexagon] Reapplying 224775 load words.Colin LeMahieu2014-12-231-2/+2
| | | | llvm-svn: 224786
* Reverting 224775 until mayLoad flag is addressed.Colin LeMahieu2014-12-231-2/+2
| | | | llvm-svn: 224783
* [Hexagon] Adding word loads.Colin LeMahieu2014-12-231-2/+2
| | | | llvm-svn: 224775
* AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targetsElena Demikhovsky2014-12-232-0/+456
| | | | | | | | by Asaf Badouh http://reviews.llvm.org/D6456 llvm-svn: 224764
* [PowerPC] Don't mark the return-address slot as immutableHal Finkel2014-12-231-0/+25
| | | | | | | | | | | | | It is tempting to mark the fixed stack slot used to store the return address as immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the function, it is not completely immutable: it is written during the function prologue. When using post-RA instruction scheduling, the prologue instructions are available for scheduling, and we're not free to interchange the order of a particular store in the prologue with loads from that stack location. Fixes PR21976. llvm-svn: 224761
* AVX-512: BLENDM - fixed encoding of the broadcast versionElena Demikhovsky2014-12-231-1/+49
| | | | | | Added more intrinsics and encoding tests. llvm-svn: 224760
* [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of ↵Michael Kuperstein2014-12-231-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759
* [PowerPC] Don't attempt a 64-bit pow2 division on PPC32Hal Finkel2014-12-231-0/+9
| | | | | | | | | | In r224033, in moving the signed power-of-2 division expansion into BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a 64-bit division on PPC32. This later asserts. Fixes PR21928. llvm-svn: 224758
* [ARM] Don't break alignment when combining base updates into load/stores.Ahmed Bougacha2014-12-233-24/+105
| | | | | | | | | | | | | | | | | | r223862/r224203 tried to also combine base-updating load/stores. There was a mistake there: the alignment was added as is as an operand to the ARMISD::VLD/VST node. However, the VLD/VST selection logic doesn't care about less-than-standard alignment attributes. For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks VLD1q64 (because of the memory type). But VLD1q64 ("vld1.64 {dXX, dYY}") is 8-aligned, per ARMARMv7a 3.2.1. For the 1-aligned load, what we really want is VLD1q8. This commit introduces bitcasts if necessary, and changes the vld/vst type to one whose standard alignment matches the original load/store alignment. Differential Revision: http://reviews.llvm.org/D6759 llvm-svn: 224754
* X86: Don't over-align combined loads.Jim Grosbach2014-12-231-0/+35
| | | | | | | | | | | When combining consecutive loads+inserts into a single vector load, we should keep the alignment of the base load. Doing otherwise can, and does, lead to using overly aligned instructions. In the included test case, for example, using a 32-byte vmovaps on a 16-byte aligned value. Oops. rdar://19190968 llvm-svn: 224746
* Make musttail more robust for vector types on x86Reid Kleckner2014-12-222-0/+116
| | | | | | | | | | | | | | | | Previously I tried to plug musttail into the existing vararg lowering code. That turned out to be a mistake, because non-vararg calls use significantly different register lowering, even on x86. For example, AVX vectors are usually passed in registers to normal functions and memory to vararg functions. Now musttail uses a completely separate lowering. Hopefully this can be used as the basis for non-x86 perfect forwarding. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6156 llvm-svn: 224745
* [Hexagon] Adding memb instruction. Fixing whitespace in test from 224730.Colin LeMahieu2014-12-221-2/+2
| | | | llvm-svn: 224735
* [x86] Add vector @llvm.ctpop intrinsic custom loweringBruno Cardoso Lopes2014-12-221-0/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when ctpop is supported for scalar types, the expansion of @llvm.ctpop.vXiY uses vector element extractions, insertions and individual calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used for the scalar calls. Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY expansion in some cases by using a using a vector parallel bit twiddling approach, based on: v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) v = v + (v >> 8) v = v + (v >> 16) v = v & 0x0000003F (from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) When scalar ctpop isn't supported, the approach above performs better for v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop is supported, this approach performs ~2x better for v8i32. Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop support with -march=core-avx2. == [x86_64h - new] v8i32: 0.661685 v4i32: 0.514678 v4i64: 0.652009 v2i64: 0.324289 == [x86_64h - old] v8i32: 1.29578 v4i32: 0.528807 v4i64: 0.65981 v2i64: 0.330707 == [x86_64 - new] v8i32: 1.003 v4i32: 0.656273 v4i64: 1.11711 v2i64: 0.754064 == [x86_64 - old] v8i32: 2.34886 v4i32: 1.72053 v4i64: 1.41086 v2i64: 1.0244 More work for other vector types will come next. llvm-svn: 224725
* [CodeGenPrepare] Handle properly the promotion of operands when this does notQuentin Colombet2014-12-221-0/+25
| | | | | | | | | generate instructions. Fixes PR21978. Related to <rdar://problem/18310086> llvm-svn: 224717
* AVX-512: Added all forms of BLENDM instructions,Elena Demikhovsky2014-12-222-2/+75
| | | | | | intrinsics, encoding tests for AVX-512F and skx instructions. llvm-svn: 224707
* Lower multiply-negate operation to mneg on AArch64Karthik Bhat2014-12-221-0/+15
| | | | | | | | | | | This patch pattern matches code such as- neg w8, w8 mul w8, w9, w8 to mneg w8, w8, w9 Review: http://reviews.llvm.org/D6754 llvm-svn: 224706
* Convert a few tests to FileCheck. NFC.Rafael Espindola2014-12-224-14/+38
| | | | llvm-svn: 224705
* Enable (sext x) == C --> x == (trunc C) combineMatt Arsenault2014-12-213-7/+542
| | | | | | | | | Extend the existing code which handles this for zext. This makes this more useful for targets with ZeroOrNegativeOne BooleanContent and obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne) since the constant will now be shrunk to i1. llvm-svn: 224691
* [x86] Change the test added in r223774 to first check the spelling ofChandler Carruth2014-12-201-26/+33
| | | | | | | | | | | | | | | the error message for a bogus processor, and then look specifically for that error message using FileCheck. I actually tried to write the test this way at first, but drew a blank on how to ensure the error message stayed in sync (oops). Now that I've recalled how to do that, this is clearly better. It also fixes an issue with a malloc implementation that actually prints to stderr in all cases, which was causing problems for some builders it seems. llvm-svn: 224665
* Masked load and store codegen - fixed 128-bit vectorsElena Demikhovsky2014-12-191-8/+78
| | | | | | | The codegen failed on 128-bit types on AVX2. I added patterns and in td files and tests. llvm-svn: 224647
* R600/SI: Only form min/max with 1 use.Matt Arsenault2014-12-193-0/+69
| | | | | | | If the condition is used for something else, this increases the number of instructions. llvm-svn: 224646
* Add the ExceptionHandling::MSVC enumerationReid Kleckner2014-12-194-4/+4
| | | | | | | | | | | | | | | It is intended to be used for a family of personality functions that have similar IR preparation requirements. Typically when interoperating with MSVC personality functions, bits of functionality need to be outlined from the main function into helper functions. There is also usually more than one landing pad per invoke, which does not match the LLVM IR landingpad representation. None of this is implemented yet. This change just adds a new enum that is active for *-windows-msvc and delegates to the EH removal preparation pass. No functionality change for other targets. llvm-svn: 224625
* Model sqrtss as a binary operation with one source operand tied to the ↵Sanjay Patel2014-12-191-5/+38
| | | | | | | | | | | destination (PR14221) This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. Differential Revision: http://reviews.llvm.org/D6330 llvm-svn: 224624
* R600/SI: Make sure non-inline constants aren't folded into mubuf soffset operandTom Stellard2014-12-191-0/+39
| | | | | | | | mubuf instructions now define the soffset field using the SCSrc_32 register class which indicates that only SGPRs and inline constants are allowed. llvm-svn: 224622
OpenPOWER on IntegriCloud