summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* Add -mcpu to stackmap.llAndrew Trick2013-12-011-1/+1
| | | | llvm-svn: 196051
* Force CPU type to unbreak unit tests on Haswell machines.Juergen Ributzka2013-11-305-5/+5
| | | | llvm-svn: 195971
* Cleanup and test X86AsmPrinter::printPCRelImm.Rafael Espindola2013-11-271-0/+15
| | | | | | | | | | | | | | It is only used for asm printing. On X86 we put basic block addresses on register before passing them to inline asm, so the MO_MachineBasicBlock case was dead. MO_ExternalSymbol was dead since any symbol being passed to inline asm is represented as MO_GlobalAddress. The MO_GlobalAddress and MO_Register cases were not tested. llvm-svn: 195824
* Fix PR18054Michael Liao2013-11-261-0/+10
| | | | | | | | - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779
* StackMap: Implement support for DirectMemRefOp.Andrew Trick2013-11-262-10/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712
* Add an intrinsic for the SSE2 PAUSE instruction.Cameron McInally2013-11-261-0/+7
| | | | llvm-svn: 195697
* Unrevert r195599 with testcase fix.Bill Wendling2013-11-251-0/+31
| | | | | | | I'm not sure how it was checking for the wrong values... PR18023. llvm-svn: 195670
* Revert r195599 as it broke the builds.Amara Emerson2013-11-251-29/+0
| | | | llvm-svn: 195636
* Don't look past volatile loads.Bill Wendling2013-11-251-0/+29
| | | | | | | A volatile load should block us from trying to coalesce stores. PR18023 llvm-svn: 195599
* Debug Info: update testing cases to specify the debug info version number.Manman Ren2013-11-231-0/+2
| | | | | | | | | | We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. Make tests more robust by removing hard-coded metadata numbers in CHECK lines. llvm-svn: 195535
* Debug Info: update testing cases to specify the debug info version number.Manman Ren2013-11-2217-1/+34
| | | | | | | | We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. llvm-svn: 195504
* X86: Perform integer comparisons at i32 or larger.Jim Grosbach2013-11-226-106/+21
| | | | | | | | | | | | | | | Utilizing the 8 and 16 bit comparison instructions, even when an input can be folded into the comparison instruction itself, is typically not worth it. There are too many partial register stalls as a result, leading to significant slowdowns. By always performing comparisons on at least 32-bit registers, performance of the calculation chain leading to the comparison improves. Continue to use the smaller comparisons when minimizing size, as that allows better folding of loads into the comparison instructions. rdar://15386341 llvm-svn: 195496
* Teach ISel not to optimize 'optnone' functions (revised).Paul Robinson2013-11-221-0/+42
| | | | | | | | | | | | | Improvements over r195317: - Set/restore EnableFastISel flag instead of just running FastISel within SelectAllBasicBlocks; the flag is checked in various places, and FastISel won't run properly if those places don't do the right thing. - Test looks for normal ISel versus FastISel behavior, and not something more subtle that doesn't work everywhere. Based on work by Andrea Di Biagio. llvm-svn: 195491
* patchpoint: factor SD builder code for live vars. Plain stackmap also ↵Andrew Trick2013-11-221-1/+19
| | | | | | optimizes Constant values now. llvm-svn: 195488
* Fix PR18014Michael Liao2013-11-221-0/+16
| | | | | | | - When simplifying the mask generation for BLEND, check whether that mask is also consumed by other non-BLEND insns. If true, skip that simplification. llvm-svn: 195476
* Don't produce tail calls when the caller is x86_thiscallcc.Rafael Espindola2013-11-221-0/+8
| | | | | | The callee will not pop the stack for us. llvm-svn: 195467
* Revert r195318 as it causes miscompilation (PR18029)Kostya Serebryany2013-11-222-4/+6
| | | | llvm-svn: 195439
* Tweak 3 tests in llvm/test/CodeGen/X86 to add -mcpu=generic since r195383.NAKAMURA Takumi2013-11-223-3/+3
| | | | | | | They failed on bdver2 buildslave. FIXME: FileCheck-ize them. llvm-svn: 195407
* SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency ↵Ekaterina Romanova2013-11-214-0/+275
| | | | | | | | | | on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383
* The basic problem is that some mainstream programs cannot deal with the wayBill Wendling2013-11-212-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang optimizes tail calls, as in this example: int foo(void); int bar(void) { return foo(); } where the call is transformed to: calll .L0$pb .L0$pb: popl %eax .Ltmp0: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax movl foo@GOT(%eax), %eax popl %ebp jmpl *%eax # TAILCALL However, the GOT references must all be resolved at dlopen() time, and so this approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which usually populates the PLT with stubs that perform the actual resolving. This patch changes X86TargetLowering::LowerCall() to skip tail call optimization, if the called function is a global or external symbol. Patch by Dimitry Andric! PR15086 llvm-svn: 195318
* MachineBlockPlacement: Strengthen the source order bias when picking an exit ↵Benjamin Kramer2013-11-201-0/+240
| | | | | | | | | | | | | | | | | | | | | | | | | block. We now only allow breaking source order if the exit block frequency is significantly higher than the other exit block. The actual bias is currently under a flag so the best cut-off can be found; the flag defaults to the old behavior. The idea is to get some benchmark coverage over different values for the flag and pick the best one. When we require the new frequency to be at least 20% higher than the old frequency I see a 5% speedup on zlib's deflate when compressing a random file on x86_64/westmere. Hal reported a small speedup on Fhourstones on a BG/Q and no regressions in the test suite. The test case is the full long_match function from zlib's deflate. I was reluctant to add it for previous tweaks to branch probabilities because it's large and potentially fragile, but changed my mind since it's an important use case and more likely to break with all the current work going into the PGO infrastructure. Differential Revision: http://llvm-reviews.chandlerc.com/D2202 llvm-svn: 195265
* AVX-512: Concat 4 128-bit vectors in one 512-bit vector.Elena Demikhovsky2013-11-201-1/+8
| | | | llvm-svn: 195229
* Fix assembly operands for the SSE2 cvtsd2ss instruction.Cameron McInally2013-11-191-0/+1
| | | | llvm-svn: 195129
* Use symbolic operands in the patchpoint folding routine and fix a spilling bug.Andrew Trick2013-11-191-1/+41
| | | | | | Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled llvm-svn: 195094
* Testcase for PR17964Bill Wendling2013-11-171-0/+10
| | | | llvm-svn: 194961
* DAGCombiner: Partially revert r192795, getNOT was fixed not to create ↵Benjamin Kramer2013-11-171-0/+2
| | | | | | illegal constants. llvm-svn: 194959
* Added a size field to the stack map record to handle subregister spills.Andrew Trick2013-11-172-60/+132
| | | | | | | | Implementing this on bigendian platforms could get strange. I added a target hook, getStackSlotRange, per Jakob's recommendation to make this as explicit as possible. llvm-svn: 194942
* Avoid illegal integer promotion in fastiselBob Wilson2013-11-151-0/+37
| | | | | | | | | | | | | | | | | Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840
* Add AVX512 unmasked FMA intrinsics and support.Cameron McInally2013-11-151-0/+97
| | | | llvm-svn: 194824
* Redirect unused test case output to /dev/nullAlexey Samsonov2013-11-151-1/+1
| | | | llvm-svn: 194798
* Platform proof a test case.Andrew Trick2013-11-151-3/+3
| | | | llvm-svn: 194788
* Add addrspacecast instruction.Matt Arsenault2013-11-151-4/+4
| | | | | | Patch by Michele Scandale! llvm-svn: 194760
* Simplify testcase.Eric Christopher2013-11-141-1/+1
| | | | llvm-svn: 194748
* Add a triple and switch test to FileCheck.Rafael Espindola2013-11-141-1/+8
| | | | | | | On windows we don't print .weak for function definitions, so count was only finding 1 'weak'. llvm-svn: 194713
* Error if we see an alias to a declaration.Rafael Espindola2013-11-145-5/+18
| | | | | | | | | | | | | | | In ELF and COFF an alias is just another offset in a section. There is no way to represent an alias to something in another file. In MachO, the spec has the N_INDR type which should allow for exactly that, but is not currently implemented. Given that it is specified but not implemented, we error in codegen to avoid miscompiling but don't reject aliases to declarations in the verifier to leave the option open of implementing it. In the past we have used alias to declarations as a way of implementing weakref, which is why it exists in some old tests which this patch updates. llvm-svn: 194705
* AVX-512: Handled extractelement from mask vector;Elena Demikhovsky2013-11-142-0/+33
| | | | | | Added VMOSHDUP/VMOVSLDUP shuffle instructions. llvm-svn: 194691
* Minor extension to llvm.experimental.patchpoint: don't require a call.Andrew Trick2013-11-141-0/+16
| | | | | | | | If a null call target is provided, don't emit a dummy call. This allows the runtime to reserve as little nop space as it needs without the requirement of emitting a call. llvm-svn: 194676
* Don't mangle \n and "Rafael Espindola2013-11-141-0/+6
| | | | | | | | | | There is nothing special about quotes and newlines from the object file point of view, only the assembler has to worry about expanding the \n and \". This patch then removes the special handling from the Mangler. llvm-svn: 194667
* Remove AllowQuotesInName and friends from MCAsmInfo.Rafael Espindola2013-11-132-14/+14
| | | | | | | | | | | Accepting quotes is a property of an assembler, not of an object file. For example, ELF can support any names for sections and symbols, but the gnu assembler only accepts quotes in some contexts and llvm-mc in a few more. LLVM should not produce different symbols based on a guess about which assembler will be reading the code it is printing. llvm-svn: 194575
* Add a test case to verify that misusing anyregcc crashes as expected.Andrew Trick2013-11-131-0/+17
| | | | llvm-svn: 194553
* SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.Juergen Ributzka2013-11-131-0/+42
| | | | | | | | | | | | | | | | | | | | | | This patch reapplies r193676 with an additional fix for the Hexagon backend. The SystemZ backend has already been fixed by r194148. The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 194542
* Cleanup the stackmap operand folding code and fix a corner case.Andrew Trick2013-11-121-1/+21
| | | | | | | I still don't know how to refer to the fixed operands symbolically. I plan to look into it. llvm-svn: 194529
* Simplify operand folding when rematerializing a load.Andrew Trick2013-11-121-11/+6
| | | | | | | | | | | | We already know how to fold a reload from a frameindex without analyzing the load instruction. Generalize this to handle any frameindex load. This streamlines the logic for rematerializing loads from stack arguments. As a side effect, it allows stackmaps to record a stack argument location without spilling it. Verified no effect on codegen for llvm test-suite. llvm-svn: 194497
* Fix the recently added anyregcc convention to handle spilled operands.Andrew Trick2013-11-111-8/+37
| | | | | | | | | | | | Fixes <rdar://15432754> [JS] Assertion: "Folded a def to a non-store!" The primary purpose of anyregcc is to prevent a patchpoint's call arguments and return value from being spilled. They must be available in a register, although the calling convention does not pin the register. It's up to the front end to avoid using this convention for calls with more arguments than allocatable registers. llvm-svn: 194428
* [Stackmap] Materialize the jump address within the patchpoint noop slide.Juergen Ributzka2013-11-093-50/+36
| | | | | | | | | | | | | | | This patch moves the jump address materialization inside the noop slide. This enables patching of the materialization itself or its complete removal. This patch also adds the ability to define scratch registers that can be used safely by the code called from the patchpoint intrinsic. At least one scratch register is required, because that one is used for the materialization of the jump address. This patch depends on D2009. Differential Revision: http://llvm-reviews.chandlerc.com/D2074 Reviewed by Andy llvm-svn: 194306
* [Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.Juergen Ributzka2013-11-081-0/+289
| | | | | | | | | | | | | | The idea of the AnyReg Calling Convention is to provide the call arguments in registers, but not to force them to be placed in a paticular order into a specified set of registers. Instead it is up tp the register allocator to assign any register as it sees fit. The same applies to the return value (if applicable). Differential Revision: http://llvm-reviews.chandlerc.com/D2009 Reviewed by Andy llvm-svn: 194293
* Slightly change the way stackmap and patchpoint intrinsics are lowered.Andrew Trick2013-11-051-0/+20
| | | | | | | | | | | | | | MorphNodeTo is not safe to call during DAG building. It eagerly deletes dependent DAG nodes which invalidates the NodeMap. We could expose a safe interface for morphing nodes, but I don't think it's worth it. Just create a new MachineNode and replaceAllUsesWith. My understaning of the SD design has been that we want to support early target opcode selection. That isn't very well supported, but generally works. It seems reasonable to rely on this feature even if it isn't widely used. llvm-svn: 194102
* Check for both styles of clobbers, those produced by dragonegg andEric Christopher2013-11-041-9/+22
| | | | | | | | those produced by clang for the inline asm bswap conversion. Modified from a patch by Chris Smowton. llvm-svn: 194016
* Add support for AVX512 masked vector blend intrinsics.Cameron McInally2013-11-041-0/+32
| | | | llvm-svn: 194006
* AVX-512: added VPCONFLICT instruction and intrinsics,Elena Demikhovsky2013-11-031-0/+23
| | | | | | added EVEX_KZ to tablegen llvm-svn: 193959
OpenPOWER on IntegriCloud