summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [WebAssembly] Handle CopyToReg nodes with flag results in LowerCopyToReg.Dan Gohman2016-02-201-3/+7
| | | | llvm-svn: 261457
* [WebAssembly] Write stack pointer back to memory when FP is usedDerek Schuff2016-02-201-1/+1
| | | | | | | | The stack pointer is bumped when there is a frame pointer or when there are static-size objects, but was only getting written back when there were static-size objects. llvm-svn: 261453
* [WebAssembly] Stackify function prologs and epilogsDerek Schuff2016-02-201-15/+21
| | | | | | | | The instructions are the same, but fewer locals are used. Differential Revision: http://reviews.llvm.org/D17428 llvm-svn: 261452
* Fix the build bot break caused by rL261441.Nemanja Ivanovic2016-02-201-5/+11
| | | | | | | | The patch has a necessary call to a function inside an assert. Which is fine when you have asserts turned on. Not so much when they're off. Sorry about the regression. llvm-svn: 261447
* Fix for PR 26500Nemanja Ivanovic2016-02-202-52/+182
| | | | | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17294 It ensures that whatever block we are emitting the prologue/epilogue into, we have the necessary scratch registers. It takes away the hard-coded register numbers for use as scratch registers as registers that are guaranteed to be available in the function prologue/epilogue are not guaranteed to be available within the function body. Since we shrink-wrap, the prologue/epilogue may end up in the function body. llvm-svn: 261441
* [X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles ↵Simon Pilgrim2016-02-201-5/+4
| | | | | | | | (PR26667) Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input. llvm-svn: 261434
* [X86][SSE] Move all undef/zero cases before target shuffle combining.Simon Pilgrim2016-02-201-20/+14
| | | | | | First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041). llvm-svn: 261433
* [X86] Enable the LEA optimization pass by default.Andrey Turetskiy2016-02-201-4/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 261429
* [X86] PR26575: Fix LEA optimization pass (Part 2).Andrey Turetskiy2016-02-201-36/+78
| | | | | | | | | | Handle address displacement operands of a type other than Immediate or Global in LEAs and load/stores. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17374 llvm-svn: 261428
* Move some code from doInitialization to runOnFunctionDavid Majnemer2016-02-201-3/+4
| | | | | | | This has no observable behavior change, it just makes the state insertion pass look a little more like normal passes. llvm-svn: 261420
* [X86] Add some missing reversed forms of XOP instructions.Craig Topper2016-02-201-0/+29
| | | | llvm-svn: 261417
* [X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled.Davide Italiano2016-02-203-2/+39
| | | | | | | | | | TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387
* AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointerTom Stellard2016-02-202-238/+22
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385
* [X86ISelLowering] Provide a more informative assert message.Davide Italiano2016-02-191-1/+1
| | | | | | I stumbled upon this while debugging a lowering bug. llvm-svn: 261371
* [X86ISelLowering] Merge two conditions inside a single if.Davide Italiano2016-02-191-3/+1
| | | | llvm-svn: 261370
* Revert r253557 "Alternative to long nops for X86 CPUs, by Andrey Turetsky"Hans Wennborg2016-02-191-32/+14
| | | | | | Turns out the new nop sequences aren't actually nops on x86_64 (PR26554). llvm-svn: 261365
* Fix incorrect selection of AVX512 sqrt when OptForSize is onDimitry Andric2016-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: When optimizing for size, sqrt calls can be incorrectly selected as AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass definition. Even if the target does not support AVX512, the class can apparently still be chosen, leading to an incorrect selection of `vsqrtss`. In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction requires an XMM register, which is not available on i686 CPUs. Reviewers: grosbach, resistor, joker.eph Subscribers: spatel, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D17414 llvm-svn: 261360
* [WebAssembly] Add another optimization idea to README.txt.Dan Gohman2016-02-191-0/+6
| | | | llvm-svn: 261354
* [AArch64][ShrinkWrap] Fix bug in prolog clobbering live reg when shrink ↵Geoff Berry2016-02-192-9/+63
| | | | | | | | | | | | | | wrapping. Summary: See bug https://llvm.org/bugs/show_bug.cgi?id=26642 Reviewers: qcolombet, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17350 llvm-svn: 261349
* AMDGPU/SI: Fix s_waitcnt insertion for flat instructionsTom Stellard2016-02-191-2/+4
| | | | | | | | | | | | | | | | Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330
* [SystemZ] Fix ABI for i128 argument and return typesUlrich Weigand2016-02-194-10/+95
| | | | | | | | | | | | | | | | | | | | | | | | | According to the SystemZ ABI, 128-bit integer types should be passed and returned via implicit reference. However, this is not currently implemented at the LLVM IR level for the i128 type. This does not matter when compiling C/C++ code, since clang will implement the implicit reference itself. However, it turns out that when calling libgcc helper routines operating on 128-bit integers, LLVM will use i128 argument and return value types; the resulting code is not compatible with the ABI used in libgcc, leading to crashes (see PR26559). This should be simple to fix, except that i128 currently is not even a legal type for the SystemZ back end. Therefore, common code will already split arguments and return values into multiple parts. The bulk of this patch therefore consists of detecting such parts, and correctly handling passing via implicit reference of a value split into multiple parts. If at some time in the future, i128 becomes a legal type, this code can be removed again. This fixes PR26559. llvm-svn: 261325
* [X86] Remove unused entries from the disassembler type enum.Craig Topper2016-02-193-6/+0
| | | | llvm-svn: 261311
* Minor code cleanups. NFC.Junmo Park2016-02-191-3/+3
| | | | llvm-svn: 261294
* [x86] fix initialization of PredictableSelectIsExpensiveSanjay Patel2016-02-181-3/+3
| | | | | | | | | | This is effectively NFC because Atom is the only in-order x86 subtarget currently, but the predicate would have become wrong if any other in-order CPU came along. See related discussion in: http://reviews.llvm.org/D16836 llvm-svn: 261275
* Remove uses of builtin comma operator.Richard Trieu2016-02-184-35/+56
| | | | | | Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270
* [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFCAdam Nemet2016-02-184-230/+1
| | | | | | | | | | | | | This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). Obviously the pass still only used from PPC at this point. Subsequent patches will start driving this from ARM64 as well. Due to the previous patch most lines should show up as moved lines. llvm-svn: 261265
* [PPCLoopDataPrefetch] Remove PPC from some of the names. NFCAdam Nemet2016-02-181-14/+14
| | | | | | | | | | | | This is done only to make the next patch that move the pass out PPC to Transforms easier to read. After this most line should show up as moved lines in that patch. This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). llvm-svn: 261264
* [WinEH] Hoist state stores from successorsDavid Majnemer2016-02-181-1/+54
| | | | | | | | | | If we know that all of our successors want to be in the exact same state, it makes sense to hoist the state transition into their common predecessor. Differential Revision: http://reviews.llvm.org/D17391 llvm-svn: 261262
* [X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC.Davide Italiano2016-02-181-1/+1
| | | | llvm-svn: 261255
* [AArch64] Reduce vector insert/extract cost for KryoMatthew Simpson2016-02-181-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237
* Revert to extend i8/i16 return values on Darwin (PR26665)Hans Wennborg2016-02-181-1/+6
| | | | | | | | | | | | | In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235
* [Hexagon] Remove redundant check.Chad Rosier2016-02-181-2/+2
| | | | llvm-svn: 261232
* AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsicsNicolai Haehnle2016-02-183-30/+75
| | | | | | | | | | | | | Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224
* [Hexagon] Fix compilation error with GCC 6Krzysztof Parzyszek2016-02-181-66/+68
| | | | | | | | | | | | | | Compiling Hexagon target with GCC 6 produces "error: should have been declared inside" due to GCC PR c++/69657 which was merged. Properly wrapping operator<<() definitions within the namespace llvm fixes the issue. Author: domagoj.stolfa Differential Revision: http://reviews.llvm.org/D17281 llvm-svn: 261220
* [Hexagon] Implement TLS supportKrzysztof Parzyszek2016-02-185-2/+202
| | | | | | Patch by Anand Kodnani. llvm-svn: 261218
* [mips][microMIPS] Implement TLBINV and TLBINVF instructionsZlatko Buljan2016-02-183-2/+30
| | | | | | Differential Revision: http://reviews.llvm.org/D16849 llvm-svn: 261211
* [Hexagon] Add support for __builtin_prefetchKrzysztof Parzyszek2016-02-183-0/+38
| | | | llvm-svn: 261210
* [Hexagon] Update the callee-saved register set for EH-aware functionsKrzysztof Parzyszek2016-02-181-3/+15
| | | | llvm-svn: 261208
* [X86][SSE] Improve PSHUFB shuffle mask decoding.Simon Pilgrim2016-02-181-16/+36
| | | | | | | | In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool. The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly. llvm-svn: 261201
* Test commit access.Nikolay Haustov2016-02-181-1/+0
| | | | llvm-svn: 261199
* [AVX512][PRORQ][PRORD] Change imm8 to intMichael Zuckerman2016-02-181-6/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D17024 llvm-svn: 261198
* [WebAssembly] Don't use setRequiresStructuredCFG(true).Dan Gohman2016-02-181-3/+4
| | | | | | | | | | | | While we still do want reducible control flow, the RequiresStructuredCFG flag imposes more strict structure constraints than WebAssembly wants. Unsetting this flag enables critical edge splitting and tail merging. Also, disable TailDuplication explicitly, as it doesn't support virtual registers, and was previously only disabled by the RequiresStructuredCFG flag. llvm-svn: 261190
* [AMDGPU] Disassembler: Added basic disassembler for AMDGPU targetTom Stellard2016-02-1812-49/+551
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185
* [WebAssembly] Disable register stackification and coloring when not optimizingDerek Schuff2016-02-173-11/+23
| | | | | | | | | | | | | | These passes are optimizations, and should be disabled when not optimizing. Also create an MCCodeGenInfo so the opt level is correctly plumbed to the backend pass manager. Also remove the command line flag for disabling register coloring; running llc with -O0 should now be useful for debugging, so it's not necessary. Differential Revision: http://reviews.llvm.org/D17327 llvm-svn: 261176
* AArch64: always clear kill flags up to last eliminated copyTim Northover2016-02-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | After r261154, we were only clearing flags if the known-zero register was originally live-in to the basic block, but we have to do it even if not when more than one COPY has been eliminated, otherwise the user of the first COPY may still have <kill> marked. E.g. BB#N: %X0 = COPY %XZR STRXui %X0<kill>, <fi#0> %X0 = COPY %XZR STRXui %X0<kill>, <fi#1> We can eliminate both copies, X0 is not live-in, but we must clear the kill on the first store. Unfortunately, I've been unable to come up with a non-fragile test for this. I've only seen it in the wild with regalloc-created spills, and attempts to reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile asm clobbers were moved around. Should fix the aarch64 bot though. llvm-svn: 261175
* Move LLVMCreateTargetData and LLVMDisposeTargetData together. NFCAmaury Sechet2016-02-171-4/+4
| | | | llvm-svn: 261172
* AArch64: improve redundant copy elimination.Tim Northover2016-02-171-40/+46
| | | | | | | | | | | | | | | Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we didn't sort out the use-def chain properly. I've also made it check more than just the last instruction for a compatible CBZ (so it can cope without fallthroughs). I'd have liked to do that separately, but it's helps writing the test. Finally, I removed some custom loops in favour of MachineInstr helpers and refactored the control flow to flatten it and avoid possibly quadratic iterations in blocks with many copies. NFC for these, just a general tidy-up. llvm-svn: 261154
* [Hexagon] Replacing reference/dereference with reference cast.Colin LeMahieu2016-02-171-4/+4
| | | | llvm-svn: 261133
* Remove superfluous semicolon.Nico Weber2016-02-171-1/+1
| | | | llvm-svn: 261128
* [WinEH] Optimize WinEH state storesDavid Majnemer2016-02-171-32/+175
| | | | | | | | | | | | | | | | | | | | | | | | | 32-bit x86 Windows targets use a linked-list of nodes allocated on the stack, referenced to via thread-local storage. The personality routine interprets one of the fields in the node as a 'state number' which indicates where the personality routine should transfer control. State transitions are possible only before call-sites which may throw exceptions. Our previous scheme had us update the state number before all call-sites which may throw. Instead, we can try to minimize the number of times we need to store by reasoning about the nearest store which dominates the current call-site. If the last store agrees with the current call-site, then we know that the state-update is redundant and can be elided. This is largely straightforward: an RPO walk of the blocks allows us to correctly forward propagate the information when the function is a DAG. Currently, loops are not handled optimally and may trigger superfluous state stores. Differential Revision: http://reviews.llvm.org/D16763 llvm-svn: 261122
OpenPOWER on IntegriCloud