summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [x86] fix initialization of PredictableSelectIsExpensiveSanjay Patel2016-02-181-3/+3
| | | | | | | | | | This is effectively NFC because Atom is the only in-order x86 subtarget currently, but the predicate would have become wrong if any other in-order CPU came along. See related discussion in: http://reviews.llvm.org/D16836 llvm-svn: 261275
* Remove uses of builtin comma operator.Richard Trieu2016-02-184-35/+56
| | | | | | Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270
* [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFCAdam Nemet2016-02-184-230/+1
| | | | | | | | | | | | | This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). Obviously the pass still only used from PPC at this point. Subsequent patches will start driving this from ARM64 as well. Due to the previous patch most lines should show up as moved lines. llvm-svn: 261265
* [PPCLoopDataPrefetch] Remove PPC from some of the names. NFCAdam Nemet2016-02-181-14/+14
| | | | | | | | | | | | This is done only to make the next patch that move the pass out PPC to Transforms easier to read. After this most line should show up as moved lines in that patch. This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). llvm-svn: 261264
* [WinEH] Hoist state stores from successorsDavid Majnemer2016-02-181-1/+54
| | | | | | | | | | If we know that all of our successors want to be in the exact same state, it makes sense to hoist the state transition into their common predecessor. Differential Revision: http://reviews.llvm.org/D17391 llvm-svn: 261262
* [X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC.Davide Italiano2016-02-181-1/+1
| | | | llvm-svn: 261255
* [AArch64] Reduce vector insert/extract cost for KryoMatthew Simpson2016-02-181-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237
* Revert to extend i8/i16 return values on Darwin (PR26665)Hans Wennborg2016-02-181-1/+6
| | | | | | | | | | | | | In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235
* [Hexagon] Remove redundant check.Chad Rosier2016-02-181-2/+2
| | | | llvm-svn: 261232
* AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsicsNicolai Haehnle2016-02-183-30/+75
| | | | | | | | | | | | | Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224
* [Hexagon] Fix compilation error with GCC 6Krzysztof Parzyszek2016-02-181-66/+68
| | | | | | | | | | | | | | Compiling Hexagon target with GCC 6 produces "error: should have been declared inside" due to GCC PR c++/69657 which was merged. Properly wrapping operator<<() definitions within the namespace llvm fixes the issue. Author: domagoj.stolfa Differential Revision: http://reviews.llvm.org/D17281 llvm-svn: 261220
* [Hexagon] Implement TLS supportKrzysztof Parzyszek2016-02-185-2/+202
| | | | | | Patch by Anand Kodnani. llvm-svn: 261218
* [mips][microMIPS] Implement TLBINV and TLBINVF instructionsZlatko Buljan2016-02-183-2/+30
| | | | | | Differential Revision: http://reviews.llvm.org/D16849 llvm-svn: 261211
* [Hexagon] Add support for __builtin_prefetchKrzysztof Parzyszek2016-02-183-0/+38
| | | | llvm-svn: 261210
* [Hexagon] Update the callee-saved register set for EH-aware functionsKrzysztof Parzyszek2016-02-181-3/+15
| | | | llvm-svn: 261208
* [X86][SSE] Improve PSHUFB shuffle mask decoding.Simon Pilgrim2016-02-181-16/+36
| | | | | | | | In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool. The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly. llvm-svn: 261201
* Test commit access.Nikolay Haustov2016-02-181-1/+0
| | | | llvm-svn: 261199
* [AVX512][PRORQ][PRORD] Change imm8 to intMichael Zuckerman2016-02-181-6/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D17024 llvm-svn: 261198
* [WebAssembly] Don't use setRequiresStructuredCFG(true).Dan Gohman2016-02-181-3/+4
| | | | | | | | | | | | While we still do want reducible control flow, the RequiresStructuredCFG flag imposes more strict structure constraints than WebAssembly wants. Unsetting this flag enables critical edge splitting and tail merging. Also, disable TailDuplication explicitly, as it doesn't support virtual registers, and was previously only disabled by the RequiresStructuredCFG flag. llvm-svn: 261190
* [AMDGPU] Disassembler: Added basic disassembler for AMDGPU targetTom Stellard2016-02-1812-49/+551
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185
* [WebAssembly] Disable register stackification and coloring when not optimizingDerek Schuff2016-02-173-11/+23
| | | | | | | | | | | | | | These passes are optimizations, and should be disabled when not optimizing. Also create an MCCodeGenInfo so the opt level is correctly plumbed to the backend pass manager. Also remove the command line flag for disabling register coloring; running llc with -O0 should now be useful for debugging, so it's not necessary. Differential Revision: http://reviews.llvm.org/D17327 llvm-svn: 261176
* AArch64: always clear kill flags up to last eliminated copyTim Northover2016-02-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | After r261154, we were only clearing flags if the known-zero register was originally live-in to the basic block, but we have to do it even if not when more than one COPY has been eliminated, otherwise the user of the first COPY may still have <kill> marked. E.g. BB#N: %X0 = COPY %XZR STRXui %X0<kill>, <fi#0> %X0 = COPY %XZR STRXui %X0<kill>, <fi#1> We can eliminate both copies, X0 is not live-in, but we must clear the kill on the first store. Unfortunately, I've been unable to come up with a non-fragile test for this. I've only seen it in the wild with regalloc-created spills, and attempts to reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile asm clobbers were moved around. Should fix the aarch64 bot though. llvm-svn: 261175
* Move LLVMCreateTargetData and LLVMDisposeTargetData together. NFCAmaury Sechet2016-02-171-4/+4
| | | | llvm-svn: 261172
* AArch64: improve redundant copy elimination.Tim Northover2016-02-171-40/+46
| | | | | | | | | | | | | | | Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we didn't sort out the use-def chain properly. I've also made it check more than just the last instruction for a compatible CBZ (so it can cope without fallthroughs). I'd have liked to do that separately, but it's helps writing the test. Finally, I removed some custom loops in favour of MachineInstr helpers and refactored the control flow to flatten it and avoid possibly quadratic iterations in blocks with many copies. NFC for these, just a general tidy-up. llvm-svn: 261154
* [Hexagon] Replacing reference/dereference with reference cast.Colin LeMahieu2016-02-171-4/+4
| | | | llvm-svn: 261133
* Remove superfluous semicolon.Nico Weber2016-02-171-1/+1
| | | | llvm-svn: 261128
* [WinEH] Optimize WinEH state storesDavid Majnemer2016-02-171-32/+175
| | | | | | | | | | | | | | | | | | | | | | | | | 32-bit x86 Windows targets use a linked-list of nodes allocated on the stack, referenced to via thread-local storage. The personality routine interprets one of the fields in the node as a 'state number' which indicates where the personality routine should transfer control. State transitions are possible only before call-sites which may throw exceptions. Our previous scheme had us update the state number before all call-sites which may throw. Instead, we can try to minimize the number of times we need to store by reasoning about the nearest store which dominates the current call-site. If the last store agrees with the current call-site, then we know that the state-update is redundant and can be elided. This is largely straightforward: an RPO walk of the blocks allows us to correctly forward propagate the information when the function is a DAG. Currently, loops are not handled optimally and may trigger superfluous state stores. Differential Revision: http://reviews.llvm.org/D16763 llvm-svn: 261122
* [Hexagon] Loop instructions don't need special processing. Extension and ↵Colin LeMahieu2016-02-171-25/+0
| | | | | | fitting is performed by generic code and the comment is incorrect, loops don't have a separate extended opcode. llvm-svn: 261118
* [NVPTX] Annotate convergent intrinsics as convergent.Justin Lebar2016-02-171-0/+2
| | | | | | | | | | | | | | | Summary: Previously the machine instructions for bar.sync &co. were not marked as convergent. This resulted in some MI passes (such as TailDuplication, fixed in an upcoming patch) doing unsafe things to these instructions. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski, hfinkel Differential Revision: http://reviews.llvm.org/D17318 llvm-svn: 261115
* [NVPTX] Annotate call machine instructions as calls.Justin Lebar2016-02-171-0/+2
| | | | | | | | | | | | | | | | | Summary: Otherwise we'll try to do unsafe optimizations on these MIs, such as sinking loads below calls. (I suspect that this is not the only bug in the NVPTX instruction tablegen files; I need to comb through them.) Reviewers: jholewinski, tra Subscribers: jingyue, jhen, llvm-commits Differential Revision: http://reviews.llvm.org/D17315 llvm-svn: 261113
* [Hexagon] Fold object construction into map::insertKrzysztof Parzyszek2016-02-171-2/+2
| | | | llvm-svn: 261096
* AVX512: Fix LowerMSCATTER() return value.Igor Breger2016-02-171-1/+1
| | | | | | | | | | | Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090
* [mips] Removed the SHF_ALLOC flag and the SHT_REL flag from the .pdr section.Scott Egerton2016-02-171-2/+1
| | | | | | | | | | | | This section is used for debug information and has no need to be in memory at runtime. This patch also fixes an error when compiling the Linux kernel. The error is that there are relocations within the .pdr section in a VDSO. SHT_REL was removed as it is a section type and not a section flag, therefore it does not make sense for it to be there. With this patch, LLVM now emits the same flags as the GNU assembler. llvm-svn: 261083
* [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectorsSimon Pilgrim2016-02-171-1/+3
| | | | | | | | | | | | AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082
* [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectorsSimon Pilgrim2016-02-171-2/+6
| | | | | | | | | | | | AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081
* [X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI.Simon Pilgrim2016-02-171-23/+20
| | | | | | | | Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow. Renamed from V to Ops to more closely match their purpose. llvm-svn: 261078
* [Hexagon] cast<> a reference instead of referencing + dereferencing.Benjamin Kramer2016-02-171-1/+1
| | | | llvm-svn: 261077
* Revert r260979 "[X86] Enable the LEA optimization pass by default."Hans Wennborg2016-02-171-5/+4
| | | | | | Asserts are still firing in Chromium builds. PR26575. llvm-svn: 261058
* WebAssembly: update expected failuresJF Bastien2016-02-171-3/+0
| | | | | | r261050 seems to inadvertently fix the assertion failure. llvm-svn: 261051
* [WebAssembly] Call memcpy for large byval copies.Dan Gohman2016-02-171-1/+1
| | | | | | | | | | This fixes very slow compilation on test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy and friends are not yet carefully tuned so the cutoff point is currently somewhat arbitrary. However, it's important that there be a cutoff point so that we don't emit unbounded quantities of loads and stores. llvm-svn: 261050
* WebAssembly: update expected test failuresJF Bastien2016-02-171-3/+0
| | | | | | r261032 adds frame address support. llvm-svn: 261044
* [X86] Fix a shrink-wrapping miscompile around __chkstkReid Kleckner2016-02-171-7/+6
| | | | | | | | | __chkstk clobbers EAX. If EAX is live across the prologue, then we have to take extra steps to save it. We already had code to do this if EAX was a register parameter. This change adapts it to work when shrink wrapping is used. llvm-svn: 261039
* [WebAssembly] Use SDValue::getConstantOperandVal. NFC.Dan Gohman2016-02-171-1/+1
| | | | llvm-svn: 261037
* [WebAssembly] Implement __builtin_frame_address.Dan Gohman2016-02-164-8/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D17307 llvm-svn: 261032
* [X86] Remove the now-unused X86ISD::PSIGN. NFC.Ahmed Bougacha2016-02-166-46/+30
| | | | llvm-svn: 261025
* [X86] Generalize logic blend of (x, -x) combine to match (-x, x).Ahmed Bougacha2016-02-161-7/+17
| | | | | | I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024
* [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.Ahmed Bougacha2016-02-161-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) | (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023
* [X86] Extract PSIGN/BLENDVP combine. NFC.Ahmed Bougacha2016-02-161-77/+95
| | | | llvm-svn: 261021
* [X86] Extract ANDNP combine. NFC.Ahmed Bougacha2016-02-161-61/+57
| | | | | | This makes it IMO more readable and reduces indentation. llvm-svn: 261020
* [WebAssembly] Update torture test expectationsDerek Schuff2016-02-161-7/+0
| | | | | | These were fixed with r260978 llvm-svn: 261017
OpenPOWER on IntegriCloud