summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Enable call frame optimization ("mov to push") not only for optsize ↵Hans Wennborg2016-03-301-4/+0
| | | | | | | | | | (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966
* CodeGen: Factor out code for tail call result compatibility check; NFCMatthias Braun2016-03-303-105/+27
| | | | llvm-svn: 264959
* AMDGPU: Add frexp_exp intrinsicMatt Arsenault2016-03-301-2/+2
| | | | llvm-svn: 264944
* Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵Aaron Ballman2016-03-305-9/+9
| | | | | | "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
* [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for ↵Simon Pilgrim2016-03-301-1/+0
| | | | | | | | consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922
* [NVPTX] Make NVVMReflect a function pass.Justin Lebar2016-03-302-102/+69
| | | | | | | | | | | | | | | Summary: Currently it's a module pass. Make it a function pass so that we can move it to PassManagerBuilder's EP_EarlyAsPossible extension point, which only accepts function passes. Reviewers: rnk Subscribers: tra, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18615 llvm-svn: 264919
* [AArch64] Fix warnings pointed out by Hal.Chad Rosier2016-03-301-1/+5
| | | | llvm-svn: 264882
* AMDGPU/SI: Improve MachineSchedModel definitionTom Stellard2016-03-301-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
* [SystemZ] Add nop and nopr InstAliases.Jonas Paulsson2016-03-301-0/+5
| | | | | | | | | | For compatability with GAS, nop and nopr are recognized as alises for bc and bcr, respectively. A mask of 0 turns these instructions effectively into no-operations. Reviewed by Ulrich Weigand. llvm-svn: 264875
* Remove HasFnAttribute guards to getFnAttribute callsNirav Dave2016-03-303-4/+1
| | | | | | | | | | | | These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
* [X86][XOP] BITREVERSE lowering using VPPERMSimon Pilgrim2016-03-301-1/+70
| | | | | | XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870
* [NVPTX] Avoid temporary std::string and make single-use function local to ↵Benjamin Kramer2016-03-302-5/+4
| | | | | | | | the cpp file. No functionality change intended. llvm-svn: 264861
* [x86] Fix a horrible bug in our lowering of x86 floating point atomicChandler Carruth2016-03-301-24/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845
* [x86] Extract a helper function to compute the full addressing mode fromChandler Carruth2016-03-302-21/+29
| | | | | | | | | an x86 MachineInstr's operands. This will be super useful to fix some bad atomics code in my next commit. No functionality changed. llvm-svn: 264819
* [Aarch64] Turn on the LoopDataPrefetch pass for CycloneAdam Nemet2016-03-301-1/+1
| | | | llvm-svn: 264811
* [PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distanceAdam Nemet2016-03-291-7/+5
| | | | | | | After the previous change, this can now be overridden centrally in the pass. llvm-svn: 264807
* [LoopDataPrefetch] Centralize the tuning cl::opts under the passAdam Nemet2016-03-291-21/+6
| | | | | | | | | This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806
* [SPARC] Use AtomicExpandPass to expand AtomicRMW instructions.James Y Knight2016-03-293-173/+10
| | | | | | | | | They were previously expanded to CAS loops in a custom isel expansion, but AtomicExpandPass knows how to do that generically. Testing is covered by the existing sparc atomics.ll testcases. llvm-svn: 264771
* Swift Calling Convention: add swiftself attribute.Manman Ren2016-03-296-1/+21
| | | | | | Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754
* Test commit accessKonstantin Zhuravlyov2016-03-291-1/+1
| | | | llvm-svn: 264736
* [mips] Test commit: Mark insertNoop as dead code (NFC)Simon Dardis2016-03-291-0/+1
| | | | llvm-svn: 264728
* [mips] Correct MIPS16 jal/jalx to have uimm26 offsets and add MC layer range ↵Daniel Sanders2016-03-293-5/+9
| | | | | | | | | | | | | | | | checks. NFC. Summary: However, this has no effect at this time because the instructions affected are marked 'isCodeGenOnly=1' and have no alternative for the MC layer. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18179 llvm-svn: 264712
* AVX-512: fixed a bug in fp_to_uint pattern on KNLElena Demikhovsky2016-03-291-0/+4
| | | | | | | | | Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701
* [PowerPC] Refactor popcnt[dw] target featuresHal Finkel2016-03-295-18/+28
| | | | | | | | | Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690
* [WebAssembly] Remove duplicate disabling of passesDerek Schuff2016-03-281-12/+6
| | | | | | Also put all the disabled passes together llvm-svn: 264684
* [PowerPC] Clarify a comment in PPCTTI about vector loadsHal Finkel2016-03-281-1/+1
| | | | | | | This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. llvm-svn: 264683
* [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op ↵Simon Pilgrim2016-03-281-0/+50
| | | | | | | | | | | | | | for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666
* [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when ↵Haicheng Wu2016-03-282-0/+19
| | | | | | | | optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606
* [PowerPC] On the A2, popcnt[dw] are very slowHal Finkel2016-03-285-6/+16
| | | | | | | | | | | | | | | | The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600
* Introduce MachineFunctionProperties and the AllVRegsAllocated propertyDerek Schuff2016-03-282-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593
* AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructionsTom Stellard2016-03-281-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589
* [Hexagon] Improve handling of unaligned vector loads and storesKrzysztof Parzyszek2016-03-285-56/+156
| | | | llvm-svn: 264584
* [Hexagon] Only use restore functions for single register at -OzKrzysztof Parzyszek2016-03-281-0/+11
| | | | llvm-svn: 264581
* [Hexagon] Speed up frame lowering when no optimizations are enabledKrzysztof Parzyszek2016-03-282-24/+35
| | | | | | | | - Do not optimize stack slots in optnone functions. - Get aligned-base register from HexagonMachineFunctionInfo instead of looking for ALIGNA instruction in the function's body. llvm-svn: 264580
* Sparc: silently ignore .proc assembler directiveDouglas Katzman2016-03-281-0/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D18463 llvm-svn: 264579
* [lanai] Add Lanai backend.Jacques Pienaar2016-03-2865-0/+10074
| | | | | | | | | | Add the Lanai backend to lib/Target. General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html). Differential Revision: http://reviews.llvm.org/D17011 llvm-svn: 264578
* [Power9] Implement new altivec instructions: bcd* seriesChuang-Yu Cheng2016-03-283-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568
* [Power9] Implement new vsx instructions: insert, extract, test data class, ↵Chuang-Yu Cheng2016-03-287-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567
* AVX-512: Fixed ICMP instruction selection for i1 operandsElena Demikhovsky2016-03-281-5/+9
| | | | | | | | | | ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566
* [Power9] Implement new vsx instructions: quad-precision move, fp-arithmeticChuang-Yu Cheng2016-03-282-0/+171
| | | | | | | | | | | | | | | | | | | | This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565
* [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵Hal Finkel2016-03-271-3/+13
| | | | | | | | | | | | | | | | | loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
* [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targetsSimon Pilgrim2016-03-261-0/+21
| | | | | | Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517
* [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targetsSimon Pilgrim2016-03-261-0/+2
| | | | | | | | Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512
* [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectorsSimon Pilgrim2016-03-261-0/+124
| | | | | | | | Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
* [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵Simon Pilgrim2016-03-261-2/+3
| | | | | | | | multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
* [PowerPC] Disable the CTR optimization in the presence of {min,max}numDavid Majnemer2016-03-261-0/+2
| | | | | | | | | The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
* [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.Simon Pilgrim2016-03-261-13/+5
| | | | | | LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506
* [Power9] Implement new altivec instructions: permute, count zero, extend ↵Chuang-Yu Cheng2016-03-263-0/+181
| | | | | | | | | | | | | | | | | | | | | sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504
* [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddrDavid Majnemer2016-03-251-1/+1
| | | | | | | | | We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465
OpenPOWER on IntegriCloud