summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Fix the PPCInstrInfo::getInstrLatency implementationHal Finkel2015-07-145-0/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PowerPC uses itineraries to describe processor pipelines (and dispatch-group restrictions for P7/P8 cores). Unfortunately, the target-independent implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that looks for the largest cycle count in the pipeline for any given instruction. This, however, yields the wrong answer for the PPC itineraries, because we don't encode the full pipeline. Because the functional units are fully pipelined, we only model the initial stages (there are no relevant hazards in the later stages to model), and so the technique employed by getStageLatency does not really work. Instead, we should take the maximum output operand latency, and that's what PPCInstrInfo::getInstrLatency now does. This caused some test-case churn, including two unfortunate side effects. First, the new arrangement of copies we get from function parameters now sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the test cases), and we have one significant test-suite regression: SingleSource/Benchmarks/BenchmarkGame/spectral-norm 56.4185% +/- 18.9398% In this benchmark we have a loop with a vectorized FP divide, and it with the new scheduling both divides end up in the same dispatch group (which in this case seems to cause a problem, although why is not exactly clear). The grouping structure is hard to predict from the bottom of the loop, and there may not be much we can do to fix this. Very few other test-suite performance effects were really significant, but almost all weakly favor this change. However, in light of the issues highlighted above, I've left the old behavior available via a command-line flag. llvm-svn: 242188
* [Hexagon] Generate instructions for operations on predicate registersKrzysztof Parzyszek2015-07-143-0/+531
| | | | | | | Convert logical operations on general-purpose registers to the correspon- ding operations on predicate registers. llvm-svn: 242186
* AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32)Matt Arsenault2015-07-142-0/+35
| | | | | | | | | | | | | | | | | This can be done only with moves which theoretically will optimize better later. Although this transform increases the instruction count, it should be code size / cycle count neutral in the worst VALU case. It also seems to slightly improve a couple of testcases due to other DAG combines this exposes. This is probably slightly worse for the SALU case, so it might be better to handle this during moveToVALU, although then you lose some simplifications like the load width reducing in the simple testcase. llvm-svn: 242177
* AMDGPU/SI: Fix read2 merging into a super register.Matt Arsenault2015-07-143-13/+35
| | | | | | | | | | | | | | | | If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move. Also remove the assert that offset1 > offset0. There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it. llvm-svn: 242174
* MachineRegisterInfo: Remove UsedPhysReg infrastructureMatthias Braun2015-07-1410-22/+15
| | | | | | | | | | | | | We have a detailed def/use lists for every physical register in MachineRegisterInfo anyway, so there is little use in maintaining an additional bitset of which ones are used. Removing it frees us from extra book keeping. This simplifies VirtRegMap. Differential Revision: http://reviews.llvm.org/D10911 llvm-svn: 242173
* Add missing builtins to the PPC back end for ABI compliance (vol. 4)Nemanja Ivanovic2015-07-141-0/+6
| | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D11183 Back end portion of the fourth round of additions to altivec.h. llvm-svn: 242167
* PrologEpilogInserter: Rewrite API to determine callee save regsiters.Matthias Braun2015-07-1423-107/+131
| | | | | | | | | | | | | | | | This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified. Related to rdar://21539507 Differential Revision: http://reviews.llvm.org/D10909 llvm-svn: 242165
* AArch64: add rev64 alias for 64-bit rev instruction.Tim Northover2015-07-141-0/+2
| | | | | | | It could be useful to assembly programmers and makes the permitted variants a little more uniform. llvm-svn: 242164
* [Hexagon] Generate "extract" instructions more aggressivelyKrzysztof Parzyszek2015-07-143-13/+278
| | | | | | | Generate extract instructions (via intrinsics) before the DAG combiner folds shifts into unrecognizable forms. llvm-svn: 242163
* ARMAsmParser: Take MCInst param by const-refHans Wennborg2015-07-141-8/+9
| | | | | | (Broken out from http://reviews.llvm.org/D11167) llvm-svn: 242160
* AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructionsTom Stellard2015-07-141-6/+27
| | | | | | | | | | Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11061 llvm-svn: 242146
* Silencing two MSVC warnings; 'argument' : truncation from 'unsigned int' to ↵Aaron Ballman2015-07-141-1/+1
| | | | | | 'int16_t' and truncation of constant value. NFC intended. llvm-svn: 242145
* [mips] Fix li/la differences between IAS and GAS.Daniel Sanders2015-07-141-82/+83
| | | | | | | | | | | | | | | | | | | Summary: - Signed 16-bit should have priority over unsigned. - For la, unsigned 16-bit must use ori+addu rather than directly use ori. - Correct tests on 32-bit immediates with 64-bit predicates by sign-extending the immediate beforehand. For example, isInt<16>(0xffff8000) should be true and use addiu. Also split li/la testing into separate files due to their size. Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10967 llvm-svn: 242139
* Generate correct asm info for mingw and cygwin ARM targets.Yaron Keren2015-07-141-2/+2
| | | | | | | | | http://reviews.llvm.org/D11075 Patch by Martell Malone Reviewed by Reid Kleckner llvm-svn: 242123
* Prune trailing whitespaces and CRs.NAKAMURA Takumi2015-07-141-23/+23
| | | | llvm-svn: 242117
* [PPC64LE] More improvements to VSX swap optimizationBill Schmidt2015-07-131-21/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows VSX swap optimization to succeed more frequently. Specifically, it is concerned with common code sequences that occur when copying a scalar floating-point value to a vector register. This patch currently handles cases where the floating-point value is already in a register, but does not yet handle loads (such as via an LXSDX scalar floating-point VSX load). That will be dealt with later. A typical case is when a scalar value comes in as a floating-point parameter. The value is copied into a virtual VSFRC register, and then a sequence of SUBREG_TO_REG and/or COPY operations will convert it to a full vector register of the class required by the context. If this vector register is then used as part of a lane-permuted computation, the original scalar value will be in the wrong lane. We can fix this by adding a swap operation following any widening SUBREG_TO_REG operation. Additional COPY operations may be needed around the swap operation in order to keep register assignment happy, but these are pro forma operations that will be removed by coalescing. If a scalar value is otherwise directly referenced in a computation (such as by one of the many XS* vector-scalar operations), we currently disable swap optimization. These operations are lane-sensitive by definition. A MentionsPartialVR flag is added for use in each swap table entry that mentions a scalar floating-point register without having special handling defined. A common idiom for PPC64LE is to convert a double-precision scalar to a vector by performing a splat operation. This ensures that the value can be referenced as V[0], as it would be for big endian, whereas just converting the scalar to a vector with a SUBREG_TO_REG operation leaves this value only in V[1]. A doubleword splat operation is one form of an XXPERMDI instruction, which takes one doubleword from a first operand and another doubleword from a second operand, with a two-bit selector operand indicating which doublewords are chosen. In the general case, an XXPERMDI can be permitted in a lane-swapped region provided that it is properly transformed to select the corresponding swapped values. This transformation is to reverse the order of the two input operands, and to reverse and complement the bits of the selector operand (derivation left as an exercise to the reader ;). A new test case that exercises the scalar-to-vector and generalized XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll. The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to use CHECK-DAG instead of CHECK for two independent instructions that now appear in reverse order. There are two small unrelated changes that are added with this patch. First, the XXSLDWI instruction was incorrectly omitted from the list of lane-sensitive instructions; this is now fixed. Second, I observed that the same webs were being rejected over and over again for different reasons. Since it's sufficient to reject a web only once, I added a check for this to speed up the compilation time slightly. llvm-svn: 242081
* [Hexagon] Move BitTracker into the llvm namespace and remove redundant ↵Benjamin Kramer2015-07-134-64/+52
| | | | | | | | qualifications No functional change intended. llvm-svn: 242062
* AMDGPU: Minor cleanups to always inline passMatt Arsenault2015-07-131-7/+4
| | | | llvm-svn: 242053
* Enable partial and runtime loop unrolling for NVPTX.Mark Heffernan2015-07-132-0/+14
| | | | | | | | | Enable partial and runtime loop unrolling for NVPTX backend via TTI::UnrollingPreferences with a small threshold. This partially unrolls small loops which are often unrolled by the PTX to SASS compiler and unrolling earlier can be beneficial. llvm-svn: 242049
* [WinEH] Strip the \01 character from the __CxxFrameHandler3 thunk nameReid Kleckner2015-07-131-3/+5
| | | | | | Add another C++ 32-bit EH table test. llvm-svn: 242044
* AMDGPU/SI: Select mad patterns to v_mac_f32Tom Stellard2015-07-137-8/+141
| | | | | | | | | The two-address instruction pass will convert these back to v_mad_f32 if necessary. Differential Revision: http://reviews.llvm.org/D11060 llvm-svn: 242038
* ARM: Fix cttz expansion on vector types.Logan Chien2015-07-131-2/+98
| | | | | | | | | | | | The 64/128-bit vector types are legal if NEON instructions are available. However, there was no matching patterns for @llvm.cttz.*() intrinsics and result in fatal error. This commit fixes the problem by lowering cttz to: a. ctpop((x & -x) - 1) b. width - ctlz(x & -x) - 1 llvm-svn: 242037
* [ARM] Handle commutativity when converting to tADDhirr in Thumb2Scott Douglass2015-07-131-3/+11
| | | | | | | | Also, run thumb_rewrite.s tests in Thumb2 now that they pass. Differential Revision: http://reviews.llvm.org/D11132 llvm-svn: 242036
* [ARM] Add Thumb2 ADD with SP narrowing from 3 operand to 2Scott Douglass2015-07-131-5/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D11131 llvm-svn: 242035
* [ARM] Small refactor of tryConvertingToTwoOperandForm (nfc)Scott Douglass2015-07-131-7/+10
| | | | | | | | | Also, add more Thumb2 ADD tests requested during review of http://reviews.llvm.org/D11053. Differential Revision: http://reviews.llvm.org/D11130 llvm-svn: 242034
* Removing several -Wunused-but-set-variable warnings; NFC intended.Aaron Ballman2015-07-131-26/+0
| | | | llvm-svn: 242028
* AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long ↵Elena Demikhovsky2015-07-134-152/+436
| | | | | | | | | | | | types. In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch. I temporary removed the old intrinsics test (just to split this patch). Half types are not covered here. Differential Revision: http://reviews.llvm.org/D11134 llvm-svn: 242023
* [ARM] Add support for nest attribute using r12Renato Golin2015-07-121-0/+3
| | | | | | | | | | | | | | | | Register r12 ('ip') is used by GCC for this purpose and hence is used here. As discussed on the GCC mailing list, the register choice is an ABI issue and so choosing the same register as GCC means __builtin_call_with_static_chain is compatible. A similar patch has just gone in the AArch64 backend, so this is just the ARM counterpart, following the same discussion. Patch by Stephen Cross. llvm-svn: 241996
* [X86][SSE] (V)PMINSB is commutable.Simon Pilgrim2015-07-121-3/+0
| | | | | | (V)PMINSB is no different to the other (V)PMIN/(V)PMAX B/D/W instructions - it is fully commutable. llvm-svn: 241994
* Trim trailing whitespaces. NFC.Simon Pilgrim2015-07-121-3/+3
| | | | llvm-svn: 241990
* [X86][SSE] Vectorized v4i32 non-uniform shifts.Simon Pilgrim2015-07-122-23/+70
| | | | | | | | | | While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized. This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together. Differential Revision: http://reviews.llvm.org/D11063 llvm-svn: 241989
* [PowerPC] Make use of the TargetRecip systemHal Finkel2015-07-123-15/+47
| | | | | | | | | | r238842 added the TargetRecip system for controlling use of reciprocal estimates for sqrt and division using a set of parameters that can be set by the frontend. Clang now supports a sophisticated -mrecip option, and this will allow that option to effectively control the relevant code-generation functionality of the PPC backend. llvm-svn: 241985
* [PowerPC] Support the nest parameter attributeHal Finkel2015-07-123-16/+51
| | | | | | | | | | | | | This adds support for the 'nest' attribute, which allows the static chain register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11 is the chain register (which the PPC64 ELF ABI calls the "environment pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded from the function descriptor, but providing an explicit 'nest' parameter will override that process and use the value provided. This allows __builtin_call_with_static_chain to work as expected on PowerPC. llvm-svn: 241984
* MC: Only allow changing feature bits in MCSubtargetInfoDuncan P. N. Exon Smith2015-07-101-1/+1
| | | | | | | | | | | | | | | | | Disallow all mutation of `MCSubtargetInfo` expect the feature bits. Besides deleting the assignment operators -- which were dead "code" -- this restricts `InitMCProcessorInfo()` to subclass initialization sequences, and exposes a new more limited function called `setDefaultFeatures()` for use by the ARMAsmParser `.cpu` directive. There's a small functional change here: ARMAsmParser used to adjust `MCSubtargetInfo::CPUSchedModel` as a side effect of calling `InitMCProcessorInfo()`, but I've removed that suspicious behaviour. Since the AsmParser shouldn't be doing any scheduling, there shouldn't be any observable change... llvm-svn: 241961
* AMDGPU: Fix chains for memory ops dependent on argument loadsMatt Arsenault2015-07-101-4/+19
| | | | | | | | | | | | | | | Most loads and stores are derived from pointers derived from a kernel argument load inserted during argument lowering. This was just using the EntryToken chain for the argument loads, and any users of these loads were also on the EntryToken chain. Return the chain of the lowered argument load so that dependent loads end up on the correct chain. No test since I'm not aware of any case where this actually broke. llvm-svn: 241960
* MC: Remove MCSubtargetInfo() default constructorDuncan P. N. Exon Smith2015-07-1014-41/+21
| | | | | | | | | | | | | | | | | | | | | Force all creators of `MCSubtargetInfo` to immediately initialize it, merging the default constructor and the initializer into an initializing constructor. Besides cleaning up the code a little, this makes it clear that the initializer is never called again later. Out-of-tree backends need a trivial change: instead of calling: auto *X = new MCSubtargetInfo(); InitXYZMCSubtargetInfo(X, ...); return X; they should call: return createXYZMCSubtargetInfoImpl(...); There's no real functionality change here. llvm-svn: 241957
* MC: Remove MCSubtargetInfo::InitCPUSched()Duncan P. N. Exon Smith2015-07-102-5/+1
| | | | | | | | | | | | Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`. We were only calling the former after explicitly calling the latter with the same CPU; it's confusing to have both methods exposed. Besides a minor (surely unmeasurable) speedup in ARM and X86 from avoiding running the logic twice, no functionality change. llvm-svn: 241956
* AMDGPU: Use requested chain when lowering argumentsMatt Arsenault2015-07-101-1/+1
| | | | | | | No test since I'm not aware of any case where this will end up being a different chain. llvm-svn: 241954
* ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920Matthias Braun2015-07-101-3/+3
| | | | llvm-svn: 241951
* Fix AArch64 prologue for empty frame with dynamic allocas.Evgeniy Stepanov2015-07-101-8/+4
| | | | | | | | Fixes PR23804: assertion failure in emitPrologue in the case of a function with an empty frame and a dynamic alloca that needs stack realignment. This is a typical case for AddressSanitizer. llvm-svn: 241943
* [TTI] BasicTTIImpl assumes no vector registersJingyue Wu2015-07-102-8/+0
| | | | | | | | | | | | | | | | | | | | | | | Summary: Following the discussion on r241884, it's more reasonable to assume that a target has no vector registers by default instead of letting every such target overrides getNumberOfRegisters. Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to return 0 when Vector is true, and partially reverts r241884 which modifies NVPTXTTIImpl::getNumberOfRegisters. It also fixes a performance bug in LoopVectorizer. Even if a target has no vector registers, vectorization may still help ILP. So, we need both checks to be false before disabling loop vectorization all together. Reviewers: hfinkel Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11108 llvm-svn: 241942
* ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common codeMatthias Braun2015-07-101-166/+186
| | | | | | | | | | | | This commit factors out common code from MergeBaseUpdateLoadStore() and MergeBaseUpdateLSMultiple() and introduces a new function MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a strd/ldrd instruction into an strd/ldrd instruction with writeback where possible. Differential Revision: http://reviews.llvm.org/D10676 llvm-svn: 241928
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-101-29/+96
| | | | | | Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 241926
* WebAssembly: basic instructions todo, and basic register info.JF Bastien2015-07-1016-19/+331
| | | | | | | | | | | | | | Summary: This code is based on AArch64 for modern backend good practice, and NVPTX for virtual ISA concerns. Reviewers: sunfish Subscribers: aemerson, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11070 llvm-svn: 241923
* Target RegisterInfo: devirtualize TargetFrameLoweringJF Bastien2015-07-108-61/+50
| | | | | | | | | | | | | Summary: The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can. This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here. Subscribers: sunfish, ted, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11093 llvm-svn: 241921
* ARMLoadStoreOptimizer: Rewrite LDM/STM matching logic.Matthias Braun2015-07-101-551/+481
| | | | | | | | | | | | | | | | | | | | | This improves the logic in several ways and is a preparation for followup patches: - First perform an analysis and create a list of merge candidates, then transform. This simplifies the code in that you have don't have to care to much anymore that you may be holding iterators to MachineInstrs that get removed. - Analyze/Transform basic blocks in reverse order. This allows to use LivePhysRegs to find free registers instead of the RegisterScavenger. The RegisterScavenger will become less precise in the future as it relies on the deprecated kill-flags. - Return the newly created node in MergeOps so there's no need to look around in the schedule to find it. - Rename some MBBI iterators to InsertBefore to make their role clear. - General code cleanup. Differential Revision: http://reviews.llvm.org/D10140 llvm-svn: 241920
* Actually support volatile memcpys in NVPTX loweringEli Bendersky2015-07-101-8/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D11091 llvm-svn: 241914
* NFC. Added a blank line for consistency.Nemanja Ivanovic2015-07-101-0/+1
| | | | llvm-svn: 241913
* Add missing builtins to the PPC back end for ABI compliance (vol. 3)Nemanja Ivanovic2015-07-101-0/+2
| | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D10973 Back end portion of the third round of additions to altivec.h. llvm-svn: 241900
* [NVPTX] declare no vector registersJingyue Wu2015-07-102-0/+8
| | | | | | | | | | | | | | | | | Summary: Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll) produces code with complex control flow which hurts later optimizations. Since NVPTX doesn't have vector registers in LLVM's sense (NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector registers to effectively disable loop vectorization. Reviewers: jholewinski Subscribers: jingyue, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11089 llvm-svn: 241884
OpenPOWER on IntegriCloud