summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Add PIC supportKrzysztof Parzyszek2015-12-187-153/+155
| | | | llvm-svn: 256025
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022
* Revert "AMDGPU/SI: Test commit"Changpeng Fang2015-12-181-1/+1
| | | | | | This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020
* [AArch64] Promote loads from storesJun Bum Lim2015-12-181-3/+280
| | | | | | | | | | | | | | This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004
* [mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV ↵Zlatko Buljan2015-12-186-67/+156
| | | | | | | | and WRDSP instructions Differential Revision: http://reviews.llvm.org/D14429 llvm-svn: 255991
* Remove unused class variables.Eric Christopher2015-12-171-5/+3
| | | | llvm-svn: 255939
* [X86] Use push-pop for materializing small constants under 'minsize'Hans Wennborg2015-12-175-4/+76
| | | | | | | | | | | | | Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936
* Revert "[AArch64] Add DAG combine for extract extend pattern"Matthew Simpson2015-12-171-19/+1
| | | | | | | This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928
* [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.Dan Gohman2015-12-171-2/+2
| | | | llvm-svn: 255925
* AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.Tom Stellard2015-12-171-2/+6
| | | | | | | | | | | | Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908
* AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndexNicolai Haehnle2015-12-173-9/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906
* Avoid explicit relocation sorting most of the time.Rafael Espindola2015-12-172-16/+14
| | | | | | | | | | These days relocations are created and stored in a deterministic way. The order they are created is also suitable for the .o file, so we don't need an explicit sort. The last remaining exception is MIPS. llvm-svn: 255902
* Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build"Rafael Espindola2015-12-171-2/+1
| | | | | | This reverts commit r255896. It broke the tests. llvm-svn: 255899
* Always sort by offset first. NFC.Rafael Espindola2015-12-172-6/+0
| | | | | | | Every target changing sortRelocs was first calling the parent implementation. Just run that first. llvm-svn: 255898
* Fix unused variable warning in release builds. NFC.Diego Novillo2015-12-171-3/+1
| | | | llvm-svn: 255897
* [AArch64] Enable PostRAScheduler for AArch64 generic buildMinSeong Kim2015-12-171-1/+2
| | | | | | | | | | | This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896
* [AArch64] Add DAG combine for extract extend patternMatthew Simpson2015-12-171-1/+19
| | | | | | | | | | This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895
* Simplify. NFC.Rafael Espindola2015-12-171-6/+3
| | | | llvm-svn: 255894
* [X86] Add option for enabling LEA optimization pass, by Andrey TuretskyAlexey Bataev2015-12-171-1/+5
| | | | | | | Add option to enable/disable LEA optimization pass. By default the pass is disabled. Differential Revision: http://reviews.llvm.org/D15573 llvm-svn: 255881
* [WebAssembly] Convert WebAssemblyTargetObjectFile to TargetLoweringObjectFileELFDan Gohman2015-12-174-42/+30
| | | | llvm-svn: 255877
* AArch64: Simplify emitEpilogue() and related code; NFCMatthias Braun2015-12-171-24/+25
| | | | | | This is in preparation to an upcoming patch. llvm-svn: 255872
* [WebAssembly] Experimental ELF writer supportDan Gohman2015-12-178-12/+291
| | | | | | | | | This creates the initial infrastructure for writing ELF output files. It doesn't yet have any implementation for encoding instructions. Differential Revision: http://reviews.llvm.org/D15555 llvm-svn: 255869
* WebAssembly: update expected torture test failuresJF Bastien2015-12-171-14/+0
| | | | | | We now have 240 expected failures. llvm-svn: 255858
* [WebAssembly] Fix legalization of shift operators on large integer types.Dan Gohman2015-12-161-0/+7
| | | | llvm-svn: 255847
* [WebAssembly] Implement eliminateCallFramePseudoDerek Schuff2015-12-166-31/+58
| | | | | | | | | | | | | | | | | | Summary: Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN pseudo-instructions. Add a test calling a vararg function which causes non-0 adjustments. This revealed an issue with RegisterCoalescer wherein it eliminates a COPY from SP32 to a vreg but failes to update the live ranges of EXPR_STACK, causing a machineinstr verifier failure (so this test is commented out). Also add a dynamic alloca test, which causes a callseq_end dag node with a 0 (instead of undef) second argument to be generated. We currently fail to select that, so adjust the ADJCALLSTACKUP tablegen code to handle it. Differential Revision: http://reviews.llvm.org/D15587 llvm-svn: 255844
* [AArch64] Simplify some TRI/TII getters. NFC.Ahmed Bougacha2015-12-161-7/+6
| | | | | | We don't need static_casts when we use the right Subtarget. llvm-svn: 255836
* [CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC.Ahmed Bougacha2015-12-161-13/+11
| | | | | | | | This matches the other MIB methods, none of which modify the builder. Without this, we can't chain copyImplicitOps. Also reformat the few users, in PPCEarlyReturn. llvm-svn: 255828
* CXX_FAST_TLS calling convention: performance improvement for AArch64.Manman Ren2015-12-167-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821
* Remove now-unused includeDerek Schuff2015-12-161-1/+0
| | | | llvm-svn: 255817
* Iterate over phys regs insteadDerek Schuff2015-12-162-4/+10
| | | | llvm-svn: 255816
* [WebAssembly] Print an extra local decl when the user stack pointer is usedDerek Schuff2015-12-161-0/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D15546 llvm-svn: 255815
* [Hexagon] Misc fixes to r255807Krzysztof Parzyszek2015-12-161-8/+3
| | | | llvm-svn: 255811
* [Hexagon] Update the Hexagon packetizerKrzysztof Parzyszek2015-12-164-888/+1235
| | | | llvm-svn: 255807
* Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions"Reid Kleckner2015-12-1612-784/+6
| | | | | | This reverts commit r255762. llvm-svn: 255806
* [WebAssembly] Fix the CFG Stackifier to handle unoptimized branchesDan Gohman2015-12-161-2/+14
| | | | | | | If a branch both branches to and falls through to the same block, treat it as an explicit branch. llvm-svn: 255803
* AMDGPU: Override getCFInstrCostMatt Arsenault2015-12-162-0/+13
| | | | | | The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796
* [WebAssembly] Use the new offset syntax for memory operands in inline asm.Dan Gohman2015-12-161-1/+3
| | | | llvm-svn: 255788
* [SystemZ] Sort relocs to avoid code corruption by linker optimizationUlrich Weigand2015-12-161-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SystemZ linkers provide an optimization to transform a general- or local-dynamic TLS sequence into an initial-exec sequence if possible. Do do that, the compiler generates a function call to __tls_get_offset, which is a brasl instruction annotated with *two* relocations: - a R_390_PLT32DBL to install __tls_get_offset as branch target - a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that the TLS optimization should be performed if possible If the optimization is performed, the brasl is replaced by an ld load instruction. However, *both* relocs are processed independently by the linker. Therefore it is crucial that the R_390_PLT32DBL is processed *first* (installing the branch target for the brasl) and the R_390_TLS_GDCALL is processed *second* (replacing the whole brasl with an ld). If the relocs are swapped, the linker will first replace the brasl with an ld, and *then* install the __tls_get_offset branch target offset. Since ld has a different layout than brasl, this may even result in a completely different (or invalid) instruction; in any case, the resulting code is corrupted. Unfortunately, the way the MC common code sorts relocations causes these two to *always* end up the wrong way around, resulting in wrong code generation by the linker and crashes. This patch overrides the sortRelocs routine to detect this particular pair of relocs and enforce the required order. llvm-svn: 255787
* [SystemZ] Fix assertion failure in adjustSubwordCmpUlrich Weigand2015-12-161-2/+2
| | | | | | | | | | | | | | | | | | | | When comparing a zero-extended value against a constant small enough to be in range of the inner type, it doesn't matter whether a signed or unsigned compare operation (for the outer type) is being used. This is why the code in adjustSubwordCmp had this assertion: assert(C.ICmpType == SystemZICMP::Any && "Signedness shouldn't matter here."); assuming the the caller had already detected that fact. However, it turns out that there cases, in particular with always-true or always- false conditions that have not been eliminated when compiling at -O0, where this is not true. Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any here, we can simply *set* it safely to SystemZICMP::Any, however. llvm-svn: 255786
* [Hexagon] Make memcpy lowering thread-safeTobias Edler von Koch2015-12-163-21/+38
| | | | | | | | This removes an unpleasant hack involving a global variable for special lowering of certain memcpy calls. These are now lowered as intended in EmitTargetCodeForMemcpy in the same way that other targets do it. llvm-svn: 255785
* [WebAssembly] Support more kinds of inline asm operandsDan Gohman2015-12-161-4/+24
| | | | llvm-svn: 255782
* [ARM] Add ARMv8.2-A FP16 vector instructionsOliver Stannard2015-12-164-42/+430
| | | | | | | | | | | | | | ARMv8.2-A adds 16-bit floating point versions of all existing SIMD floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Note that VFP without SIMD is not a valid combination for any version of ARMv8-A, but I have ensured that these instructions all depend on both FeatureNEON and FeatureFullFP16 for consistency. Differential Revision: http://reviews.llvm.org/D15039 llvm-svn: 255764
* [ARM] Add ARMv8.2-A FP16 scalar instructionsOliver Stannard2015-12-1612-6/+784
| | | | | | | | | | | | | | | | | | | | | | | | | | | ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. The assembly for these instructions uses S registers (AArch32 does not have H registers), but the instructions have ".f16" type specifiers rather than ".f32" or ".f64". The top 16 bits of each source register are ignored, and the top 16 bits of the destination register are set to zero. These instructions are mostly the same as the 32- and 64-bit versions, but they use coprocessor 9 rather than 10 and 11. Two new instructions, VMOVX and VINS, have been added to allow packing and extracting two 16-bit floats stored in the top and bottom halves of an S register. New fixup kinds have been added for the PC-relative load and store instructions, but no ELF relocations have been added as they have a range of 512 bytes. Differential Revision: http://reviews.llvm.org/D15038 llvm-svn: 255762
* [X86] Improve shift combiningMichael Kuperstein2015-12-161-0/+57
| | | | | | | | | | | | | | | | | | | | This folds (ashr (shl a, [56,48,32,24,16]), SarConst) into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or into (lshr, (sext (a), SarConst - [56,48,32,24,16])) depending on sign of (SarConst - [56,48,32,24,16]) sexts in X86 are MOVs. The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size). However the MOVs have 2 advantages to SHIFTs on x86: 1. MOVs can write to a register that differs from source. 2. MOVs accept memory operands. This fixes PR24373. Patch by: evgeny.v.stupachenko@intel.com Differential Revision: http://reviews.llvm.org/D13161 llvm-svn: 255761
* [WinEH] Make llvm.x86.seh.recoverfp work on x64Reid Kleckner2015-12-152-13/+23
| | | | | | | | | | | It adjusts from RSP-after-prologue to RBP, which is what SEH filters need to do before they can use llvm.localrecover. Fixes SEH filter captures, which were broken in r250088. Issue reported by Alex Crichton. llvm-svn: 255707
* Fix "Not having LAHF/SAHF" assert.Hans Wennborg2015-12-151-1/+2
| | | | | | It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703
* AMDGPU/SI: Set the code object work group segment size when targeting HSATom Stellard2015-12-151-0/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702
* [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)Sanjay Patel2015-12-151-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves on the suggested codegen from PR24475: https://llvm.org/bugs/show_bug.cgi?id=24475 but only for the fmaxf() case to start, so we can sort out any bugs before extending to fmin, f64, and vectors. The fmax / maxnum definitions provide us flexibility for signed zeros, so the only thing we have to worry about in this replacement sequence is NaN handling. Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes a problem: SelectionDAGBuilder::visitSelect() transforms compare/select instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps that should be checking for NaN inputs or global unsafe-math before transforming? As it stands, that bypasses a big set of optimizations that the x86 backend already has in PerformSELECTCombine(). Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so we have completely unnecessary operations happening on undef elements of the vector. Differential Revision: http://reviews.llvm.org/D15294 llvm-svn: 255700
* [Sparc] Tweak r255668: Use llvm_unreachable.James Y Knight2015-12-151-1/+1
| | | | llvm-svn: 255698
OpenPOWER on IntegriCloud