summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AVR] Added AVRCallingConv.tdDylan McKay2015-12-203-1/+67
| | | | llvm-svn: 256130
* [X86] Use range-based for loop. NFCCraig Topper2015-12-201-3/+2
| | | | llvm-svn: 256127
* [X86] Prevent constant hoisting for a couple compare immediates that the ↵Craig Topper2015-12-201-1/+13
| | | | | | | | | | selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126
* Add AVR.td and AVRRegisterInfo.tdDylan McKay2015-12-204-1/+783
| | | | | | | | | | | | | | | | | | | | | Summary: This adds the core AVR TableGen file, along with the register descriptions. Lines in AVR.td which require other TableGen files which haven't been committed yet are commented out. This is a fairly trivial patch, and should only require a quick review. I kept the line width smaller than 80 columns, but there are a few exceptions because I'm not sure how to split a string over several lines. Reviewers: stoklund Subscribers: dylanmckay, agnat Differential Revision: http://reviews.llvm.org/D14684 llvm-svn: 256120
* Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for ↵Weiming Zhao2015-12-201-2/+2
| | | | | | | | | | | | | | | | | Thumb2 Summary: r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb. r250697: http://reviews.llvm.org/rL250697 Reviewers: rmaprath Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D15653 llvm-svn: 256115
* AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shadersTom Stellard2015-12-191-6/+5
| | | | | | | | | | | | | | | Summary: The analysis of shader inputs was completely wrong. We were passing the wrong index to AttributeSet::hasAttribute() and the logic for which inputs where in SGPRs was wrong too. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15608 llvm-svn: 256082
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-191-2/+2
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysRegNicolai Haehnle2015-12-191-6/+22
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15629 llvm-svn: 256073
* AMDGPU: fix overlapping copies in copyPhysRegNicolai Haehnle2015-12-191-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072
* [Hexagon] Add PIC supportKrzysztof Parzyszek2015-12-187-153/+155
| | | | llvm-svn: 256025
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022
* Revert "AMDGPU/SI: Test commit"Changpeng Fang2015-12-181-1/+1
| | | | | | This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021
* AMDGPU/SI: Test commitChangpeng Fang2015-12-181-1/+1
| | | | | | | | | | | | Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020
* [AArch64] Promote loads from storesJun Bum Lim2015-12-181-3/+280
| | | | | | | | | | | | | | This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004
* [mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV ↵Zlatko Buljan2015-12-186-67/+156
| | | | | | | | and WRDSP instructions Differential Revision: http://reviews.llvm.org/D14429 llvm-svn: 255991
* Remove unused class variables.Eric Christopher2015-12-171-5/+3
| | | | llvm-svn: 255939
* [X86] Use push-pop for materializing small constants under 'minsize'Hans Wennborg2015-12-175-4/+76
| | | | | | | | | | | | | Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936
* Revert "[AArch64] Add DAG combine for extract extend pattern"Matthew Simpson2015-12-171-19/+1
| | | | | | | This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928
* [WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF.Dan Gohman2015-12-171-2/+2
| | | | llvm-svn: 255925
* AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.Tom Stellard2015-12-171-2/+6
| | | | | | | | | | | | Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908
* AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndexNicolai Haehnle2015-12-173-9/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906
* Avoid explicit relocation sorting most of the time.Rafael Espindola2015-12-172-16/+14
| | | | | | | | | | These days relocations are created and stored in a deterministic way. The order they are created is also suitable for the .o file, so we don't need an explicit sort. The last remaining exception is MIPS. llvm-svn: 255902
* Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build"Rafael Espindola2015-12-171-2/+1
| | | | | | This reverts commit r255896. It broke the tests. llvm-svn: 255899
* Always sort by offset first. NFC.Rafael Espindola2015-12-172-6/+0
| | | | | | | Every target changing sortRelocs was first calling the parent implementation. Just run that first. llvm-svn: 255898
* Fix unused variable warning in release builds. NFC.Diego Novillo2015-12-171-3/+1
| | | | llvm-svn: 255897
* [AArch64] Enable PostRAScheduler for AArch64 generic buildMinSeong Kim2015-12-171-1/+2
| | | | | | | | | | | This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896
* [AArch64] Add DAG combine for extract extend patternMatthew Simpson2015-12-171-1/+19
| | | | | | | | | | This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895
* Simplify. NFC.Rafael Espindola2015-12-171-6/+3
| | | | llvm-svn: 255894
* [X86] Add option for enabling LEA optimization pass, by Andrey TuretskyAlexey Bataev2015-12-171-1/+5
| | | | | | | Add option to enable/disable LEA optimization pass. By default the pass is disabled. Differential Revision: http://reviews.llvm.org/D15573 llvm-svn: 255881
* [WebAssembly] Convert WebAssemblyTargetObjectFile to TargetLoweringObjectFileELFDan Gohman2015-12-174-42/+30
| | | | llvm-svn: 255877
* AArch64: Simplify emitEpilogue() and related code; NFCMatthias Braun2015-12-171-24/+25
| | | | | | This is in preparation to an upcoming patch. llvm-svn: 255872
* [WebAssembly] Experimental ELF writer supportDan Gohman2015-12-178-12/+291
| | | | | | | | | This creates the initial infrastructure for writing ELF output files. It doesn't yet have any implementation for encoding instructions. Differential Revision: http://reviews.llvm.org/D15555 llvm-svn: 255869
* WebAssembly: update expected torture test failuresJF Bastien2015-12-171-14/+0
| | | | | | We now have 240 expected failures. llvm-svn: 255858
* [WebAssembly] Fix legalization of shift operators on large integer types.Dan Gohman2015-12-161-0/+7
| | | | llvm-svn: 255847
* [WebAssembly] Implement eliminateCallFramePseudoDerek Schuff2015-12-166-31/+58
| | | | | | | | | | | | | | | | | | Summary: Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN pseudo-instructions. Add a test calling a vararg function which causes non-0 adjustments. This revealed an issue with RegisterCoalescer wherein it eliminates a COPY from SP32 to a vreg but failes to update the live ranges of EXPR_STACK, causing a machineinstr verifier failure (so this test is commented out). Also add a dynamic alloca test, which causes a callseq_end dag node with a 0 (instead of undef) second argument to be generated. We currently fail to select that, so adjust the ADJCALLSTACKUP tablegen code to handle it. Differential Revision: http://reviews.llvm.org/D15587 llvm-svn: 255844
* [AArch64] Simplify some TRI/TII getters. NFC.Ahmed Bougacha2015-12-161-7/+6
| | | | | | We don't need static_casts when we use the right Subtarget. llvm-svn: 255836
* [CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC.Ahmed Bougacha2015-12-161-13/+11
| | | | | | | | This matches the other MIB methods, none of which modify the builder. Without this, we can't chain copyImplicitOps. Also reformat the few users, in PPCEarlyReturn. llvm-svn: 255828
* CXX_FAST_TLS calling convention: performance improvement for AArch64.Manman Ren2015-12-167-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821
* Remove now-unused includeDerek Schuff2015-12-161-1/+0
| | | | llvm-svn: 255817
* Iterate over phys regs insteadDerek Schuff2015-12-162-4/+10
| | | | llvm-svn: 255816
* [WebAssembly] Print an extra local decl when the user stack pointer is usedDerek Schuff2015-12-161-0/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D15546 llvm-svn: 255815
* [Hexagon] Misc fixes to r255807Krzysztof Parzyszek2015-12-161-8/+3
| | | | llvm-svn: 255811
* [Hexagon] Update the Hexagon packetizerKrzysztof Parzyszek2015-12-164-888/+1235
| | | | llvm-svn: 255807
* Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions"Reid Kleckner2015-12-1612-784/+6
| | | | | | This reverts commit r255762. llvm-svn: 255806
* [WebAssembly] Fix the CFG Stackifier to handle unoptimized branchesDan Gohman2015-12-161-2/+14
| | | | | | | If a branch both branches to and falls through to the same block, treat it as an explicit branch. llvm-svn: 255803
* AMDGPU: Override getCFInstrCostMatt Arsenault2015-12-162-0/+13
| | | | | | The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796
* [WebAssembly] Use the new offset syntax for memory operands in inline asm.Dan Gohman2015-12-161-1/+3
| | | | llvm-svn: 255788
* [SystemZ] Sort relocs to avoid code corruption by linker optimizationUlrich Weigand2015-12-161-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SystemZ linkers provide an optimization to transform a general- or local-dynamic TLS sequence into an initial-exec sequence if possible. Do do that, the compiler generates a function call to __tls_get_offset, which is a brasl instruction annotated with *two* relocations: - a R_390_PLT32DBL to install __tls_get_offset as branch target - a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that the TLS optimization should be performed if possible If the optimization is performed, the brasl is replaced by an ld load instruction. However, *both* relocs are processed independently by the linker. Therefore it is crucial that the R_390_PLT32DBL is processed *first* (installing the branch target for the brasl) and the R_390_TLS_GDCALL is processed *second* (replacing the whole brasl with an ld). If the relocs are swapped, the linker will first replace the brasl with an ld, and *then* install the __tls_get_offset branch target offset. Since ld has a different layout than brasl, this may even result in a completely different (or invalid) instruction; in any case, the resulting code is corrupted. Unfortunately, the way the MC common code sorts relocations causes these two to *always* end up the wrong way around, resulting in wrong code generation by the linker and crashes. This patch overrides the sortRelocs routine to detect this particular pair of relocs and enforce the required order. llvm-svn: 255787
* [SystemZ] Fix assertion failure in adjustSubwordCmpUlrich Weigand2015-12-161-2/+2
| | | | | | | | | | | | | | | | | | | | When comparing a zero-extended value against a constant small enough to be in range of the inner type, it doesn't matter whether a signed or unsigned compare operation (for the outer type) is being used. This is why the code in adjustSubwordCmp had this assertion: assert(C.ICmpType == SystemZICMP::Any && "Signedness shouldn't matter here."); assuming the the caller had already detected that fact. However, it turns out that there cases, in particular with always-true or always- false conditions that have not been eliminated when compiling at -O0, where this is not true. Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any here, we can simply *set* it safely to SystemZICMP::Any, however. llvm-svn: 255786
* [Hexagon] Make memcpy lowering thread-safeTobias Edler von Koch2015-12-163-21/+38
| | | | | | | | This removes an unpleasant hack involving a global variable for special lowering of certain memcpy calls. These are now lowered as intended in EmitTargetCodeForMemcpy in the same way that other targets do it. llvm-svn: 255785
OpenPOWER on IntegriCloud