summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-012-14/+100
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* [AArch64][RegisterBankInfo] Use the helper functions for the checksQuentin Colombet2016-09-301-29/+26
| | | | | | | | This makes sure the helper functions work as expected. NFC. llvm-svn: 282961
* [AArch64][RegisterBankInfo] Rename getValueMappingIdx to getValueMappingQuentin Colombet2016-09-302-9/+11
| | | | | | | | We don't return index, we return the actual ValueMapping. NFC. llvm-svn: 282960
* [AArch64][RegisterBankInfo] Compress the ValueMapping table a bit.Quentin Colombet2016-09-302-46/+38
| | | | | | | | | | We don't need to have singleton ValueMapping on their own, we can just reuse one of the elements of the 3-ops mapping. This allows even more code sharing. NFC. llvm-svn: 282959
* [AArch64][RegisterBankInfo] Refactor the code to access AArch64::ValMappingQuentin Colombet2016-09-302-24/+23
| | | | | | | | | Use a helper function to access ValMapping. This should make the code easier to understand and maintain. NFC. llvm-svn: 282958
* [AArch64][RegisterBankInfo] Rename getRegBankIdx to getRegBankIdxOffsetQuentin Colombet2016-09-302-17/+22
| | | | | | | | | The function name did not make it clear that the returned value was an offset to apply to a register bank index. NFC. llvm-svn: 282957
* [AArch64][RegisterBankInfo] Use the static opds mapping for alt mappingsQuentin Colombet2016-09-301-14/+7
| | | | | | | | Avoid to rely on the dynamically allocated operands mapping for the alternative mapping. NFC. llvm-svn: 282956
* X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)Hans Wennborg2016-09-302-6/+6
| | | | | | | | | | | We can't use Jcc to leave a Win64 function in general, because that confuses the unwinder. However, for "leaf" functions, that is, functions where the return address is always on top of the stack and which don't have unwind info, it's OK. Differential Revision: https://reviews.llvm.org/D24836 llvm-svn: 282920
* [WebAssembly] Make register stackification more conservativeDerek Schuff2016-09-301-19/+15
| | | | | | | | | | | | Register stackification currently checks VNInfo for changes. Make that more accurate by testing each intervening instruction for any other defs to the same virtual register. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D24942 llvm-svn: 282886
* [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa versionKonstantin Zhuravlyov2016-09-305-16/+69
| | | | | | Differential Revision: https://reviews.llvm.org/D24973 llvm-svn: 282877
* [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier ↵Konstantin Zhuravlyov2016-09-302-2/+9
| | | | | | | | instruction Differential Revision: https://reviews.llvm.org/D24985 llvm-svn: 282875
* [AMDGPU] Do not run scalar optimization passes at "-O0"Konstantin Zhuravlyov2016-09-301-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D25055 llvm-svn: 282873
* [AVR] Add the ELF object file writerDylan McKay2016-09-302-0/+128
| | | | | | | | | | | | Summary: This adds the ELF32 writer for AVR. Reviewers: kparzysz Subscribers: beanz, mgorny Differential Revision: https://reviews.llvm.org/D25031 llvm-svn: 282856
* [AVR] Add the assembly instruction printerDylan McKay2016-09-306-2/+260
| | | | | | | | | | | | | | | | | Summary: This change adds the AVR assembly instruction printer. No tests are included in this patch. I have left them downstream so we can add them once `llc` successfully runs (there's very few components left to upstream until this). Reviewers: arsenm, kparzysz Subscribers: wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25028 llvm-svn: 282854
* [AVX-512] Store address operand should be an input operand for the special ↵Craig Topper2016-09-301-4/+4
| | | | | | stack spilling pseudos for XMM16-31 and YMM16-31 without VLX. llvm-svn: 282843
* [AVX-512] Add the special stack spilling pseudos for XMM16-31 and YMM16-31 ↵Craig Topper2016-09-301-0/+8
| | | | | | without VLX to teh isFrameLoadOpcode and isFrameStoreOpcode. llvm-svn: 282842
* Revert r282835 "[AVX-512] Always use the full 32 register vector classes for ↵Craig Topper2016-09-301-15/+30
| | | | | | | | addRegisterClass regardless of whether AVX512/VLX is enabled or not." Turns out this doesn't pass verify-machineinstrs. llvm-svn: 282841
* [X86] Add AVX-512 VTs to findRepresentativeClass as well as v16i16 which was ↵Craig Topper2016-09-301-3/+5
| | | | | | | | also missing. Change register class to include the extra 16 AVX512 registers. I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar. llvm-svn: 282836
* [AVX-512] Always use the full 32 register vector classes for ↵Craig Topper2016-09-301-30/+15
| | | | | | | | | | addRegisterClass regardless of whether AVX512/VLX is enabled or not. If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of. I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change. llvm-svn: 282835
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-306-19/+19
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* [X86] Don't preserve Win64 SSE CSRs when SSE is disabledReid Kleckner2016-09-302-2/+9
| | | | | | | | | Code that doesn't use floating point and doesn't use SSE (kernel code) shouldn't save and restore SSE registers. Fixes PR30503 llvm-svn: 282819
* [AArch64][RegisterBankInfo] Use static mapping for 3-operands instrs.Quentin Colombet2016-09-301-0/+50
| | | | | | | | This uses a TableGen'ed like structure for all 3-operands instrs. The output of the RegBankSelect pass should be identical but the RegisterBankInfo will do less dynamic allocations. llvm-svn: 282817
* [AArch64][RegisterBankInfo] Add static value mapping for 3-op instrs.Quentin Colombet2016-09-302-12/+58
| | | | | | | This is the kind of input TableGen should generate at some point. NFC. llvm-svn: 282816
* [AArch64][RegisterBankInfo] Check the statically created ValueMapping.Quentin Colombet2016-09-301-0/+18
| | | | | | | | | Make sure that the ValueMappings contain the value we expect at the indices we expect. NFC. llvm-svn: 282815
* [X86] Avoid "unused" warnings if no assertsDouglas Katzman2016-09-291-2/+4
| | | | llvm-svn: 282732
* [X86][SSE] Added common helper for shuffle mask constant pool decodes.Simon Pilgrim2016-09-291-164/+136
| | | | | | | | | | The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data. This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements. Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future). llvm-svn: 282720
* Revert "[AVR] Add instruction selection lowering code"Dylan McKay2016-09-293-1954/+2
| | | | | | I accidentally comitted it. llvm-svn: 282712
* [AVR] Add instruction selection lowering codeDylan McKay2016-09-293-2/+1954
| | | | | | | | | | | | Summary: This adds AVRISelLowering.cpp Reviewers: kparzysz, arsenm Subscribers: wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25034 llvm-svn: 282711
* [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.Craig Topper2016-09-292-8/+137
| | | | | | | | This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used. Fixes one of the cases from PR29112. llvm-svn: 282690
* [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 ↵Craig Topper2016-09-291-2/+6
| | | | | | (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition. llvm-svn: 282688
* [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution ↵Craig Topper2016-09-291-0/+10
| | | | | | domain fixing table. llvm-svn: 282687
* [X86] Remove AddedComplexity adjustments that don't seem to be needed.Craig Topper2016-09-291-6/+4
| | | | llvm-svn: 282686
* [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.Craig Topper2016-09-291-1/+2
| | | | llvm-svn: 282684
* Remove an unnecessary duplicate initialization of TLOF from the MipsEric Christopher2016-09-291-4/+0
| | | | | | | | | | | AsmPrinter. This was reinitializing the Mangler after we moved the Mangler down to TLOF and causing us to have two different unnamed global values accessed with the same name. This should fix the problems on the ubsan tests here: http://lab.llvm.org:8011/builders/clang-cmake-mips/builds/15307 llvm-svn: 282675
* Update comment about initializing TLOF with a pointer at the previousEric Christopher2016-09-291-1/+3
| | | | | | line or the other commented out place. llvm-svn: 282673
* AMDGPU: Partially fix control flow at -O0Matt Arsenault2016-09-297-21/+426
| | | | | | | | | | | | | | | Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
* AArch64: Set shift bit of TLSLE HI12 add instructionLei Liu2016-09-291-0/+8
| | | | | | | | | | | | Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000. Reviewers: t.p.northover, peter.smith, rovka Subscribers: salim.nasser, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D24702 llvm-svn: 282661
* [RegisterBankInfo] Uniquely generate OperandsMapping.Quentin Colombet2016-09-281-16/+27
| | | | | | | | | | | This is a step toward statically allocate InstructionMapping. Like the previous few commits, the goal is to move toward a TableGen'ed like structure with no dynamic allocation at all. This should already improve compile time by getting rid of a bunch of memmove of SmallVectors. llvm-svn: 282643
* [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit ↵Konstantin Zhuravlyov2016-09-282-3/+238
| | | | | | | | instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624
* [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.Artem Belevich2016-09-287-16/+263
| | | | | | | | These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-09-281-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604
* [AVR] Rename the builtin calling convention namesDylan McKay2016-09-281-3/+3
| | | | | | 'BUILTIN' is clearer than 'RT' in this context. llvm-svn: 282602
* [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)Marina Yatsina2016-09-281-0/+6
| | | | | | | | | | Implement 'retn' simply by aliasing it to the relevant 'ret' instruction Commit on behalf of coby Differential Revision: https://reviews.llvm.org/D24346 llvm-svn: 282601
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-09-281-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600
* [AVR] Import the LLVM namespace inside AVRMCTargetDesc.cppDylan McKay2016-09-281-0/+2
| | | | llvm-svn: 282598
* [AVR] Add AVRMCTargetDesc.cppDylan McKay2016-09-283-4/+97
| | | | | | | | | | | | | | Summary: This adds the AVRMCTargetDesc file in tree. It allows creation of the core classes used in the backend. Reviewers: arsenm, kparzysz Subscribers: wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25023 llvm-svn: 282597
* [AVR] Update the signature of createAVRAsmBackendDylan McKay2016-09-281-1/+3
| | | | | | It has been recently changed to also take a MCTargetOptions structure. llvm-svn: 282594
* [AVR] Enable the assembly parserDylan McKay2016-09-282-15/+18
| | | | | | | | We very recently landed the code. This commit enables the parser. It also adds a missing include to AVRAsmParser.cpp llvm-svn: 282593
* [AVR] Merge most recent changes to AVRInstrInfo.tdDylan McKay2016-09-281-21/+85
| | | | | | | | | This adds two new things: - Operand types per fixup - Atomic pseudo operations llvm-svn: 282588
* [AVR] Update the data layoutDylan McKay2016-09-281-1/+3
| | | | | | | | | | | | | | | The previous data layout caused issues when dealing with atomics. Foe example, it is illegal to load a 16-bit value with less than 16-bits of alignment. This changes the data layout so that all types are aligned by at least their own width. Interestingly, this also _slightly_ decreased register pressure in some cases. llvm-svn: 282587
OpenPOWER on IntegriCloud