summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [mips] Removed the SHF_ALLOC flag from the .pdr section.Scott Egerton2016-02-151-1/+1
| | | | | | | | | | | | | | | | | Summary: This section is used for debug information and has no need to be in memory at runtime. With this patch, LLVM now emits the same flags as the GNU assembler. This patch also fixes an error when compiling the Linux kernel, The error is that there are relocations within the .pdr section in a VDSO. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17199 llvm-svn: 260879
* AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 ↵Igor Breger2016-02-153-20/+40
| | | | | | | | | | are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878
* Support: Fix incremental build when re-configuring targetsDuncan P. N. Exon Smith2016-02-131-1/+1
| | | | | | | | | | | | | | r180893 added an indirect include of llvm/Config/Targets.def to llvm/Support/CodeGen.h, which in turn is included by things like llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to rebuild 1274 files after reconfiguring. This commit strips CodeGen.h back down to just a pile of enums and moves the expensive includes over to CodeGenCWrappers.h (which is only included in two places). This gets ninja down to 88 files if you reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86. llvm-svn: 260835
* [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shufflesSimon Pilgrim2016-02-131-0/+166
| | | | | | | | | | | | | | | | | | This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834
* Remove Proc feature flags for X86 processors that are used to inherit ↵Craig Topper2016-02-132-38/+34
| | | | | | features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats. llvm-svn: 260832
* [x86-64] allow mfence even with -mno-sse (PR23203)Sanjay Patel2016-02-134-10/+11
| | | | | | | | | | | | | As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828
* [Hexagon] Replace use of "std::map::emplace" with "insert"Krzysztof Parzyszek2016-02-131-1/+4
| | | | | | | Gcc 4.7.2-4 does not seem to have "emplace" in its implementation of map. This should fix the build failure on polly-amd64-linux. llvm-svn: 260816
* HexagonFrameLowering.cpp: Appease msc18 to give an explicit constructor ↵NAKAMURA Takumi2016-02-131-2/+4
| | | | | | SlotInfo() instead of member initializers. llvm-svn: 260812
* AMDGPU: Prepare for reducing private element size.Matt Arsenault2016-02-131-14/+48
| | | | | | | | | | | | Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804
* AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsicTom Stellard2016-02-131-0/+11
| | | | | | | This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792
* AMDGPU: Cleanup includes and random macrosMatt Arsenault2016-02-131-11/+4
| | | | llvm-svn: 260784
* AMDGPU: Add intrinsics for sin/cosMatt Arsenault2016-02-132-1/+18
| | | | | | | These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782
* AMDGPU: Rename intrinsic to better match instruction nameMatt Arsenault2016-02-137-9/+9
| | | | | | Also fixes missing f32 test. llvm-svn: 260780
* AMDGPU/SI: Add instruction defs for VOP1 DPP instructionsTom Stellard2016-02-132-0/+107
| | | | | | | | | | Reviewers: nhaustov, cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17159 llvm-svn: 260774
* AMDGPU: Fix broken condition causing warningMatt Arsenault2016-02-131-1/+1
| | | | llvm-svn: 260773
* Fix Windows buildbot breakage.Alexey Samsonov2016-02-121-3/+4
| | | | llvm-svn: 260766
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-1215-41/+266
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* Disable the vzeroupper insertion pass on PS4.Yunzhong Gao2016-02-124-2/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764
* [WebAssembly] Report more meaningful error messages for some unsupportedDerek Schuff2016-02-122-4/+17
| | | | | | | | | ops. Computed gotos and RETURNADDR may never be supported; we can do FRAMEADDR in the future. llvm-svn: 260759
* [Hexagon] Optimize stack slot spillsKrzysztof Parzyszek2016-02-125-3/+1089
| | | | | | | | | | | | | | Replace spills to memory with spills to registers, if possible. This applies mostly to predicate registers (both scalar and vector), since they are very limited in number. A spill of a predicate register may happen even if there is a general-purpose register available. In cases like this the stack spill/reload may be eliminated completely. This optimization will consider all stack objects, regardless of where they came from and try to match the live range of the stack slot with a dead range of a register from an appropriate register class. llvm-svn: 260758
* [Hexagon] Mark HVX registers as volatileKrzysztof Parzyszek2016-02-121-1/+7
| | | | llvm-svn: 260753
* [WebAssembly] Update test expectations after r260737Derek Schuff2016-02-121-13/+1
| | | | llvm-svn: 260750
* [Hexagon] Recognize more cases in copyPhysReg and stack slot load/storeKrzysztof Parzyszek2016-02-121-51/+105
| | | | llvm-svn: 260748
* [WebAssembly] Fix byval for empty types.Dan Gohman2016-02-121-2/+1
| | | | llvm-svn: 260740
* [AArch64] Enable post-RA MI scheduler for Kryo.Chad Rosier2016-02-121-1/+1
| | | | | | This should have landed in r260686. llvm-svn: 260739
* [WebAssembly] Fix insertion of a BLOCK in a loop header that also ends a BLOCK.Dan Gohman2016-02-121-1/+3
| | | | llvm-svn: 260737
* [Hexagon] Recognize more instructions in isLoadFromStackSlot/isStoreToStackSlotKrzysztof Parzyszek2016-02-121-19/+99
| | | | llvm-svn: 260725
* [Hexagon] Add utility functions to detect sign- and zero-extending loadsKrzysztof Parzyszek2016-02-122-1/+161
| | | | llvm-svn: 260698
* [Hexagon] Replace expansion of spill pseudo-instructions in frame loweringKrzysztof Parzyszek2016-02-122-320/+475
| | | | | | | | | Rewrite the code to handle all pseudo-instructions in a single pass. This temporarily reverts spill slot optimization that used general- purpose registers to hold values of spilled predicate registers. llvm-svn: 260696
* [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD ↵Tom Stellard2016-02-122-3/+3
| | | | | | | | | | | | | | assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694
* AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.Changpeng Fang2016-02-121-4/+10
| | | | | | | | | | | | | | | Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692
* [Hexagon] Remove HexagonExpandPredSpillCode passKrzysztof Parzyszek2016-02-123-203/+0
| | | | | | This code is dead. The expansion is now done in HexagonFrameLowering. llvm-svn: 260691
* [Hexagon] Eliminate pseudo instructions for circ/brev loads and storesKrzysztof Parzyszek2016-02-125-461/+192
| | | | | | | | | We can generate the actual instructions from the intrinsics without the need for pseudo-instructions. Also, since the intrinsics have a side- effect in a form of a store, attempt to optimize away loads from the store location. llvm-svn: 260690
* [AArch64] Reduce number of callee-save save/restores.Geoff Berry2016-02-121-126/+160
| | | | | | | | | | | | | | | | | | | | | Summary: Before this change, callee-save registers would be rounded up to even pairs of GPRs and FPRs. This change eliminates these extra padding load/stores, though it does keep the stack allocation the same size unless both the GPR and FPR sets have an odd size, in which case one full pair stack slot (16 bytes) is saved. This optimization cannot currently be done for MachO targets since they rely on a fast-path .debug_frame equivalent that can only encode callee-save registers as pairs. Reviewers: t.p.northover, rengolin, mcrosier, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17000 llvm-svn: 260689
* [Hexagon] Handle out-of-range offsets in eliminateFrameIndexKrzysztof Parzyszek2016-02-121-12/+15
| | | | | | | Create a virtual register that will hold the actual address and use it with the offset of 0 in the place of the original FI. llvm-svn: 260688
* [AArch64] Add support for Qualcomm Kryo CPU.Chad Rosier2016-02-128-5/+2506
| | | | | | Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686
* [AArch64] Merge two adjacent str WZR into str XZRJun Bum Lim2016-02-121-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change merges adjacent 32 bit zero stores into a 64 bit zero store. e.g., str wzr, [x0] str wzr, [x0, #4] becomes str xzr, [x0] Therefore, four adjacent 32 bit zero stores will be a single stp. e.g., str wzr, [x0] str wzr, [x0, #4] str wzr, [x0, #8] str wzr, [x0, #12] becomes stp xzr, xzr, [x0] Reviewers: mcrosier, jmolloy, gberry, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16933 llvm-svn: 260682
* [Hexagon] Specify vector alignment in DataLayout stringKrzysztof Parzyszek2016-02-121-7/+7
| | | | | | | | | | | The DataLayout can calculate alignment of vectors based on the alignment of the element type and the number of elements. In fact, it is the product of these two values. The problem is that for vectors of N x i1, this will return the alignment of N bytes, since the alignment of i1 is 8 bits. The vector types of vNi1 should be aligned to N bits instead. Provide explicit alignment for HVX vectors to avoid such complications. llvm-svn: 260678
* Fix uninitialized memory read.Benjamin Kramer2016-02-121-2/+2
| | | | | | Found by msan. llvm-svn: 260676
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-127-56/+102
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* C API: Remove LLVMGetDataLayout that was deprecated in 3.7Mehdi Amini2016-02-121-14/+0
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260657
* AMDGPU: Set element_size in private resource descriptorMatt Arsenault2016-02-127-2/+56
| | | | | | | | | Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651
* AMDGPU: Fix mishandling alignment when scalarizing vector loads/storesMatt Arsenault2016-02-121-2/+5
| | | | | | | I don't think this was causing any real problems, so I'm not sure how to test for this. llvm-svn: 260646
* AMDGPU: Initialize SILowerControlFlowMatt Arsenault2016-02-123-30/+43
| | | | llvm-svn: 260645
* AMDGPU: Remove trailing whitespaceMatt Arsenault2016-02-121-4/+4
| | | | llvm-svn: 260644
* [x86] simplify getZeroVector() ; NFCISanjay Patel2016-02-111-39/+20
| | | | | | | | | | | Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. See also: http://reviews.llvm.org/rL258833 http://reviews.llvm.org/rL260582 llvm-svn: 260609
* [AArch64] Implements the lowering of formal arguments for GlobalISel.Quentin Colombet2016-02-112-0/+53
| | | | | | | | | | | | | | | | This is just a trivial implementation: - Support only arguments passed in registers. - Support only "plain" arguments, i.e., no sext/zext attribute. At this point, it is possible to play with the IRTranslator on AArch64: llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel For now, we only support the translation of program with adds and returns. Follow-up patches are on their way to add a test case (the MIRParser is not ready as it is). llvm-svn: 260600
* AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRsTom Stellard2016-02-115-0/+72
| | | | | | | | | | | | | | | | Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
* AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklistTom Stellard2016-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: When we split SMRD instructions into two MUBUFs we were adding the users of the newly created MUBUFs to the VALU worklist. However, the only users these instructions had was the REG_SEQUENCE that was inserted by splitSMRD when the original SMRD instruction was split. We need to make sure to add the users of the original SMRD to the VALU worklist before it is split. I have a test case, but it requires one other bug fix, so it will be added in a later commt. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17101 llvm-svn: 260588
* [WebAssembly] Reformat WebAssemblyFrameLowering and WebAssemblyISelLoweringDerek Schuff2016-02-114-82/+76
| | | | | | | | | | Reviewers: sunfish, jfb Subscribers: jfb, dschuff Differential Revision: http://reviews.llvm.org/D17156 llvm-svn: 260585
OpenPOWER on IntegriCloud