summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Ensure there are enough registers for wave dispatchTim Renouf2018-04-111-0/+13
| | | | | | | | | | | | | | | | | Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Re-landed after noticing that the buildbot failure from 329808 seemed to be unrelated. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329826
* [FastISel] Disable local value sinking by defaultReid Kleckner2018-04-111-1/+8
| | | | | | | | | | | | | | | | | | This is causing compilation timeouts on code with long sequences of local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out that code coverage instrumentation is a great way to create sequences like this, which how our users ran into the issue in practice. Intel has a tool that detects these kinds of non-linear compile time issues, and Andy Kaylor reported it as PR37010. The current sinking code scans the whole basic block once per local value sink, which happens before emitting each call. In theory, local values should only be introduced to be used by instructions between the current flush point and the last flush point, so we should only need to scan those instructions. llvm-svn: 329822
* [InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse()Sanjay Patel2018-04-111-10/+10
| | | | llvm-svn: 329821
* [DWARFv5] Fuss with asm syntax for conveying MD5 checksum.Paul Robinson2018-04-112-28/+32
| | | | | | | | | | Previously the MD5 option of the .file directive provided the checksum as a quoted hex string; now it's a normal hex number with 0x prefix, same as the .octa directive accepts. Differential Revision: https://reviews.llvm.org/D45459 llvm-svn: 329820
* [MIPS GlobalISel] Select add i32, i32Petar Jovanovic2018-04-1111-13/+406
| | | | | | | | | | | | | Add the minimal support necessary to lower a function that returns the sum of two i32 values. Support argument/return lowering of i32 values through registers only. Add tablegen for regbankselect and instructionselect. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D44304 llvm-svn: 329819
* [AMDGPU] Fix lowering enqueue_kernelYaxun Liu2018-04-111-20/+30
| | | | | | | | | | | | | | | | | | Two issues were fixed: runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works. handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction. Differential Revision: https://reviews.llvm.org/D45187 llvm-svn: 329815
* Revert "[AMDGPU] Ensure there are enough registers for wave dispatch"Tim Renouf2018-04-111-13/+0
| | | | | | | | | This reverts 329808. That change caused a report of a failure in test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect it is an expensive-check-only error. Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0 llvm-svn: 329811
* [AArch64][AsmParser] Split index parsing from vector list.Sander de Smalen2018-04-111-27/+23
| | | | | | | | | | | | | | | | | | | | Summary: Place parsing of a vector index into a separate function to reduce duplication, since the code is duplicated in both the parsing of a Neon vector register operand and a Neon vector list. This is patch [2/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45428 llvm-svn: 329809
* [AMDGPU] Ensure there are enough registers for wave dispatchTim Renouf2018-04-111-0/+13
| | | | | | | | | | | | | | Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329808
* [X86] Add variable shuffle schedule classesSimon Pilgrim2018-04-1113-119/+55
| | | | | | | | | | | | | | Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806
* [AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32Dmitry Preobrazhensky2018-04-111-4/+30
| | | | | | | | | See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845 Differential Revision: https://reviews.llvm.org/D45443 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329801
* [AArch64] Fix regression after r329691Francis Visoiu Mistrih2018-04-111-1/+1
| | | | | | | | | | | | In r329691, we would choose FP even if the offset wouldn't fit, just because the offset is smaller than the one from BP. This made many accesses through FP need to scavenge a register, which resulted in slower and bigger code for no good reason. This patch now always picks the offset that fits first, even if FP is preferred. llvm-svn: 329797
* Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max.Artur Gainullin2018-04-111-0/+30
| | | | | | | | | | | | | | | | | | | | | Bitwise 'not' of the min/max could be eliminated in the pattern: %notx = xor i32 %x, -1 %cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y %smax = select i1 %cmp1, i32 %notx, i32 %y %res = xor i32 %smax, -1 https://rise4fun.com/Alive/lCN Reviewers: spatel Reviewed by: spatel Subscribers: a.elovikov, llvm-commits Differential Revision: https://reviews.llvm.org/D45317 llvm-svn: 329791
* [ARM] FP16 VSEL codegenSjoerd Meijer2018-04-111-4/+10
| | | | | | | | | | | | | This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788
* [AArch64][AsmParser] Unify code for parsing Neon/SVE vectors.Sander de Smalen2018-04-112-147/+161
| | | | | | | | | | | | | | | | | | | | | | Summary: Merged 'tryMatchVectorRegister' (specific to Neon) and 'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and created a generic 'parseVectorKind()' function that returns the #Elements and ElementWidth of a vector suffix. This reduces the duplication of this functionality between two the vector implementations. This is patch [1/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45427 llvm-svn: 329782
* [X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace ↵Craig Topper2018-04-112-12/+24
| | | | | | | | 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774
* [X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit ↵Craig Topper2018-04-111-3/+22
| | | | | | | | | | an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771
* Simplification of libcall like printf->puts must check for RtLibUseGOT metadata.Sriraman Tallam2018-04-101-0/+11
| | | | | | | | | | With -fno-plt, for example, calls to printf when getting converted to puts still use the PLT. This patch checks for the metadata "RtLibUseGOT" and annotates the declaration with the right attributes. Differential Revision: https://reviews.llvm.org/D45180 llvm-svn: 329768
* GOTPCREL references must always use RIP.Sriraman Tallam2018-04-102-3/+9
| | | | | | | | With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765
* AMDGPU: enable 128-bit for local addr space under an optionMarek Olsak2018-04-105-12/+17
| | | | | | | | | | | | | | | | | | | Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764
* [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.Geoff Berry2018-04-101-0/+18
| | | | | | | | | | | | | | | | Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761
* [CVP] simplify phi with constant incoming values that match common variable ↵Sanjay Patel2018-04-101-0/+60
| | | | | | | | | | | | | | | | | | | | | | | edge values This is based on an example that was recently posted on llvm-dev: void *propagate_null(void* b, int* g) { if (!b) { return 0; } (*g)++; return b; } https://godbolt.org/g/xYk3qG The original code or constant propagation in other passes has obscured the fact that the phi can be removed completely. Differential Revision: https://reviews.llvm.org/D45448 llvm-svn: 329755
* [Verifier] Refactor duplicate code for atomic mem intrinsic verification (NFC)Daniel Neilson2018-04-101-75/+12
| | | | | | | | | Summary: The verification rules for the intrinsics for atomic memcpy, atomic memmove, and atomic memset are basically code clones. This change merges their verification rules into a single block to remove duplication. llvm-svn: 329753
* [MachO] Emit Weak ReadOnlyWithRel to ConstDataSectionSteven Wu2018-04-102-3/+8
| | | | | | | | | | | | | | | | | Summary: Darwin dynamic linker can handle weak symbols in ConstDataSection. ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection instead of normal DataSection. rdar://problem/39298457 Reviewers: dexonsmith, kledzik Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45472 llvm-svn: 329752
* Recommit r329716 "Add missing nullptr check before getSection() to ↵Jessica Paquette2018-04-101-0/+9
| | | | | | | | | | | | | AArch64MachObjectWriter::recordRelocation" This commit fixes the bot failures that were coming up before with r329716. The fix was to move the check for "isInSection()" inside of the if condition and emit the error there instead of waiting to get past the unreachable statement. This should work in debug and release builds now. llvm-svn: 329746
* [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.Amara Emerson2018-04-101-0/+2
| | | | | | rdar://39175175 llvm-svn: 329743
* [X86] Split up -march=icelake to -client & -serverGabor Buella2018-04-102-6/+16
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742
* [InstSimplify] fix formatting; NFCSanjay Patel2018-04-101-7/+7
| | | | llvm-svn: 329736
* [X86] Change the name string for the newly add DF flag register to 'dirflag' ↵Craig Topper2018-04-101-1/+1
| | | | | | | | | | to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734
* [LLVM-C] Add Missing 'break's in InlineAsm bindingsRobert Widmann2018-04-101-0/+2
| | | | | | | | | | | | | | Summary: Noticed by Andrea Di Biagio while reviewing r329369 Reviewers: whitequark, harlanhaskins Reviewed By: harlanhaskins Subscribers: llvm-commits, abergmeier-dsfishlabs Differential Revision: https://reviews.llvm.org/D45496 llvm-svn: 329731
* Revert 329716 "Add missing nullptr check before getSection() to ↵Jessica Paquette2018-04-101-2/+1
| | | | | | | | AArch64MachObjectWriter::recordRelocation" This broke a bunch of bots so I'm reverting while I figure it out. llvm-svn: 329728
* [DebugInfoPDB] Add DIA implementations of findSymbolByRVA and findSymbolByAddrAaron Smith2018-04-102-9/+41
| | | | llvm-svn: 329724
* [CodeGen] Fix printing bundles in MIR outputKrzysztof Parzyszek2018-04-102-4/+7
| | | | | | | | | | | | | | | | Delay printing the newline until after the opening bracket was printed, e.g. BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } instead of BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } llvm-svn: 329719
* Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."Peter Collingbourne2018-04-102-24/+17
| | | | | | Caused a build failure in check-tsan. llvm-svn: 329718
* Add missing nullptr check to AArch64MachObjectWriter::recordRelocationJessica Paquette2018-04-101-1/+2
| | | | | | | | | | | There was missing nullptr check before a call to getSection() in recordRelocation. This would result in a segfault in code like the attached test. This adds the missing check and a test which makes sure we get the expected error output. llvm-svn: 329716
* AMDGPU/MC: Allow disassembling without symbol infoNicolai Haehnle2018-04-101-0/+3
| | | | | | | | | | | | | | | | | | | | | Summary: We would like the UMR debugging tool[0] to be able to provide disassembly for currently live waves based on plain memory dumps, and we want to leverage the LLVM disassembler for this. This mostly works, except that UMR clearly can't provide real symbol info, so it wants to set DisInfo == nullptr. [0] https://cgit.freedesktop.org/amd/umr/ Reviewers: arsenm, rampitec, artem.tamazov, dp Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45477 Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6 llvm-svn: 329715
* [PDB] Remove dead code and run clang format; NFCAaron Smith2018-04-102-32/+22
| | | | llvm-svn: 329712
* Fix spelling. NFC.Chad Rosier2018-04-101-2/+2
| | | | llvm-svn: 329709
* [CodeGen/Dwarf] Rename the "sizetype" synthetic type and add it to the ↵Pavel Labath2018-04-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | accelerator table Summary: This type is created on-demand and used as the base type for array ranges. Since it is "special", its construction did not go through the createTypeDIE function and so it was never inserted into the accelerator table, although it clearly belongs there. I add an explicit addAccelType call to insert it into the table. During review, we also decided to rename the type to something more unique to avoid confusion in case the user has own "sizetype" type. The new name for the type size __ARRAY_SIZE_TYPE__. Reviewers: JDevlieghere, aprantl, dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45445 llvm-svn: 329705
* Fix whitespace indentation. NFCI.Simon Pilgrim2018-04-101-1/+1
| | | | llvm-svn: 329704
* [X86] Disable SGX for Skylake ServerGabor Buella2018-04-101-3/+4
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700
* [DA] Improve alias checking in dependence analysisDavid Green2018-04-101-10/+37
| | | | | | | | | | | | Improve the alias analysis to account for cases where we know that src/dst pairs cannot alias due to things like TBAA. As we know they are noalias, we know no dependency can occur. Also fixes issues around the size parameter to AA being incorrect. Differential Revision: https://reviews.llvm.org/D42381 llvm-svn: 329692
* [AArch64] Use FP to access the emergency spill slotFrancis Visoiu Mistrih2018-04-101-10/+28
| | | | | | | | | | | | | | | | | | | | | In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-04-101-2/+6
| | | | | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690
* AArch64: diagnose unpredictable store-exclusive instructionsTim Northover2018-04-101-0/+32
| | | | | | | | Much like any written register in load/store instructions, the status register is not allowed to overlap with any others. So diagnose it like we already do with the other cases. llvm-svn: 329687
* [X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources.Andrea Di Biagio2018-04-101-1/+1
| | | | | | | | | | | | | | | | | | The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686
* [AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by ↵Sander de Smalen2018-04-102-0/+53
| | | | | | | | | | | | | | immediate) instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45371 llvm-svn: 329681
* [MC][TableGen] Add optional libpfm counter names for ProcResUnits.Clement Courbet2018-04-102-0/+77
| | | | | | | | | | | | | | | | Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675
* [AArch64][SVE] Asm: Add support for SVE INDEX instructions.Sander de Smalen2018-04-104-0/+120
| | | | | | | | | | | | Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45370 llvm-svn: 329674
* [x86] Model the direction flag (DF) separately from the rest of EFLAGS.Chandler Carruth2018-04-107-51/+76
| | | | | | | | | | | | | | | | | | | | | This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673
OpenPOWER on IntegriCloud