summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] FP16 VSEL codegenSjoerd Meijer2018-04-111-4/+10
| | | | | | | | | | | | | This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788
* [AArch64][AsmParser] Unify code for parsing Neon/SVE vectors.Sander de Smalen2018-04-112-147/+161
| | | | | | | | | | | | | | | | | | | | | | Summary: Merged 'tryMatchVectorRegister' (specific to Neon) and 'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and created a generic 'parseVectorKind()' function that returns the #Elements and ElementWidth of a vector suffix. This reduces the duplication of this functionality between two the vector implementations. This is patch [1/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45427 llvm-svn: 329782
* [X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace ↵Craig Topper2018-04-111-12/+4
| | | | | | | | 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774
* [X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit ↵Craig Topper2018-04-111-3/+22
| | | | | | | | | | an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771
* GOTPCREL references must always use RIP.Sriraman Tallam2018-04-102-3/+9
| | | | | | | | With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765
* AMDGPU: enable 128-bit for local addr space under an optionMarek Olsak2018-04-105-12/+17
| | | | | | | | | | | | | | | | | | | Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764
* [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.Geoff Berry2018-04-101-0/+18
| | | | | | | | | | | | | | | | Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761
* Recommit r329716 "Add missing nullptr check before getSection() to ↵Jessica Paquette2018-04-101-0/+9
| | | | | | | | | | | | | AArch64MachObjectWriter::recordRelocation" This commit fixes the bot failures that were coming up before with r329716. The fix was to move the check for "isInSection()" inside of the if condition and emit the error there instead of waiting to get past the unreachable statement. This should work in debug and release builds now. llvm-svn: 329746
* [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.Amara Emerson2018-04-101-0/+2
| | | | | | rdar://39175175 llvm-svn: 329743
* [X86] Split up -march=icelake to -client & -serverGabor Buella2018-04-102-6/+16
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742
* [X86] Change the name string for the newly add DF flag register to 'dirflag' ↵Craig Topper2018-04-101-1/+1
| | | | | | | | | | to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734
* Revert 329716 "Add missing nullptr check before getSection() to ↵Jessica Paquette2018-04-101-2/+1
| | | | | | | | AArch64MachObjectWriter::recordRelocation" This broke a bunch of bots so I'm reverting while I figure it out. llvm-svn: 329728
* Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."Peter Collingbourne2018-04-102-24/+17
| | | | | | Caused a build failure in check-tsan. llvm-svn: 329718
* Add missing nullptr check to AArch64MachObjectWriter::recordRelocationJessica Paquette2018-04-101-1/+2
| | | | | | | | | | | There was missing nullptr check before a call to getSection() in recordRelocation. This would result in a segfault in code like the attached test. This adds the missing check and a test which makes sure we get the expected error output. llvm-svn: 329716
* AMDGPU/MC: Allow disassembling without symbol infoNicolai Haehnle2018-04-101-0/+3
| | | | | | | | | | | | | | | | | | | | | Summary: We would like the UMR debugging tool[0] to be able to provide disassembly for currently live waves based on plain memory dumps, and we want to leverage the LLVM disassembler for this. This mostly works, except that UMR clearly can't provide real symbol info, so it wants to set DisInfo == nullptr. [0] https://cgit.freedesktop.org/amd/umr/ Reviewers: arsenm, rampitec, artem.tamazov, dp Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45477 Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6 llvm-svn: 329715
* Fix spelling. NFC.Chad Rosier2018-04-101-2/+2
| | | | llvm-svn: 329709
* Fix whitespace indentation. NFCI.Simon Pilgrim2018-04-101-1/+1
| | | | llvm-svn: 329704
* [X86] Disable SGX for Skylake ServerGabor Buella2018-04-101-3/+4
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700
* [AArch64] Use FP to access the emergency spill slotFrancis Visoiu Mistrih2018-04-101-10/+28
| | | | | | | | | | | | | | | | | | | | | In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-04-101-2/+6
| | | | | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690
* AArch64: diagnose unpredictable store-exclusive instructionsTim Northover2018-04-101-0/+32
| | | | | | | | Much like any written register in load/store instructions, the status register is not allowed to overlap with any others. So diagnose it like we already do with the other cases. llvm-svn: 329687
* [X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources.Andrea Di Biagio2018-04-101-1/+1
| | | | | | | | | | | | | | | | | | The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686
* [AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by ↵Sander de Smalen2018-04-102-0/+53
| | | | | | | | | | | | | | immediate) instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45371 llvm-svn: 329681
* [MC][TableGen] Add optional libpfm counter names for ProcResUnits.Clement Courbet2018-04-102-0/+77
| | | | | | | | | | | | | | | | Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675
* [AArch64][SVE] Asm: Add support for SVE INDEX instructions.Sander de Smalen2018-04-104-0/+120
| | | | | | | | | | | | Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45370 llvm-svn: 329674
* [x86] Model the direction flag (DF) separately from the rest of EFLAGS.Chandler Carruth2018-04-107-51/+76
| | | | | | | | | | | | | | | | | | | | | This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673
* [X86] Prevent folding loads with 64-bit ANDs with immediates that fit in ↵Craig Topper2018-04-101-1/+12
| | | | | | | | | | | | 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667
* [x86] Introduce a pass to begin more systematically fixing PR36028 and ↵Chandler Carruth2018-04-107-117/+747
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657
* ShadowCallStack/x86_64: Ignore pseudo-machine instructionsVlad Tsyrklevich2018-04-101-1/+2
| | | | llvm-svn: 329656
* [globalisel][legalizerinfo] Add support for the Lower action in ↵Daniel Sanders2018-04-091-10/+6
| | | | | | | | | | | | | | | getActionDefinitionsBuilder() and use it in AArch64. Lower is slightly odd. It often doesn't change the type but the lowerings do use the new type to decide what code to create. Treat it like a mutation but provide convenience functions that re-use the existing type. Re-uses the existing tests: test/CodeGen/AArch64/GlobalISel/legalize-rem.mir test/CodeGen/AArch64/GlobalISel//legalize-mul.mir test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir llvm-svn: 329623
* AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel headerKonstantin Zhuravlyov2018-04-094-10/+10
| | | | | | | | | | | 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620
* [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types ↵Craig Topper2018-04-091-7/+22
| | | | | | | | | | | | | | on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616
* AArch64: Allow offsets to be folded into addresses with ELF.Peter Collingbourne2018-04-092-17/+24
| | | | | | | | | | | | | | | This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611
* Revert "AMDGPU: enable 128-bit for local addr space under an option"Alex Shlyapnikov2018-04-095-17/+12
| | | | | | | | | | | | | | This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610
* [WebAssembly] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-091-10/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: sunfish, RKSimon Reviewed By: sunfish Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D44873 llvm-svn: 329607
* [X86] Revert the SLM part of r328914.Craig Topper2018-04-091-1/+3
| | | | | | While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593
* AMDGPU: enable 128-bit for local addr space under an optionMarek Olsak2018-04-095-12/+17
| | | | | | | | | | | | | | | | | Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591
* AMDGPU: Initialize GlobalISel passesTom Stellard2018-04-091-0/+1
| | | | | | | | | | | | | | Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 llvm-svn: 329588
* [X86][MMX] Fix missing itinerary for PALIGNRSimon Pilgrim2018-04-091-4/+4
| | | | llvm-svn: 329568
* [X86][MMX] Fix missing itinerary for MOVQ2DQ instruction formatSimon Pilgrim2018-04-091-1/+1
| | | | llvm-svn: 329567
* [X86][MMX] Fix missing itinerary for CVTPI2PSSimon Pilgrim2018-04-091-4/+4
| | | | llvm-svn: 329565
* [AMDGPU][MC][GFX9] Added instructions s_mul_hi_*32, s_lshl*_add_u32Dmitry Preobrazhensky2018-04-091-0/+21
| | | | | | | | | | | See bugs 36841: https://bugs.llvm.org/show_bug.cgi?id=36841 36842: https://bugs.llvm.org/show_bug.cgi?id=36842 Differential Revision: https://reviews.llvm.org/D45251 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329562
* [X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINSSimon Pilgrim2018-04-091-1/+1
| | | | | | The RR/RM itineraries were the wrong way around llvm-svn: 329561
* [X86][SSE] Fix f32 mul/div itinerary groups typoSimon Pilgrim2018-04-091-4/+4
| | | | | | The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556
* [NFC] fix trivial typos in comments and error messageHiroshi Inoue2018-04-094-4/+4
| | | | | | "is is" -> "is", "are are" -> "are" llvm-svn: 329546
* [TargetSchedule] shrink interface for init(); NFCISanjay Patel2018-04-084-4/+4
| | | | | | | | | | The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540
* [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.Craig Topper2018-04-0810-76/+63
| | | | | | | | | | | | | | | | | | | Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539
* [X86][Znver1] Remove InstRWs for BLENDVPS/PDCraig Topper2018-04-081-12/+0
| | | | | | | | | | | | | | | | | Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538
* [PowerPC] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: hfinkel, RKSimon Reviewed By: RKSimon Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D44870 llvm-svn: 329535
* [X86] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534
OpenPOWER on IntegriCloud