summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU][MC] Added support of 3-element addresses for MIMG instructionsDmitry Preobrazhensky2018-04-041-1/+9
| | | | | | | | | See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187
* Sort targetgen calls in lib/Target/*/CMakeLists.Nico Weber2018-04-0418-84/+92
| | | | | | | | | | | Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181
* Remove duplicate tablegen lines from AVR target.Nico Weber2018-04-041-6/+2
| | | | | | | | | | They were added in r285274, in what looks like a merge mishap. AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that revision. Also sort the tablegen lines to make this easier to spot in the future. llvm-svn: 329178
* Make helpers static. NFC.Benjamin Kramer2018-04-041-3/+3
| | | | llvm-svn: 329170
* AMDGPU: Dimension-aware image intrinsicsNicolai Haehnle2018-04-046-3/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166
* AMDGPU: Fix copying i1 value out of loop with non-uniform exitNicolai Haehnle2018-04-044-1/+103
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164
* [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)John Brawn2018-04-041-0/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163
* [ARM] Do not convert some vmov instructionsMikhail Maltsev2018-04-041-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch https://reviews.llvm.org/D44467 implements conversion of invalid vmov instructions into valid ones. It turned out that some valid instructions also get converted, for example vmov.i64 d2, #0xff00ff00ff00ff00 -> vmov.i16 d2, #0xff00 Such behavior is incorrect because according to the ARM ARM section F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD instructions, "On assembly, the data type must be matched in the table if possible." This patch fixes the isNEONmovReplicate check so that the above instruction is not modified any more. Reviewers: rengolin, olista01 Reviewed By: rengolin Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D44678 llvm-svn: 329158
* [X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ.Craig Topper2018-04-043-18/+10
| | | | | | These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154
* [X86] Use loadi16/loadi32 predicates in multiply patternsCraig Topper2018-04-041-9/+9
| | | | llvm-svn: 329153
* [X86] Remove more dead code left over from the handling of i8/i16 ↵Craig Topper2018-04-041-21/+0
| | | | | | UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152
* [X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from ↵Craig Topper2018-04-041-12/+0
| | | | | | | | X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147
* [X86] Remove some code that was only needed when i1 was a legal type. NFCCraig Topper2018-04-041-6/+0
| | | | llvm-svn: 329146
* Fix bad #include path in r329139Vlad Tsyrklevich2018-04-041-1/+1
| | | | llvm-svn: 329140
* Add the ShadowCallStack passVlad Tsyrklevich2018-04-044-0/+334
| | | | | | | | | | | | | | | | | | | | Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139
* [MachineOutliner] Test for X86FI->getUsesRedZone() as well as ↵Jessica Paquette2018-04-031-1/+5
| | | | | | | | | | | | | | | | Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function *truly* uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134
* [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to ↵Farhana Aleen2018-04-031-1/+1
| | | | | | | | | | | | | | | | min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131
* [AArch64] Adjust the cost model for Exynos M3Evandro Menezes2018-04-031-12/+12
| | | | | | Fix typo and simplify matching expression. llvm-svn: 329130
* [Hexagon] peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-1/+2
| | | | | | | | Move the check canPeel() to Hexagon Target before setting PeelCount. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329129
* [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfoJessica Paquette2018-04-033-7/+27
| | | | | | | | | | | | This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120
* Revert "MSG"Farhana Aleen2018-04-031-1/+1
| | | | | | | | This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119
* MSGFarhana Aleen2018-04-031-1/+1
| | | | llvm-svn: 329114
* [CodeGen]Add NoVRegs property on PostRASink and ShrinkWrapJun Bum Lim2018-04-032-3/+4
| | | | | | | | | | | | | | | | | Summary: This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs property, so now the MachineFunctionPass can enforce this check. These passes are disabled in NVPTX & WebAssembly. Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier Reviewed By: dschuff, thegameg Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D45183 llvm-svn: 329095
* [Hexagon] Remove -mhvx-double and the corresponding subtarget featureKrzysztof Parzyszek2018-04-033-50/+32
| | | | | | | Specifying the HVX vector length should be done via the -mhvx-length option. llvm-svn: 329079
* [MC][Tablegen] Allow the definition of processor register files in the ↵Andrea Di Biagio2018-04-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
* [PowerPC] reorder entries in P9InstrResources.td in alphabetical order; NFCHiroshi Inoue2018-04-031-1/+1
| | | | | | Reorder entries added in my previous commit (rL328969) to keep alphabetical order. llvm-svn: 329064
* [X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCICraig Topper2018-04-033-33/+37
| | | | | | TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4. llvm-svn: 329049
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-0/+7
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16Dmitry Preobrazhensky2018-04-021-0/+8
| | | | | | | | | See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | | | Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983
* [Hexagon] Clean up some code in HexagonAsmPrinter, NFCKrzysztof Parzyszek2018-04-022-48/+53
| | | | llvm-svn: 328981
* Revert r328975, it makes TableGen assert on the bots.Nico Weber2018-04-024-204/+1
| | | | llvm-svn: 328978
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-04-024-0/+729
| | | | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973
* [PowerPC] fix assertion failure due to missing instruction in ↵Hiroshi Inoue2018-04-021-8/+4
| | | | | | | | P9InstrResources.td This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 328969
* [X86][Silvermont] Use correct latency and throughput information for divide ↵Craig Topper2018-04-021-0/+115
| | | | | | | | and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962
* [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.Craig Topper2018-04-021-29/+28
| | | | | | Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961
* [X86] Correct the throughput for divide instructions in Sandy ↵Craig Topper2018-04-025-220/+318
| | | | | | | | Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960
* [X86] Fix the SchedRW for AVX512 shift instructions.Craig Topper2018-04-022-8/+14
| | | | | | It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959
* [X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX ↵Craig Topper2018-04-021-29/+19
| | | | | | versions. llvm-svn: 328958
* [X86] Add an itinerary to BTR64rr.Craig Topper2018-04-021-1/+2
| | | | llvm-svn: 328956
* [X86] Make sure all the classes declare in the Haswell scheduler model are ↵Craig Topper2018-04-021-66/+66
| | | | | | | | prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955
* [X86] Give VINSERTPS the same intinerary as INSERTPS.Craig Topper2018-04-021-3/+4
| | | | llvm-svn: 328954
* [X86] Cleanup ADCX/ADOX instruction definitions.Craig Topper2018-04-011-29/+45
| | | | | | Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952
* [AArch64] Reserve x18 register on FuchsiaPetr Hosek2018-04-011-2/+2
| | | | | | | | This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950
* [X86] Give ADC8/16/32/64mi the same scheduling information as ↵Craig Topper2018-04-014-22/+10
| | | | | | | | ADC8/16/32/64mr and SBB8/16/32/64mi. It doesn't make a lot of sense that it would be different. llvm-svn: 328946
* [x86] Correct the operand structure of the ADOX instruction.Chandler Carruth2018-04-011-19/+11
| | | | | | | | | | | | | | | This also moves to define it in the same way as ADCX which seems to use constraints a bit better. This is pulled out of the review for reducing the use of popf for restoring EFLAGS, but is independent. There are still more problems with our definitions for these instructions that Craig is going to look at but this is at least less broken and he can start from this to improve them more fully. Thanks to Craig for the review here. llvm-svn: 328945
* [x86] Expose more of the condition conversion routines in the public APIChandler Carruth2018-04-012-7/+13
| | | | | | | | for X86's instruction information. I've now got a second patch under review that needs these same APIs. This bit is nicely orthogonal and obvious, so landing it. NFC. llvm-svn: 328944
* AMDGPU: Make isIntrinsicSourceOfDivergence table-drivenNicolai Haehnle2018-04-012-47/+63
| | | | | | | | | | | | | | | | Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939
OpenPOWER on IntegriCloud