summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to ↵Farhana Aleen2018-04-031-1/+1
| | | | | | | | | | | | | | | | min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131
* [AArch64] Adjust the cost model for Exynos M3Evandro Menezes2018-04-031-12/+12
| | | | | | Fix typo and simplify matching expression. llvm-svn: 329130
* [Hexagon] peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-1/+2
| | | | | | | | Move the check canPeel() to Hexagon Target before setting PeelCount. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329129
* [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfoJessica Paquette2018-04-033-7/+27
| | | | | | | | | | | | This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120
* Revert "MSG"Farhana Aleen2018-04-031-1/+1
| | | | | | | | This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119
* MSGFarhana Aleen2018-04-031-1/+1
| | | | llvm-svn: 329114
* [CodeGen]Add NoVRegs property on PostRASink and ShrinkWrapJun Bum Lim2018-04-032-3/+4
| | | | | | | | | | | | | | | | | Summary: This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs property, so now the MachineFunctionPass can enforce this check. These passes are disabled in NVPTX & WebAssembly. Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier Reviewed By: dschuff, thegameg Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D45183 llvm-svn: 329095
* [Hexagon] Remove -mhvx-double and the corresponding subtarget featureKrzysztof Parzyszek2018-04-033-50/+32
| | | | | | | Specifying the HVX vector length should be done via the -mhvx-length option. llvm-svn: 329079
* [MC][Tablegen] Allow the definition of processor register files in the ↵Andrea Di Biagio2018-04-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
* [PowerPC] reorder entries in P9InstrResources.td in alphabetical order; NFCHiroshi Inoue2018-04-031-1/+1
| | | | | | Reorder entries added in my previous commit (rL328969) to keep alphabetical order. llvm-svn: 329064
* [X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCICraig Topper2018-04-033-33/+37
| | | | | | TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4. llvm-svn: 329049
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-0/+7
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16Dmitry Preobrazhensky2018-04-021-0/+8
| | | | | | | | | See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | | | Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983
* [Hexagon] Clean up some code in HexagonAsmPrinter, NFCKrzysztof Parzyszek2018-04-022-48/+53
| | | | llvm-svn: 328981
* Revert r328975, it makes TableGen assert on the bots.Nico Weber2018-04-024-204/+1
| | | | llvm-svn: 328978
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-04-024-0/+729
| | | | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973
* [PowerPC] fix assertion failure due to missing instruction in ↵Hiroshi Inoue2018-04-021-8/+4
| | | | | | | | P9InstrResources.td This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 328969
* [X86][Silvermont] Use correct latency and throughput information for divide ↵Craig Topper2018-04-021-0/+115
| | | | | | | | and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962
* [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.Craig Topper2018-04-021-29/+28
| | | | | | Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961
* [X86] Correct the throughput for divide instructions in Sandy ↵Craig Topper2018-04-025-220/+318
| | | | | | | | Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960
* [X86] Fix the SchedRW for AVX512 shift instructions.Craig Topper2018-04-022-8/+14
| | | | | | It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959
* [X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX ↵Craig Topper2018-04-021-29/+19
| | | | | | versions. llvm-svn: 328958
* [X86] Add an itinerary to BTR64rr.Craig Topper2018-04-021-1/+2
| | | | llvm-svn: 328956
* [X86] Make sure all the classes declare in the Haswell scheduler model are ↵Craig Topper2018-04-021-66/+66
| | | | | | | | prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955
* [X86] Give VINSERTPS the same intinerary as INSERTPS.Craig Topper2018-04-021-3/+4
| | | | llvm-svn: 328954
* [X86] Cleanup ADCX/ADOX instruction definitions.Craig Topper2018-04-011-29/+45
| | | | | | Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952
* [AArch64] Reserve x18 register on FuchsiaPetr Hosek2018-04-011-2/+2
| | | | | | | | This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950
* [X86] Give ADC8/16/32/64mi the same scheduling information as ↵Craig Topper2018-04-014-22/+10
| | | | | | | | ADC8/16/32/64mr and SBB8/16/32/64mi. It doesn't make a lot of sense that it would be different. llvm-svn: 328946
* [x86] Correct the operand structure of the ADOX instruction.Chandler Carruth2018-04-011-19/+11
| | | | | | | | | | | | | | | This also moves to define it in the same way as ADCX which seems to use constraints a bit better. This is pulled out of the review for reducing the use of popf for restoring EFLAGS, but is independent. There are still more problems with our definitions for these instructions that Craig is going to look at but this is at least less broken and he can start from this to improve them more fully. Thanks to Craig for the review here. llvm-svn: 328945
* [x86] Expose more of the condition conversion routines in the public APIChandler Carruth2018-04-012-7/+13
| | | | | | | | for X86's instruction information. I've now got a second patch under review that needs these same APIs. This bit is nicely orthogonal and obvious, so landing it. NFC. llvm-svn: 328944
* AMDGPU: Make isIntrinsicSourceOfDivergence table-drivenNicolai Haehnle2018-04-012-47/+63
| | | | | | | | | | | | | | | | Summary: This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44938 llvm-svn: 328939
* AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsicsNicolai Haehnle2018-04-016-214/+91
| | | | | | | | | | | | | | | | | | Summary: Avoids having to list all intrinsics manually. This is in preparation for the new dimension-aware image intrinsics, which I'd rather not have to list here by hand. Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44937 llvm-svn: 328938
* [X86] Don't check for folding into a store when deciding if we can promote ↵Craig Topper2018-04-011-2/+4
| | | | | | | | an i16 mul. There's no RMW mul operation. llvm-svn: 328931
* [X86] Check if the load and store are to the same pointer before preventing ↵Craig Topper2018-04-011-3/+14
| | | | | | i16 RMW shifts and subtracts from being promoted. llvm-svn: 328930
* [X86] Allow i16 subtracts to be promoted if the load is on the LHS and its ↵Craig Topper2018-04-011-4/+4
| | | | | | not being stored. llvm-svn: 328928
* [X86] Remove unneeded temporary variable. NFCCraig Topper2018-04-011-6/+2
| | | | | | This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false. llvm-svn: 328926
* [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entriesSimon Pilgrim2018-03-311-2/+2
| | | | llvm-svn: 328918
* Fix trailing whitespace. NFCI.Simon Pilgrim2018-03-311-1/+1
| | | | llvm-svn: 328917
* [X86] Add SchedRW for PMULLDCraig Topper2018-03-3111-46/+11
| | | | | | | | | | | | | | | | | | | Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914
* Fix a bunch of typoes. NFCFangrui Song2018-03-306-10/+9
| | | | llvm-svn: 328907
* [WebAssembly] Register wasm passes with the PassRegistryJacob Gravelle2018-03-3021-4/+107
| | | | | | | | | | | | | | Summary: This exposes WebAssembly passes for use on the command line (as arguments to -print-before and the like). Reviewers: dschuff, sunfish Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D45103 llvm-svn: 328901
* [Hexagon] Reduce excessive indentation in .s outputKrzysztof Parzyszek2018-03-302-16/+8
| | | | llvm-svn: 328898
* [Hexagon] Avoid creating invalid offsets in packetizerKrzysztof Parzyszek2018-03-301-0/+3
| | | | | | | | | | Two memory instructions with a dependency only on the address register between the two (the first one of them being post-incrememnt) can be packetized together after the offset on the second was updated to the incremement value. Make sure that the new offset is valid for the instruction. llvm-svn: 328897
* [X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts andAndrea Di Biagio2018-03-301-8/+17
| | | | | | | | | VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892
* [X86][BtVer2] Fix the number of uOps for horizontal operations.Andrea Di Biagio2018-03-301-0/+2
| | | | llvm-svn: 328886
* [NVPTX] Enable StructuredCFG for NVPTXTim Shen2018-03-301-0/+10
| | | | | | | | | | | | | | | | | Summary: Make NVPTX require structured CFG. Added a temporary flag to "roll back" the behavior for easy deployment. Combined with D45008, this fixes several internal Nvidia GPU test failures that we suspect to be ptxas miscompiles (PR27738). Reviewers: jlebar Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D45070 llvm-svn: 328885
* [WebAssembly] Refactor tablegen for store instructions (NFC)Derek Schuff2018-03-301-194/+115
| | | | | | | | Summary: Add patterns similar to loads. Differential Revision: https://reviews.llvm.org/D45064 llvm-svn: 328876
OpenPOWER on IntegriCloud