| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
llvm-svn: 329131
|
| |
|
|
|
|
| |
Fix typo and simplify matching expression.
llvm-svn: 329130
|
| |
|
|
|
|
|
|
| |
Move the check canPeel() to Hexagon Target before setting PeelCount.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329129
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
|
| |
|
|
|
|
|
|
| |
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.
This was committed by mistake.
llvm-svn: 329119
|
| |
|
|
| |
llvm-svn: 329114
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
|
| |
|
|
|
|
|
| |
Specifying the HVX vector length should be done via the -mhvx-length
option.
llvm-svn: 329079
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
|
| |
|
|
|
|
| |
Reorder entries added in my previous commit (rL328969) to keep alphabetical order.
llvm-svn: 329064
|
| |
|
|
|
|
| |
TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4.
llvm-svn: 329049
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
(1). In code, just swap is not enough, ConditionalCode CC
should also be swapped, otherwise incorrect code will
be generated.
(2). The ConditionalCode swap should be subject to
getHasJmpExt(). If getHasJmpExt() is False, certain
conditional codes will not be supported and swap
may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
|
| |
|
|
|
|
|
|
|
|
| |
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
|
| |
|
|
|
|
|
|
|
| |
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847
Differential Revision: https://reviews.llvm.org/D45097
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
|
| |
|
|
|
|
|
|
|
|
|
| |
Fixed a bug which caused Tablegen crash.
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
|
| |
|
|
| |
llvm-svn: 328981
|
| |
|
|
| |
llvm-svn: 328978
|
| |
|
|
|
|
|
|
|
| |
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
|
| |
|
|
|
|
|
|
| |
P9InstrResources.td
This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td.
llvm-svn: 328969
|
| |
|
|
|
|
|
|
| |
and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.
llvm-svn: 328962
|
| |
|
|
|
|
| |
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/
llvm-svn: 328961
|
| |
|
|
|
|
|
|
| |
Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.
llvm-svn: 328960
|
| |
|
|
|
|
| |
It was being inadvertently defaulted to an FADD scheduler class.
llvm-svn: 328959
|
| |
|
|
|
|
| |
versions.
llvm-svn: 328958
|
| |
|
|
| |
llvm-svn: 328956
|
| |
|
|
|
|
|
|
| |
prefixed with HW.
The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model.
llvm-svn: 328955
|
| |
|
|
| |
llvm-svn: 328954
|
| |
|
|
|
|
| |
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
|
| |
|
|
|
|
|
|
| |
This register is reserved as a platform register on Fuchsia.
Differential Revision: https://reviews.llvm.org/D45105
llvm-svn: 328950
|
| |
|
|
|
|
|
|
| |
ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.
llvm-svn: 328946
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
|
| |
|
|
|
|
|
|
| |
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Avoids having to list all intrinsics manually.
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44937
llvm-svn: 328938
|
| |
|
|
|
|
|
|
| |
an i16 mul.
There's no RMW mul operation.
llvm-svn: 328931
|
| |
|
|
|
|
| |
i16 RMW shifts and subtracts from being promoted.
llvm-svn: 328930
|
| |
|
|
|
|
| |
not being stored.
llvm-svn: 328928
|
| |
|
|
|
|
| |
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.
llvm-svn: 328926
|
| |
|
|
| |
llvm-svn: 328918
|
| |
|
|
| |
llvm-svn: 328917
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
|
| |
|
|
| |
llvm-svn: 328907
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This exposes WebAssembly passes for use on the command line (as
arguments to -print-before and the like).
Reviewers: dschuff, sunfish
Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D45103
llvm-svn: 328901
|
| |
|
|
| |
llvm-svn: 328898
|
| |
|
|
|
|
|
|
|
|
| |
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.
llvm-svn: 328897
|
| |
|
|
|
|
|
|
|
| |
VSQRT instructions.
There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.
llvm-svn: 328892
|
| |
|
|
| |
llvm-svn: 328886
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make NVPTX require structured CFG. Added a temporary flag to
"roll back" the behavior for easy deployment.
Combined with D45008, this fixes several internal Nvidia GPU test
failures that we suspect to be ptxas miscompiles (PR27738).
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45070
llvm-svn: 328885
|
| |
|
|
|
|
|
|
| |
Summary: Add patterns similar to loads.
Differential Revision: https://reviews.llvm.org/D45064
llvm-svn: 328876
|