| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999
Differential Revision: https://reviews.llvm.org/D45084
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329187
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
They were added in r285274, in what looks like a merge mishap.
AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that
revision.
Also sort the tablegen lines to make this easier to spot in the future.
llvm-svn: 329178
 | 
| | 
| 
| 
|  | 
llvm-svn: 329170
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.
This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.
Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.
v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI
Change-Id: I099f309e0a394082a5901ea196c3967afb867f04
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44939
llvm-svn: 329166
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.
There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".
Fixes a bug encountered in Nier: Automata.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40547
llvm-svn: 329164
 | 
| | 
| 
| 
| 
| 
|  | 
Differential Revision: https://reviews.llvm.org/D44573
llvm-svn: 329163
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Patch https://reviews.llvm.org/D44467 implements conversion of invalid
vmov instructions into valid ones. It turned out that some valid
instructions also get converted, for example
  vmov.i64 d2, #0xff00ff00ff00ff00 ->
  vmov.i16 d2, #0xff00
Such behavior is incorrect because according to the ARM ARM section
F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD
instructions, "On assembly, the data type must be matched in the table
if possible."
This patch fixes the isNEONmovReplicate check so that the above
instruction is not modified any more.
Reviewers: rengolin, olista01
Reviewed By: rengolin
Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D44678
llvm-svn: 329158
 | 
| | 
| 
| 
| 
| 
|  | 
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.
llvm-svn: 329154
 | 
| | 
| 
| 
|  | 
llvm-svn: 329153
 | 
| | 
| 
| 
| 
| 
|  | 
UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC
llvm-svn: 329152
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
X86ISelDAGToDAG.cpp. NFC
These are promoted to i16/i32 multiplies by a DAG combine.
llvm-svn: 329147
 | 
| | 
| 
| 
|  | 
llvm-svn: 329146
 | 
| | 
| 
| 
|  | 
llvm-svn: 329140
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.
Reviewers: pcc, vitalybuka
Reviewed By: pcc
Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44802
llvm-svn: 329139
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Attribute::NoRedZone
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
llvm-svn: 329134
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
llvm-svn: 329131
 | 
| | 
| 
| 
| 
| 
|  | 
Fix typo and simplify matching expression.
llvm-svn: 329130
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Move the check canPeel() to Hexagon Target before setting PeelCount.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329129
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.
This was committed by mistake.
llvm-svn: 329119
 | 
| | 
| 
| 
|  | 
llvm-svn: 329114
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Specifying the HVX vector length should be done via the -mhvx-length
option.
llvm-svn: 329079
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
 - The total number of physical registers.
 - Which target registers are accessible through the register file.
 - The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
  def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes.  An empty set of
register classes means: this register file models all the registers specified by
the Target.  For each register class, users can specify an optional register
cost. By default, register costs default to 1.  A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
 1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
 2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
 3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
 | 
| | 
| 
| 
| 
| 
|  | 
Reorder entries added in my previous commit (rL328969) to keep alphabetical order.
llvm-svn: 329064
 | 
| | 
| 
| 
| 
| 
|  | 
TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4.
llvm-svn: 329049
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
  (1). In code, just swap is not enough, ConditionalCode CC
       should also be swapped, otherwise incorrect code will
       be generated.
  (2). The ConditionalCode swap should be subject to
       getHasJmpExt(). If getHasJmpExt() is False, certain
       conditional codes will not be supported and swap
       may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847
Differential Revision: https://reviews.llvm.org/D45097
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Fixed a bug which caused Tablegen crash.
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
 | 
| | 
| 
| 
|  | 
llvm-svn: 328981
 | 
| | 
| 
| 
|  | 
llvm-svn: 328978
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837
Differential Revision: https://reviews.llvm.org/D45085
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
P9InstrResources.td
This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td.
llvm-svn: 328969
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.
llvm-svn: 328962
 | 
| | 
| 
| 
| 
| 
|  | 
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/
llvm-svn: 328961
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.
llvm-svn: 328960
 | 
| | 
| 
| 
| 
| 
|  | 
It was being inadvertently defaulted to an FADD scheduler class.
llvm-svn: 328959
 | 
| | 
| 
| 
| 
| 
|  | 
versions.
llvm-svn: 328958
 | 
| | 
| 
| 
|  | 
llvm-svn: 328956
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
prefixed with HW.
The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model.
llvm-svn: 328955
 | 
| | 
| 
| 
|  | 
llvm-svn: 328954
 | 
| | 
| 
| 
| 
| 
|  | 
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This register is reserved as a platform register on Fuchsia.
Differential Revision: https://reviews.llvm.org/D45105
llvm-svn: 328950
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.
llvm-svn: 328946
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
 |