| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.
I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.
Differential Revision: https://reviews.llvm.org/D43608
llvm-svn: 335173
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.
On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits". Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.
This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register. The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.
I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2. Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.
Differential Revision: https://reviews.llvm.org/D48225
llvm-svn: 335113
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
First off: i do not have any access to that processor,
so this is purely theoretical, no benchmarks.
I have been looking into b**d**ver2 scheduling profile, and while cross-referencing
the existing b**t**ver2, znver1 profiles, and the reference docs
(`Software Optimization Guide for AMD Family {15,16,17}h Processors`),
i have noticed that only b**t**ver2 scheduling profile specifies these.
Also, there is no mca test coverage.
Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb
Reviewed By: GGanesh
Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D47676
llvm-svn: 335099
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB.
Atom is from Agner and SLM is a guess.
I've left AMD processors alone.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48079
llvm-svn: 335097
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
two DenseMaps for lookups
Summary:
After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures.
Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table.
I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48312
llvm-svn: 335092
|
| |
|
|
|
|
|
|
|
|
|
|
| |
insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.
This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.
llvm-svn: 335076
|
| |
|
|
|
|
|
|
|
|
|
|
| |
std::call_once
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.
So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.
Differential Revision: https://reviews.llvm.org/D48194
llvm-svn: 335064
|
| |
|
|
|
|
|
|
| |
ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.
llvm-svn: 335062
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.
Differential Revision: https://reviews.llvm.org/D45203
llvm-svn: 335037
|
| |
|
|
| |
llvm-svn: 335026
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.
Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.
Finally, remove the manual table from the emitter.
This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.
llvm-svn: 335018
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.
The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.
This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.
This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.
llvm-svn: 335017
|
| |
|
|
|
|
| |
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.
llvm-svn: 335015
|
| |
|
|
|
|
| |
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?
llvm-svn: 334994
|
| |
|
|
| |
llvm-svn: 334990
|
| |
|
|
|
|
|
| |
the individual stub creation to increase readability a bit in the
non-object file format specific function.
llvm-svn: 334989
|
| |
|
|
|
|
|
|
| |
exclusive conditions.
else if would have worked just as well, but this keeps the original readability a bit more clear.
llvm-svn: 334988
|
| |
|
|
|
|
|
|
| |
the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.
llvm-svn: 334971
|
| |
|
|
|
|
|
|
| |
Jaguar only supports up to AVX1
Differential Revision: https://reviews.llvm.org/D48274
llvm-svn: 334947
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Noop certainly does not use resources.
Reviewers: RKSimon, craig.topper, andreadb
Subscribers: gbedwell, llvm-commits, gchatelet
Differential Revision: https://reviews.llvm.org/D48028
llvm-svn: 334927
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
the heap. NFCI
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.
Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.
This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.
This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.
llvm-svn: 334925
|
| |
|
|
|
|
|
|
|
|
| |
encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.
Also remove the vpextrw.s EVEX alias. That's not something gas implements.
llvm-svn: 334922
|
| |
|
|
|
|
|
|
|
|
| |
with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.
Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.
llvm-svn: 334920
|
| |
|
|
|
|
|
|
| |
instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.
llvm-svn: 334915
|
| |
|
|
|
|
|
|
|
|
| |
simplify the hasSingleUseFromRoot handling.
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.
This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.
llvm-svn: 334908
|
| |
|
|
|
|
|
|
| |
tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.
llvm-svn: 334898
|
| |
|
|
|
|
|
|
| |
parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.
llvm-svn: 334897
|
| |
|
|
|
|
|
|
|
|
| |
instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.
VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.
llvm-svn: 334896
|
| |
|
|
|
|
| |
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.
llvm-svn: 334867
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Complementary patch to lowering sqrt intrinsics in Clang.
Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, mike.dvoretsky, llvm-commits
Differential Revision: https://reviews.llvm.org/D41599
llvm-svn: 334849
|
| |
|
|
|
|
| |
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.
llvm-svn: 334848
|
| |
|
|
|
|
|
|
| |
have an undefined register update."
There's a typo causing the build to fail.
llvm-svn: 334803
|
| |
|
|
|
|
|
|
| |
register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.
llvm-svn: 334802
|
| |
|
|
|
|
|
|
|
|
| |
autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.
There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.
llvm-svn: 334800
|
| |
|
|
|
|
| |
consistency.
llvm-svn: 334785
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().
We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly
are not worthwhile for AVX1.
I think in most cases we are able to recover by converting
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.
llvm-svn: 334759
|
| |
|
|
| |
llvm-svn: 334758
|
| |
|
|
|
|
|
|
| |
autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.
llvm-svn: 334728
|
| |
|
|
|
|
| |
be consistent with other scalar instructions.
llvm-svn: 334727
|
| |
|
|
|
|
|
|
| |
would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.
llvm-svn: 334726
|
| |
|
|
|
|
| |
Some of these instructions are already in the manual folding table so we should have them in the auto table too.
llvm-svn: 334725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
llvm-svn: 334685
|
| |
|
|
|
|
|
|
| |
load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.
llvm-svn: 334648
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shortcoming was noted in D47330, and the test diffs show we already
had other examples where we failed to fold to a SHRUNKBLEND:
/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.
This patch implements an idea from D48043 and would obsolete that patch
because it catches more cases (notable the AVX1 case that was missed there).
All we're doing is allowing the existing transform to fire more often by
removing the post-legalize constraint. All of the relevant feature checks
and other predicates are left as-is.
Differential Revision: https://reviews.llvm.org/D48078
llvm-svn: 334592
|
| |
|
|
|
|
| |
intrinsics. Use select in IR instead.
llvm-svn: 334576
|
| |
|
|
|
|
|
|
| |
NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.
llvm-svn: 334563
|
| |
|
|
| |
llvm-svn: 334562
|
| |
|
|
| |
llvm-svn: 334529
|
| |
|
|
|
|
|
|
|
|
|
|
| |
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.
Patch by Chris January
Differential Revision: https://reviews.llvm.org/D47783
llvm-svn: 334523
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:
leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc
GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.
Reviewers: javed.absar, echristo
Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar
Differential Revision: https://reviews.llvm.org/D47507
llvm-svn: 334515
|