summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to iselCraig Topper2018-06-207-166/+264
| | | | | | | | | | I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173
* [llvm-mca][X86] Teach how to identify register writes that implicitly clear ↵Andrea Di Biagio2018-06-201-1/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113
* [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.Roman Lebedev2018-06-201-1/+26
| | | | | | | | | | | | | | | | | | | | | | | Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into b**d**ver2 scheduling profile, and while cross-referencing the existing b**t**ver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only b**t**ver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099
* [X86] Add sched class WriteLAHFSAHF and fix values.Clement Courbet2018-06-2011-16/+17
| | | | | | | | | | | | | | | Summary: I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB. Atom is from Agner and SLM is a guess. I've left AMD processors alone. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48079 llvm-svn: 335097
* [X86] Use binary search of the EVEX->VEX static tables instead of populating ↵Craig Topper2018-06-201-38/+29
| | | | | | | | | | | | | | | | | | | | | two DenseMaps for lookups Summary: After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures. Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table. I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time. Reviewers: RKSimon, spatel, chandlerc Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48312 llvm-svn: 335092
* [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename ↵Jessica Paquette2018-06-192-10/+2
| | | | | | | | | | | | insertOutlinerEpilogue insertOutlinerPrologue was not used by any target, and prologue-esque code was beginning to appear in insertOutlinerEpilogue. Refactor that into one function, buildOutlinedFrame. This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue. llvm-svn: 335076
* [X86] Initialize FMA3Info directly in its constructor instead of relying on ↵Craig Topper2018-06-192-27/+2
| | | | | | | | | | | | std::call_once FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object. So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff. Differential Revision: https://reviews.llvm.org/D48194 llvm-svn: 335064
* [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ↵Craig Topper2018-06-191-10/+10
| | | | | | | | ceil/floor/nearbyint/rint/trunc. Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE. llvm-svn: 335062
* [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patternsMikhail Dvoretckii2018-06-193-5/+118
| | | | | | | | | | | This patch handles back-end folding of generic patterns created by lowering the X86 rounding intrinsics to native IR in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation. Differential Revision: https://reviews.llvm.org/D45203 llvm-svn: 335037
* Test commit.Mikhail Dvoretckii2018-06-191-1/+1
| | | | llvm-svn: 335026
* [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td ↵Craig Topper2018-06-192-65/+128
| | | | | | | | | | | | | | files. Remove remaining manual table entries from the tablegen emitter. This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information. Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions. Finally, remove the manual table from the emitter. This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap. llvm-svn: 335018
* [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have ↵Craig Topper2018-06-192-7/+12
| | | | | | | | | | | | | | VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0. EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity. The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1. This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0. This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler. llvm-svn: 335017
* [X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCICraig Topper2018-06-191-16/+6
| | | | | | The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit. llvm-svn: 335015
* [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.Craig Topper2018-06-181-1/+1
| | | | | | The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd? llvm-svn: 334994
* Tidy comment language and explanation.Eric Christopher2018-06-181-5/+5
| | | | llvm-svn: 334990
* Pull non-lazy stub table emission into a separate function alongsideEric Christopher2018-06-181-41/+47
| | | | | | | the individual stub creation to increase readability a bit in the non-object file format specific function. llvm-svn: 334989
* Add return statements to make it clear that all of these are mutually ↵Eric Christopher2018-06-181-0/+4
| | | | | | | | exclusive conditions. else if would have worked just as well, but this keeps the original readability a bit more clear. llvm-svn: 334988
* [X86] Encode the EVEX2VEX exception list information in .td files instead of ↵Craig Topper2018-06-182-15/+38
| | | | | | | | the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971
* [X86][BtVer2] Flag AVX2+ scheduler classes as unsupportedSimon Pilgrim2018-06-181-20/+20
| | | | | | | | Jaguar only supports up to AVX1 Differential Revision: https://reviews.llvm.org/D48274 llvm-svn: 334947
* [X86] Fix NOOP sched overrides on BDW/HSW/SKL.Clement Courbet2018-06-183-6/+3
| | | | | | | | | | | | Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927
* [X86] Create X86InstrFMA3Group objects fully in a static table instead of on ↵Craig Topper2018-06-182-227/+146
| | | | | | | | | | | | | | the heap. NFCI Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables. Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0. This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted. This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method. llvm-svn: 334925
* [X86] Add '.s' aliases to the assembler for the various redundant move ↵Craig Topper2018-06-184-6/+77
| | | | | | | | | | encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922
* [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves ↵Craig Topper2018-06-181-45/+80
| | | | | | | | | | with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920
* [X86] Add all the FMA instructions direclty to the load folding table ↵Craig Topper2018-06-172-100/+544
| | | | | | | | instead of proxying through X86InstrFMA3Info. These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table. llvm-svn: 334915
* [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to ↵Craig Topper2018-06-172-14/+12
| | | | | | | | | | simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908
* [X86] More additions to the load folding tables based on the autogenerated ↵Craig Topper2018-06-165-33/+814
| | | | | | | | tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898
* [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly ↵Craig Topper2018-06-161-0/+12
| | | | | | | | parser. These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority. llvm-svn: 334897
* [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple ↵Craig Topper2018-06-161-2/+2
| | | | | | | | | | instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896
* [X86] Add more instructions to the hasUndefRegUpdate list.Craig Topper2018-06-151-0/+31
| | | | | | Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future. llvm-svn: 334867
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-153-58/+59
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.Craig Topper2018-06-151-4/+10
| | | | | | An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too. llvm-svn: 334848
* Revert r334802 "[X86] Prevent folding stack reloads with instructions that ↵Craig Topper2018-06-151-7/+4
| | | | | | | | have an undefined register update." There's a typo causing the build to fail. llvm-svn: 334803
* [X86] Prevent folding stack reloads with instructions that have an undefined ↵Craig Topper2018-06-151-4/+7
| | | | | | | | register update. We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency. llvm-svn: 334802
* [X86] Add more instructions to the memory folding tables using the ↵Craig Topper2018-06-151-1/+220
| | | | | | | | | | autogenerated table as a guide. I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions. There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass. llvm-svn: 334800
* [X86] Add 'Z' to the internal names of various EVEX instructions for overall ↵Craig Topper2018-06-153-76/+76
| | | | | | consistency. llvm-svn: 334785
* [x86] be more selective about converting 'and' to shuffle (PR37749)Sanjay Patel2018-06-141-0/+6
| | | | | | | | | | | | | | | | | isVectorClearMaskLegal() is the TLI hook used by the generic DAGCombiner::XformToShuffleWithZero(). We've grown to accomodate/expect this transform to shuffle (disabling it more generally results in many regressions). So I'm narrowly excluding the 256-bit types that clearly are not worthwhile for AVX1. I think in most cases we are able to recover by converting the shuffle back into 'and' ops, but the cases in: https://bugs.llvm.org/show_bug.cgi?id=37749 ...show that there are cracks. llvm-svn: 334759
* [X86] Fix stale comment in folding tables.Craig Topper2018-06-141-3/+3
| | | | llvm-svn: 334758
* [X86] Add more vector instructions to the memory folding table using the ↵Craig Topper2018-06-141-1/+215
| | | | | | | | autogenerated table as a guide. The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX. llvm-svn: 334728
* [X86] Remove '128' from the internal name of some scalar FP instructions to ↵Craig Topper2018-06-141-8/+8
| | | | | | be consistent with other scalar instructions. llvm-svn: 334727
* [X86] Disable load unfolding for a bunch of instruction where unfolding ↵Craig Topper2018-06-141-16/+16
| | | | | | | | would increase the size of the load. Found by an audit of the manual table vs the autogenerated table. llvm-svn: 334726
* [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.Craig Topper2018-06-142-15/+14
| | | | | | Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725
* [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes ↵Craig Topper2018-06-144-43/+178
| | | | | | | | | | | | | | | | | | | | | and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685
* [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct ↵Craig Topper2018-06-131-4/+4
| | | | | | | | load folding table. They were in the operand 1 folding table, but their foldable operand is operand 2. llvm-svn: 334648
* [x86] eliminate even more sign-bit tests with vector selectSanjay Patel2018-06-131-5/+4
| | | | | | | | | | | | | | | | | | | This shortcoming was noted in D47330, and the test diffs show we already had other examples where we failed to fold to a SHRUNKBLEND: /// Dynamic (non-constant condition) vector blend where only the sign bits /// of the condition elements are used. This is used to enforce that the /// condition mask is not valid for generic VSELECT optimizations. This patch implements an idea from D48043 and would obsolete that patch because it catches more cases (notable the AVX1 case that was missed there). All we're doing is allowing the existing transform to fire more often by removing the post-legalize constraint. All of the relevant feature checks and other predicates are left as-is. Differential Revision: https://reviews.llvm.org/D48078 llvm-svn: 334592
* [X86] Remove masking from avx512vbmi2 concat and shift by immediate ↵Craig Topper2018-06-132-23/+19
| | | | | | intrinsics. Use select in IR instead. llvm-svn: 334576
* [X86] Mark all instructions that have masked store semantics with ↵Craig Topper2018-06-131-6/+8
| | | | | | | | NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator. Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore. llvm-svn: 334563
* [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.Craig Topper2018-06-131-2/+4
| | | | llvm-svn: 334562
* [X86] Remove mayLoad flag from AVX512 truncating store instructions.Craig Topper2018-06-121-2/+1
| | | | llvm-svn: 334529
* [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFFReid Kleckner2018-06-123-112/+0
| | | | | | | | | | | | All COFF targets should use @IMGREL32 relocations for symbol differences against __ImageBase. Do the same for getSectionForConstant, so that immediates lowered to globals get merged across TUs. Patch by Chris January Differential Revision: https://reviews.llvm.org/D47783 llvm-svn: 334523
* [MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use ↵Fangrui Song2018-06-121-1/+7
| | | | | | | | | | | | | | | | | | | R_X86_64_GOTPC32 instead of R_X86_64_PC32 Summary: This is similar to D46319 (ARM). x86-64 psABI p40 gives an example: leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32. Reviewers: javed.absar, echristo Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar Differential Revision: https://reviews.llvm.org/D47507 llvm-svn: 334515
OpenPOWER on IntegriCloud