summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* [x86] add/regenerate complete checks; NFCSanjay Patel2017-09-043-78/+146
| | | | llvm-svn: 312502
* [x86] add test for unnecessary cmp + masked store; NFCSanjay Patel2017-09-041-0/+28
| | | | | | | | | As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) llvm-svn: 312496
* Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵Sam McCall2017-09-0455-252/+272
| | | | | | | | | | forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490
* [X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382)Simon Pilgrim2017-09-042-42/+31
| | | | | | Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles. llvm-svn: 312486
* Added shuffle test case from PR34382Simon Pilgrim2017-09-041-0/+11
| | | | llvm-svn: 312485
* Added shuffle test case from PR34369Simon Pilgrim2017-09-041-0/+37
| | | | llvm-svn: 312481
* [X86] Replace -mcpu option with -mattr in LIT tests added in ↵Ayman Musa2017-09-0413-952/+953
| | | | | | https://reviews.llvm.org/rL312442 llvm-svn: 312474
* [GlobalISel][X86] G_PHI support.Igor Breger2017-09-044-1/+1321
| | | | llvm-svn: 312473
* [XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic ↵Dean Michael Berris2017-09-046-61/+63
| | | | | | | | | | | | | | | | | references in .text Summary: This is a re-roll of D36615 which uses PLT relocations in the back-end to the call to __xray_CustomEvent() when building in -fPIC and -fxray-instrument mode. Reviewers: pcc, djasper, bkramer Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D37373 llvm-svn: 312466
* [X86] Add a combine to recognize when we have two insert subvectors that ↵Craig Topper2017-09-042-3/+0
| | | | | | | | together write the whole vector, but the starting vector isn't undef. In this case we should replace the starting vector with undef. llvm-svn: 312462
* [X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, ↵Craig Topper2017-09-032-6/+0
| | | | | | X, Idx), Idx) into an insert of X into the larger zero vector. llvm-svn: 312460
* [X86] Add more patterns to use moves to zero the upper portions of a vector ↵Craig Topper2017-09-032-30/+15
| | | | | | register that I missed in r312450. llvm-svn: 312459
* [X86] Combine inserting a vector of zeros into a vector of zeros just the ↵Craig Topper2017-09-031-14/+4
| | | | | | larger vector. llvm-svn: 312458
* [X86] Add patterns to turn an insert into lower subvector of a zero vector ↵Craig Topper2017-09-038-186/+97
| | | | | | | | into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450
* [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.Craig Topper2017-09-0326-276/+205
| | | | llvm-svn: 312449
* [X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0).Craig Topper2017-09-036-121/+93
| | | | | | In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448
* [X86] Add -mtriple option to LIT tests added in ↵Ayman Musa2017-09-0313-13/+13
| | | | | | https://reviews.llvm.org/rL312442 llvm-svn: 312443
* [X86][AVX512] Add simple tests for all AVX512 shuffle instructions.Ayman Musa2017-09-0313-0/+26357
| | | | | | | | | | | | | | | | Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated. This is a subset of the generated tests that we think may add value to our X86 set of tests. Some of the checks are not optimal and will be changed after fixing: 1. PR34394 2. PR34382 3. PR34380 4. PR34359 Differential Revision: https://reviews.llvm.org/D37329 llvm-svn: 312442
* [X86] Add RUN line for LIT test committed in "rL312438: [X86] Fix crash on ↵Ayman Musa2017-09-031-1/+3
| | | | | | assert of non-simple type after type-legalization.". llvm-svn: 312439
* [X86] Fix crash on assert of non-simple type after type-legalizationAyman Musa2017-09-031-0/+22
| | | | | | | | | | The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards. Adding a type check in case the combine is running after the type legalize pass. Differential Revision: https://reviews.llvm.org/D37330 llvm-svn: 312438
* [X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64Craig Topper2017-09-024-48/+464
| | | | | | | | | | | | | | | | | | | | | Summary: ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort. We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code. Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one. The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed. Reviewers: spatel, zvi, igorb, guyblank, RKSimon Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37320 llvm-svn: 312422
* [x86] eliminate redundant shuffle of horizontal math ops when both inputs ↵Sanjay Patel2017-09-011-16/+0
| | | | | | | | | | | | | | | are the same This is limited to a set of patterns based on the example in PR34111: https://bugs.llvm.org/show_bug.cgi?id=34111 ...but as I was investigating this, I see that horizontal patterns can go wrong in many, many other ways that would not be handled by this patch. Each data type may even go different in the DAG after starting with the same basic IR pattern, so even proper IR canonicalization won't fix it all. Differential Revision: https://reviews.llvm.org/D37357 llvm-svn: 312379
* [X86] Add test case I forgot to commit with r312285.Craig Topper2017-09-011-0/+49
| | | | llvm-svn: 312335
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-09-0155-272/+252
| | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328
* [X86] Add isel patterns for memory forms of FMA3 intrinsic instructionsCraig Topper2017-09-011-48/+48
| | | | llvm-svn: 312309
* [x86] add more tests for horizontal ops; NFCSanjay Patel2017-08-312-19/+159
| | | | llvm-svn: 312279
* Revert r311525: "[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove ↵Daniel Jasper2017-08-316-51/+61
| | | | | | | | synthetic references in .text" Breaks builds internally. Will forward repo instructions to author. llvm-svn: 312243
* [X86] Added run line to intrinsics upgrade test. NFC.Yael Tsafrir2017-08-311-0/+1
| | | | llvm-svn: 312241
* AMD family 17h (znver1) scheduler model update.Ashutosh Nema2017-08-3119-654/+654
| | | | | | | | | | | | | | | | | | | | Summary: This patch enables the following: 1) Regex based Instruction itineraries for integer instructions. 2) The instructions are grouped as per the nature of the instructions (move, arithmetic, logic, Misc, Control Transfer). 3) FP instructions and their itineraries are added which includes values for SSE4A, BMI, BMI2 and SHA instructions. Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36617 llvm-svn: 312237
* Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY ↵Hans Wennborg2017-08-3055-252/+272
| | | | | | | | | | | | | | | | | | | | | | | | | | | source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-08-3055-272/+252
| | | | | | | | | | | | | | | | | | | | | | | Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154
* Canonicalize the representation of empty an expression in ↵Adrian Prantl2017-08-307-8/+8
| | | | | | | | | | | | | | | | DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. llvm-svn: 312144
* [AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel ↵Craig Topper2017-08-309-32/+32
| | | | | | | | | | | | unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 llvm-svn: 312138
* [GlobalISel][X86] Support variadic function call.Igor Breger2017-08-302-0/+163
| | | | | | | | | | | | | | Summary: Support variadic function call. Port the implementation from X86FastISel. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37261 llvm-svn: 312130
* Re-land MachineInstr: Reason locally about some memory objects before going ↵Balaram Makam2017-08-305-30/+30
| | | | | | | | | | | | | | | | | | | | to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126
* [X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen ↵Gadi Haber2017-08-3010-10/+2465
| | | | | | | | | | | | | | regression tests NFC. Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake. The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info. Reviewers: zvi, RKSimon, aymanmus, igorb Differential Revision: https://reviews.llvm.org/D37258 llvm-svn: 312103
* [AVX512] Correct isel patterns to support selecting masked ↵Craig Topper2017-08-301-0/+155
| | | | | | | | | | | | | | | | | | | | | | | vbroadcastf32x2/vbroadcasti32x2 Summary: This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern. Fixes PR34357 We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch. Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one. Reviewers: aymanmus, zvi, igorb Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37286 llvm-svn: 312101
* [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from ↵Craig Topper2017-08-3013-151/+151
| | | | | | | | | | a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100
* [X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as wellCraig Topper2017-08-302-3/+3
| | | | | | | | Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge. Differential Revision: https://reviews.llvm.org/D37250 llvm-svn: 312099
* [X86] Provide a separate feature bit for macro fusion support instead of ↵Craig Topper2017-08-305-8/+8
| | | | | | | | | | | | | | | | | | | | | basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097
* [dwarfdump] Pretty print location expressions and location listsReid Kleckner2017-08-293-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Based on Fred's patch here: https://reviews.llvm.org/D6771 I can't seem to commandeer the old review, so I'm creating a new one. With that change the locations exrpessions are pretty printed inline in the DIE tree. The output looks like this for debug_loc entries: DW_AT_location [DW_FORM_data4] (0x00000000 0x0000000000000001 - 0x000000000000000b: DW_OP_consts +3 0x000000000000000b - 0x0000000000000012: DW_OP_consts +7 0x0000000000000012 - 0x000000000000001b: DW_OP_reg0 RAX, DW_OP_piece 0x4 0x000000000000001b - 0x0000000000000024: DW_OP_breg5 RDI+0) And like this for debug_loc.dwo entries: DW_AT_location [DW_FORM_sec_offset] (0x00000000 Addr idx 2 (w/ length 190): DW_OP_consts +0, DW_OP_stack_value Addr idx 3 (w/ length 23): DW_OP_reg0 RAX, DW_OP_piece 0x4) Simple locations without ranges are printed inline: DW_AT_location [DW_FORM_block1] (DW_OP_reg4 RSI, DW_OP_piece 0x4, DW_OP_bit_piece 0x20 0x0) The debug_loc(.dwo) dumping in changed accordingly to factor the code. Reviewers: dblaikie, aprantl, friss Subscribers: mgorny, javed.absar, hiraditya, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D37123 llvm-svn: 312042
* [X86] Add a test cases to demonstrate selecting GPR instructions whenGuy Blank2017-08-291-0/+365
| | | | | | using mask based ones are more appropriate. llvm-svn: 311996
* [X86] Adding a test to demonstrate aggressive folding for LEA facotrization.Jatin Bhateja2017-08-291-0/+148
| | | | | | Differential Revision: https://reviews.llvm.org/D37257 llvm-svn: 311994
* Mark Knights Landing as having slow two memory operand instructionsCraig Topper2017-08-291-1/+1
| | | | | | | | | | | | | | | | Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979
* [DAGCombiner] Teach visitEXTRACT_SUBVECTOR to turn extracts of BUILD_VECTOR ↵Craig Topper2017-08-282-11/+5
| | | | | | | | | | into smaller BUILD_VECTORs Only do this before operations are legalized of BUILD_VECTOR is Legal for the target. Differential Revision: https://reviews.llvm.org/D37186 llvm-svn: 311892
* [X86][Haswell] Updating HSW instruction scheduling informationGadi Haber2017-08-2839-11252/+10009
| | | | | | | | | | | | | | | This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879
* [AVX512] Add more patterns for using masked moves for subvector extracts of ↵Craig Topper2017-08-271-0/+228
| | | | | | the lowest subvector. This time with bitcasts between the vselect and the extract. llvm-svn: 311856
* [DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111)Sanjay Patel2017-08-271-7/+2
| | | | | | | | As noted in the FIXME, this could be improved more, but this is the smallest fix that helps: https://bugs.llvm.org/show_bug.cgi?id=34111 llvm-svn: 311853
* [x86] add haddps test for PR34111; NFCSanjay Patel2017-08-271-0/+25
| | | | llvm-svn: 311852
* [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors typesJatin Bhateja2017-08-271-2/+82
| | | | llvm-svn: 311847
OpenPOWER on IntegriCloud