summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [dwarfdump] Pretty print location expressions and location listsReid Kleckner2017-08-293-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Based on Fred's patch here: https://reviews.llvm.org/D6771 I can't seem to commandeer the old review, so I'm creating a new one. With that change the locations exrpessions are pretty printed inline in the DIE tree. The output looks like this for debug_loc entries: DW_AT_location [DW_FORM_data4] (0x00000000 0x0000000000000001 - 0x000000000000000b: DW_OP_consts +3 0x000000000000000b - 0x0000000000000012: DW_OP_consts +7 0x0000000000000012 - 0x000000000000001b: DW_OP_reg0 RAX, DW_OP_piece 0x4 0x000000000000001b - 0x0000000000000024: DW_OP_breg5 RDI+0) And like this for debug_loc.dwo entries: DW_AT_location [DW_FORM_sec_offset] (0x00000000 Addr idx 2 (w/ length 190): DW_OP_consts +0, DW_OP_stack_value Addr idx 3 (w/ length 23): DW_OP_reg0 RAX, DW_OP_piece 0x4) Simple locations without ranges are printed inline: DW_AT_location [DW_FORM_block1] (DW_OP_reg4 RSI, DW_OP_piece 0x4, DW_OP_bit_piece 0x20 0x0) The debug_loc(.dwo) dumping in changed accordingly to factor the code. Reviewers: dblaikie, aprantl, friss Subscribers: mgorny, javed.absar, hiraditya, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D37123 llvm-svn: 312042
* [X86] Add a test cases to demonstrate selecting GPR instructions whenGuy Blank2017-08-291-0/+365
| | | | | | using mask based ones are more appropriate. llvm-svn: 311996
* [X86] Adding a test to demonstrate aggressive folding for LEA facotrization.Jatin Bhateja2017-08-291-0/+148
| | | | | | Differential Revision: https://reviews.llvm.org/D37257 llvm-svn: 311994
* Mark Knights Landing as having slow two memory operand instructionsCraig Topper2017-08-291-1/+1
| | | | | | | | | | | | | | | | Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979
* [DAGCombiner] Teach visitEXTRACT_SUBVECTOR to turn extracts of BUILD_VECTOR ↵Craig Topper2017-08-282-11/+5
| | | | | | | | | | into smaller BUILD_VECTORs Only do this before operations are legalized of BUILD_VECTOR is Legal for the target. Differential Revision: https://reviews.llvm.org/D37186 llvm-svn: 311892
* [X86][Haswell] Updating HSW instruction scheduling informationGadi Haber2017-08-2839-11252/+10009
| | | | | | | | | | | | | | | This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879
* [AVX512] Add more patterns for using masked moves for subvector extracts of ↵Craig Topper2017-08-271-0/+228
| | | | | | the lowest subvector. This time with bitcasts between the vselect and the extract. llvm-svn: 311856
* [DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111)Sanjay Patel2017-08-271-7/+2
| | | | | | | | As noted in the FIXME, this could be improved more, but this is the smallest fix that helps: https://bugs.llvm.org/show_bug.cgi?id=34111 llvm-svn: 311853
* [x86] add haddps test for PR34111; NFCSanjay Patel2017-08-271-0/+25
| | | | llvm-svn: 311852
* [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors typesJatin Bhateja2017-08-271-2/+82
| | | | llvm-svn: 311847
* [X86] Add a target-specific DAG combine to combine extract_subvector from ↵Craig Topper2017-08-273-19/+4
| | | | | | all zero/one build_vectors. llvm-svn: 311841
* [AVX512] Add patterns to match masked extract_subvector with bitcasts ↵Craig Topper2017-08-261-0/+114
| | | | | | | | | | between the vselect and the extract_subvector. Remove the late DAG combine. We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't. Add some more test cases to ensure we've also got most of the zero masking covered too. llvm-svn: 311837
* [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32.Jatin Bhateja2017-08-261-0/+112
| | | | | | Differential Revision: https://reviews.llvm.org/D37183 llvm-svn: 311834
* [DAGCombiner] Extending pattern detection for vector shuffle.Jatin Bhateja2017-08-264-116/+120
| | | | | | | | | | | | | | | | | | Summary: If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. This will also fix PR 33784 Reviewers: zvi, delena, RKSimon, thakis Reviewed By: RKSimon Subscribers: chandlerc, eladcohen, llvm-commits Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 311833
* Revert rL311247 : To rectify commit message.Jatin Bhateja2017-08-264-120/+116
| | | | | | | | Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311832
* [AVX512] Add patterns to use masked moves to implement masked ↵Craig Topper2017-08-251-36/+24
| | | | | | | | extract_subvector of the lowest subvector. This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI. llvm-svn: 311821
* [AVX512] Add additional test cases for masked extract subvector.Craig Topper2017-08-251-4/+448
| | | | | | This includes tests for extracting 128-bits from a 256-bit vector and zero masking. llvm-svn: 311820
* [X86] Add patterns to show more failures to use TBM instructions when we're ↵Craig Topper2017-08-251-0/+331
| | | | | | | | trying to check flags. We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction. llvm-svn: 311819
* [x86] Teach the backend to fold more read-modify-write memory operandsChandler Carruth2017-08-255-21/+434
| | | | | | | | | | | | | | | | | | | | | | | | to instructions. These can't be reasonably matched in tablegen due to the handling of flags, so we have to do this in C++ code. We only did it for `inc` and `dec` historically, this starts fleshing that out to more interesting instructions. Notably, this handles transfering operands to `add` and `sub`. Currently this forces them into a register. The next patch will add support for keeping immediate operands as immediates. Then I'll extend this beyond just `add` and `sub`. I'm not super thrilled by the repeated switches in the code but everything else I tried was really ugly or problematic. Many thanks to Craig Topper for the suggestions about where to even begin here and how to make this stuff work. Differential Revision: https://reviews.llvm.org/D37130 llvm-svn: 311806
* [x86] regenerate checks; NFCSanjay Patel2017-08-252-9/+20
| | | | llvm-svn: 311793
* [x86] NFC - normalize test case formatting of IR and generate CHECKChandler Carruth2017-08-253-206/+647
| | | | | | lines with the script rather than using manually written checks. llvm-svn: 311753
* [X86] Add TBM instructions to X86InstrInfo::isDefConvertible.Craig Topper2017-08-252-20/+1
| | | | | | This allows us to remove "test" instructions and use the flags from the TBM instructions directly. llvm-svn: 311747
* [x86] Back out one aspect of r311318: don't generically setChandler Carruth2017-08-252-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FeatureSlowUAMem32. The idea was to mark things that are slow on widely available processors as slow in the generic CPU so that the code generated for that CPU would be fast across those processors. However, for this feature that doesn't work out very well at all. The problem here is that you can very easily enable AVX or AVX2 on top of this generic CPU. For example, this can happen just by using AVX2 intrinsics from Clang within a region of code guarded by a dynamic CPU feature test. When you do that, the generated code with SlowUAMem32 set is ... amazingly slower. The problem is that there really aren't very good alternatives to the unaligned loads, and so our vector codegen regresses significantly. The other issue is that there are plenty of AMD CPUs with AVX1 that don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2 instead of this special feature. =/ It would be nice to have the target attriute logic be able to enable/disable more than just one feature at a time and control this in a more fine grained and useful way, but that doesn't seem easy. Given that it is only Sandybridge and Ivybridge that set this feature, for now I'm just backing it out of the generic CPU. That has the additional advantage of going back to the previous state that people seemed vaguely happy with. llvm-svn: 311740
* [x86] Fix an amazing goof in the handling of sub, or, and xor lowering.Chandler Carruth2017-08-256-381/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment for this code indicated that it should work similar to our handling of add lowering above: if we see uses of an instruction other than flag usage and store usage, it tries to avoid the specialized X86ISD::* nodes that are designed for flag+op modeling and emits an explicit test. Problem is, only the add case actually did this. In all the other cases, the logic was incomplete and inverted. Any time the value was used by a store, we bailed on the specialized X86ISD node. All of this appears to have been historical where we had different logic here. =/ Turns out, we have quite a few patterns designed around these nodes. We should actually form them. I fixed the code to match what we do for add, and it has quite a positive effect just within some of our test cases. The only thing close to a regression I see is using: notl %r testl %r, %r instead of: xorl -1, %r But we can add a pattern or something to fold that back out. The improvements seem more than worth this. I've also worked with Craig to update the comments to no longer be actively contradicted by the code. =[ Some of this still remains a mystery to both Craig and myself, but this seems like a large step in the direction of consistency and slightly more accurate comments. Many thanks to Craig for help figuring out this nasty stuff. Differential Revision: https://reviews.llvm.org/D37096 llvm-svn: 311737
* [DAG] convert vector select-of-constants to logic/mathSanjay Patel2017-08-243-47/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | This goes back to a discussion about IR canonicalization. We'd like to preserve and convert more IR to 'select' than we currently do because that's likely the best choice in IR: http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html ...but that's often not true for codegen, so we need to account for this pattern coming in to the backend and transform it to better DAG ops. Steps in this patch: 1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely enable this transform. Other targets will probably want that anyway to distinguish scalars from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary. 2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the vector ext is often free. Implementing a more general fold using xor+and can be a follow-up for targets that don't have a legal vselect. It's also possible that we can remove the TLI hook for the special case fold implemented here because we're eliminating a constant, but it needs to be tested on other targets. Differential Revision: https://reviews.llvm.org/D36840 llvm-svn: 311731
* Adding base lit test for x86interleavedMichael Zuckerman2017-08-241-0/+261
| | | | llvm-svn: 311658
* [x86] NFC: Clean up two tests and generate precise checks for them.Chandler Carruth2017-08-242-187/+678
| | | | | | | | | | | | Mostly this involved giving unnamed values names and running the IR through `opt` to re-format it but merging in any important comments in the original. I then deleted pointless comments and inlined the function attributes for ease of reading and editting. All of this is to make it much easier to see the instructions being generated here and evaluate any updates to the tests. llvm-svn: 311634
* [GlobalISel][X86] Support G_IMPLICIT_DEF.Igor Breger2017-08-244-0/+272
| | | | | | | | | | | | | | Summary: Support G_IMPLICIT_DEF. Reviewers: zvi, guyblank, t.p.northover Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36733 llvm-svn: 311633
* Add ‘llvm.experimental.constrained.fma‘ Intrinsic.Wei Ding2017-08-241-22/+59
| | | | | | Differential Revision: http://reviews.llvm.org/D36335 llvm-svn: 311629
* [DAG] Fix Node Replacement in PromoteIntBinOpHans Wennborg2017-08-241-0/+53
| | | | | | | | | | | | | | When one operand is a user of another in a promoted binary operation we may replace and delete the returned value before returning triggering an assertion. Reorder node replacements to prevent this. Fixes PR34137. Landing on behalf of Nirav. Differential Revision: https://reviews.llvm.org/D36581 llvm-svn: 311623
* Parse and print DIExpressions inline to ease IR and MIR testingReid Kleckner2017-08-232-46/+48
| | | | | | | | | | | | | | | | | | | Summary: Most DIExpressions are empty or very simple. When they are complex, they tend to be unique, so checking them inline is reasonable. This also avoids the need for CodeGen passes to append to the llvm.dbg.mir named md node. See also PR22780, for making DIExpression not be an MDNode. Reviewers: aprantl, dexonsmith, dblaikie Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D37075 llvm-svn: 311594
* [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectorsCraig Topper2017-08-231-0/+24
| | | | | | | | | | | | There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them. On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal. This fixes PR34139. Differential Revision: https://reviews.llvm.org/D36992 llvm-svn: 311572
* [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic ↵Dean Michael Berris2017-08-236-61/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | references in .text Summary: This change achieves two things: - Redefine the Custom Event handling instrumentation points emitted by the compiler to not require dynamic relocation of references to the __xray_CustomEvent trampoline. - Remove the synthetic reference we emit at the end of a function that we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER associated with the section where the function is defined. To achieve the custom event handling change, we've had to introduce the concept of sled versioning -- this will need to be supported by the runtime to allow us to understand how to turn on/off the new version of the custom event handling sleds. That change has to land first before we change the way we write the sleds. To remove the synthetic reference, we rely on a relatively new linker feature that preserves the sections that are associated with each other. This allows us to limit the effects on the .text section of ELF binaries. Because we're still using absolute references that are resolved at runtime for the instrumentation map (and function index) maps, we mark these sections write-able. In the future we can re-define the entries in the map to use relative relocations instead that can be statically determined by the linker. That change will be a bit more invasive so we defer this for later. Depends on D36816. Reviewers: dblaikie, echristo, pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36615 llvm-svn: 311525
* Fix tail-merge-after-mbp testMatthias Braun2017-08-231-14/+14
| | | | | | | | The output of this test changed after the fix in r311520 to have -run-pass=block-placement behave like it does in a normal pipeline. Adjust the test. llvm-svn: 311521
* [x86] auto-generate full checks; NFCSanjay Patel2017-08-221-11/+22
| | | | | | I don't see anything Darwin-specific here, so I made the target generic x86-64. llvm-svn: 311465
* [x86] simplify runs and auto-generate full checksSanjay Patel2017-08-221-33/+40
| | | | | | | | | I've replaced the two OS-specific runs with a generic run because there's no functional difference in the resulting output that we're checking. Also, the script still doesn't work with a Win target. llvm-svn: 311463
* [X86] Prevent several calls to ISD::isConstantSplatVector from returning a ↵Craig Topper2017-08-222-0/+28
| | | | | | | | | | | | | | narrower APInt than the original scalar type ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results. This patch just adds a flag to prevent the APInt from changing width. Fixes PR34271. Differential Revision: https://reviews.llvm.org/D36996 llvm-svn: 311429
* [X86] When selecting sse_load_f32/f64 pattern, make sure there's only one ↵Craig Topper2017-08-211-2/+3
| | | | | | | | | | | | | | use of every node all the way back to the root of the match Summary: With masked operations, its possible for the operation node like fadd, fsub, etc. to be used by multiple different vselects. Since the pattern matching will start at the vselect, we need to make sure the operation node itself is only used once before we can fold a load. Otherwise we'll end up folding the same load into multiple instructions. Reviewers: RKSimon, spatel, zvi, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36938 llvm-svn: 311342
* [GlobalISel][X86] Support G_BRCOND operation.Igor Breger2017-08-213-0/+225
| | | | | | | | | | | | | | Summary: Support G_BRCOND operation. For now don't try to fold cmp/trunc instructions. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34754 llvm-svn: 311327
* [GlobalISel][X86] LowerCall, for now don't handel ByValue function arguments.Igor Breger2017-08-211-0/+11
| | | | llvm-svn: 311321
* [InterLeaved] Adding lit test for future work interleaved load strid 3Michael Zuckerman2017-08-211-0/+316
| | | | llvm-svn: 311320
* [x86] Teach the "generic" x86 CPU to avoid patterns that are slow onChandler Carruth2017-08-214-18/+43
| | | | | | | | | | | | | | | | | | | | widely used processors. This occured to me when I saw that we were generating 'inc' and 'dec' when for Haswell and newer we shouldn't. However, there were a few "X is slow" things that we should probably just set. I've avoided any of the "X is fast" features because most of those would be pretty serious regressions on processors where X isn't actually fast. The slow things are likely to be negligible costs on processors where these aren't slow and a significant win when they are slow. In retrospect this seems somewhat obvious. Not sure why we didn't do this a long time ago. Differential Revision: https://reviews.llvm.org/D36947 llvm-svn: 311318
* [x86] Handle more cases where we can re-use an atomic operation's flagsChandler Carruth2017-08-211-0/+86
| | | | | | | | | | | | | | | | | | | | | | rather than doing a separate comparison. This both saves an explicit comparision and avoids the use of `xadd` which introduces register constraints and other challenges to the generated code. The motivating case is from atomic reference counts where `1` is the sentinel rather than `0` for whatever reason. This can and should be lowered efficiently on x86 by just using a different flag, however the x86 code only handled the `0` case. There remains some further opportunities here that are currently hidden due to canonicalization. I've included test cases that show these and FIXMEs. However, I don't at the moment have any production use cases and they seem substantially harder to address. Differential Revision: https://reviews.llvm.org/D36945 llvm-svn: 311317
* [AVX-512] Don't change which instructions we use for unmasked subvector ↵Craig Topper2017-08-213-192/+56
| | | | | | | | | | broadcasts when AVX512DQ is enabled. There's no functional difference between the AVX512DQ instructions if we're not masking. This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently. llvm-svn: 311308
* [AVX512] Add 128->256 vbroadcastf64x2/vbroadcasti64x2 instructions to the ↵Craig Topper2017-08-212-107/+28
| | | | | | EVEX->VEX table. llvm-svn: 311307
* [AVX512] Add a test to check what happens when a load is referenced by two ↵Craig Topper2017-08-201-0/+21
| | | | | | | | different masked scalar intrinsics with the same op inputs, but different masking node. We're missing some single use checks in the sse_load_f32/f64 handling that cause us to replicate the load. llvm-svn: 311300
* [GlobalISel][X86] Support call ABI.Igor Breger2017-08-202-18/+550
| | | | | | | | | | | | | | Summary: Support call ABI. For now only Linux C and X86_64_SysV calling conventions supported. Variadic function not supported. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: oren_ben_simhon Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34602 llvm-svn: 311279
* [GlobalISel][X86] Support asimetric copy from/to GPR physical register.Igor Breger2017-08-201-0/+185
| | | | | | Usually this case generated by ABI lowering, it requare to performe trancate/anyext. llvm-svn: 311278
* [x86] Fix an even stranger corner case where we have multiple levels ofChandler Carruth2017-08-191-1/+23
| | | | | | | | | cmov self-refrencing. Pointed out by Amjad Aboud in code review, test case minorly simplified from the one he posted. llvm-svn: 311267
* [AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps ↵Craig Topper2017-08-191-1/+1
| | | | | | from an extract subvector operation. llvm-svn: 311263
OpenPOWER on IntegriCloud