summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of ↵Craig Topper2019-08-231-3/+5
| | | | | | | | | INSERT_SUBVECTORS for widening with zeros. CONCAT_VECTORS is more canonical for the early DAG combine runs until we start getting into the op legalization phases. llvm-svn: 369734
* [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.Craig Topper2019-08-231-3/+10
| | | | | | | | | | | | | | | | | | | | | For v2i32 we only feed 2 i8 elements into the psadbw instructions with 0s in the other 14 bytes. The resulting psadbw instruction will produce zeros in bits [127:16] of the output. We need to take the result and feed it to a v2i32 add where the first element includes bits [15:0] of the sad result. The other element should be zero. Prior to this patch we were using a truncate to take 0 from bits 95:64 of the psadbw. This results in a pshufd to move those bits to 63:32. But since we also have zeroes in bits 63:32 of the psadbw output, we should just take those bits. The previous code probably worked better with promoting legalization, but now we use widening legalization. I've preserved the old behavior if -x86-experimental-vector-widening-legalization=false until we get that option removed. llvm-svn: 369733
* [MC] Minor cleanup to MCFixup::Kind handling. NFC.Sam Clegg2019-08-2319-43/+39
| | | | | | | | | | Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720
* IR. Change strip* family of functions to not look through aliases.Peter Collingbourne2019-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | I noticed another instance of the issue where references to aliases were being replaced with aliasees, this time in InstCombine. In the instance that I saw it turned out to be only a QoI issue (a symbol ended up being missing from the symbol table due to the last reference to the alias being removed, preventing HWASAN from symbolizing a global reference), but it could easily have manifested as incorrect behaviour. Since this is the third such issue encountered (previously: D65118, D65314) it seems to be time to address this common error/QoI issue once and for all and make the strip* family of functions not look through aliases. Includes a test for the specific issue that I saw, but no doubt there are other similar bugs fixed here. As with D65118 this has been tested to make sure that the optimization isn't load bearing. I built Clang, Chromium for Linux, Android and Windows as well as the test-suite and there were no size regressions. Differential Revision: https://reviews.llvm.org/D66606 llvm-svn: 369697
* [MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is localFrancis Visoiu Mistrih2019-08-224-6/+8
| | | | | | | | | | | | | | | | | | | | | Local symbols in the indirect symbol table contain the value `INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must contain the address of the target. In r349060, I added support for local symbols in the indirect symbol table, which was checking if the symbol `isDefined` && `!isExternal` to determine if the symbol is local or not. It turns out that `isDefined` will return false if the user of the symbol comes before its definition, and we'll again generate .long 0 which will be the symbol at the adress 0x0. Instead of doing that, use GlobalValue::hasLocalLinkage() to check if the symbol is local. Differential Revision: https://reviews.llvm.org/D66563 llvm-svn: 369671
* [X86] Remove MCInstLower code that drops operands from some CALL and TAILJMP ↵Craig Topper2019-08-221-18/+8
| | | | | | | | | | instructions. Add asserts to verify operand count It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it. Differential Revision: https://reviews.llvm.org/D66570 llvm-svn: 369665
* [X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.Andrea Di Biagio2019-08-221-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Single operand MUL instructions that implicitly set EAX have the following latency/throughput profile (see below): imul %cl # latency: 3cy - uOPs: 1 - 1 JMul imul %cx # latency: 3cy - uOPs: 3 - 3 JMul imul %ecx # latency: 3cy - uOPs: 2 - 2 JMul imul %rcx # latency: 6cy - uOPs: 2 - 4 JMul mul %cl # latency: 3cy - uOPs: 1 - 1 JMul mul %cx # latency: 3cy - uOPs: 3 - 3 JMul mul %ecx # latency: 3cy - uOPs: 2 - 2 JMul mul %rcx # latency: 6cy - uOPs: 2 - 4 JMul Excluding the 64bit variant, which has a latency of 6cy, every other instruction has a latency of 3cy. However, the number of decoded macro-opcodes (as well as the resource cyles) depend on the MUL size. The two operand MULs have a more predictable profile (see below): imul %dx, %dx # latency: 3cy - uOPs: 1 - 1 JMul imul %edx, %edx # latency: 3cy - uOPs: 1 - 1 JMul imul %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul imul $3, %dx, %dx # latency: 4cy - uOPs: 2 - 2 JMul imul $3, %ecx, %ecx # latency: 3cy - uOPs: 1 - 1 JMul imul $3, %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul This patch updates the values in the Jaguar scheduling model and regenerates llvm-mca tests. Differential Revision: https://reviews.llvm.org/D66547 llvm-svn: 369661
* [PowerPC] Add combined ELF ABI and 32/64 bit queries to the subtarget. [NFC]Sean Fertile2019-08-226-58/+59
| | | | | | | | | | | | A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more readable I've added a couple functions that combine checking for the ELF abi and 64-bit/32-bit code at once. As we add more AIX support I intend to add similar functions for the AIX ABI. Differential Revision: https://reviews.llvm.org/D65814 llvm-svn: 369658
* [PowerPC][XCOFF][MC] Explicitly set containing csect on symbols. [NFC]Sean Fertile2019-08-221-0/+2
| | | | | | | | | | | Previously we would get the csect a symbol was contained in through its fragment. This works only if we are writing an object file, and only for defined symbols. To fix this we set the contating csect explicitly on the MCSymbolXCOFF object. Differential Revision: https://reviews.llvm.org/D66032 llvm-svn: 369657
* [X86][BtVer2] Fix latency and throughput of XCHG and XADD.Andrea Di Biagio2019-08-221-1/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to 1 extra uOP; maximum observed throughput is 0.5 IPC. ``` xchgb %cl, %dl # Latency: 2cy - uOPs: 3 - 2 ALU xchgw %cx, %dx # Latency: 1cy - uOPs: 2 - 2 ALU xchgl %ecx, %edx # Latency: 1cy - uOPs: 2 - 2 ALU xchgq %rcx, %rdx # Latency: 1cy - uOPs: 2 - 2 ALU ``` The reg-mem forms of XCHG are atomic operations with an observed latency of 16cy. The resource usage is similar to the XCHGrr variants. The biggest difference is obviously the bus-locking, which prevents the LS to issue other memory uOPs in parallel until the unlocking store uOP is executed. ``` xchgb %cl, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgw %cx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgl %ecx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgq %rcx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy ``` The exchanged in/out register operand becomes available after 11cy from the start of execution. Added test xchg.s to verify that we correctly see that register write committed in 11cy (and not 16cy). Reg-reg XADD instructions have the same latency/throughput than the byte exchange (register-register variant). ``` xaddb %cl, %dl # latency: 2cy - uOPs: 3 - 3 ALU xaddw %cx, %dx # latency: 2cy - uOPs: 3 - 3 ALU xaddl %ecx, %edx # latency: 2cy - uOPs: 3 - 3 ALU xaddq %rcx, %rdx # latency: 2cy - uOPs: 3 - 3 ALU ``` The non-atomic RM variants have a latency of 11cy, and decode to 4 macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register operand becomes available in 3cy (it matches the 'load-to-use latency'). ``` xaddb %cl, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddw %cx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddl %ecx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddq %rcx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU ``` The atomic XADD variants execute in 16cy. The in/out register operand is available after 11cy from the start of execution. ``` lock xaddb %cl, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddw %cx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddl %ecx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddq %rcx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy ``` Added test xadd.s to verify those latencies as well as read-advance values. Differential Revision: https://reviews.llvm.org/D66535 llvm-svn: 369642
* [MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI.Simon Pilgrim2019-08-221-21/+11
| | | | | | Allows for some cleanup in a lot of SSE/AVX vector splitting code llvm-svn: 369640
* Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit cd53ff6, reapplying 7c6b229. llvm-svn: 369638
* Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"Hans Wennborg2019-08-221-7/+6
| | | | | | | | | | | | | | | | | It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636
* [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening ↵Craig Topper2019-08-221-0/+18
| | | | | | | | | | | | | | legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628
* [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | | | | | This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626
* [TargetLowering] Remove optional arguments passing to makeLibCallShiva Chen2019-08-224-10/+20
| | | | | | | | | | The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622
* [X86] Making X86OptimizeLEAs pass public. NFCPengfei Wang2019-08-223-26/+30
| | | | | | | | | | | | | | | | Reviewers: wxiao3, LuoYuanke, andrew.w.kaylor, craig.topper, annita.zhang, liutianle, pengfei, xiangzhangllvm, RKSimon, spatel, andreadb Reviewed By: RKSimon Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Patch by Gen Pei (gpei) Differential Revision: https://reviews.llvm.org/D65933 llvm-svn: 369612
* [X86] Correct the scheduler classes for TAILJMP and TCRETURN CodeGenOnly ↵Craig Topper2019-08-211-24/+25
| | | | | | | | | | | | instructions. We had an odd combination of WriteJump applied to some memory instructions and WriteJumpLd applied to register and immediate instructions. Thsi should hopefully assign them all correctly. llvm-svn: 369599
* [X86] Replace a couple hardcoded '5's with X86::AddrNumOperands for ↵Craig Topper2019-08-211-2/+3
| | | | | | readability. NFC llvm-svn: 369598
* [RISCV] Remove fix introduced by r369573, superseded by r369580Luis Marques2019-08-211-3/+0
| | | | llvm-svn: 369590
* [RISCV] Fix use of side-effects in asserts in decoder functionsLuis Marques2019-08-211-6/+9
| | | | llvm-svn: 369580
* Fix -Werror=unused-variable error after r369528.Richard Smith2019-08-211-0/+3
| | | | llvm-svn: 369573
* Revert r367389 (and follow-up r368404); it caused PR43073.Nico Weber2019-08-211-50/+76
| | | | llvm-svn: 369567
* [WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcastsSam Clegg2019-08-211-0/+2
| | | | | | | | Fixes: https://github.com/emscripten-core/emscripten/issues/8770 Differential Revision: https://reviews.llvm.org/D66508 llvm-svn: 369566
* [MVT] Add v16f16 and v32f16 vectors.Craig Topper2019-08-211-0/+4
| | | | | | | | | I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565
* [mips] Replace call `expandLoadAddress` by `loadAndAddSymbolAddress`. NFCSimon Atanasyan2019-08-211-2/+2
| | | | | | | | In case of expanding `lw/sw $reg, symbol($reg)` instruction for PIC it's enough to call the `loadAndAddSymbolAddress` method. Additional work performed by the `expandLoadAddress` is not required here. llvm-svn: 369563
* [mips] Remove duplicated case from the `StringSwitch`. NFCSimon Atanasyan2019-08-211-1/+0
| | | | llvm-svn: 369562
* GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sourcesMatt Arsenault2019-08-211-1/+1
| | | | | | | | This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547
* [ARM] Formatting for ARMInstrMVE.td. NFCDavid Green2019-08-211-89/+98
| | | | | | | This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545
* [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitionsAlexander Timofeev2019-08-212-0/+16
| | | | | | | Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
* [LLVM][Alignment] Introduce Alignment In MachineFrameInfoGuillaume Chatelet2019-08-212-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: jfb Subscribers: hiraditya, dexonsmith, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65800 llvm-svn: 369531
* [RISCV] Add support for RVC HINT instructionsLuis Marques2019-08-217-3/+212
| | | | | | | | | The hint instructions are enabled by default (if the standard C extension is enabled). To disable them pass -mattr=-rvc-hints. Differential Revision: https://reviews.llvm.org/D62592 llvm-svn: 369528
* [MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOADPetar Avramovic2019-08-211-3/+3
| | | | | | | | NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66205 llvm-svn: 369512
* [MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXTPetar Avramovic2019-08-211-0/+4
| | | | | | | | NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66204 llvm-svn: 369511
* [MIPS GlobalISel] Consider type1 when legalizing shifts after r351882Petar Avramovic2019-08-211-2/+2
| | | | | | | | | r351882 allows different type for shift amount then result and value being shifted. Fix MIPS Legalizer rules to take r351882 into account. Differential Revision: https://reviews.llvm.org/D66203 llvm-svn: 369510
* [MIPS GlobalISel] NarrowScalar G_TRUNCPetar Avramovic2019-08-211-0/+4
| | | | | | | | | Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source. NarrowScalar G_TRUNC to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66202 llvm-svn: 369509
* [AArch64] Update MTE system register encodingsLuke Cheeseman2019-08-211-5/+5
| | | | | | | | | | The encodings for the system registers TFSRE0_EL1, TFSR_EL1 TFSR_EL2, TFSR_EL3 and TFSR_EL12 have been changed so that they consistently have CRn=5 and CRm=6 as per https://developer.arm.com/docs/ddi0487/latest. Differential Revision: https://reviews.llvm.org/D65442 llvm-svn: 369505
* [AArch64][GlobalISel] Add support for narrowScalar of G_ZEXTAmara Emerson2019-08-212-4/+5
| | | | | | | | We do this by merging the source with the high bits set to 0. Differential Revision: https://reviews.llvm.org/D66181 llvm-svn: 369480
* [RISCV GlobalISel] Adding initial GlobalISel infrastructureDaniel Sanders2019-08-2015-2/+412
| | | | | | | | | | | | | | | | | | | Summary: Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`. Patch by Andrew Wei Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders Reviewed By: dsanders Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65219 llvm-svn: 369467
* [AArch64][GlobalISel] Select logical_imm32 and logical_imm64 patternsJessica Paquette2019-08-202-0/+23
| | | | | | | | | | | | | | Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM SDNodeXForms in AArch64InstrFormats.td. - Add select-logical-imm.mir, which contains tests for each imported pattern. - Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now select instructions of this form. Differential Revision: https://reviews.llvm.org/D66162 llvm-svn: 369465
* [AArch64][GlobalISel] Select patterns which use shifted register operandsJessica Paquette2019-08-202-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GlobalISel equivalents for the following from AArch64InstrFormats: - arith_shifted_reg32 - arith_shifted_reg64 And partial support for - logical_shifted_reg32 - logical_shifted_reg32 The only thing missing for the logical cases is support for rotates. Other than the missing support, the transformation is identical for the arithmetic shifted register and the logical shifted register. Lots of tests here: - Add select-arith-shifted-reg.mir to show that we correctly select add and sub instructions which use this pattern. - Add select-logical-shifted-reg.mir to cover patterns which are not shared between the arithmetic and logical cases. - Update addsub-shifted.ll to show that we correctly fold shifts into adds/subs. - Update eon.ll to show that we can select the eon instruction by folding xors. Differential Revision: https://reviews.llvm.org/D66163 llvm-svn: 369460
* [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors ↵Craig Topper2019-08-201-0/+14
| | | | | | | | | | | | | | (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459
* Revert [WinEH] Allocate space in funclets stack to save XMM CSRsReid Kleckner2019-08-203-135/+26
| | | | | | | | This reverts r367088 (git commit 9ad565f70ec5fd3531056d7c939302d4ea970c83) And the follow up fix r368631 / e9865b9b31bb2e6bc742dc6fca8f9f9517c3c43e llvm-svn: 369457
* Adds support for writing the .bss section for XCOFF object files.Sean Fertile2019-08-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target object writer. Also adds a class to represent the top-level sections, which we materialize in the ObjectWriter. executePostLayoutBinding will map all csects into the appropriate container depending on its storage mapping class, and map all symbols into their containing csect. Once all symbols have been processed we - Assign addresses and symbol table indices. - Calaculte section sizes. - Build the section header table. - Assign the sections raw-pointer value for non-virtual sections. Since the .bss section is virtual, writing the header table is enough to add support. Writing of a sections raw data, or of any relocations is not included in this patch. Testing is done by dumping the section header table, but it needs to be extended to include dumping the symbol table once readobj support for dumping auxiallary entries lands. Differential Revision: https://reviews.llvm.org/D65159 llvm-svn: 369454
* [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector ↵Craig Topper2019-08-201-0/+12
| | | | | | | | | | (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))). Differential Revision: https://reviews.llvm.org/D66489 llvm-svn: 369434
* [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a ↵Craig Topper2019-08-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend. We already had patterns for extending to i32 to take advantage of the impliciting zeroing of the upper bits of a 32-bit GPR that is done by KMOVW/KMOVB. But the extend might be all the way to i64, in which case the existing patterns would fail and we'd get a KMOVW/B followed by a MOVZX. By adding patterns for i64 we can use the fact that KMOVW/B zero the upper bits of the 32-bit GPR and the normal property that 32-bit GPR writes implicitly zero the upper 32-bits of the full 64-bit GPR. The anyextend patterns are slightly different since we don't care about the upper zeros. For the i8->i64 I think this avoids selecting the anyextend as a MOVZX to prevent a partial register issue that doesn't exist. For i16->i64 I think we would have just emitted an insert_subreg on top of the extract_subreg that the vXi16->i16 bitcast pattern emits. The register coalescer or peephole pass should combine those, but this saves that work and makes i8/16 consistent. llvm-svn: 369431
* [TargetMachine] Don't try to create COFFSTUB references on windows on non-COFFMartin Storsjo2019-08-202-3/+8
| | | | | | | | This avoids spurious relocation types for windows/elf targets. Differential Revision: https://reviews.llvm.org/D66401 llvm-svn: 369426
* Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"Matt Arsenault2019-08-205-142/+28
| | | | | | | This reverts r367500 and r369203. This is causing various test failures. llvm-svn: 369417
* [X86][BtVer2] Use ReadAfterLd entries for the register operands of CMPXCHG.Andrea Di Biagio2019-08-201-3/+12
| | | | | | This is a follow-up of r369365. llvm-svn: 369412
* [X86] Use isNullConstant instead of getConstantOperandVal == 0. NFCCraig Topper2019-08-201-2/+2
| | | | llvm-svn: 369410
OpenPOWER on IntegriCloud