summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86][BtVer2] Fix latency and throughput of XCHG and XADD.Andrea Di Biagio2019-08-221-1/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to 1 extra uOP; maximum observed throughput is 0.5 IPC. ``` xchgb %cl, %dl # Latency: 2cy - uOPs: 3 - 2 ALU xchgw %cx, %dx # Latency: 1cy - uOPs: 2 - 2 ALU xchgl %ecx, %edx # Latency: 1cy - uOPs: 2 - 2 ALU xchgq %rcx, %rdx # Latency: 1cy - uOPs: 2 - 2 ALU ``` The reg-mem forms of XCHG are atomic operations with an observed latency of 16cy. The resource usage is similar to the XCHGrr variants. The biggest difference is obviously the bus-locking, which prevents the LS to issue other memory uOPs in parallel until the unlocking store uOP is executed. ``` xchgb %cl, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgw %cx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgl %ecx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy xchgq %rcx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy ``` The exchanged in/out register operand becomes available after 11cy from the start of execution. Added test xchg.s to verify that we correctly see that register write committed in 11cy (and not 16cy). Reg-reg XADD instructions have the same latency/throughput than the byte exchange (register-register variant). ``` xaddb %cl, %dl # latency: 2cy - uOPs: 3 - 3 ALU xaddw %cx, %dx # latency: 2cy - uOPs: 3 - 3 ALU xaddl %ecx, %edx # latency: 2cy - uOPs: 3 - 3 ALU xaddq %rcx, %rdx # latency: 2cy - uOPs: 3 - 3 ALU ``` The non-atomic RM variants have a latency of 11cy, and decode to 4 macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register operand becomes available in 3cy (it matches the 'load-to-use latency'). ``` xaddb %cl, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddw %cx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddl %ecx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU xaddq %rcx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU ``` The atomic XADD variants execute in 16cy. The in/out register operand is available after 11cy from the start of execution. ``` lock xaddb %cl, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddw %cx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddl %ecx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy lock xaddq %rcx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy ``` Added test xadd.s to verify those latencies as well as read-advance values. Differential Revision: https://reviews.llvm.org/D66535 llvm-svn: 369642
* [MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI.Simon Pilgrim2019-08-221-21/+11
| | | | | | Allows for some cleanup in a lot of SSE/AVX vector splitting code llvm-svn: 369640
* Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit cd53ff6, reapplying 7c6b229. llvm-svn: 369638
* [Loop Peeling] Fix silly bug in metadata update.Serguei Katkov2019-08-221-6/+6
| | | | | | | We must update loop metedata before we moved to parent loop if it is present. llvm-svn: 369637
* Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"Hans Wennborg2019-08-221-7/+6
| | | | | | | | | | | | | | | | | It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636
* [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening ↵Craig Topper2019-08-221-0/+18
| | | | | | | | | | | | | | legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628
* [Support] Improve readNativeFile(Slice) interfacePavel Labath2019-08-223-92/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There was a subtle, but pretty important difference between the Slice and regular versions of this function. The Slice function was zero-initializing the rest of the buffer when the read syscall returned less bytes than expected, while the regular function did not. This patch removes the inconsistency by making both functions *not* zero-initialize the buffer. The zeroing code is moved to the MemoryBuffer class, which is currently the only user of this code. This makes the API more consistent, and the code shorter. While in there, I also refactor the functions to return the number of bytes through the regular return value (via Expected<size_t>) instead of a separate by-ref argument. Reviewers: aganea, rnk Subscribers: kristina, Bigcheese, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66471 llvm-svn: 369627
* [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | | | | | This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626
* [TargetLowering] Remove optional arguments passing to makeLibCallShiva Chen2019-08-229-95/+177
| | | | | | | | | | The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622
* [X86] Making X86OptimizeLEAs pass public. NFCPengfei Wang2019-08-223-26/+30
| | | | | | | | | | | | | | | | Reviewers: wxiao3, LuoYuanke, andrew.w.kaylor, craig.topper, annita.zhang, liutianle, pengfei, xiangzhangllvm, RKSimon, spatel, andreadb Reviewed By: RKSimon Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Patch by Gen Pei (gpei) Differential Revision: https://reviews.llvm.org/D65933 llvm-svn: 369612
* [COFF] Fix section name for constants larger than 64 bits on WindowsFangrui Song2019-08-221-1/+2
| | | | | | | | | | | | APIntToHexString returns wrong value ("0000000000000000ffffffffffffffff") for integer larger than 64 bits, and thus TargetLoweringObjectFileCOFF::getSectionForConstant returns same section name for all numbers larger than 64 bits. This patch tries to fix it. Differential Revision: https://reviews.llvm.org/D66458 Patch by Senran Zhang llvm-svn: 369610
* [Object] FIX: update PlatformKind name in TapiFileCyndy Ishida2019-08-211-2/+2
| | | | | | | Buildbots that use GCC failed to compile because overwritten namespace with variable name llvm-svn: 369602
* [Object] Add tapi files to objectCyndy Ishida2019-08-215-3/+163
| | | | | | | | | | | | | | | | | | | | | Summary: The intention for this is to allow reading and printing symbols out from llvm-nm. Tapi file, and Tapi universal follow a similiar format to their respective MachO Object format. The tests are dependent on llvm-nm processing tbd files which is why its in D66160 Reviewers: ributzka, steven_wu, lhames Reviewed By: ributzka, lhames Subscribers: mgorny, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66159 llvm-svn: 369600
* [X86] Correct the scheduler classes for TAILJMP and TCRETURN CodeGenOnly ↵Craig Topper2019-08-211-24/+25
| | | | | | | | | | | | instructions. We had an odd combination of WriteJump applied to some memory instructions and WriteJumpLd applied to register and immediate instructions. Thsi should hopefully assign them all correctly. llvm-svn: 369599
* [X86] Replace a couple hardcoded '5's with X86::AddrNumOperands for ↵Craig Topper2019-08-211-2/+3
| | | | | | readability. NFC llvm-svn: 369598
* [RISCV] Remove fix introduced by r369573, superseded by r369580Luis Marques2019-08-211-3/+0
| | | | llvm-svn: 369590
* [Attributor] Fix: Gracefully handle non-instruction usersJohannes Doerfert2019-08-211-1/+5
| | | | | | | Function can have users that are not instructions, e.g., bitcasts. For now, we simply give up when we see them. llvm-svn: 369588
* Add FileWriter to GSYM and encode/decode functions to AddressRange and ↵Greg Clayton2019-08-213-0/+115
| | | | | | | | | | | | | | | | AddressRanges The full GSYM patch started with: https://reviews.llvm.org/D53379 This patch add the ability to encode data using the new llvm::gsym::FileWriter class. FileWriter is a simplified binary data writer class that doesn't require targets, target definitions, architectures, or require any other optional compile time libraries to be enabled via the build process. This class needs the ability to seek to different spots in the binary data that it produces to fix up offsets and sizes in GSYM data. It currently uses std::ostream over llvm::raw_ostream because llvm::raw_ostream doesn't support seeking which is required when encoding and decoding GSYM data. AddressRange objects are encoded and decoded to be relative to a base address. This will be the FunctionInfo's start address if the AddressRange is directly contained in a FunctionInfo, or a base address of the containing parent AddressRange or AddressRanges. This allows address ranges to be efficiently encoded using ULEB128 encodings as we encode the offset and size of each range instead of full addresses. This also makes encoded addresses easy to relocate as we just need to relocate one base address. Differential Revision: https://reviews.llvm.org/D63828 llvm-svn: 369587
* [RISCV] Fix use of side-effects in asserts in decoder functionsLuis Marques2019-08-211-6/+9
| | | | llvm-svn: 369580
* [BinaryFormat] Teach identify_magic about Tapi files.Cyndy Ishida2019-08-214-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Tapi files are YAML files that start with the !tapi tag. The only execption are TBD v1 files, which don't have a tag. In that case we have to scan a little further and check if the first key "archs" exists. This is the first patch in a series of patches to add libObject support for text-based dynamic library (.tbd) files. This patch is practically exactly the same as D37820, that was never pushed to master, and is needed for future commits related to reading tbd files for llvm-nm Reviewers: ributzka, steven_wu, bollu, espindola, jfb, shafik, jdoerfert Reviewed By: steven_wu Subscribers: dexonsmith, llvm-commits Tags: #llvm, #clang, #sanitizers, #lldb, #libc, #openmp Differential Revision: https://reviews.llvm.org/D66149 llvm-svn: 369579
* [Attributor][NFC] Fix copy & paste errorJohannes Doerfert2019-08-211-1/+1
| | | | llvm-svn: 369577
* [Attributor][NFC] Remove leftover semicolonJohannes Doerfert2019-08-211-1/+1
| | | | llvm-svn: 369576
* [Attributor] Use existing unreachable instead of introducing new onesJohannes Doerfert2019-08-211-0/+3
| | | | | | | So far we split the unreachable off and placed a new one, this is not necessary. llvm-svn: 369575
* Fix -Werror=unused-variable error after r369528.Richard Smith2019-08-211-0/+3
| | | | llvm-svn: 369573
* [GVN] Do PHI translations across all edges between the load and the ↵Florian Hahn2019-08-211-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | unavailable pred. Currently we do not properly translate addresses with PHIs if LoadBB != LI->getParent(), because PHITranslateAddr expects a direct predecessor as argument, because it considers all instructions outside of the current block to not requiring translation. The amount of cases that trigger this should be very low, as most single predecessor blocks should be folded into their predecessor by GVN before we actually start with value numbering. It is still not guaranteed to happen, so we should do PHI translation along all edges between the loads' block and the predecessor where we have to place a load. There are a few test cases showing current limits of the PHI translation, which could be improved later. Reviewers: spatel, reames, efriedma, john.brawn Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D65020 llvm-svn: 369570
* Revert r369549 as it broke the bots.Aaron Ballman2019-08-211-4/+3
| | | | | | http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/13605/ llvm-svn: 369569
* Revert r367389 (and follow-up r368404); it caused PR43073.Nico Weber2019-08-211-50/+76
| | | | llvm-svn: 369567
* [WebAssembly] Handle aliases in WebAssemblyFixFunctionBitcastsSam Clegg2019-08-211-0/+2
| | | | | | | | Fixes: https://github.com/emscripten-core/emscripten/issues/8770 Differential Revision: https://reviews.llvm.org/D66508 llvm-svn: 369566
* [MVT] Add v16f16 and v32f16 vectors.Craig Topper2019-08-212-0/+6
| | | | | | | | | I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565
* [mips] Replace call `expandLoadAddress` by `loadAndAddSymbolAddress`. NFCSimon Atanasyan2019-08-211-2/+2
| | | | | | | | In case of expanding `lw/sw $reg, symbol($reg)` instruction for PIC it's enough to call the `loadAndAddSymbolAddress` method. Additional work performed by the `expandLoadAddress` is not required here. llvm-svn: 369563
* [mips] Remove duplicated case from the `StringSwitch`. NFCSimon Atanasyan2019-08-211-1/+0
| | | | llvm-svn: 369562
* [DAGCombiner] Remove mostly redundant calls to AddToWorklistAmaury Sechet2019-08-211-2/+1
| | | | | | | | | | | | | | | | | Summary: These calls change the order in which some nodes are processed and so have an effect on codegen. The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66543 llvm-svn: 369561
* [BitcodeReader] Check if we can create a null constant for type.Florian Hahn2019-08-211-0/+2
| | | | | | | | | | | | | | | | | | | We cannot create null constants for certain types, e.g. VoidTy, FunctionTy or LabelTy. getNullValue asserts if we pass in an unsupported type. We should also check for opaque types, but I'm not sure how. This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14795. Reviewers: t.p.northover, jfb, vsk Reviewed By: vsk Tags: #llvm Differential Revision: https://reviews.llvm.org/D65897 llvm-svn: 369557
* Fix -Wimplicit-fallthrough warnings in regcomp.cNathan Huckleberry2019-08-211-3/+4
| | | | | | | | | | | | | | | | | | Summary: Since clang does not support comment style fallthrough annotations these should be switched. Reviewers: aaron.ballman, nickdesaulniers, xbolva00 Reviewed By: aaron.ballman, nickdesaulniers, xbolva00 Subscribers: xbolva00, nickdesaulniers, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66487 llvm-svn: 369549
* [LoopPassManager + MemorySSA] Only enable use of MemorySSA for LPMs known to ↵Alina Sbirlea2019-08-212-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | preserve it. Summary: Add a flag to the FunctionToLoopAdaptor that allows enabling MemorySSA only for the loop pass managers that are known to preserve it. If an LPM is known to have only loop transforms that *all* preserve MemorySSA, then use MemorySSA if `EnableMSSALoopDependency` is set. If an LPM has loop passes that do not preserve MemorySSA, then the flag passed is `false`, regardless of the value of `EnableMSSALoopDependency`. When using a custom loop pass pipeline via `passes=...`, use keyword `loop` vs `loop-mssa` to use MemorySSA in that LPM. If a loop that does not preserve MemorySSA is added while using the `loop-mssa` keyword, that's an error. Add the new `loop-mssa` keyword to a few tests where a difference occurs when enabling MemorySSA. Reviewers: chandlerc Subscribers: mehdi_amini, Prazek, george.burgess.iv, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66376 llvm-svn: 369548
* GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sourcesMatt Arsenault2019-08-212-1/+21
| | | | | | | | This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547
* [ARM] Formatting for ARMInstrMVE.td. NFCDavid Green2019-08-211-89/+98
| | | | | | | This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545
* [instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0Philip Reames2019-08-211-0/+5
| | | | | | Noticed while looking at pr43028. llvm-svn: 369541
* Improving CodeView debug info type record's inline commentsNilanjana Basu2019-08-214-55/+400
| | | | llvm-svn: 369533
* [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitionsAlexander Timofeev2019-08-213-1/+21
| | | | | | | Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
* [LLVM][Alignment] Introduce Alignment In MachineFrameInfoGuillaume Chatelet2019-08-216-18/+21
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: jfb Subscribers: hiraditya, dexonsmith, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65800 llvm-svn: 369531
* [DWARF] Adjust return type of DWARFUnit::getLength().Igor Kudrin2019-08-212-3/+3
| | | | | | | | | DWARFUnitHeader::getLength() returns uint64_t. DWARFUnit::getLength() should do the same. Differential Revision: https://reviews.llvm.org/D66472 llvm-svn: 369529
* [RISCV] Add support for RVC HINT instructionsLuis Marques2019-08-217-3/+212
| | | | | | | | | The hint instructions are enabled by default (if the standard C extension is enabled). To disable them pass -mattr=-rvc-hints. Differential Revision: https://reviews.llvm.org/D62592 llvm-svn: 369528
* [DAGCombiner] Various nits. NFCAmaury Sechet2019-08-211-4/+2
| | | | llvm-svn: 369520
* [InstCombine] narrow icmp with extended operands of different widthsSanjay Patel2019-08-211-6/+17
| | | | | | | | | | | An intermediate extend is used to widen the narrow operand to the width of the other (wider) operand. At that point, we have the same logic as the existing transform that was restricted to folds of equal width zext/sext. This mostly solves PR42700: https://bugs.llvm.org/show_bug.cgi?id=42700 llvm-svn: 369519
* MinidumpYAML: move serialization code to MinidumpEmitter.cppPavel Labath2019-08-212-209/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The code for serializing minidumps was living in MinidumpYAML.cpp so that it would be accessible from unit tests. While this had its advantages, it was also unfortunate because it broke symmetry with all other yaml2obj serializers. Fortunately, nowadays all of yaml2obj is a library, so we don't need to do anything special. This patch improves the code consistency by moving the serialization code to MinidumpEmitter.cpp to match the style used in other backends. It also removes the writeAsBinary entry point in favor of the more general convertYAML interface. This patch is just massaging the code a bit. There shouldn't be any functional change here. Reviewers: jhenderson, abrachet Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66474 llvm-svn: 369517
* [MIPS GlobalISel] NarrowScalar G_ZEXTLOAD and G_SEXTLOADPetar Avramovic2019-08-211-3/+3
| | | | | | | | NarrowScalar G_ZEXTLOAD and G_SEXTLOAD to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66205 llvm-svn: 369512
* [MIPS GlobalISel] NarrowScalar G_ZEXT and G_SEXTPetar Avramovic2019-08-211-0/+4
| | | | | | | | NarrowScalar G_ZEXT and G_SEXT to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66204 llvm-svn: 369511
* [MIPS GlobalISel] Consider type1 when legalizing shifts after r351882Petar Avramovic2019-08-211-2/+2
| | | | | | | | | r351882 allows different type for shift amount then result and value being shifted. Fix MIPS Legalizer rules to take r351882 into account. Differential Revision: https://reviews.llvm.org/D66203 llvm-svn: 369510
* [MIPS GlobalISel] NarrowScalar G_TRUNCPetar Avramovic2019-08-212-0/+19
| | | | | | | | | Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source. NarrowScalar G_TRUNC to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66202 llvm-svn: 369509
OpenPOWER on IntegriCloud