summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* GlobalISel: Try to make legalize rules more useful for vectorsMatt Arsenault2019-02-073-28/+56
| | | | | | | Mostly keep the existing functions on scalars, but add versions which also operate based on the vector element size. llvm-svn: 353430
* [DAG] Cleanup of unused node in SimplifySelectCC.Nirav Dave2019-02-071-8/+15
| | | | llvm-svn: 353428
* [x86] split more 256/512-bit shuffles in loweringSanjay Patel2019-02-071-1/+5
| | | | | | | | | | | | This is intentionally a small step because it's hard to know exactly where we might introduce a conflicting transform with the code that tries to form wider shuffles. But I think this is safe - if we have a wide shuffle with 2 operands, then we should do better with an extract + narrow shuffle. Differential Revision: https://reviews.llvm.org/D57867 llvm-svn: 353427
* [DAG] Cleanup unused node on failed SELECT Combine.Nirav Dave2019-02-071-0/+6
| | | | llvm-svn: 353426
* [llvm-ar][libObject] Fix relative paths when nesting thin archives.Jordan Rupprecht2019-02-071-59/+12
| | | | | | | | | | | | | | | | | | | Summary: When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist). As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes. This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point). Reviewers: mstorsjo, pcc, ruiu, davide, david2050 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57842 llvm-svn: 353424
* [X86] Simplify casing. NFC.Nirav Dave2019-02-071-8/+8
| | | | llvm-svn: 353417
* [DAG] Cleanup unused nodes on failed store-to-load forward combine.Nirav Dave2019-02-071-9/+21
| | | | llvm-svn: 353416
* [CodeView] Fix cycles in debug info when merging Types with global hashes Alexandre Ganea2019-02-072-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | When type streams with forward references were merged using GHashes, cycles were introduced in the debug info. This was caused by GlobalTypeTableBuilder::insertRecordAs() not inserting the record on the second pass, thus yielding an empty ArrayRef at that record slot. Later on, upon PDB emission, TpiStreamBuilder::commit() would skip that empty record, thus offseting all indices that came after in the stream. This solution comes in two steps: 1. Fix the hash calculation, by doing a multiple-step resolution, iff there are forward references in the input stream. 2. Fix merge by resolving with multiple passes, therefore moving records with forward references at the end of the stream. This patch also adds support for llvm-readoj --codeview-ghash. Finally, fix dumpCodeViewMergedTypes() which previously could reference deleted memory. Fixes PR40221 Differential Revision: https://reviews.llvm.org/D57790 llvm-svn: 353412
* [LSR] Generate cross iteration indexesSam Parker2019-02-073-23/+77
| | | | | | | | | | | | | | Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 llvm-svn: 353403
* [ARM GlobalISel] Support G_ICMP for Thumb2Diana Picus2019-02-072-12/+25
| | | | | | | Mark as legal and use the t2* equivalents of the arm mode instructions, e.g. t2CMPrr instead of plain CMPrr. llvm-svn: 353392
* [ARM] Reformat isRedundantFlagInstr for D57833. NFCDavid Green2019-02-071-8/+4
| | | | llvm-svn: 353386
* [BPF] add code-gen support for JMP32 instructionsJiong Wang2019-02-079-49/+132
| | | | | | | | | | | | | | | | | | | | | | | JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of existing BPF conditional jump instructions, but the comparison happens on low 32-bit sub-register only, therefore some unnecessary extensions could be saved. JMP32 instructions will only be available for -mcpu=v3. Host probe hook has been updated accordingly. JMP32 instructions will only be enabled in code-gen when -mattr=+alu32 enabled, meaning compiling the program using sub-register mode. For JMP32 encoding, it is a new instruction class, and is using the reserved eBPF class number 0x6. This patch has been tested by compiling and running kernel bpf selftests with JMP32 enabled. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 353384
* AArch64: implement copy for paired GPR registers.Tim Northover2019-02-072-0/+45
| | | | | | | When doing 128-bit atomics using CASP we might need to copy a GPRPair to a different register, but that was unimplemented up to now. llvm-svn: 353383
* [BranchFolding] Remove dead code for handling EHPad blocksCraig Topper2019-02-071-23/+0
| | | | | | | | | | | | | | | | Summary: This code tries to handle the case where IBB is an EHPad, but there's an earlier check that uses PBB->hasEHPadSuccessor(). Where PBB is a predecessor of IBB. The hasEHPadSuccessor function would have visited IBB and seen that it was an EHPad and returned false. This would prevent us from reaching this code with IBB as an EHPad. Looks like this code was originally added in rL37427 (ancient) and made dead in rL143001. Reviewers: rnk, void, efriedma Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57358 llvm-svn: 353375
* Move the SMT API to LLVMMikhail R. Gadelha2019-02-072-1/+842
| | | | | | | | Moved everything SMT-related to LLVM and updated the cmake scripts. Differential Revision: https://reviews.llvm.org/D54978 llvm-svn: 353373
* Add OpenBSD support to be able to get the thread nameBrad Smith2019-02-071-0/+6
| | | | llvm-svn: 353367
* [WebAssembly] Add symbol flag to the binary format llvm.usedSam Clegg2019-02-074-0/+8
| | | | | | | | | | | | | | Summary: Rather than add a new attribute See https://github.com/WebAssembly/tool-conventions/issues/64 Subscribers: dschuff, jgravelle-google, aheejin, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57864 llvm-svn: 353360
* Remove reference to non-existent function. NFC.Sam Clegg2019-02-071-2/+1
| | | | | | | | This comment is old. The code in question was removed in rL203174 Differential Revision: https://reviews.llvm.org/D57856 llvm-svn: 353352
* [libObject][NFC] Include filename in error messageJordan Rupprecht2019-02-061-1/+1
| | | | llvm-svn: 353341
* [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with ↵Alina Sbirlea2019-02-064-45/+131
| | | | | | | | | | | | | | | | | | | | | | | | MemorySSA. Summary: Experimentally we found that promotion to scalars carries less benefits than sinking and hoisting in LICM. When using MemorySSA, we build an AliasSetTracker on demand in order to reuse the current infrastructure. We only build it if less than AccessCapForMSSAPromotion exist in the loop, a cap that is by default set to 250. This value ensures there are no runtime regressions, and there are small compile time gains for pathological cases. A much lower value (20) was found to yield a single regression in the llvm-test-suite and much higher benefits for compile times. Conservatively we set the current cap to a high value, but we will explore lowering it when MemorySSA is enabled by default. Reviewers: sanjoy, chandlerc Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D56625 llvm-svn: 353339
* [DAG] Immediately cleanup unused nodes from extend-based combines.Nirav Dave2019-02-061-2/+7
| | | | llvm-svn: 353338
* Move IR flag handling directly into builder calls for cases translated from ↵Michael Berg2019-02-062-43/+48
| | | | | | | | | | | | | | Instructions in GlobalIsel Reviewers: aditya_nandakumar, volkan Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, volkan, Petar.Avramovic Differential Revision: https://reviews.llvm.org/D57630 llvm-svn: 353336
* [AliasSetTracker] Pass MustAlias to addPointer more often.Alina Sbirlea2019-02-061-24/+36
| | | | | | | | | | | | | | | Summary: Pass the alias info to addPointer when available. Will save an alias() call for must sets when adding a known Must or May alias. [Part of a series of cleanup patches] Reviewers: reames, mkazantsev Subscribers: sanjoy, jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D56613 llvm-svn: 353335
* [X86][DAG] Avoid creating dangling bitcast.Nirav Dave2019-02-061-1/+2
| | | | | | | | combineExtractWithShuffle may leave a dangling bitcast which may prevent further optimization in later passes. Avoid constructing it unless it is used. llvm-svn: 353333
* [SystemZ] Improved handling of the @llvm.ctlz intrinsic.Jonas Paulsson2019-02-062-0/+2
| | | | | | | | | | | | | | Since SystemZ supports counting of leading zeros with the FLOGR instruction, isCheapToSpeculateCtlz() should return true, which it now does. ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only expanded to CTLZ if it is Legal or Custom. Review: Ulrich Weigand https://reviews.llvm.org/D57710 llvm-svn: 353330
* build: Remove the cmake check for malloc.h.Peter Collingbourne2019-02-061-4/+1
| | | | | | | | | | | As far as I can tell, malloc.h is only being used here to provide a definition of mallinfo (malloc itself is declared in stdlib.h via cstdlib). We already have a macro for whether mallinfo is available, so switch to using that instead. Differential Revision: https://reviews.llvm.org/D57807 llvm-svn: 353329
* [SystemZ] Wait with VGBM selection until after DAGCombine2.Jonas Paulsson2019-02-065-41/+46
| | | | | | | | | | | | | | | | | Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs to the DAGCombiner and select them to VGBM in Select(). This allows the DAGCombiner to understand the constant vector values. For floating point, only all-zeros vectors are now generated with VGBM, as it turned out to be somewhat complicated to handle any arbitrary constants, while in practice this is very rare and hardly needed. The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed. Review: Ulrich Weigand https://reviews.llvm.org/D57152 llvm-svn: 353325
* [SelectionDAG] Cleanup some code comments. NFCBjorn Pettersson2019-02-061-4/+4
| | | | | | | | | | Don't repeat the function name in some doxygen comments. (Just a minor cleanup, while testing to push from the git monorepo setup.) llvm-svn: 353317
* [GlobalISel][NFC] Gardening: Factor out code for simple unary intrinsicsJessica Paquette2019-02-061-78/+58
| | | | | | | | | | | | | There was a lot of repeated code wrt unary math intrinsics in translateKnownIntrinsic. This factors out the repeated MIRBuilder code into two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode. This simplifies adding simple unary intrinsics, since after this, all you have to do is add the mapping to SimpleUnaryIntrinsicOpcodes. Differential Revision: https://reviews.llvm.org/D57774 llvm-svn: 353316
* [yaml2obj]Allow number for ELF symbol typeJames Henderson2019-02-061-0/+1
| | | | | | | | | | | | | yaml2obj previously only recognised standard STT_* names, and didn't allow arbitrary numbers. This change allows the user to specify a number for the type instead. It also adds a test to verify the existing behaviour for obj2yaml for unkown symbol types. Reviewed by: grimar Differential Revision: https://reviews.llvm.org/D57822 llvm-svn: 353315
* [InstCombine] X | C == C --> (X & ~C) == 0Sanjay Patel2019-02-061-9/+18
| | | | | | | | | | We should canonicalize to one of these forms, and compare-with-zero could be more conducive to follow-on transforms. This also leads to generally better codegen as shown in PR40611: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353313
* AArch64: enforce even/odd register pairs for CASP instructions.Tim Northover2019-02-062-6/+8
| | | | | | | | ARMv8.1a CASP instructions need the first of the pair to be an even register (otherwise the encoding is unallocated). We enforced this during assembly, but not CodeGen before. llvm-svn: 353308
* [InlineAsm][X86] Add backend support for X86 flag output parameters.Nirav Dave2019-02-064-12/+90
| | | | | | | Allow custom handling of inline assembly output parameters and add X86 flag parameter support. llvm-svn: 353307
* [SelectionDAGBuilder] Refactor Inline Asm output check. NFCI.Nirav Dave2019-02-061-13/+26
| | | | llvm-svn: 353305
* [SystemZ] Do not return INT_MIN from strcmp/memcmpUlrich Weigand2019-02-064-125/+91
| | | | | | | | | | | | | | | | | | | The IPM sequence currently generated to compute the strcmp/memcmp result will return INT_MIN for the "less than zero" case. While this is in compliance with the standard, strictly speaking, it turns out that common applications cannot handle this, e.g. because they negate a comparison result in order to implement reverse compares. This patch changes code to use a different sequence that will result in -2 for the "less than zero" case (same as GCC). However, this requires that the two source operands of the compare instructions are inverted, which breaks the optimization in removeIPMBasedCompare. Therefore, I've removed this (and all of optimizeCompareInstr), and replaced it with a mostly equivalent optimization in combineCCMask at the DAGcombine level. llvm-svn: 353304
* AArch64: annotate atomics with dropped acquire semantics when printing.Tim Northover2019-02-063-62/+50
| | | | | | | | | | | A quirk of the v8.1a spec is that when the writeback regiser for an atomic read-modify-write instruction is wzr/xzr, the instruction no longer enforces acquire ordering. However, it's still written with the misleading 'a' mnemonic. So this adds an annotation when disassembling such instructions, mentioning the change. llvm-svn: 353303
* [x86] vectorize cast ops in lowering to avoid register file transfersSanjay Patel2019-02-061-0/+57
| | | | | | | | | | | | | | | The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch. There are several potential follow-ups to enhance this: 1. Allow extraction from non-zero element index. 2. Peek through extends of smaller width integers. 3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P Differential Revision: https://reviews.llvm.org/D56864 llvm-svn: 353302
* [MCA] Speedup ResourceManager queries. NFCIAndrea Di Biagio2019-02-061-8/+9
| | | | | | | | | | | | | When a resource unit R is released, the ResourceManager notifies groups that contain R. Before this patch, the logic in method ResourceManager::release() implemented a potentially slow iterative search of dependent groups on the entire set of processor resources. This patch replaces that logic with a simpler (and often faster) lookup on array `Resource2Groups`. This patch gives an average speedup of ~3-4% (observed on a release build when testing for target btver2). No functional change intended. llvm-svn: 353301
* [DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode.Clement Courbet2019-02-061-8/+8
| | | | | | | GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with static typing instead of runtime cast. llvm-svn: 353291
* [NFC] Simplify check in guard wideningMax Kazantsev2019-02-061-9/+3
| | | | llvm-svn: 353290
* [DebugInfo]Print correct value for special opcode address incrementJames Henderson2019-02-061-2/+2
| | | | | | | | | | | The wrong variable was being used when printing the address increment in verbose output of .debug_line. This patch fixes this. Reviewed by: JDevlieghere Differential Revision: https://reviews.llvm.org/D57693 llvm-svn: 353288
* [yaml::BinaryRef] Slight perf tuning (for llvm-exegesis analysis mode)Roman Lebedev2019-02-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: llvm-exegesis uses this functionality to read it's benchmark dumps. This reading of `.yaml`s takes ~60% of runtime for 14656 benchmark points (i.e. one sweep over all x86 instructions), but only 30% of time for 3x as much benchmark points. In particular, this `BinaryRef` appears to be an obvious pain point. Without patch: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-orig.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-orig.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-orig.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-orig.html' (25 runs): 972.86 msec task-clock # 0.994 CPUs utilized ( +- 0.25% ) 30 context-switches # 30.774 M/sec ( +- 21.74% ) 0 cpu-migrations # 0.370 M/sec ( +- 67.81% ) 11873 page-faults # 12211.512 M/sec ( +- 0.00% ) 3898373408 cycles # 4009682.186 GHz ( +- 0.25% ) (83.12%) 360399748 stalled-cycles-frontend # 9.24% frontend cycles idle ( +- 0.54% ) (83.24%) 1099450483 stalled-cycles-backend # 28.20% backend cycles idle ( +- 0.59% ) (33.63%) 4910528820 instructions # 1.26 insn per cycle # 0.22 stalled cycles per insn ( +- 0.13% ) (50.21%) 1111976775 branches # 1143726625.854 M/sec ( +- 0.10% ) (66.77%) 23248474 branch-misses # 2.09% of all branches ( +- 0.19% ) (83.29%) 0.97850 +- 0.00647 seconds time elapsed ( +- 0.66% ) ``` With the patch: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 905.29 msec task-clock # 0.999 CPUs utilized ( +- 0.11% ) 15 context-switches # 16.533 M/sec ( +- 32.27% ) 0 cpu-migrations # 0.000 K/sec 11873 page-faults # 13121.789 M/sec ( +- 0.00% ) 3627759720 cycles # 4009283.100 GHz ( +- 0.11% ) (83.19%) 370401480 stalled-cycles-frontend # 10.21% frontend cycles idle ( +- 0.22% ) (83.19%) 1007114438 stalled-cycles-backend # 27.76% backend cycles idle ( +- 0.34% ) (33.62%) 4414014304 instructions # 1.22 insn per cycle # 0.23 stalled cycles per insn ( +- 0.08% ) (50.36%) 1003751700 branches # 1109314021.971 M/sec ( +- 0.07% ) (66.97%) 24611010 branch-misses # 2.45% of all branches ( +- 0.10% ) (83.41%) 0.90593 +- 0.00105 seconds time elapsed ( +- 0.12% ) ``` So this decreases the overall run time of llvm-exegesis analysis mode (on one sweep) by roughly -7%. To be noted, `BinaryRef::writeAsBinary()` change is the reason for the perf changes, usage of `llvm::isHexDigit()` instead of `isxdigit()` does not appear to have any perf impact, i have only changed it "for symmetry". `writeAsBinary()` change is correct, it produces identical de-hex-ified buffer, and the final output is thus identical: ``` $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-new.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-orig.html ``` Reviewers: silvas, espindola, sbc100, zturner, courbet, gchatelet Reviewed By: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57699 llvm-svn: 353282
* [NFC] Factor out detatchment of dead blocks from their erasingMax Kazantsev2019-02-061-18/+26
| | | | llvm-svn: 353277
* [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simplerMax Kazantsev2019-02-061-1/+3
| | | | llvm-svn: 353276
* [NFC] Revert rL353274Max Kazantsev2019-02-061-10/+5
| | | | llvm-svn: 353275
* [NFC] Extend API of DeleteDeadBlock(s) to collect updates without DTUMax Kazantsev2019-02-061-5/+10
| | | | llvm-svn: 353274
* [NFC] Replace readonly SmallVectorImpl with ArrayRefMax Kazantsev2019-02-061-3/+2
| | | | llvm-svn: 353273
* [HotColdSplit] Move splitting after instrumented PGO useTeresa Johnson2019-02-062-13/+13
| | | | | | | | | | | | | | | | | | | | | Summary: Follow up to D57082 which moved splitting earlier in the pipeline, in order to perform it before inlining. However, it was moved too early, before the IR is annotated with instrumented PGO data. This caused the splitting to incorrectly determine cold functions. Move it to just after PGO annotation (still before inlining), in both pass managers. Reviewers: vsk, hiraditya, sebpop Subscribers: mehdi_amini, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57805 llvm-svn: 353270
* [AliasSetTracker] Minor style tweak to avoid a variable w/two distinct live ↵Philip Reames2019-02-061-4/+2
| | | | | | ranges [NFC] llvm-svn: 353267
* Move DomTreeUpdater from IR to AnalysisRichard Trieu2019-02-0617-16/+16
| | | | | | | | DomTreeUpdater depends on headers from Analysis, but is in IR. This is a layering violation since Analysis depends on IR. Relocate this code from IR to Analysis to fix the layering violation. llvm-svn: 353265
OpenPOWER on IntegriCloud