summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [DeLICM] Use input access heuristic for mapped PHI WRITEs.Michael Kruse2017-05-111-3/+18
| | | | | | | | | | | | | | | | | | | As with the scalar operand of the initial StoreInst, also use input accesses when searching for new opportunities after mapping a PHI write. The same rational applies here: After LICM has been applied, the promoted value will either be an instruction in the same statement (in which case we fall back to try every scalar access of the statement), or in another statement such that there will be such an input access. In the latter case other scalars cannot have originated from the same register promotion, at least not by LICM. This mostly helps to decrease compilation time and makes debugging easier by not pursuing unpromising routes. In some circumstances, it may change the compiler's output. llvm-svn: 302839
* [DeLICM] Lookup input accesses.Michael Kruse2017-05-113-6/+42
| | | | | | | | | | | | | | | | Previous to this patch, we used VirtualUse to determine the input access of an llvm::Value in a statement. The input access is the READ MemoryAccess that makes a value available in that statement, which can either be a READ of a MemoryKind::Value or the MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input access to heuristically find a candidate to map without searching all possible values. This might modify the behaviour in that previously PHI accesses were not considered input accesses before. This was unintentially lost when "VirtualUse" was extracted from the "Known Knowledge" patch. llvm-svn: 302838
* [VirtualInstruction] Do a lookup instead of a linear search. NFC.Michael Kruse2017-05-111-20/+1
| | | | llvm-svn: 302837
* [ScopInfo] Keep scalar acceess dictionaries up-to-data. NFC.Michael Kruse2017-05-112-0/+27
| | | | | | | | | | When removing a MemoryAccess, also remove it from maps pointing to it. This was already done for InstructionToAccess, but not yet for ValueReads, ValueWrites and PHIWrites as those were only used during the ScopBuilder phase. Keeping them updated allows us to use them later as well. llvm-svn: 302836
* Issue diagnostics when returning FP values on x86_64 without SSE1/2Reid Kleckner2017-05-112-9/+39
| | | | | | | | | | | | | Avoid using report_fatal_error, because it will ask the user to file a bug. If the user attempts to disable SSE on x86_64 and them use floating point, that's a bug in their code, not a bug in the compiler. This is just a start. There are other ways to crash the backend in this configuration, but they should be updated to follow this pattern. Differential Revision: https://reviews.llvm.org/D27522 llvm-svn: 302835
* [PPC] Change the register constraint of the first source operand of ↵Guozhi Wei2017-05-114-1/+44
| | | | | | | | | | | | instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834
* [DWARF parser] Produce correct template parameter packsSean Callanan2017-05-1112-36/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Templates can end in parameter packs, like this template <class T...> struct MyStruct { /*...*/ }; LLDB does not currently support these parameter packs; it does not emit them into the template argument list at all. This causes problems when you specialize, e.g.: template <> struct MyStruct<int> { /*...*/ }; template <> struct MyStruct<int, int> : MyStruct<int> { /*...*/ }; LLDB generates two template specializations, each with no template arguments, and then when they are imported by the ASTImporter into a parser's AST context we get a single specialization that inherits from itself, causing Clang's record layout mechanism to smash its stack. This patch fixes the problem for classes and adds tests. The tests for functions fail because Clang's ASTImporter can't import them at the moment, so I've xfailed that test. Differential Revision: https://reviews.llvm.org/D33025 llvm-svn: 302833
* Reduce template usage. NFC.Rafael Espindola2017-05-115-78/+77
| | | | llvm-svn: 302832
* [GISel]: Remove unused lambda captures. NFCAditya Nandakumar2017-05-111-4/+4
| | | | | | https://reviews.llvm.org/D33085 llvm-svn: 302831
* [scudo] Use our own combined allocatorKostya Kortchinsky2017-05-114-115/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The reasoning behind this change is twofold: - the current combined allocator (sanitizer_allocator_combined.h) implements features that are not relevant for Scudo, making some code redundant, and some restrictions not pertinent (alignments for example). This forced us to do some weird things between the frontend and our secondary to make things work; - we have enough information to be able to know if a chunk will be serviced by the Primary or Secondary, allowing us to avoid extraneous calls to functions such as `PointerIsMine` or `CanAllocate`. As a result, the new scudo-specific combined allocator is very straightforward, and allows us to remove some now unnecessary code both in the frontend and the secondary. Unused functions have been left in as unimplemented for now. It turns out to also be a sizeable performance gain (3% faster in some Android memory_replay benchmarks, doing some more on other platforms). Reviewers: alekseyshl, kcc, dvyukov Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33007 llvm-svn: 302830
* Decrease inlinecold-threshold to 45Easwaran Raman2017-05-112-19/+3
| | | | | | | | | | | | | | | | | | | | I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829
* Reduce template usage. NFC.Rafael Espindola2017-05-114-35/+32
| | | | llvm-svn: 302828
* De-virtualize TerminatorInst successor accessorsReid Kleckner2017-05-113-45/+79
| | | | | | | | | Use the same switch technique to eliminate virtual successor accessors from TerminatorInst. Extracted from D31261. NFC llvm-svn: 302827
* Reduce template usage. NFC.Rafael Espindola2017-05-114-9/+9
| | | | llvm-svn: 302826
* XFAIL this test for Hexagon.Richard Smith2017-05-111-0/+4
| | | | | | | It's failing due to Hexagon calling convention lowering being broken (empty structs are not passed even if they have nontrivial destructors / copy ctors). llvm-svn: 302825
* [Libcxxabi]: Support using compiler-rt for MinGW64Martell Malone2017-05-111-3/+8
| | | | | | | | Reviewers: EricWF Differential Revision: https://reviews.llvm.org/D33098 llvm-svn: 302824
* De-virtualize GlobalValueReid Kleckner2017-05-119-47/+79
| | | | | | | | | | | | | | | The erase/remove from parent methods now use a switch table to remove themselves from their appropriate parent ilist. The copyAttributesFrom method is now completely non-virtual, since we only ever copy attributes from a global of the appropriate type. Pre-requisite to de-virtualizing Value to save a vptr (https://reviews.llvm.org/D31261). NFC llvm-svn: 302823
* [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.Chad Rosier2017-05-113-1/+53
| | | | | | Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822
* [AMDGPU] Placate unused variable warning in release builds.Davide Italiano2017-05-111-0/+1
| | | | llvm-svn: 302821
* [MSP430] Generate EABI-compliant libcallsVadzim Dambrouski2017-05-1111-40/+967
| | | | | | | | | | | | | Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820
* [LiveVariables] Switch Kill/Defs sets to be DenseSet(s).Davide Italiano2017-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The testcase in PR32984 shows a non linear compile time increase after a change that made the LoopUnroll pass more aggressive (increasing the threshold). My profiling shows all the time of PHI elimination goes to llvm::LiveVariables::addNewBlock. This is because we keep Defs/Kills registers in a SmallSet and vfind(const T &V); is O(N). Switching to a DenseSet reduces the time spent in the pass from 297 seconds to 97 seconds. Profiling still shows a lot of time is spent iterating the data structure, so I guess there's room for improvement. Dan tells me GCC uses real set operations for live registers and it takes no-time on this testcase. Matthias points out we might want to switch all this to LiveIntervalAnalysis so it's not entirely sure if a rewrite is worth it. Differential Revision: https://reviews.llvm.org/D33088 llvm-svn: 302819
* Work around different -std= default for PS4 target.Richard Smith2017-05-111-1/+1
| | | | llvm-svn: 302818
* PR22877: When constructing an array via a constructor with a default argumentRichard Smith2017-05-112-6/+50
| | | | | | | | | | | | in list-initialization, run cleanups for the default argument after each iteration of the initialization loop. We previously only ran the destructor for any temporary once, at the end of the complete loop, rather than once per iteration! Re-commit of r302750, reverted in r302776. llvm-svn: 302817
* [APInt] Remove an APInt copy from the return of APInt::multiplicativeInverse.Craig Topper2017-05-111-1/+4
| | | | llvm-svn: 302816
* [APInt] Fix typo in comment. NFCCraig Topper2017-05-111-1/+1
| | | | llvm-svn: 302815
* AMDGPU: Remove tfe bit from flat instruction definitionsMatt Arsenault2017-05-1111-169/+78
| | | | | | | | | | We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-116-8/+72
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* [AMDGPU] Fix incorrect register pressure calculationStanislav Mekhanoshin2017-05-111-2/+3
| | | | | | | | | Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
* [SLP] Emit optimization remarksAdam Nemet2017-05-115-11/+143
| | | | | | | | | | | | | | | | | | The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
* [PowerPC] Eliminate integer compare instructions - vol. 1Nemanja Ivanovic2017-05-1118-11/+1942
| | | | | | | | | | | | | This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810
* Add a test that local submodule visibility has no effect on debug infoAdrian Prantl2017-05-111-0/+5
| | | | | | rdar://problem/27876262 llvm-svn: 302809
* [DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI.Simon Pilgrim2017-05-111-18/+4
| | | | llvm-svn: 302808
* [asan] Test 'strndup_oob_test.cc' added in r302781 fails on the ↵Pierre Gousseau2017-05-111-1/+2
| | | | | | | | clang-cmake-thumbv7-a15-full-sh bot. Marking as unsupported on armv7l-unknown-linux-gnueabihf, same as strdup_oob_test.cc llvm-svn: 302807
* Fix -DLLVM_ENABLE_THREADS=OFF build after r302748Hans Wennborg2017-05-111-0/+2
| | | | llvm-svn: 302806
* [Simplify] Remove identical scalar writes.Michael Kruse2017-05-117-1/+428
| | | | | | | | | | | | | | | | | | | | | | After DeLICM, it is possible to have two writes of the same value to the same location in the same statement when it determined that those writes do not conflict (write the same value). Teach -polly-simplify to remove one of the writes. It interferes with the pattern matching of matrix-multiplication kernels and also seem to not be optimized away by LLVM. The algorthm is simple, has O(n^2) behaviour (n = max number of MemoryAccesses in a statement) and only matches the most obvious cases, but seem to be enough to pattern-match Boost ublas gemm. Not handled cases include: - StoreInst instructions (a.k.a. explicit writes), since the value might be loaded or overwritten between the two stores. - PHINode, especially LCSSA, when the PHI value matches with on other's. - Partial writes (in preparation) llvm-svn: 302805
* [X86][AVX] Added zeroall/zeroupper scheduler testsSimon Pilgrim2017-05-111-0/+50
| | | | | | Missing on SandyBridge and Btver2 models llvm-svn: 302804
* Modules: fix modules build.Tim Northover2017-05-111-0/+1
| | | | | | | | A recent commit made GlobalVariable.h depend on intrinsics generation, so (I think) it needs to be in the lower-level module. I'll confirm with others, but this should fix the bots. llvm-svn: 302803
* Mark LWG#2782 as complete. No functionality change; we already do this. Just ↵Marshall Clow2017-05-112-2/+8
| | | | | | added a few more tests. llvm-svn: 302802
* Renumber test line number expectations after r302783.Benjamin Kramer2017-05-111-4/+3
| | | | | | Also remove a confused stable-runtimes requirement. llvm-svn: 302801
* Replace a nested namespace used for overload resolution with a struct. ↵Marshall Clow2017-05-111-2/+2
| | | | | | Richard Smith says that using the namespace results in an ODR violation, but I disagree. Nevertheless, the struct works just as well. llvm-svn: 302800
* Mark LWG#2850 as complete. No functionality change; we had tests that ↵Marshall Clow2017-05-112-2/+3
| | | | | | covered it already. Just added comments to the tests. Thanks to K-ballo for the heads up. llvm-svn: 302799
* Mark LWG#2796 as complete. No functionality change; we had tests that ↵Marshall Clow2017-05-113-2/+10
| | | | | | covered it already. Just added comments to the tests llvm-svn: 302798
* [CodeCompletion] Provide member completions for dependent expressions whoseAlex Lorenz2017-05-114-16/+126
| | | | | | | | | | | | | | type is a TemplateSpecializationType or InjectedClassNameType Fixes PR30847. Partially fixes PR20973 (first position only). PR17614 is still not working, its expression has the dependent builtin type. We'll have to teach the completion engine how to "resolve" dependent expressions to fix it. rdar://29818301 llvm-svn: 302797
* [CodeCompletion] NFC, extract a function that generates memberAlex Lorenz2017-05-111-30/+36
| | | | | | completion results for records llvm-svn: 302796
* Fix two-stage build on windows using DistributionExample cmake cacheNAKAMURA Takumi2017-05-112-6/+20
| | | | | | Thanks to Matthew Larionov <matthewtff@gmail.com> llvm-svn: 302795
* [IR] Allow attributes with global variablesJaved Absar2017-05-1112-9/+254
| | | | | | | | | | | | | This patch extends llvm-ir to allow attributes to be set on global variables. An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html A key part of that proposal was to extend LLVM-IR to carry attributes on global variables. This generic feature could be useful for multiple purposes. In our present context, it would be useful to carry user specified sections for bss/rodata/data. Reviewed by: Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D32009 llvm-svn: 302794
* [GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection.Igor Breger2017-05-111-105/+0
| | | | | | Now it handle by TableGen. llvm-svn: 302793
* [ELF] - Make text section location explicit in early-assign-symbol.s test.George Rimar2017-05-111-2/+2
| | | | | | | | Testcase itself depends on .text section location, which was orphan earlier. Suggested by Rafael Espíndola llvm-svn: 302792
* [X86] Moving X86Local namespace from .cpp to .h file to use it in memory ↵Ayman Musa2017-05-112-124/+123
| | | | | | | | folding TableGen backend. Differential Revision: https://reviews.llvm.org/D32797 llvm-svn: 302791
* [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFCAyal Zaks2017-05-111-80/+101
| | | | | | | | | | | | | | Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790
OpenPOWER on IntegriCloud