summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [PM] Fixup for r231556 where I missed a dependency on intrinsicsChandler Carruth2015-03-071-0/+2
| | | | | | generation. llvm-svn: 231558
* [PM] Create a separate library for high-level pass management code.Chandler Carruth2015-03-078-2/+534
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will provide the analogous replacements for the PassManagerBuilder and other code long term. This code is extracted from the opt tool currently, and I plan to extend it as I build up support for using the new pass manager in Clang and other places. Mailing this out for review in part to let folks comment on the terrible names here. A brief word about why I chose the names I did. The library is called "Passes" to try and make it clear that it is a high-level utility and where *all* of the passes come together and are registered in a common library. I didn't want it to be *limited* to a registry though, the registry is just one component. The class is a "PassBuilder" but this name I'm less happy with. It doesn't build passes in any traditional sense and isn't a Builder-style API at all. The class is a PassRegisterer or PassAdder, but neither of those really make a lot of sense. This class is responsible for constructing passes for registry in an analysis manager or for population of a pass pipeline. If anyone has a better name, I would love to hear it. The other candidate I looked at was PassRegistrar, but that doesn't really fit either. There is no register of all the passes in use, and so I think continuing the "registry" analog outside of the registry of pass *names* and *types* is a mistake. The objects themselves are just objects with the new pass manager. Differential Revision: http://reviews.llvm.org/D8054 llvm-svn: 231556
* [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLESimon Pilgrim2015-03-071-0/+30
| | | | | | | | | | | | This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE. This prevents many cases of spilling scalar data between the gpr + simd registers. At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match). Differential Revision: http://reviews.llvm.org/D8132 llvm-svn: 231554
* Typo.Eric Christopher2015-03-072-2/+2
| | | | llvm-svn: 231547
* Recommit r231324 with a fix to the ARM execution domain codeEric Christopher2015-03-072-16/+19
| | | | | | | | | | | | to disable lane switching if we don't actually have the instruction set we want to switch to. Models the earlier check above the conditional for the pass. The testcase is one that triggered with the assert that's added as part of the fix, use it to avoid adding a new testcase as it highlights the same problem. llvm-svn: 231539
* Do not restrict interleaved unrolling to small loops, depending on the target.Olivier Sallenave2015-03-064-0/+17
| | | | llvm-svn: 231528
* [AArch64][LoadStoreOptimizer] Generate LDP + SXTW instead of LD[U]R + LD[U]RSW.Quentin Colombet2015-03-061-11/+116
| | | | | | | | | | | Teach the load store optimizer how to sign extend a result of a load pair when it helps creating more pairs. The rational is that loads are more expensive than sign extensions, so if we gather some in one instruction this is better! <rdar://problem/20072968> llvm-svn: 231527
* DAGCombiner: Canonicalize select(and/or,x,y) depending on target.Matthias Braun2015-03-061-0/+63
| | | | | | | | | | | | | | | This is based on the following equivalences: select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y) select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y)) Many target cannot perform and/or on the CPU flags and therefore the right side should be choosen to avoid materializign the i1 flags in an integer register. If the target can perform this operation efficiently we normalize to the left form. Differential Revision: http://reviews.llvm.org/D7622 llvm-svn: 231507
* DAGCombiner: Factor out some and/or combines.Matthias Braun2015-03-061-225/+252
| | | | | | | | | | | | | | | This is in preparation for changing visitSELECT to normalize towards select(Cond0, select(Cond1, X, Y), Y); select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of the conditions. The factored function contains all DAGCombine rules which reduce two values combined by an And/Or operation to a single value. This does not include rules involving constants as visitSELECT already handles that case. Differential Revision: http://reviews.llvm.org/D8026 llvm-svn: 231506
* LoopInterchange: Remove empty method.Benjamin Kramer2015-03-061-6/+1
| | | | llvm-svn: 231503
* LoopInterchange: Rephrase instruction moving using ilist's splice and factor ↵Benjamin Kramer2015-03-061-56/+19
| | | | | | | | it into a function + Random cleanups. No functional change. llvm-svn: 231501
* ExecutionDepsFix: Indizes -> Indices.Matthias Braun2015-03-061-10/+10
| | | | | | Translate german to english. llvm-svn: 231500
* Fix typo.Eric Christopher2015-03-061-1/+1
| | | | llvm-svn: 231495
* R600/SI: Remove unused register classTom Stellard2015-03-061-7/+0
| | | | llvm-svn: 231491
* Fold init() helpers into constructors. NFC.Benjamin Kramer2015-03-062-37/+19
| | | | llvm-svn: 231486
* Avoid calls to dumpPassInfo and RegionBase<Tr>::getNameStr() in RGPassManager ifChad Rosier2015-03-061-10/+14
| | | | | | | | | | | | | | | | | | -debug-pass is not specified, as the string is only used when dumping pass information. There is a big cost of determining the name in ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a name. This is the case for the Release build, as names are not preserved by the front-end. RegionPass is mainly used by Polly, resulting in long compile time for one file of a customer application with the Release build (1m24s) vs Release+Asserts build (10s) when Polly is used. With this change, the compile time with the Release build went down to 8s. Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! Phabricator: http://reviews.llvm.org/D8076 llvm-svn: 231485
* [ConstantRange] Teach multiply to be cleverer about signed ranges.James Molloy2015-03-061-1/+27
| | | | | | | | | | | | | Multiplication is not dependent on signedness, so just treating all input ranges as unsigned is not incorrect. However it will cause overly pessimistic ranges (such as full-set) when used with signed negative values. Teach multiply to try to interpret its inputs as both signed and unsigned, and then to take the most specific (smallest population) as its result. llvm-svn: 231483
* [AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalentsBruno Cardoso Lopes2015-03-066-21/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and access through a non_lazy_symbol_pointers section is used instead. -- before _extgotequiv: .long _extfoo _delta: .long _extgotequiv-_delta -- after _delta: .long L_extfoo$non_lazy_ptr-_delta .section __IMPORT,__pointers,non_lazy_symbol_pointers L_extfoo$non_lazy_ptr: .indirect_symbol _extfoo .long 0 llvm-svn: 231475
* [AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalentsBruno Cardoso Lopes2015-03-065-4/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow up r230264 and add ARM64 support for replacing global GOT equivalent symbol accesses by references to the GOT entry for the final symbol instead, example: -- before .globl _foo _foo: .long 42 .globl _gotequivalent _gotequivalent: .quad _foo .globl _delta _delta: .long _gotequivalent-_delta -- after .globl _foo _foo: .long 42 .globl _delta Ltmp3: .long _foo@GOT-Ltmp3 llvm-svn: 231474
* [mips] [IAS] Add missing constraints and improve testing for the .module ↵Toma Tabacu2015-03-063-7/+41
| | | | | | | | | | | | | | | | | | directive. Summary: None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint. Also added testing for all the other implemented directives which are supposed to trigger this constraint. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7140 llvm-svn: 231465
* Change the way in which error case is being handled.Daniel Jasper2015-03-061-2/+4
| | | | | | | | | Specifically this: * Prevents an "unused" warning in non-assert builds. * In that error case return with out removing a child loop instead of looping forever. llvm-svn: 231459
* Add a new pass "Loop Interchange"Karthik Bhat2015-03-064-0/+1204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass interchanges loops to provide a more cache-friendly memory access. For e.g. given a loop like - for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[j][i] = A[j][i]+B[j][i]; is interchanged to - for(int j=0;j<N;j++) for(int i=0;i<N;i++) A[j][i] = A[j][i]+B[j][i]; This pass is currently disabled by default. To give a brief introduction it consists of 3 stages- LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix. LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time. LoopInterchangeTransform : Which does the actual transform. LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks. TODO: 1) Add support for reductions and lcssa phi. 2) Improve profitability model. 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange. 4) Improve compile time regression found in llvm lnt due to this pass. 5) Fix issues in Dependency Analysis module. A special thanks to Hal for reviewing this code. Review: http://reviews.llvm.org/D7499 llvm-svn: 231458
* X86: Form IMGREL relocations for LLVM FunctionsDavid Majnemer2015-03-061-9/+7
| | | | | | | | We supported forming IMGREL relocations from ConstantExprs involving __ImageBase if the minuend was a GlobalVariable. Extend this functionality to all GlobalObjects. llvm-svn: 231456
* Silence C4715 'not all control paths return a value' warnings.Yaron Keren2015-03-061-0/+3
| | | | llvm-svn: 231455
* Support: Improve performance of FileOutputBuffer on WindowsRui Ueyama2015-03-061-0/+7
| | | | | | | | | | We extend an underlying file before mmap'ing it, but it's not needed on Windows. Extending file is slow on Windows, so we should avoid doing that. The difference gets larger as the size of an output file gets larger. It shove off 2 seconds out of 25 seconds when linking chrome.dll with LLD, for example. llvm-svn: 231452
* [objc-arc] Sprinkle some more auto on some iterators.Michael Gottesman2015-03-061-8/+4
| | | | llvm-svn: 231447
* [objc-arc] Move the detection of potential uses or altering of a ref count ↵Michael Gottesman2015-03-063-108/+171
| | | | | | onto PtrState. llvm-svn: 231446
* LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.Michael Zolotukhin2015-03-061-1/+8
| | | | | | | Though such shifts are usually optimized away by combiner, we still can encounter them after a vector shift is legalized. llvm-svn: 231443
* Remember to move a type to the correct set when setting the body.Rafael Espindola2015-03-061-0/+9
| | | | | | | | | We would set the body of a struct type (therefore making it non-opaque) but were forgetting to move it to the non-opaque set. Fixes pr22807. llvm-svn: 231442
* [objc-arc] Move the checking of whether or not we can match onto PtrStates ↵Michael Gottesman2015-03-063-51/+76
| | | | | | | | | | | and out of the main dataflow. These refactored computations check whether or not we are at a stage of the sequence where we can perform a match. This patch moves the computation out of the main dataflow and into {BottomUp,TopDown}PtrState. llvm-svn: 231439
* [objc-arc] Refactor (Re-)initialization of PtrState from dataflow -> ↵Michael Gottesman2015-03-063-47/+64
| | | | | | | | | | {TopDown,BottomUp}PtrState Class. This initialization occurs when we see a new retain or release. Before we performed the actual initialization inline in the dataflow. That is just messy. llvm-svn: 231438
* [objc-arc] Create two subclasses of PtrState in preparation for moving per ↵Michael Gottesman2015-03-062-43/+60
| | | | | | | | | | | ptr state change behavior onto a PtrState class. This will enable the main ObjCARCOpts dataflow to work with higher level concepts such as "can this ptr state be modified by this ref count" and not need to understand the nitty gritty details of how that is determined. This makes the dataflow cleaner. llvm-svn: 231437
* [objc-arc] Extract out MDNodes into a cache structure so the information can ↵Michael Gottesman2015-03-062-22/+33
| | | | | | be passed around. llvm-svn: 231436
* [objc-arc] Remove annotations code.Michael Gottesman2015-03-061-327/+0
| | | | | | | It will always be in the history if it is needed again. Now it is just dead code. llvm-svn: 231435
* Teach ComputeNumSignBits about signed reminder.Nadav Rotem2015-03-061-1/+27
| | | | | | This optimization a continuation of r231140 that reasoned about signed div. llvm-svn: 231433
* Fix build error.Michael Gottesman2015-03-052-21/+29
| | | | llvm-svn: 231430
* [objc-arc] Change some casts and loop iterators to use auto.Michael Gottesman2015-03-051-16/+12
| | | | llvm-svn: 231427
* [objc-arc] Extract out state specific to a ref count from the main objc arc ↵Michael Gottesman2015-03-054-287/+298
| | | | | | sequence dataflow. This will allow me to separate the actual ARC queries from the meat of the dataflow algorithm. llvm-svn: 231426
* [objc-arc] Extract blot map vector into its own file. NFC.Michael Gottesman2015-03-052-160/+151
| | | | llvm-svn: 231425
* [X86] Remove stale comment. NFC.Ahmed Bougacha2015-03-051-1/+0
| | | | | | | | It turns out 256bit V[SZ]EXT nodes are still generated by the new shuffle lowering, so this is here to stay! llvm-svn: 231422
* Instructions: Use delegated constructors to reduce duplicationBenjamin Kramer2015-03-051-153/+32
| | | | | | NFC. llvm-svn: 231411
* [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)Sanjay Patel2015-03-052-21/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reduces code size for all AVX targets and increases speed for some chips. SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the packed float/double flavors. AVX subsequently made the instruction useful by adding a 4-register operand form. So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway. This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell: blendv : 29.73 cycles/iter logic : 43.15 cycles/iter No new test cases with this patch because: 1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel 2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX 3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants. http://llvm.org/bugs/show_bug.cgi?id=22483 Differential Revision: http://reviews.llvm.org/D8063 llvm-svn: 231408
* SelectionDAGBuilder: Merge 3 copies of the limited precision exp2 emission code.Benjamin Kramer2015-03-051-251/+93
| | | | | | NFC intended. llvm-svn: 231406
* Fix uninitialized memory references in WinEHPrepareAndrew Kaylor2015-03-051-1/+3
| | | | llvm-svn: 231405
* SDAG: Merge the meat of two ExpandAtomic implementations.Benjamin Kramer2015-03-053-212/+42
| | | | | | | The copies already diverged, don't let them become any worse. Reduce redundancy in code with a little macro metaprogramming. llvm-svn: 231401
* [AArch64] Teach AsmPrinter about GlobalAddress operands.Ahmed Bougacha2015-03-051-0/+12
| | | | | | | Fixes PR22761, rdar://20024866. Differential Revision: http://reviews.llvm.org/D8042 llvm-svn: 231400
* Use the correct func begin symbol in all places in ppc.Rafael Espindola2015-03-052-9/+10
| | | | | | I missed an occurrence of the old symbol in my previous patch. llvm-svn: 231398
* [ARM] Enable vector extload combine for legal types.Ahmed Bougacha2015-03-052-0/+24
| | | | | | | | | | | | | | | | | | | | | This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 llvm-svn: 231396
* Replace PrintStackTrace(FILE*) with PrintStackTrace(raw_ostream&)Zachary Turner2015-03-052-36/+43
| | | | | | | | | | This will be followed by a change on the clang side to update the only user of this function with the new version. Differential Revision: http://reviews.llvm.org/D8074 Reviewed By: Reid Kleckner llvm-svn: 231392
* Remove accidental errs() call in VerifierReid Kleckner2015-03-051-1/+0
| | | | llvm-svn: 231391
OpenPOWER on IntegriCloud