summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* PGO branch weight: fix PR18752.Manman Ren2014-02-071-5/+4
| | | | | | | Fix a bug triggered in IfConverterTriangle when CvtBB has multiple predecessors by getting the weights before removing a successor. llvm-svn: 200958
* X86: Resolve a long standing FIXME and properly isel pextr[bw].Jim Grosbach2014-02-073-18/+4
| | | | | | | | | | | | | | | | | Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use them to match the relevant pextr store instructions. The test widen_load-2.ll requires a slight change because with the stores gone, the remaining instructions are scheduled in a different order. Add test cases for SSE4 and AVX variants. Resolves rdar://13414672. Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 200957
* [CodeGenPrepare] Move away sign extensions that get in the way of addressingQuentin Colombet2014-02-061-14/+808
| | | | | | | | | | | | | | | | | | | | | | | | | | mode. Basically the idea is to transform code like this: %idx = add nsw i32 %a, 1 %sextidx = sext i32 %idx to i64 %gep = gep i8* %myArray, i64 %sextidx load i8* %gep Into: %sexta = sext i32 %a to i64 %idx = add nsw i64 %sexta, 1 %gep = gep i8* %myArray, i64 %idx load i8* %gep That way the computation can be folded into the addressing mode. This transformation is done as part of the addressing mode matcher. If the matching fails (not profitable, addressing mode not legal, etc.), the matcher will revert the related promotions. <rdar://problem/15519855> llvm-svn: 200947
* Track register pressure a bit more carefully (weird corner case).Andrew Trick2014-02-061-1/+8
| | | | | | | | | | | | | | | | This solves a problem where a def machine operand has no uses but has not been marked dead. In this case, the initial RP analysis was being extra precise and determining from LiveIntervals the the register was actually dead. This caused us to omit the register from the RP tracker's block live out. That's all good, but the per-instruction summary still accounted for it as a valid def. This could cause an assertion in the tracker later when we underflow pressure. This is from a bug report on an out-of-tree target. It is not reproducible on well-behaved targets. I'm just making an obvious fix without unit test. llvm-svn: 200941
* Revert r200095 and r200152. It turns out when compiling with -arch armv7 ↵Evan Cheng2014-02-061-3/+4
| | | | | | -mcpu=cortex-m3, the triple would still set iOS as the OS so the hack is still needed. rdar://15984891 llvm-svn: 200937
* R600/SI: Add a MUBUF store pattern for Reg+Imm offsetsTom Stellard2014-02-062-1/+11
| | | | llvm-svn: 200935
* R600/SI: Add a MUBUF store pattern for Imm offsetsTom Stellard2014-02-061-0/+5
| | | | llvm-svn: 200934
* R600/SI: Add a MUBUF load pattern for Reg+Imm offsetsTom Stellard2014-02-061-0/+5
| | | | llvm-svn: 200933
* R600/SI: Use immediates offsets for SMRD instructions whenever possibleTom Stellard2014-02-062-10/+13
| | | | | | | | There was a problem with the old pattern, so we were copying some larger immediates into registers when we could have been encoding them in the instruction. llvm-svn: 200932
* Remove const_cast for STI when parsing inline asmDavid Peixotto2014-02-063-14/+16
| | | | | | | | | | | | | | | | | | | | | In a previous commit (r199818) we added a const_cast to an existing subtarget info instead of creating a new one so that we could reuse it when creating the TargetAsmParser for parsing inline assembly. This cast was necessary because we needed to reuse the existing STI to avoid generating incorrect code when the inline asm contained mode-switching directives (e.g. .code 16). The root cause of the failure was that there was an implicit sharing of the STI between the parser and the MCCodeEmitter. To fix a different but related issue, we now explicitly pass the STI to the MCCodeEmitter (see commits r200345-r200351). The const_cast is no longer necessary and we can now create a fresh STI for the inline asm parser to use. Differential Revision: http://llvm-reviews.chandlerc.com/D2709 llvm-svn: 200929
* X86: add costs for 64-bit vector ext/trunc & rebalanceTim Northover2014-02-061-15/+58
| | | | | | | | | | | | | | | | | | | | The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 llvm-svn: 200928
* Add a -suppress-warnings option to bitcode linking.Eli Bendersky2014-02-061-9/+22
| | | | llvm-svn: 200927
* Yet another patch to reduce compile time for small programs:Puyan Lotfi2014-02-061-4/+28
| | | | | | | | | | | | | | | | | | | | | The aim in this patch is to reduce work that VirtRegRewriter needs to do when telling MachineRegisterInfo which physregs are in use. Up until now VirtRegRewriter::rewrite has been doing rewriting and populating def info and then proceeding to set whether a physreg is used based this info for every physreg that the target provides. This can be expensive when a target has an unusually high number of supported physregs, and is a noticeable chunk of compile time for small programs on such targets. So to reduce compile time, this patch simply adds the use of a SparseSet to the rewrite function that is used to flag each physreg that is encountered in a MachineFunction. Afterward, rather than iterating over the set of all physregs for a given target to set the physregs used in MachineRegisterInfo, the new way is to iterate over the set of physregs that were actually encountered and set in the SparseSet. This improves compile time because the existing rewrite function was iterating over all MachineOperands already, and because the iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data. llvm-svn: 200919
* X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodesTim Northover2014-02-064-49/+7
| | | | | | | | | | | | | | | | | | | | | | I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918
* The following patch' purpose is to reduce compile time for compilation of smallPuyan Lotfi2014-02-062-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | programs on targets with large register files. The root of the compile time overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast std::vector uses the faster __platform_bzero to zero out primitive buffers when assign is called, while SmallVector uses an iterator. The fix for this was simply to replace the SmallVector with a dynamically allocated buffer and to initialize or reinitialize the buffer based on the total registers that the target architecture requires. The changes support cases where a pass manager may be reused for different targets, and note that the PhysRegEntries is allocated using calloc mainly for good for, and also to quite tools like Valgrind (see comments for more info on this). There is an rdar to track the fact that SmallVector doesn't have platform specific speedup optimizations inside of it for things like this, and I'll create a bugzilla entry at some point soon as well. TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned char>::assign(N, 0) with a call to calloc for N bytes which is much faster because SmallVector's assign uses iterators. llvm-svn: 200917
* This small change reduces compile time for small programs on targets that havePuyan Lotfi2014-02-061-1/+3
| | | | | | | | | | | large register files. The omission of Queries.clear() is perfectly safe because LiveIntervalUnion::Query doesn't contain any data that needs freeing and because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr holding Queries every time it is run, so there's no need to zero out the queries either. Not having to do this for very large numbers of physregs is a noticeable constant cost reduction in compilation of small programs. llvm-svn: 200913
* A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton!Nick Lewycky2014-02-061-1/+11
| | | | llvm-svn: 200907
* [PM] Fix horrible typos that somehow didn't cause a failure in a C++11Chandler Carruth2014-02-061-7/+9
| | | | | | | | | | | | build but spectacularly changed behavior of the C++98 build. =] This shows my one problem with not having unittests -- basic API expectations aren't well exercised by the integration tests because they *happen* to not come up, even though they might later. I'll probably add a basic unittest to complement the integration testing later, but I wanted to revive the bots. llvm-svn: 200905
* [PM] Add a new "lazy" call graph analysis pass for the new pass manager.Chandler Carruth2014-02-062-0/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it *really* hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. llvm-svn: 200903
* [DAG] Don't pull the binary operation though the shift if the operands have ↵Juergen Ributzka2014-02-061-2/+9
| | | | | | | | | | | | opaque constants. During DAGCombine visitShiftByConstant assumes that certain binary operations with only constant operands can always be folded successfully. This is no longer true when the constant is opaque. This commit fixes visitShiftByConstant by not performing the optimization for opaque constants. Otherwise we would end up in an infinite DAGCombine loop. llvm-svn: 200900
* Set default of inlinecold-threshold to 225.Manman Ren2014-02-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | 225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898
* Update the X86 assembler for .intel_syntax to acceptKevin Enderby2014-02-061-6/+64
| | | | | | | | the << and >> bitwise operators. rdar://15975725 llvm-svn: 200896
* don't set HasReliableSymbolDifference for ELF.Rafael Espindola2014-02-061-3/+1
| | | | | | | It is only used in MachObjectWriter.cpp. Another leftover from early days of ELF in MC. llvm-svn: 200895
* doesSectionRequireSymbols is meaningless on ELF, remove.Rafael Espindola2014-02-062-8/+0
| | | | | | | | | | | | | | | | | | | | This is a nop. doesSectionRequireSymbols is only used from isSymbolLinkerVisible. isSymbolLinkerVisible only use from ELF was in if (!Asm.isSymbolLinkerVisible(Symbol) && !Symbol.isUndefined()) return false; if (Symbol.isTemporary()) return false; If the symbol is a temporary this code returns false and it is irrelevant if we take the first if or not. If the symbol is not a temporary, Asm.isSymbolLinkerVisible returns true without ever calling doesSectionRequireSymbols. This was an horrible leftover from when support for ELF was first added. llvm-svn: 200894
* Disable most IR-level transform passes on functions marked 'optnone'.Paul Robinson2014-02-0631-0/+125
| | | | | | | | | Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892
* Just returning false is the default.Rafael Espindola2014-02-064-20/+0
| | | | llvm-svn: 200890
* Pass address space to allowsUnalignedMemoryAccessesMatt Arsenault2014-02-053-11/+25
| | | | llvm-svn: 200888
* Add address space argument to allowsUnalignedMemoryAccess.Matt Arsenault2014-02-0515-15/+32
| | | | | | | On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887
* Inliner uses a smaller inline threshold for callees with cold attribute.Manman Ren2014-02-051-0/+11
| | | | | | | | Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886
* [RegAlloc] Add a last chance recoloring mechanism when everything else failed toQuentin Colombet2014-02-051-8/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | find a register. The idea is to choose a color for the variable that cannot be allocated and recolor its interferences around. Unlike the current register allocation scheme, it is allowed to change the color of an already assigned (but maybe not splittable or spillable) live interval while propagating this change to its neighbors. In other word, there are two things that may help finding an available color: - Already assigned variables (RS_Done) can be recolored to different color. - The recoloring allows to catch solutions that needs to touch more that just the neighbors of the current allocated variable. E.g., vA can use {R1, R2 } vB can use { R2, R3} vC can use {R1 } Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and they all interfere. vA is assigned R1 vB is assigned R2 vC tries to evict vA but vA is already done. => Regular register allocation heuristic fails. Last chance recoloring kicks in: vC does as if vA was evicted => vC uses R1. vC is marked as fixed. vA needs to find a color. None are available. vA cannot evict vC: vC is a fixed virtual register now. vA does as if vB was evicted => vA uses R2. vB needs to find a color. R3 is available. Recoloring => vC = R1, vA = R2, vB = R3. <rdar://problem/15947839> llvm-svn: 200883
* [PM] Don't require analysis results to be const in the new pass manager.Chandler Carruth2014-02-051-4/+4
| | | | | | | | | | I think this was just over-eagerness on my part. The analysis results need to often be non-const because they need to (in some cases at least) be updated by the transformation pass in order to remain correct. It also makes lazy analyses (a common case) needlessly annoying to write in order to make their entire state mutable. llvm-svn: 200881
* Remove support for not using .loc directives.Rafael Espindola2014-02-0511-64/+47
| | | | | | Clang itself was not using this. The only way to access it was via llc. llvm-svn: 200862
* Revert "Fix an invalid check for duplicate option categories."Rafael Espindola2014-02-051-14/+3
| | | | | | | | This reverts commit r200853. It was causing clang/Analysis/checker-plugins.c to crash. llvm-svn: 200858
* [mips] Add NaCl target and forbid indexed loads and stores for itPetar Jovanovic2014-02-054-4/+11
| | | | | | | | | | | This patch adds NaCl target for Mips. It also forbids indexed loads and stores if the target is NaCl. Patch by Sasa Stankovic. Differential Revision: http://llvm-reviews.chandlerc.com/D2690 llvm-svn: 200855
* Fix an invalid check for duplicate option categories.Alexander Kornienko2014-02-051-3/+14
| | | | | | | | | | | | | | | | | | | | Summary: The check performed in the comparator is invalid, as some STL implementations enforce strict weak ordering by calling the comparator with the same value. This check was also in a wrong place: the assertion would only fire when -help was used. The new check is performed each time the category is registered (we are not going to have thousands of them, so it's fine to do it in O(N^2)). Reviewers: jordan_rose Reviewed By: jordan_rose CC: cfe-commits, alexmc Differential Revision: http://llvm-reviews.chandlerc.com/D2699 llvm-svn: 200853
* AVX-512: optimized icmp -> sext -> icmp patternElena Demikhovsky2014-02-051-21/+67
| | | | llvm-svn: 200849
* Test commitAlon Mishne2014-02-051-1/+1
| | | | llvm-svn: 200843
* ARM: Resolve thumb_bl fixup in same MCFragment.Logan Chien2014-02-051-1/+8
| | | | | | | | | | | | | | In Thumb1 mode, bl instruction might be selected for branches between basic blocks in the function if the offset is greater than 2KB. However, this might cause SEGV because the destination symbol is not marked as thumb function and the execution mode will be reset to ARM mode. Since we are sure that these symbols are in the same data fragment, we can simply resolve these local symbols, and don't emit any relocation information for this bl instruction. llvm-svn: 200842
* AVX-512: fixed a bug in EVEX encoding (the bug appeared after r200624)Elena Demikhovsky2014-02-051-2/+4
| | | | llvm-svn: 200837
* R600/SI: Add pattern for zero-extending i1 to i32Michel Danzer2014-02-051-0/+5
| | | | | | | | | Fixes opencl-example if_* tests with radeonsi. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200830
* ARM: Enable use of relocation type tlsldo in debug info for tls data.Kai Nacke2014-02-052-0/+9
| | | | | | | This fixes PR18554. Reviewers: Renato Golin, Keith Walker llvm-svn: 200826
* Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns ↵Craig Topper2014-02-053-74/+35
| | | | | | instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense. llvm-svn: 200824
* AVX-512: Added intrinsic for cvtph2ps.Elena Demikhovsky2014-02-053-23/+65
| | | | | | | Added VPTESTNM instruction. Added a pattern to vselect (lit tests will follow). llvm-svn: 200823
* Add CheckChildInteger to ISelMatcher operations. Removes nearly 2000 bytes ↵Craig Topper2014-02-051-2/+23
| | | | | | from X86 matcher table. llvm-svn: 200821
* Use the information provided by getFlags to unify some code in llvm-nm.Rafael Espindola2014-02-051-3/+7
| | | | | | | | | | It is not clear how much we should try to expose in getFlags. For example, should there be a SF_Object and a SF_Text? But for information that is already being exposed, we may as well use it in llvm-nm. llvm-svn: 200820
* Fix configure to find arc4random via header files.Todd Fiala2014-02-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISSUE: On Ubuntu 12.04 LTS, arc4random is provided by libbsd.so, which is a transitive dependency of libedit. If a system had libedit on it that was implemented in terms of libbsd.so, then the arc4random test, previously implemented as a linker test, would succeed with -ledit. However, on Ubuntu this would also require a #include <bsd/stdlib.h>. This caused a build breakage on configure-based Ubuntu 12.04 with libedit installed. FIX: This fix changes configure to test for arc4random by searching for it in the standard header files. On Ubuntu 12.04, this test now properly fails to find arc4random as it is not defined in the default header locations. It also tweaks the #define names to match the output of the header check command, which is slightly different than the linker function check #defines. I tested the following scenarios: (1) Ubuntu 12.04 without the libedit package [did not find arc4random, as expected] (2) Ubuntu 12.04 with libedit package [properly did not find arc4random, as expected] (3) Ubuntu 12.04 with most recent libedit, custom built, and not dependent on libbsd.so [properly did not find arc4random, as expected]. (4) FreeBSD 10.0B1 [properly found arc4random, as expected] llvm-svn: 200819
* Fix wording of warning message about invalid debug info.Manman Ren2014-02-041-2/+2
| | | | llvm-svn: 200806
* Remove unused SF_ThreadLocal.Rafael Espindola2014-02-042-2/+1
| | | | llvm-svn: 200800
* llvm-cov: Fix include order in GCOV.cppJustin Bogner2014-02-041-3/+3
| | | | llvm-svn: 200796
* SimplifyLibCalls: Push TLI through the exp2->ldexp transform.Benjamin Kramer2014-02-042-29/+34
| | | | | | For the odd case of platforms with exp2 available but not ldexp. llvm-svn: 200795
OpenPOWER on IntegriCloud