summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"Duncan P. N. Exon Smith2014-04-184-5/+947
| | | | | | | | | | | | | | This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622
* ARM64: disable generation of .loh directives outside MachO.Tim Northover2014-04-181-1/+2
| | | | | | Part of PR19455. llvm-svn: 206611
* ARM64: don't emit .subsections_via_symbols on ELF.Tim Northover2014-04-181-7/+9
| | | | | | Part of PR19455. llvm-svn: 206610
* ARM64: add extra NEG pattern.Tim Northover2014-04-181-0/+2
| | | | llvm-svn: 206609
* AArch64/ARM64: add non-scalar lowering for more FCVT operations.Tim Northover2014-04-182-2/+12
| | | | llvm-svn: 206591
* AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.Tim Northover2014-04-181-5/+7
| | | | | | | We couldn't cope if the first mask element was UNDEF before, which isn't ideal. llvm-svn: 206588
* [msan] Add -msan-instrumentation-with-call-threshold.Evgeniy Stepanov2014-04-181-42/+110
| | | | | | | | | This flag replaces inline instrumentation for checks and origin stores with calls into MSan runtime library. This is a workaround for PR17409. Disabled by default. llvm-svn: 206585
* [LCG] Remove all of the complexity stemming from supporting copying.Chandler Carruth2014-04-181-42/+21
| | | | | | | | | | | | | | Reality is that we're never going to copy one of these. Supporting this was becoming a nightmare because nothing even causes it to compile most of the time. Lots of subtle errors built up that wouldn't have been caught by any "normal" testing. Also, make the move assignment actually work rather than the bogus swap implementation that would just infloop if used. As part of that, factor out the graph pointer updates into a helper to share between move construction and move assignment. llvm-svn: 206583
* [LCG] Add support for building persistent and connected SCCs to theChandler Carruth2014-04-181-4/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LazyCallGraph. This is the start of the whole point of this different abstraction, but it is just the initial bits. Here is a run-down of what's going on here. I'm planning to incorporate some (or all) of this into comments going forward, hopefully with better editing and wording. =] The crux of the problem with the traditional way of building SCCs is that they are ephemeral. The new pass manager however really needs the ability to associate analysis passes and results of analysis passes with SCCs in order to expose these analysis passes to the SCC passes. Making this work is kind-of the whole point of the new pass manager. =] So, when we're building SCCs for the call graph, we actually want to build persistent nodes that stick around and can be reasoned about later. We'd also like the ability to walk the SCC graph in more complex ways than just the traditional postorder traversal of the current CGSCC walk. That means that in addition to being persistent, the SCCs need to be connected into a useful graph structure. However, we still want the SCCs to be formed lazily where possible. These constraints are quite hard to satisfy with the SCC iterator. Also, using that would bypass our ability to actually add data to the nodes of the call graph to facilite implementing the Tarjan walk. So I've re-implemented things in a more direct and embedded way. This immediately makes it easy to get the persistence and connectivity correct, and it also allows leveraging the existing nodes to simplify the algorithm. I've worked somewhat to make this implementation more closely follow the traditional paper's nomenclature and strategy, although it is still a bit obtuse because it isn't recursive, using an explicit stack and a tail call instead, and it is interruptable, resuming each time we need another SCC. The other tricky bit here, and what actually took almost all the time and trials and errors I spent building this, is exactly *what* graph structure to build for the SCCs. The naive thing to build is the call graph in its newly acyclic form. I wrote about 4 versions of this which did precisely this. Inevitably, when I experimented with them across various use cases, they became incredibly awkward. It was all implementable, but it felt like a complete wrong fit. Square peg, round hole. There were two overriding aspects that pushed me in a different direction: 1) We want to discover the SCC graph in a postorder fashion. That means the root node will be the *last* node we find. Using the call-SCC DAG as the graph structure of the SCCs results in an orphaned graph until we discover a root. 2) We will eventually want to walk the SCC graph in parallel, exploring distinct sub-graphs independently, and synchronizing at merge points. This again is not helped by the call-SCC DAG structure. The structure which, quite surprisingly, ended up being completely natural to use is the *inverse* of the call-SCC DAG. We add the leaf SCCs to the graph as "roots", and have edges to the caller SCCs. Once I switched to building this structure, everything just fell into place elegantly. Aside from general cleanups (there are FIXMEs and too few comments overall) that are still needed, the other missing piece of this is support for iterating across levels of the SCC graph. These will become useful for implementing #2, but they aren't an immediate priority. Once SCCs are in good shape, I'll be working on adding mutation support for incremental updates and adding the pass manager that this analysis enables. llvm-svn: 206581
* X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.Benjamin Kramer2014-04-181-0/+6
| | | | | | | vcvtph2ps only reads the lower 64 bits of the address passed to the intrinsic. llvm-svn: 206579
* Revert r206565 (and r206566 which updated tests).Chandler Carruth2014-04-181-11/+4
| | | | | | | | | | | | | | This commit was attributed to a different person from the person who posted the patch to the list, and the person who posted it the list claimed when they did that they were not the author, but that the author was yet a third person. I don't know what is going on here, but reverting until the attribution is clear and the author has explicitly contributed the patch. Also, the review hasn't really involved any of the MC maintainers and that seems questionable too. llvm-svn: 206576
* AArch64/ARM64: spot a greater variety of concat_vector operations.Tim Northover2014-04-181-14/+72
| | | | | | | | | | Code mostly copied from AArch64, just tidied up a trifle and plumbed into the ARM64 way of doing things. This also enables the AArch64 tests which inspired the previous untested commits. llvm-svn: 206574
* ARM64: implement cunning optimisation from AArch64Tim Northover2014-04-181-0/+53
| | | | | | | | A vector extract followed by a dup can become a single instruction even if the types don't match. AArch64 handled this in ISelLowering, but a few reasonably simple patterns can take care of it in TableGen, so that's where I've put it. llvm-svn: 206573
* ARM64: spot a vector_shuffle that maps to INS and expand.Tim Northover2014-04-181-0/+64
| | | | | | | Tests will be coming very shortly when all the optimisations needed to support AArch64's neon-copy.ll file are committed. llvm-svn: 206572
* ARM64: nick some AArch64 patterns for extract/insert -> INS.Tim Northover2014-04-181-0/+37
| | | | | | | Tests will be committed shortly when all optimisations needed to support AArch64's neon-copy.ll file are supported. llvm-svn: 206571
* AArch64/ARM64: emit all vector FP comparisons as such.Tim Northover2014-04-181-5/+42
| | | | | | | | | | | | ARM64 was scalarizing some vector comparisons which don't quite map to AArch64's compare and mask instructions. AArch64's approach of sacrificing a little efficiency to emulate them with the limited set available was better, so I ported it across. More "inspired by" than copy/paste since the backend's internal expectations were a bit different, but the tests were invaluable. llvm-svn: 206570
* AArch64/ARM64: port BSL logic from AArch64 & enable test.Tim Northover2014-04-183-0/+75
| | | | | | | | | | | I enhanced it a little in the process. The decision shouldn't really be beased on whether a BUILD_VECTOR is a splat: any set of constants will do the job provided they're related in the correct way. Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's best to check for all 4 possibilities rather than assuming it'll be the RHS. llvm-svn: 206569
* AArch64/ARM64: copy byval implementation from AArch64.Tim Northover2014-04-182-16/+49
| | | | | | | | It's not actually used to handle C or C++ ABI rules on ARM64, but could well be emitted by other language front-ends, so it's as well to have a sensible implementation. llvm-svn: 206568
* Patch by Ray Donnelly.Yaron Keren2014-04-181-4/+11
| | | | | | Emit WIN64 SEH registers by name instead of just number. llvm-svn: 206565
* [asan] one more workaround for PR17409: don't do BB-level coverage ↵Kostya Serebryany2014-04-181-1/+6
| | | | | | instrumentation if there are more than N (=1500) basic blocks. This makes ASanCoverage work on libjpeg_turbo/jchuff.c used by Chrome, which has 1824 BBs llvm-svn: 206564
* This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.Jiangning Liu2014-04-181-0/+1
| | | | | | | | A new test case is also added for ARM64. Patched by Z.Zheng llvm-svn: 206563
* R600: Minor cleanups.Matt Arsenault2014-04-183-69/+64
| | | | | | Fix indentation, better line wrapping, unused includes. llvm-svn: 206562
* [ExecutionEngine] Allow JIT clients to enable/disable module verification.Lang Hames2014-04-183-14/+25
| | | | | | | | | | | | Previously module verification was always enabled, with no way to turn it off. As of this commit, module verification is on by default in Debug builds, and off by default in release builds. The default behaviour can be overridden by calling setVerifyModules(bool) on the JIT instance (this works for both the old JIT, and MCJIT). <rdar://problem/16150008> llvm-svn: 206561
* This is one of the optimizations ported from ARM64 to AArch64 to address the ↵Jiangning Liu2014-04-181-0/+2
| | | | | | | | performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64. Patched by Z.Zheng llvm-svn: 206559
* R600/SI: Try to use scalar BFE.Matt Arsenault2014-04-182-5/+66
| | | | | | | | Use scalar BFE with constant shift and offset when possible. This is complicated by the fact that the scalar version packs the two operands of the vector version into one. llvm-svn: 206558
* This commit enables unaligned memory accesses of vector types on AArch64 ↵Jiangning Liu2014-04-184-0/+91
| | | | | | | | back end. This should boost vectorized code performance. Patched by Z. Zheng llvm-svn: 206557
* Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"Duncan P. N. Exon Smith2014-04-184-947/+5
| | | | | | | | | | | This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556
* blockfreq: Really fix r206548 (and r206549)Duncan P. N. Exon Smith2014-04-181-32/+0
| | | | | | Turns out this code is dead. llvm-svn: 206554
* blockfreq: Fixing MSVC after r206548?Duncan P. N. Exon Smith2014-04-181-2/+2
| | | | llvm-svn: 206549
* blockfreq: Rewrite BlockFrequencyInfoImplDuncan P. N. Exon Smith2014-04-184-5/+979
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is *not* incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548
* R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16Matt Arsenault2014-04-184-16/+63
| | | | llvm-svn: 206547
* PMBuilder: Expose an option to disable tail callsDuncan P. N. Exon Smith2014-04-181-1/+3
| | | | | | | | Adds API to allow frontends to disable tail calls in PassManagerBuilder. <rdar://problem/16050591> llvm-svn: 206542
* R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIRTom Stellard2014-04-181-1/+1
| | | | llvm-svn: 206541
* [ARM64,C++11] Range'ify another loop.Jim Grosbach2014-04-171-9/+7
| | | | llvm-svn: 206539
* Fix bug 19437 - Only add discriminators for DWARF 4 and above.Diego Novillo2014-04-173-11/+16
| | | | | | | | | | | | | | Summary: This prevents the discriminator generation pass from triggering if the DWARF version being used in the module is prior to 4. Reviewers: echristo, dblaikie CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3413 llvm-svn: 206507
* remove some dead codeNuno Lopes2014-04-176-49/+0
| | | | | | | | | | | | | | | lib/Analysis/IPA/InlineCost.cpp | 18 ------------------ lib/Analysis/RegionPass.cpp | 1 - lib/Analysis/TypeBasedAliasAnalysis.cpp | 1 - lib/Transforms/Scalar/LoopUnswitch.cpp | 21 --------------------- lib/Transforms/Utils/LCSSA.cpp | 2 -- lib/Transforms/Utils/LoopSimplify.cpp | 6 ------ utils/TableGen/AsmWriterEmitter.cpp | 13 ------------- utils/TableGen/DFAPacketizerEmitter.cpp | 7 ------- utils/TableGen/IntrinsicEmitter.cpp | 2 -- 9 files changed, 71 deletions(-) llvm-svn: 206506
* Start pushing changes for Mips Fast-IselReed Kotler2014-04-174-0/+53
| | | | llvm-svn: 206505
* R600: Add comment clariying use of sext for result of MUL_U24Tom Stellard2014-04-171-0/+2
| | | | llvm-svn: 206501
* R600/SI: Stop using i128 as the resource descriptor typeTom Stellard2014-04-176-66/+44
| | | | | | | | | Having i128 as a legal type complicates the legalization phase. v4i32 is already a legal type, so we will use that instead. This fixes several piglit tests. llvm-svn: 206500
* R600/SI: Change default register class for i32 to SReg_32Tom Stellard2014-04-171-1/+1
| | | | | | SIFixSGPRCopies is smart enough to handle this now. llvm-svn: 206499
* R600/SI: Teach SIInstrInfo::moveToVALU() how to handle PHI instructionsTom Stellard2014-04-171-3/+15
| | | | llvm-svn: 206498
* R600/SI: Legalize operands after changing dst reg in FixSGPRCopiesTom Stellard2014-04-171-2/+4
| | | | | | Otherwise we may not legalize some illegal REG_SEQUENCE instructions. llvm-svn: 206497
* Improve ARM64 vector creationLouis Gerbarg2014-04-172-2/+5
| | | | | | | | | | | This patch improves the performance of vector creation in caseiswhere where several of the lanes in the vector are a constant floating point value. It also includes new patterns to fold together some of the instructions when the value is 0.0f. Test cases included. rdar://16349427 llvm-svn: 206496
* ARM64: [su]xtw use W regs as inputs, not X regs.Jim Grosbach2014-04-171-4/+8
| | | | | | | | | Update the SXT[BHW]/UXTW instruction aliases and the shifted reg addressing mode handling. PR19455 and rdar://16650642 llvm-svn: 206495
* ManagedStatic is never built with a null constructor, remove support for it.David Blaikie2014-04-171-2/+3
| | | | llvm-svn: 206492
* ARM64: switch to IR-based atomic operations.Tim Northover2014-04-174-812/+91
| | | | | | | | Goodbye code! (Game: spot the bug fixed by the change). llvm-svn: 206490
* ARM64: add acquire/release versions of the existing atomic intrinsics.Tim Northover2014-04-174-4/+112
| | | | | | | These will be needed to support IR-level lowering of atomic operations. llvm-svn: 206489
* Reverse 206485.Gerolf Hoflehner2014-04-171-8/+2
| | | | | | | | | After some discussions the preferred semantics of the always_inline attribute is inline always when the compiler can determine that it it safe to do so. llvm-svn: 206487
* [stack protector] Make the StackProtector pass respect ssp-buffer-size.Josh Magee2014-04-171-3/+3
| | | | | | | | | | | Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size" attribute after all uses of SSPBufferSize. The effect was that the default SSPBufferSize was always used during analysis. I moved the check for the attribute before the analysis; now --param ssp-buffer-size= works correctly again. Differential Revision: http://reviews.llvm.org/D3349 llvm-svn: 206486
* Atomics: promote ARM's IR-based atomics pass to CodeGen.Tim Northover2014-04-178-120/+137
| | | | | | | | | | | | Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485
OpenPOWER on IntegriCloud