summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Now that Reassociate's LinearizeExprTree can look through arbitrary expressionDuncan Sands2012-06-123-25/+247
| | | | | | | | | | | | | | | | | topologies, it is quite possible for a leaf node to have huge multiplicity, for example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x raised to a vast power (the multiplicity, or weight, of x). This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value. It turns out that the weight for a value never needs more bits to represent than the value itself, so it is enough to represent weights as APInts of the same bitwidth and do the right overflow-avoiding dance steps when computing weights. As a side-effect it reduces the number of multiplies needed in some cases of large powers. While there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree static, pushing the rank computation out into users. This is progress towards fixing PR13021. llvm-svn: 158358
* Reapply r158337, this time properly protect Darwin/PPC host CPU use with ↵Hal Finkel2012-06-122-135/+134
| | | | | | | | | | | | | __ppc__. Original commit message: Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158349
* Satisfy C++ aliasing rules, per suggestion by Chandler.Argyrios Kyrtzidis2012-06-122-2/+2
| | | | llvm-svn: 158346
* Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into ↵Jakob Stoklund Olesen2012-06-122-132/+133
| | | | | | | | | sys::getHostCPUName()." This commit broke most of the PowerPC unit tests when running on Intel/Apple. llvm-svn: 158345
* For llvm::sys::ThreadLocalImpl instead of malloc'ing the platform-specificArgyrios Kyrtzidis2012-06-122-13/+11
| | | | | | | | | | | thread local data, embed them in the class using a uint64_t and make sure we get compiler errors if there's a platform where this is not big enough. This makes ThreadLocal more safe for using it in conjunction with CrashRecoveryContext. Related to crash in rdar://11434201. llvm-svn: 158342
* misched: When querying RegisterPressureTracker, always save current and max ↵Andrew Trick2012-06-111-2/+8
| | | | | | pressure. llvm-svn: 158340
* misched: regpressure getMaxPressureDelta, revert accidental checkin.Andrew Trick2012-06-111-8/+2
| | | | llvm-svn: 158339
* Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().Hal Finkel2012-06-112-133/+132
| | | | | | | | Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158337
* Enable MFOCRF generation on the PPC A2 core.Hal Finkel2012-06-111-2/+2
| | | | llvm-svn: 158324
* Rename the PPC target feature gpul to mfocrf.Hal Finkel2012-06-115-13/+13
| | | | | | | | | | | The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323
* Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection.Hal Finkel2012-06-111-0/+1
| | | | llvm-svn: 158322
* Emit the two-operand form of the PPC mfcr instruction as mfocrf.Hal Finkel2012-06-111-1/+1
| | | | | | This is necessary on Linux and supported on Darwin, see PR2604. llvm-svn: 158315
* Add local CPU detection for Linux PPC.Hal Finkel2012-06-111-1/+95
| | | | | | This functionality mirrors that available on PPC/Darwin. llvm-svn: 158314
* Add POWER6 and POWER7 CPU types to the PPC backend.Hal Finkel2012-06-113-0/+14
| | | | | | No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313
* Write llvm-tblgen backends as functions instead of sub-classes.Jakob Stoklund Olesen2012-06-111-1/+4
| | | | | | | | | The TableGenBackend base class doesn't do much, and will be removed completely soon. Patch by Sean Silva! llvm-svn: 158311
* Re-enable the CMN instruction.Bill Wendling2012-06-115-69/+144
| | | | | | | | | We turned off the CMN instruction because it had semantics which we weren't getting correct. If we are comparing with an immediate, then it's okay to use the CMN instruction. <rdar://problem/7569620> llvm-svn: 158302
* InstCombine: factor code better.Benjamin Kramer2012-06-111-14/+7
| | | | | | No functionality change. llvm-svn: 158301
* InstCombine: Turn (zext A) == (B & (1<<X)-1) into A == (trunc B), narrowing ↵Benjamin Kramer2012-06-101-1/+23
| | | | | | | | | | | | | | | | | | | | the compare. This saves a cast, and zext is more expensive on platforms with subreg support than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750. On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the same performance now when not inlining either function. stupid_memchr: 323.0us bsd_memchr: 321.0us memchr: 479.0us where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time, I haven't fully understood the issue yet, something is grossly mangling the loop after inlining. llvm-svn: 158297
* Enable ILP scheduling for all nodes by default on PPC.Hal Finkel2012-06-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296
* Add AutoUpgrade support for the SSE4 ptest intrinsics.Nadav Rotem2012-06-101-6/+59
| | | | | | Patch by Michael Kuperstein. llvm-svn: 158295
* Use critical anti-dep. breaking on all PPC targets, but also add other ↵Hal Finkel2012-06-101-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | register classes. Using 'all' instead of 'critical' would be better because it would make it easier to satisfy the bundling constraints, but, as noted in the FIXME, that is currently not possible with the crs. This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups: SingleSource/Benchmarks/Shootout-C++/moments - 40% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26% SingleSource/Benchmarks/McGill/misr - 23% MultiSource/Applications/JM/ldecod/ldecod - 22% Largest slowdowns: SingleSource/Benchmarks/Shootout-C++/matrix - -29% SingleSource/Benchmarks/Shootout-C++/ary3 - -22% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18% SingleSource/Benchmarks/Shootout-C++/ary - -17% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15% llvm-svn: 158294
* Add intrinsics for immediate form of XOP vprot instructions. Use i128mem ↵Craig Topper2012-06-101-27/+37
| | | | | | instead of f128mem for integer XOP instructions. llvm-svn: 158291
* Improve ext/trunc patterns on PPC64.Hal Finkel2012-06-091-11/+4
| | | | | | | | | | The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283
* Use XOP vpcom intrinsics in patterns instead of a target specific SDNode ↵Craig Topper2012-06-094-60/+13
| | | | | | type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279
* Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate ↵Craig Topper2012-06-092-175/+67
| | | | | | as an argument. llvm-svn: 158278
* Disabling a spurious deprecation warning about using PathV1 from within the ↵Aaron Ballman2012-06-091-0/+10
| | | | | | PathV1 implementation file. llvm-svn: 158274
* Fixing a typo in the comments.Aaron Ballman2012-06-091-1/+1
| | | | llvm-svn: 158273
* Allocate the contents of DwarfDebug's StringMaps in a single big ↵Benjamin Kramer2012-06-092-5/+6
| | | | | | BumpPtrAllocator. llvm-svn: 158265
* Silence a gcc-4.6 warning: GCC fails to understand that secondReg and cmpOp2 areDuncan Sands2012-06-091-1/+1
| | | | | | correlated, and thinks that cmpOp2 may be used uninitialized. llvm-svn: 158263
* Enable tail merging on PPC.Hal Finkel2012-06-091-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259
* Register pressure: added getPressureAfterInstr.Andrew Trick2012-06-091-33/+80
| | | | llvm-svn: 158256
* Sketch a LiveRegMatrix analysis pass.Jakob Stoklund Olesen2012-06-093-0/+296
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The LiveRegMatrix represents the live range of assigned virtual registers in a Live interval union per register unit. This is not fundamentally different from the interference tracking in RegAllocBase that both RABasic and RAGreedy use. The important differences are: - LiveRegMatrix tracks interference per register unit instead of per physical register. This makes interference checks cheaper and assignments slightly more expensive. For example, the ARM D7 reigster has 24 aliases, so we would check 24 physregs before assigning to one. With unit-based interference, we check 2 units before assigning to 2 units. - LiveRegMatrix caches regmask interference checks. That is currently duplicated functionality in RABasic and RAGreedy. - LiveRegMatrix is a pass which makes it possible to insert target-dependent passes between register allocation and rewriting. Such passes could tweak the register assignments with interference checking support from LiveRegMatrix. Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix. llvm-svn: 158255
* Test commitJack Carter2012-06-091-0/+1
| | | | llvm-svn: 158250
* Also compute MBB live-in lists in the new rewriter pass.Jakob Stoklund Olesen2012-06-096-89/+32
| | | | | | | | | This deduplicates some code from the optimizing register allocators, and it means that it is now possible to change the register allocators' solutions simply by editing the VirtRegMap between the register allocator pass and the rewriter. llvm-svn: 158249
* Convert comments to proper Doxygen comments.Dmitri Gribenko2012-06-093-11/+11
| | | | llvm-svn: 158248
* Reintroduce VirtRegRewriter.Jakob Stoklund Olesen2012-06-087-78/+121
| | | | | | | | | | | | | | | | | | OK, not really. We don't want to reintroduce the old rewriter hacks. This patch extracts virtual register rewriting as a separate pass that runs after the register allocator. This is possible now that CodeGen/Passes.cpp can configure the full optimizing register allocator pipeline. The rewriter pass uses register assignments in VirtRegMap to rewrite virtual registers to physical registers, and it inserts kill flags based on live intervals. These finalization steps are the same for the optimizing register allocators: RABasic, RAGreedy, and PBQP. llvm-svn: 158244
* canonicalize:Nuno Lopes2012-06-081-4/+5
| | | | | | | | | | | | | | -%a + 42 into 42 - %a previously we were emitting: -(%a + 42) This fixes the infinite loop in PR12338. The generated code is still not perfect, though. Will work on that next llvm-svn: 158237
* Start implementing pre-ra if-converter: using speculation and selects to ↵Evan Cheng2012-06-081-6/+15
| | | | | | eliminate branches. llvm-svn: 158234
* TargetInstrInfo hooks implemented in codegen should be declared pure virtual.Andrew Trick2012-06-081-13/+13
| | | | llvm-svn: 158233
* Reapply commit 158073 with a fix (the testcase was already committed). TheDuncan Sands2012-06-081-123/+120
| | | | | | | | | | | | | | | | | | problem was that by moving instructions around inside the function, the pass could accidentally move the iterator being used to advance over the function too. Fix this by only processing the instruction equal to the iterator, and leaving processing of instructions that might not be equal to the iterator to later (later = after traversing the basic block; it could also wait until after traversing the entire function, but this might make the sets quite big). Original commit message: Grab-bag of reassociate tweaks. Unify handling of dead instructions and instructions to reoptimize. Exploit this to more systematically eliminate dead instructions (this isn't very useful in practice but is convenient for analysing some testcase I am working on). No need for WeakVH any more: use an AssertingVH instead. llvm-svn: 158226
* Remove the TODO statement in the PPC README re: CTR loopsHal Finkel2012-06-081-1/+0
| | | | | | | As Chris points out, this can now be removed! TODO: check if the associated section on viterbi's inner loop can also be removed. llvm-svn: 158224
* Enable PPC CTR loop formation by default.Hal Finkel2012-06-082-11/+9
| | | | | | | | | | | | | | | | | | | | | | Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223
* Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.Hal Finkel2012-06-081-2/+10
| | | | | | | | | | | Marking these classes as non-alocatable allows CTR loop generation to work correctly with the block placement passes, etc. These register classes are currently used only by some unused TCRETURN patterns. In future cleanup, these will be removed. Thanks again to Jakob for suggesting this fix to the CTR loop problem! llvm-svn: 158221
* Enable optimization for integer ABS on X86 if Subtarget has CMOV.Manman Ren2012-06-081-3/+5
| | | | llvm-svn: 158220
* Fix a crash in APInt::lshr when shiftAmt > BitWidth.Chad Rosier2012-06-081-1/+1
| | | | | | Patch by James Benton <jbenton@vmware.com>. llvm-svn: 158213
* Fix Target->Codegen dependence.Andrew Trick2012-06-082-195/+205
| | | | | | | | | | | | | | | | | Bulk move of TargetInstrInfo implementation into TargetInstrInfoImpl. This is dirty because the code isn't part of TargetInstrInfoImpl class, nor should it be, because the methods are not target hooks. However, it's the current mechanism for keeping libTarget useful outside the backend. You'll get a not-so-nice link error if you invoke a TargetInstrInfo method that depends on CodeGen. The TargetInstrInfoImpl class should probably be removed since it doesn't really solve this problem. To really fix this, we probably need separate interfaces for the CodeGen/nonCodeGen sides of TargetInstrInfo. llvm-svn: 158212
* BoundsChecking: add support for ConstantPointerNull. fixes a bunch of ↵Nuno Lopes2012-06-081-6/+7
| | | | | | instrumentation failures in loops with reallocs llvm-svn: 158210
* Disable the PPC CTR-Loops pass by default.Hal Finkel2012-06-082-4/+17
| | | | | | | | | | The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206
* Fix a bug in the new PPC CTR-Loops pass.Hal Finkel2012-06-081-0/+1
| | | | | | | | | The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205
* Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form ↵Hal Finkel2012-06-089-18/+812
| | | | | | | | | | CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204
OpenPOWER on IntegriCloud