summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Canonicalize splats as build_vectors (PR22283)Sanjay Patel2015-02-172-28/+22
| | | | | | | | | | | | | | | | | | | | | | This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 llvm-svn: 229511
* R600/SI: Extend private extload pattern to include zext loadsTom Stellard2015-02-171-4/+6
| | | | llvm-svn: 229507
* Prefer SmallVector::append/insert over push_back loops.Benjamin Kramer2015-02-1727-159/+72
| | | | | | Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500
* Fixed a bug in store sinking.Elena Demikhovsky2015-02-171-4/+6
| | | | | | | | | | The problem was in store-sink barrier check. Store sink barrier should be checked for ModRef (read-write) mode. http://llvm.org/bugs/show_bug.cgi?id=22613 llvm-svn: 229495
* OrcJIT: Appease msc18 not to be confused on executeCompileCallback<OrcX86_64>.NAKAMURA Takumi2015-02-171-2/+3
| | | | llvm-svn: 229494
* Reformat.NAKAMURA Takumi2015-02-171-5/+3
| | | | llvm-svn: 229493
* [X86] Silence -Wsign-compare warnings.Andrea Di Biagio2015-02-171-2/+2
| | | | | | | | | GCC 4.8 reported two new warnings due to comparisons between signed and unsigned integer expressions. The new warnings were accidentally introduced by revision 229480. Added explicit casts to silence the warnings. No functional change intended. llvm-svn: 229488
* Revert "InstrProf: Add unit tests for the profile reader and writer"Justin Bogner2015-02-172-45/+14
| | | | | | | | | | This added API to the InstrProfWriter to write to a string so I could write unittests without using temp files. This doesn't really work, since the format has tighter alignment requirements than a char. This reverts r229478 and its follow-up, r229481. llvm-svn: 229483
* AVX-512: changes in intel_ocl_bi calling conventionsElena Demikhovsky2015-02-172-14/+33
| | | | | | | | - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions llvm-svn: 229482
* [X86] Combine vector anyext + and into a vector zextMichael Kuperstein2015-02-171-9/+99
| | | | | | | | | | Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 llvm-svn: 229480
* Re-apply "InstrProf: Add unit tests for the profile reader and writer"Justin Bogner2015-02-172-14/+45
| | | | | | | | Add these tests again, but use va_list instead of initializer lists. This reverts r229456, reapplying r229455. llvm-svn: 229478
* Make the PowerPC AsmPrinter independent of global subtargetEric Christopher2015-02-171-15/+24
| | | | | | | | | initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use attributes on the TargetMachine or get information from all of the various subtargets. llvm-svn: 229475
* Add a FIXME to move IsLittleEndian to the target machine.Eric Christopher2015-02-171-0/+1
| | | | llvm-svn: 229472
* Move ABI handling and 64-bitness to the PowerPC target machine.Eric Christopher2015-02-175-32/+42
| | | | | | | This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471
* AsmPrinter: Use DIExpression default constructor, NFCDuncan P. N. Exon Smith2015-02-171-1/+1
| | | | llvm-svn: 229464
* [x86] Teach the unpack lowering to try wider element unpacks.Chandler Carruth2015-02-171-16/+52
| | | | | | | | | | | This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. llvm-svn: 229463
* [BDCE] Add a bit-tracking DCE passHal Finkel2015-02-174-0/+419
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x |= (4 & foo(5)); x |= (8 & foo(3)); x |= (16 & foo(2)); x |= (32 & foo(1)); x |= (64 & foo(0)); x |= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462
* [Orc] Update the Orc indirection utils and refactor the CompileOnDemand layer.Lang Hames2015-02-173-192/+162
| | | | | | | | | | | | | | This patch replaces most of the Orc indirection utils API with a new class: JITCompileCallbackManager, which creates and manages JIT callbacks. Exposing this functionality directly allows the user to create callbacks that are associated with user supplied compilation actions. For example, you can create a callback to lazyily IR-gen something from an AST. (A kaleidoscope example demonstrating this will be committed shortly). This patch also refactors the CompileOnDemand layer to use the JITCompileCallbackManager API. llvm-svn: 229461
* AsmPrinter: Stop creating DebugLocsDuncan P. N. Exon Smith2015-02-172-13/+3
| | | | | | | | | | | | | | | | While looking at a heap profile of a clang LTO bootstrap with -g, I noticed that 2.2% of memory in an `llvm-lto` of clang is from calling `DebugLoc::get()` in `collectVariableInfo()` (accounting for ~40% of memory used for `MDLocation`s). I suspect this was introduced by r226736, whose goal was to prevent uniquing of `DebugLoc`s (goal achieved, if so). There's no reason we need a `DebugLoc` here at all -- it was just being used for (in)convenient API -- so the fix is to pass the scope and inlined-at directly to `LexicalScopes::findInlinedScope()`. llvm-svn: 229459
* [PowerPC] Support non-direct-sub/superclass VSX copiesHal Finkel2015-02-161-4/+4
| | | | | | | | | | | | | | | | Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457
* Revert "InstrProf: Add unit tests for the profile reader and writer"Justin Bogner2015-02-162-45/+14
| | | | | | | | Looks like the bots don't like my initializer lists. This reverts r229455 llvm-svn: 229456
* InstrProf: Add unit tests for the profile reader and writerJustin Bogner2015-02-162-14/+45
| | | | | | | | | | This required some minor API to be added to these types to avoid needing temp files. Also, I've used initializer lists in the tests, as MSVC 2013 claims to support them. I'll redo this without them if the bots complain. llvm-svn: 229455
* [Mips] Add .MIPS.options section descriptor kinds enumerationSimon Atanasyan2015-02-161-1/+1
| | | | | | No functional changes. llvm-svn: 229452
* [ARM] Remove unused declaration. NFC.Ahmed Bougacha2015-02-161-1/+0
| | | | | | | GlobalMerge was moved to lib/CodeGen a while ago, and is no longer called "ARMGlobalMerge". llvm-svn: 229448
* [AVX512] Make 512b vector floating point rounds legal on AVX512.Cameron McInally2015-02-161-0/+11
| | | | llvm-svn: 229445
* RegisterCoalescer: Don't rematerialize subregister definitions.Matthias Braun2015-02-161-0/+18
| | | | | | | | | | | We cannot simply rematerialize instructions which only defining a subregister, as the final value also depends on the previous instructions. This fixes test/CodeGen/R600/subreg-coalescer-bug.ll with subreg liveness enabled. llvm-svn: 229444
* RegisterCoalescer: Do not look for regclass of IMPLICIT_DEF.Matthias Braun2015-02-161-6/+7
| | | | | | | | | | | | IMPLICIT_DEF is a generic instruction and has no (fixed) output register class defined. The rematerialization code of the register coalescer should not scan the instruction description for a register class. This fixes a problem showing up in test/CodeGen/R600/subreg-coalescer-crash.ll with subregister liveness enabled. llvm-svn: 229443
* [X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domainSimon Pilgrim2015-02-161-10/+10
| | | | | | | | Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 llvm-svn: 229439
* SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here tooMehdi Amini2015-02-161-2/+46
| | | | | | | Update SPARC tests to match. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229438
* InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))Mehdi Amini2015-02-162-22/+49
| | | | | | | Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437
* InstrProf: Use ErrorOr for IndexedInstrProfReader::create (NFC)Justin Bogner2015-02-162-5/+11
| | | | | | | The other InstrProfReader::create factories were updated to return ErrorOr in r221120, and it's odd for these APIs not to match. llvm-svn: 229433
* [X86] Remove the multiply by 8 that goes into the shift constant for ↵Craig Topper2015-02-163-42/+37
| | | | | | X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction. llvm-svn: 229431
* [X86] Remove x86.avx2.psll.dq.bs and x86.avx2.psrl.dq.bs intrinsics.Craig Topper2015-02-162-7/+51
| | | | llvm-svn: 229430
* ARM: Transfer kill flag when lowering VSTMQIA to VSTMDIA.Matthias Braun2015-02-161-1/+2
| | | | llvm-svn: 229425
* RegisterCoalescer: Improve previous fix for wrong def after.Matthias Braun2015-02-161-3/+2
| | | | | | | | | | | The previous fix in r225503 was needlessly complicated. The problem goes away as well if the arguments to MergeValueNumberInto are supplied in the correct order. This was previously missed because the existing code already had the wrong order but an additional later Merge was hiding the bug for the main liverange VNI. llvm-svn: 229424
* MSVC 2013 does not ICE on this code in the same fashion that MSVC 2012 did; NFC.Aaron Ballman2015-02-161-2/+1
| | | | llvm-svn: 229422
* Bitcode: Fix major regression: large files w/ debug infoDuncan P. N. Exon Smith2015-02-162-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The metadata/value split introduced a major regression reading large bitcode files that contain debug info (or other cyclic (non-self reference) metadata graphs). For the first time in a while, I dropped from libLTO.dylib down to `llvm-lto` with a non-trivial bitcode file (~350MB), and I hit this when reading the result of ld64's `-save-temps` in `llvm-lto`. Here's pseudo-code for what was going on: read-main-metadata-block: for each md: if has-fwd-ref: // Only true for cyclic graphs. any-fwd-refs <- true if any-fwd-refs: foreach md: resolve-cycles(md) // Handle cycles. foreach function: read-function-metadata-block: // Such as !alias, !loop if any-fwd-refs: foreach md: // (all metadata, not just this block) resolve-cycles(md) // A no-op, but the loop is expensive!! This commit resets the `AnyFwdRefs` flag to `false`. This on its own was enough to change my Release+Asserts `llvm-lto` time for reading this bitcode from over 20 minutes (I gave up on it) to 20 seconds. I've gone further by tracking the min/max metadata forward-references in a metadata block. This protects against a schema that has lots of functions that each reference their own metadata cycle. Unfortunately, this regression is in the 3.6 branch as well. llvm-svn: 229421
* ConstantFold: Properly fold GEP indices wider than i64David Majnemer2015-02-161-18/+31
| | | | llvm-svn: 229420
* Run LICM as part of the cleanup phase from the scalar optimizer.James Molloy2015-02-161-0/+1
| | | | | | | Things like LoopUnrolling can produce loop invariant values - make sure we pick them up. llvm-svn: 229419
* We require MSVC 1800 as our minimum, so these checks can safely go away; ↵Aaron Ballman2015-02-161-12/+7
| | | | | | NFC. (It seems this code has been copy/pasted around, unfortunately.) llvm-svn: 229417
* We require MSVC 1800 as our minimum, so these checks can safely go away; NFC.Aaron Ballman2015-02-161-12/+7
| | | | llvm-svn: 229415
* MSVC 2013 supports std::forward_as_tuple, while MSVC 2012 did not; so we can ↵Aaron Ballman2015-02-161-10/+8
| | | | | | move to using the improved API. llvm-svn: 229414
* AArch64: Safely handle the incoming sret call argument.Andrew Trick2015-02-168-29/+45
| | | | | | | | | | | | | | | | | | This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
* [ADCE] Don't indent inside an anonymous namespaceHal Finkel2015-02-161-11/+10
| | | | | | | To be consistent with what clang-format does, don't add extra indentation inside an anonymous namespace. NFC. llvm-svn: 229412
* [LoopReroll] Relax some assumptions a little.James Molloy2015-02-161-3/+6
| | | | | | | | We won't find a root with index zero in any loop that we are able to reroll. However, we may find one in a non-rerollable loop, so bail gracefully instead of failing hard. llvm-svn: 229406
* [LoopReroll] Don't crash on dead codeJames Molloy2015-02-161-0/+2
| | | | | | | | If a PHI has no users, don't crash; bail gracefully. This shouldn't happen often, but we can make no guarantees that previous passes didn't leave dead code around. llvm-svn: 229405
* [asan] Reuse a common function.Evgeniy Stepanov2015-02-161-6/+2
| | | | | | Do not reimplement RoundUpToAlignment. llvm-svn: 229397
* [x86] Add a generic unpack-targeted lowering technique. This can be usedChandler Carruth2015-02-161-0/+54
| | | | | | | | | | | | | to generically lower blends and is particularly nice because it is available frome SSE2 onward. This removes a lot of the remaining domain crossing blends in SSE2 code. I'm hoping to replace some of the "interleaved" lowering hacks with something closer to this which should be more principled. First, this needs to learn how to detect and use other interleavings besides that of the natural type provided. That will be a follow-up patch though. llvm-svn: 229378
* Fix quoting of #pragma comment for MS compat, LLVM part.Michael Kuperstein2015-02-161-14/+3
| | | | | | | | | For #pragma comment(linker, ...) MSVC expects the comment string to be quoted, but for #pragma comment(lib, ...) the compiler itself quotes the library name. Since this distinction disappears by the time the directive reaches the backend, move quoting for the "lib" version to the frontend. Differential Revision: http://reviews.llvm.org/D7652 llvm-svn: 229375
* [x86] Add initial basic support for forming blends of v16i8 vectors.Chandler Carruth2015-02-161-10/+22
| | | | | | | | | | | | | This blend instruction is ... really lame. The register usage is insane. As a consequence this is probably only *barely* better than 2 pshufbs followed by a por, and that mostly because it only has to read from a single memory location. However, this doesn't fix as much as I kind of expected, so more to go. Pretty sure that the ordering and delegation of v16i8 is just really, really bad. llvm-svn: 229373
OpenPOWER on IntegriCloud