summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Don't use MUBUF vaddr if address may overflowMatt Arsenault2017-11-156-2/+60
| | | | | | | Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240
* Revert r318193 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' ↵Hans Wennborg2017-11-151-318/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | elements in integer binary ops." It crashes building sqlite; see reply on the llvm-commits thread. > [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. > > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > *dst++ = *src++; > *dst++ = *src++ + 1; > *dst++ = *src++ + 2; > *dst++ = *src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318239
* [LoopRotate] processLoop should return true even if it just simplified the ↵Craig Topper2017-11-151-1/+1
| | | | | | | | | | | | | | loop latch without making any other changes Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened. PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch. Fixes PR35210. Differential Revision: https://reviews.llvm.org/D40035 llvm-svn: 318237
* [asan] Prevent rematerialization of &__asan_shadow.Evgeniy Stepanov2017-11-151-12/+30
| | | | | | | | | | | | | | | | | | | | Summary: In the mode when ASan shadow base is computed as the address of an external global (__asan_shadow, currently on android/arm32 only), regalloc prefers to rematerialize this value to save register spills. Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant pool entry). This changes adds an inline asm in the function prologue to suppress this behavior. It reduces AsanTest binary size by 7%. Reviewers: pcc, vitalybuka Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40048 llvm-svn: 318235
* AMDGPU: Handle or in multi-use shl ptr combineMatt Arsenault2017-11-141-2/+2
| | | | llvm-svn: 318223
* [EntryExitInstrumenter] Placate GCC, the semicolon is redundant. NFCI.Davide Italiano2017-11-141-3/+2
| | | | llvm-svn: 318217
* [Reassociate] use dyn_cast instead of isa+cast; NFCISanjay Patel2017-11-141-9/+9
| | | | llvm-svn: 318212
* [GISel]: Rework legalization algorithm for better elimination ofAditya Nandakumar2017-11-141-101/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | artifacts along with DCE Legalization Artifacts are all those insts that are there to make the type system happy. Currently, the target needs to say all combinations of extends and truncs are legal and there's no way of verifying that post legalization, we only have *truly* legal instructions. This patch changes roughly the legalization algorithm to process all illegal insts at one go, and then process all truncs/extends that were added to satisfy the type constraints separately trying to combine trivial cases until they converge. This has the added benefit that, the target legalizerinfo can only say which truncs and extends are okay and the artifact combiner would combine away other exts and truncs. Updated legalization algorithm to roughly the following pseudo code. WorkList Insts, Artifacts; collect_all_insts_and_artifacts(Insts, Artifacts); do { for (Inst in Insts) legalizeInstrStep(Inst, Insts, Artifacts); for (Artifact in Artifacts) tryCombineArtifact(Artifact, Insts, Artifacts); } while(!Insts.empty()); Also, wrote a simple wrapper equivalent to SetVector, except for erasing, it avoids moving all elements over by one and instead just nulls them out. llvm-svn: 318210
* Reland "[mips][mt][6/7] Add support for mftr, mttr instructions."Simon Dardis2017-11-147-1/+375
| | | | | | | | | | | | | | | | | | | | This adjusts the tests to hopfully pacify the llvm-clang-x86_64-expensive-checks-win buildbot. Unlike many other instructions, these instructions have aliases which take coprocessor registers, gpr register, accumulator (and dsp accumulator) registers, floating point registers, floating point control registers and coprocessor 2 data and control operands. For the moment, these aliases are treated as pseudo instructions which are expanded into the underlying instruction. As a result, disassembling these instructions shows the underlying instruction and not the alias. Reviewers: slthakur, atanasyan Differential Revision: https://reviews.llvm.org/D35253 llvm-svn: 318207
* Make salvageDebugInfo of casts work for dbg.declare and dbg.addrReid Kleckner2017-11-141-6/+16
| | | | | | | | | | | | | | | | | | | | | Summary: Instcombine (and probably other passes) sometimes want to change the type of an alloca. To do this, they generally create a new alloca with the desired type, create a bitcast to make the new pointer type match the old pointer type, replace all uses with the cast, and then simplify the casts. We already knew how to salvage dbg.value instructions when removing casts, but we can extend it to cover dbg.addr and dbg.declare. Fixes a debug info quality issue uncovered in Chromium in http://crbug.com/784609 Reviewers: aprantl, vsk Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40042 llvm-svn: 318203
* [CodeGen] Peel off the dominant case in switch statement in loweringRong Xu2017-11-142-2/+92
| | | | | | | | | | This patch peels off the top case in switch statement into a branch if the probability exceeds a threshold. This will help the branch prediction and avoids the extra compares when lowering into chain of branches. Differential Revision: http://reviews.llvm.org/D39262 llvm-svn: 318202
* Fix unused variable warning.Richard Smith2017-11-141-1/+0
| | | | llvm-svn: 318201
* Rename CountingFunctionInserter and use for both mcount and cygprofile ↵Hans Wennborg2017-11-1410-62/+157
| | | | | | | | | | | | | | | | | | | | | | calls, before and after inlining Clang implements the -finstrument-functions flag inherited from GCC, which inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit. This is useful for getting a trace of how the functions in a program are executed. Normally, the calls remain even if a function is inlined into another function, but it is useful to be able to turn this off for users who are interested in a lower-level trace, i.e. one that reflects what functions are called post-inlining. (We use this to generate link order files for Chromium.) LLVM already has a pass for inserting similar instrumentation calls to mcount(), which it does after inlining. This patch renames and extends that pass to handle calls both to mcount and the cygprofile functions, before and/or after inlining as controlled by function attributes. Differential Revision: https://reviews.llvm.org/D39287 llvm-svn: 318195
* [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in ↵Dinar Temirbulatov2017-11-141-140/+318
| | | | | | | | | | | | | | | | | | | | | | | | | | | integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { *dst++ = *src++; *dst++ = *src++ + 1; *dst++ = *src++ + 2; *dst++ = *src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318193
* AMDGPU: Error on stack size overflowMatt Arsenault2017-11-142-6/+12
| | | | llvm-svn: 318189
* [SystemZ] Do not crash when selecting an OR of two constantsUlrich Weigand2017-11-141-2/+4
| | | | | | | | | | | In rare cases, common code will attempt to select an OR of two constants. This confuses the logic in splitLargeImmediate, causing an internal error during isel. Fixed by simply leaving this case to common code to handle. This fixes PR34859. llvm-svn: 318187
* [AArch64] Adjust the cost model for Exynos M1 and M2Evandro Menezes2017-11-141-11/+9
| | | | | | Fix the modeling of loads and stores of registers pairs. llvm-svn: 318186
* [ARM, AArch64] Fix an assert message, Darwin isn't the only target ↵Martin Storsjo2017-11-142-2/+4
| | | | | | supporting TLS. NFC. llvm-svn: 318184
* [CodeGenPrepare] Disable div bypass when working set size is huge.Easwaran Raman2017-11-141-3/+4
| | | | | | | | | | | | | | | | | | | Summary: Bypass of slow divs based on operand values is currently disabled for -Os. Do the same when profile summary is available and the working set size of the application is huge. This is similar to how loop peeling is guarded by hasHugeWorkingSetSize. In the div bypass case, the generated extra code (and the extra branch) tendss to outweigh the benefits of the bypass. This results in noticeable performance improvement on an internal application. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39992 llvm-svn: 318179
* [SystemZ] Fix invalid codegen using RISBMux on out-of-range bitsUlrich Weigand2017-11-141-1/+9
| | | | | | | | Before using the 32-bit RISBMux set of instructions we need to verify that the input bits are actually within range of the 32-bit instruction. This fixer PR35289. llvm-svn: 318177
* Mark intrinsics operating on the whole warp as IntrInaccessibleMemOnlyArtem Belevich2017-11-142-10/+21
| | | | | | | It's needed to model the fact that they do access data from other threads in a warp and thus can't be CSE'd. llvm-svn: 318173
* CodeGen: Fix TargetLowering::LowerCallTo for sret value typeYaxun Liu2017-11-141-1/+1
| | | | | | | | | | | | | | TargetLowering::LowerCallTo assumes that sret value type corresponds to a pointer in default address space, which is incorrect, since sret value type should correspond to a pointer in alloca address space, which may not be the default address space. This causes assertion for amdgcn target in amdgiz environment. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39996 llvm-svn: 318167
* [PredicateInfo] Stable sort ValueDFS to remove non-deterministic orderingMandeep Singh Grang2017-11-141-1/+1
| | | | | | | | | | | | | | Summary: This fixes failure in Transforms/Util/PredicateInfo/testandor.ll uncovered by D39245. Reviewers: dberlin Reviewed By: dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39630 llvm-svn: 318165
* [XRay] Stable sort XRayRecord to remove non-deterministic orderingMandeep Singh Grang2017-11-141-1/+1
| | | | | | | | | | | | | | | | Summary: This fixes failure in tools/llvm-xray/X86/graph-zero-latency-calls.yaml uncovered by D39245. Reviewers: dberris Reviewed By: dberris Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39943 llvm-svn: 318163
* [X86] Fix typo in comment. NFCCraig Topper2017-11-141-2/+2
| | | | llvm-svn: 318156
* [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipeGil Rapaport2017-11-142-124/+243
| | | | | | | | | | | | | | | | | | | | This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 llvm-svn: 318149
* ARM: correctly update CFG when splitting BB to fix branch.Tim Northover2017-11-141-0/+6
| | | | | | | | | | | | Because the block-splitting code is multi-purpose, we have to meddle with the branches when using it to fixup a conditional branch destination. We got the code right, but forgot to update the CFG so the verifier complained when expensive checks were on. Probably harmless since constant-islands comes so late, but best to fix it anyway. llvm-svn: 318148
* [ARM GlobalISel] Remove C++ code for G_CONSTANTDiana Picus2017-11-141-22/+0
| | | | | | | | | | Get rid of the handwritten instruction selector code for handling G_CONSTANT. This code wasn't checking all the preconditions correctly anyway, so it's better to leave it to TableGen, which can handle at least some cases correctly (e.g. MOVi, MOVi16, folding into binary operations). Also add tests to cover those cases. llvm-svn: 318146
* [ARM] Fix incorrect conversion of a tail call to an ordinary callMomchil Velikov2017-11-142-36/+74
| | | | | | | | | | | | | | | | | | | When we emit a tail call for Armv8-M, but then discover that the caller needs to save/restore `LR`, we convert the tail call to an ordinary one, since restoring `LR` takes extra instructions, which may negate the benefits of the tail call. If the callee, however, takes stack arguments, this conversion is incorrect, since nothing has been done to pass the stack arguments. Thus the patch reverts https://reviews.llvm.org/rL294000 Also, we improve the instruction sequence for popping `LR` in the case when we couldn't immediately find a scratch low register, but we can use as a temporary one of the callee-saved low registers and restore `LR` before popping other callee-saves. Differential Revision: https://reviews.llvm.org/D39599 llvm-svn: 318143
* AMDGPU: Fix producing saveexec when the copy is spilledMatt Arsenault2017-11-141-2/+16
| | | | | | | | | If the register from the copy from exec was spilled, the copy before the spill was deleted leaving a spill of undefined register verifier error and miscompiling. Check for other use instructions of the copy register. llvm-svn: 318132
* [PM] Port BoundsChecking to the new PM.Chandler Carruth2017-11-144-35/+42
| | | | | | | | | | | Registers it and everything, updates all the references, etc. Next patch will add support to Clang's `-fexperimental-new-pass-manager` path to actually enable BoundsChecking correctly. Differential Revision: https://reviews.llvm.org/D39084 llvm-svn: 318128
* Use TempFile in llvm-ar. NFC.Rafael Espindola2017-11-141-11/+7
| | | | llvm-svn: 318127
* [PM] Refactor BoundsChecking further to prepare it to be exposed both asChandler Carruth2017-11-141-65/+65
| | | | | | | | | | | | a legacy and new PM pass. This essentially moves the class state to parameters and re-shuffles the code to make that reasonable. It also does some minor cleanups along the way and leaves some comments. Differential Revision: https://reviews.llvm.org/D39081 llvm-svn: 318124
* [WebAssembly] Explicily disable comdat support for wasm outputSam Clegg2017-11-142-12/+11
| | | | | | | | | | | | | For now at least. We clearly need some kind of comdat or linkonce_odr support for wasm but currently COMDAT is not supported. Disable COMDAT support in the same way we do the Mach-O. This also causes clang not to generated COMDATs. Differential Revision: https://reviews.llvm.org/D39873 llvm-svn: 318123
* Add a move assignment operator to TempFile. NFC.Rafael Espindola2017-11-141-1/+3
| | | | llvm-svn: 318122
* Update some code.google.com linksHans Wennborg2017-11-133-6/+6
| | | | llvm-svn: 318115
* Simplify and rename variable.Rafael Espindola2017-11-131-3/+3
| | | | | | | | | std::error_code can represent success, so we don't need a Optional<std::error_code>. Rename the variable to avoid confusion with the type Error. llvm-svn: 318111
* AMDGPU: Fix not converting d16 load/stores to offsetMatt Arsenault2017-11-131-1/+22
| | | | | | Fixes missed optimization with new MUBUF instructions. llvm-svn: 318106
* Simplify. NFC.Rafael Espindola2017-11-131-3/+2
| | | | llvm-svn: 318104
* AMDGPU: Implement computeKnownBitsForTargetNode for mbcntMatt Arsenault2017-11-132-0/+18
| | | | llvm-svn: 318100
* Fix -Werror when compiling rL318083 (ter)Serge Guelton2017-11-131-1/+2
| | | | | | Statically assert the result and remove a runtime comparison, a direct consequence of the optimization introduced in rL318083. llvm-svn: 318091
* Fix an assertion in SelectionDAG::transferDbgValues()Adrian Prantl2017-11-131-11/+16
| | | | | | | | when transferring debug info describing the lower bits of an extended SDNode. rdar://problem/35504722 llvm-svn: 318086
* [arm] Fix Unnecessary reloads from GOT.Evgeniy Stepanov2017-11-139-40/+41
| | | | | | | | | | | | Summary: This fixes PR35221. Use pseudo-instructions to let MachineCSE hoist global address computation. Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39871 llvm-svn: 318081
* Fix clang -Wsometimes-uninitialized warning in SCEV codeReid Kleckner2017-11-131-1/+1
| | | | | | | I don't believe this was a problem in practice, as it's likely that the boolean wasn't checked unless the backend condition was non-null. llvm-svn: 318073
* Create a TempFile class.Rafael Espindola2017-11-132-20/+72
| | | | | | | | | | | | | | | This just adds a TempFile class and replaces the use in FileOutputBuffer with it. The only difference for now is better error handling. Followup work includes: - Convert other user of temporary files to it. - Add support for automatically deleting on windows. - Add a createUnnamed method that returns a potentially unnamed file. It would be actually unnamed on modern linux and have a unknown name on windows. llvm-svn: 318069
* [ValueTracking] use 'auto' with 'dyn_cast'; NFCSanjay Patel2017-11-131-11/+13
| | | | llvm-svn: 318058
* [X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter addressCraig Topper2017-11-131-20/+35
| | | | | | | | | | | | If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address. This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase. We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it. Differential Revision: https://reviews.llvm.org/D39927 llvm-svn: 318057
* [ValueTracking] simplify code in CannotBeNegativeZero() with match(); NFCISanjay Patel2017-11-131-5/+3
| | | | llvm-svn: 318055
* AMDGPU: Drop duplicate setOperationActionJan Vesely2017-11-131-2/+0
| | | | | | | | These are set with other scalar int ops few lines up Differential Revision: https://reviews.llvm.org/D39928 llvm-svn: 318051
* [SCEV] Handling for ICmp occuring in the evolution chain.Jatin Bhateja2017-11-132-8/+88
| | | | | | | | | | | | | | | | | | | | Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: sanjoy, junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 318050
OpenPOWER on IntegriCloud