summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [mips][ias] R_MIPS_GOT_(PAGE|OFST) do not need symbolsDaniel Sanders2016-07-191-4/+4
| | | | | | | | | | Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22458 llvm-svn: 275968
* [mips] Correct label prefixes for N32 and N64.Daniel Sanders2016-07-192-3/+13
| | | | | | | | | | | | | | | | | Summary: N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own ($). This fixes the majority of object differences between -fintegrated-as and -fno-integrated-as. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22412 llvm-svn: 275967
* [mips] Recognise the triple used by Debian stretch for mips64el.Daniel Sanders2016-07-191-0/+2
| | | | | | | | | | | | | Summary: The triple used for this distribution is mips64el-linux-gnuabi64. Reviewers: sdardis Subscribers: sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22406 llvm-svn: 275966
* [InstCombine] Minor cleanup of cast simplification code [NFC]Tobias Grosser2016-07-193-60/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following: - Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter. - Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there. - Simplify interface of `shouldOptimizeCast()`. - Minor code style adaptions in `shouldOptimizeCast()`. - Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant. - Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format. - Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability. - Simplify the interface of `isEliminableCastPair()`. - Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration. - Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer. - Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant. Reviewers: vtjnash, grosser Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22449 Contributed-by: Matthias Reisinger llvm-svn: 275964
* AVX-512: Fixed BT instruction selection.Elena Demikhovsky2016-07-192-25/+59
| | | | | | | | | | | The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950
* [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.Craig Topper2016-07-191-6/+16
| | | | llvm-svn: 275942
* [X86] Remove superfluous parameter from a multiclass. All instantiations ↵Craig Topper2016-07-191-9/+8
| | | | | | passed the same value. llvm-svn: 275941
* [MemorySSA] Update to the new shiny walker.George Burgess IV2016-07-191-307/+830
| | | | | | | | | | | | | | | | | | | | | | | | This patch updates MemorySSA's use-optimizing walker to be more accurate and, in some cases, faster. Essentially, this changed our core walking algorithm from a cache-as-you-go DFS to an iteratively expanded DFS, with all of the caching happening at the end. Said expansion happens when we hit a Phi, P; we'll try to do the smallest amount of work possible to see if optimizing above that Phi is legal in the first place. If so, we'll expand the search to see if we can optimize to the next phi, etc. An iteratively expanded DFS lets us potentially quit earlier (because we don't assume that we can optimize above all phis) than our old walker. Additionally, because we don't cache as we go, we can now optimize above loops. As an added bonus, this patch adds a ton of verification (if EXPENSIVE_CHECKS are enabled), so finding bugs is easier. Differential Revision: https://reviews.llvm.org/D21777 llvm-svn: 275940
* [X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFCCraig Topper2016-07-192-4/+4
| | | | llvm-svn: 275939
* Retry: [llvm-profdata] Speed up merging by using a thread poolVedant Kumar2016-07-191-0/+8
| | | | | | | | | | | | | | | | | | | | | Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Changes since the initial commit: - When handling odd-length inputs, call ThreadPool::wait() before merging the last profile. Should fix a race/off-by-one (see r275937). Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275938
* Revert "[llvm-profdata] Speed up merging by using a thread pool"Vedant Kumar2016-07-191-8/+0
| | | | | | | | | | | This reverts commit r275921. It broke the ppc64be bot: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537 I'm not sure why it broke, but based on the output, it looks like an off-by-one (one profile left un-merged). llvm-svn: 275937
* Recommit the patch "Use uniforms set to populate VecValuesToIgnore".Wei Mi2016-07-191-49/+40
| | | | | | | | | | | | | | | | | | For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936
* AMDGPU/SI: Fix SI scheduler refcount issueMatt Arsenault2016-07-191-0/+3
| | | | | | | | | Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-198-300/+451
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* [LoopReroll] Reroll loops with unordered atomic memory accessesSanjoy Das2016-07-191-7/+7
| | | | | | | | | | Reviewers: hfinkel, jfb, reames Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22385 llvm-svn: 275932
* Fix -Wreturn-type with gcc 4.8 and libc++Matt Arsenault2016-07-181-1/+1
| | | | llvm-svn: 275922
* [llvm-profdata] Speed up merging by using a thread poolVedant Kumar2016-07-181-0/+8
| | | | | | | | | | | | | | | | Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275921
* [NVPTX] Make sure we adjust alignment at all call sitesArtem Belevich2016-07-181-2/+1
| | | | | | | .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920
* [PM] Convert Loop Strength Reduce pass to new PMDehao Chen2016-07-183-24/+46
| | | | | | | | | | | | Summary: Convert Loop String Reduce pass to new PM Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22468 llvm-svn: 275919
* [PM] Port FunctionImport Pass to new PMTeresa Johnson2016-07-184-46/+57
| | | | | | | | | | | | Summary: Port FunctionImport Pass to new PM. Reviewers: mehdi_amini, davide Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D22475 llvm-svn: 275916
* Revert rL275912.Wei Mi2016-07-181-40/+49
| | | | llvm-svn: 275915
* Use uniforms set to populate VecValuesToIgnore.Wei Mi2016-07-181-49/+40
| | | | | | | | | | | | | | | | For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912
* refactor SimplifySelectInst; NFCISanjay Patel2016-07-181-97/+115
| | | | llvm-svn: 275911
* [NVPTX] Force minimum alignment of 4 for byval arguments of device-side ↵Artem Belevich2016-07-182-6/+23
| | | | | | | | | | | | | | | | functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893
* [LoopSimplify] Update LCSSA after separating nested loops.Michael Zolotukhin2016-07-181-12/+50
| | | | | | | | | | | | | | | | | | Summary: Usually LCSSA survives this transformation, but in some cases (see attached test) it doesn't: values from the original loop after separating might be used from the outer loop. Before the transformation it was the same loop, so LCSSA phis were not required. This fixes PR28272. Reviewers: sanjoy, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21665 llvm-svn: 275891
* Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"Vitaly Buka2016-07-181-11/+0
| | | | | | | | Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275776. llvm-svn: 275890
* [MC] Separate non-parsing operations from conditional chains. NFC.Nirav Dave2016-07-181-15/+20
| | | | llvm-svn: 275888
* [GVNHoist] Remove a home-grown version of replaceUsesOfWithDavid Majnemer2016-07-181-15/+2
| | | | | | | | | replaceUsesOfWith will, on average, consider fewer values when trying to do the replacement. No functional change is intended. llvm-svn: 275884
* [LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA ↵Michael Zolotukhin2016-07-181-1/+10
| | | | | | | | | | | | | | | | | | form. Summary: SSAUpdate might insert PHI-nodes inside loops, which can break LCSSA form unless we fix it up. This fixes PR28424. Reviewers: sanjoy, chandlerc, hfinkel Subscribers: uabelho, llvm-commits Differential Revision: http://reviews.llvm.org/D21997 llvm-svn: 275883
* AMDGPU: Remove pointless dyn_cast_or_nullMatt Arsenault2016-07-181-4/+3
| | | | | | This is already casted above so non-null llvm-svn: 275881
* Fix -Wmicrosoft-enum-value in GVNHoist.cppReid Kleckner2016-07-181-1/+1
| | | | llvm-svn: 275879
* AMDGPU: Fix missing switch case warningMatt Arsenault2016-07-181-0/+1
| | | | llvm-svn: 275873
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-185-1/+8
| | | | llvm-svn: 275871
* AMDGPU/R600: Replace barrier intrinsicsMatt Arsenault2016-07-183-21/+1
| | | | llvm-svn: 275870
* AMDGPU: Remove dead check in AMDGPUPromoteAllocaMatt Arsenault2016-07-181-9/+10
| | | | | | | | | | This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869
* AMDGPU: Remove dead code and redundant checkMatt Arsenault2016-07-181-27/+1
| | | | | | | Non intrinsic calls aren't really handled, and this IntrinsicInst dyn_cast checks for the function for us. llvm-svn: 275868
* [ThinLTO] Address review comments from PGO indirect call promotion (NFC)Teresa Johnson2016-07-181-13/+17
| | | | | | Address a couple of post-commit review comments from r275707. llvm-svn: 275867
* CodeGenPrep: use correct function to determine Global's alignment.Tim Northover2016-07-181-1/+1
| | | | | | | | | Elsewhere (particularly computeKnownBits) we assume that a global will be aligned to the value returned by Value::getPointerAlignment. This is used to boost the alignment on memcpy/memset, so any target-specific request can only increase that value. llvm-svn: 275866
* [Hexagon] Handle returning small structures by valueKrzysztof Parzyszek2016-07-181-1/+7
| | | | | | | This is not compliant with the official ABI, but allows experimentation with calling conventions. llvm-svn: 275825
* [Hexagon] Revert r275822: mistake in commit messageKrzysztof Parzyszek2016-07-181-7/+1
| | | | llvm-svn: 275824
* [X86][AVX] Add target shuffle decode support for VBROADCASTSimon Pilgrim2016-07-183-0/+19
| | | | | | Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823
* [Hexagon] Handle returning small structures by valueKrzysztof Parzyszek2016-07-181-1/+7
| | | | | | | This is compliant with the official ABI, but allows experimentation with calling conventions. llvm-svn: 275822
* [X86] Accept SELECT op code for x86-64 fp128 typeChih-Hung Hsieh2016-07-181-0/+1
| | | | | | | | | | DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow SELECT op code for x86_64 fp128 type for MME targets, so SoftenFloatOperand does not abort on SELECT op code. Differential Revision: http://reviews.llvm.org/D21758 llvm-svn: 275818
* [LoopDist] Port to new PMAdam Nemet2016-07-184-34/+78
| | | | | | | | | | | | | | | | | | | | | | Summary: The direct motivation for the port is to ensure that the OptRemarkEmitter tests work with the new PM. This remains a function pass because we not only create multiple loops but could also version the original loop. In the test I need to invoke opt with -passes='require<aa>,loop-distribute'. LoopDistribute does not directly depend on AA however LAA does. LAA uses getCachedResult so I *think* we need manually pull in 'aa'. Reviewers: davidxl, silvas Subscribers: sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22437 llvm-svn: 275811
* [OptRemarkEmitter] Port to new PMAdam Nemet2016-07-185-15/+36
| | | | | | | | | | | | | | | | | | | | Summary: The main goal is to able to start using the new OptRemarkEmitter analysis from the LoopVectorizer. Since the vectorizer was recently converted to the new PM, it makes sense to convert this analysis as well. This pass is currently tested through the LoopDistribution pass, so I am also porting LoopDistribution to get coverage for this analysis with the new PM. Reviewers: davidxl, silvas Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22436 llvm-svn: 275810
* Sort include headersAdam Nemet2016-07-181-1/+1
| | | | llvm-svn: 275809
* [Hexagon] Misc changes to HexagonMachineScheduler, NFCKrzysztof Parzyszek2016-07-181-26/+3
| | | | | | | - Remove duplicated code. - Convert loop to range-for. llvm-svn: 275806
* [Hexagon] Enable .cur formation in MISched for Hexagon V60Krzysztof Parzyszek2016-07-181-0/+8
| | | | | | | | | | | Schedule a load and its use in the same packet in MISched. Previously, isResourceAvailable was returning false for dependences in the same packet, which prevented MISched from packetizing a load and its use in the same packet for v60. Patch by Ikhlas Ajbar. llvm-svn: 275804
* Revert "r275571 [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals"Alexander Kornienko2016-07-181-134/+50
| | | | | | Causes https://llvm.org/bugs/show_bug.cgi?id=28588 llvm-svn: 275801
* [Hexagon] Add verbose debugging mode to Hexagon MI SchedulerKrzysztof Parzyszek2016-07-182-10/+76
| | | | | | Patch by Sergei Larin. llvm-svn: 275799
OpenPOWER on IntegriCloud