summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* Propagate TBAA info in SelectionDAG::getIndexedLoadKrzysztof Parzyszek2016-08-292-1/+39
| | | | | | Patch by Pranav Bhandarkar. llvm-svn: 279998
* [Myriad]: add missing 'mcpu' valuesDouglas Katzman2016-08-291-0/+3
| | | | | | Should have been done with r276646. llvm-svn: 279996
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-2932-63/+513
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* [asan] Enable new stack poisoning with store instruction by defaultVitaly Buka2016-08-295-60/+107
| | | | | | | | | | Reviewers: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23968 llvm-svn: 279993
* GlobalISel: switch to SmallVector for pending legalizations.Tim Northover2016-08-291-6/+8
| | | | | | std::queue was doing far to many heap allocations to be healthy. llvm-svn: 279992
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-2912-150/+209
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* ASan: remove variable only used in assertions buildTim Northover2016-08-291-2/+1
| | | | llvm-svn: 279990
* GlobalISel: legalize frem to a libcall on AArch64.Tim Northover2016-08-298-5/+54
| | | | llvm-svn: 279988
* GlobalISel: rework CallLowering so that it can be used for libcalls too.Tim Northover2016-08-296-29/+89
| | | | | | | There should be no functional change here, I'm just making the implementation of "frem" (to libcall) legalization easier for a followup. llvm-svn: 279987
* AMDGPU/R600: Fix fixups used for constant arraysMatt Arsenault2016-08-292-0/+29
| | | | | | Fixes bug 29289 llvm-svn: 279986
* IfConversion: Fix branch predication bug.Kyle Butt2016-08-292-20/+98
| | | | | | | | | | | | This bug shows up with diamonds that share unpredicable, unanalyzable branches. There's an included test case from Hexagon. What was happening was that we were attempting to predicate the branch instruction despite the fact that it was checked to be the same. Now for unanalyzable branches we skip over the branch instructions when predicating the block. Differential Revision: https://reviews.llvm.org/D23939 llvm-svn: 279985
* Use store operation to poison allocas for lifetime analysis.Vitaly Buka2016-08-293-94/+730
| | | | | | | | | | | | | | | | | | Summary: Calling __asan_poison_stack_memory and __asan_unpoison_stack_memory for small variables is too expensive. Code is disabled by default and can be enabled by -asan-experimental-poisoning. PR27453 Reviewers: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23947 llvm-svn: 279984
* [asan] Separate calculation of ShadowBytes from calculating ASanStackFrameLayoutVitaly Buka2016-08-294-79/+115
| | | | | | | | | | | | Summary: No functional changes, just refactoring to make D23947 simpler. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23954 llvm-svn: 279982
* [SimplifyCFG] Hoisting invalidates metadataDavid Majnemer2016-08-292-2/+39
| | | | | | | | | We forgot to remove optimization metadata when performing hosting during FoldTwoEntryPHINode. This fixes PR29163. llvm-svn: 279980
* Make vec_fabs.ll pass with MSVC 2013Reid Kleckner2016-08-291-4/+7
| | | | | | We should revert this change once we drop support for MSVC 2013. llvm-svn: 279979
* [gold] Fix test accidentally regressed for newer goldTeresa Johnson2016-08-293-1/+18
| | | | | | | | | | | | With r279911 I accidentally regressed the gold/X86/start-lib-common.ll test for newer golds (v1.12+) that honor the --start-lib/--end-lib. Remove the alignment which should not be there to make this work with both old and new gold linkers. Additionally, now that we have a subdirectory for v1.12+ gold tests, copy this test there and check specifically for the v1.12+ behavior. llvm-svn: 279977
* [AArch64] Adjust the scheduling model for Exynos M1.Evandro Menezes2016-08-291-4/+14
| | | | | | Further refine the model for loads. llvm-svn: 279976
* [StatepointsForGC] Rematerialize in the presence of PHIsAnna Thomas2016-08-292-0/+72
| | | | | | | | | | | | | | | | Summary: While walking the use chain for identifying rematerializable values in RS4GC, add the case where the current value and base value are the same PHI nodes. This will aid rematerialization of geps and casts instead of relocating. Reviewers: sanjoy, reames, igor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23920 llvm-svn: 279975
* [LTO] Remove extraneous outputTeresa Johnson2016-08-291-1/+0
| | | | | | Remove some debugging output to stderr that snuck in with r279576. llvm-svn: 279974
* [Constant] remove fdiv and frem from canTrap()Sanjay Patel2016-08-292-8/+3
| | | | | | | | | | | Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in the backend after: https://reviews.llvm.org/rL279970 llvm-svn: 279973
* [SimplifyCFG] rename test file, regenerate checks, and add testSanjay Patel2016-08-292-41/+70
| | | | | | | The fdiv test shows a problem similar to: https://reviews.llvm.org/rL279970 llvm-svn: 279972
* [Coroutines] Part 9: Add cleanup subfunction.Gor Nishanov2016-08-2919-137/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [Coroutines] Part 9: Add cleanup subfunction. This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll) Intrinsic Changes: * coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine. * coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined. CoroSplit now creates three subfunctions: # f$resume - resume logic # f$destroy - cleanup logic, followed by a deallocation code # f$cleanup - just the cleanup code CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not. Other fixes, improvements: * Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point. * Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>) Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23844 llvm-svn: 279971
* [TargetLowering] remove fdiv and frem from canOpTrap() (PR29114)Sanjay Patel2016-08-293-15/+3
| | | | | | | | | | | | | | | | | Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a similar bug in Constant::canTrap(). This bug manifests in PR29114: https://llvm.org/bugs/show_bug.cgi?id=29114 ...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> type. Differential Revision: https://reviews.llvm.org/D23974 llvm-svn: 279970
* Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole registerKrzysztof Parzyszek2016-08-293-7/+53
| | | | | | | | | | | | | MRI::getMaxLaneMaskForVReg does not always cover the whole register. For example, on X86 the upper 16 bits of EAX cannot be accessed via any subregister. Consequently, there is no lane mask that only covers that part of EAX. The getMaxLaneMaskForVReg will return the union of the lane masks for all subregisters, and in case of EAX, that union will not cover the upper 16 bits. This fixes https://llvm.org/bugs/show_bug.cgi?id=29132 llvm-svn: 279969
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-292-2/+3
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* Use the correct ctor/dtor section for dynamic-no-pic.Rafael Espindola2016-08-292-1/+5
| | | | llvm-svn: 279967
* Mark test as XFAIL instead of disabling it everywhere.Benjamin Kramer2016-08-291-2/+2
| | | | | | | There is no lit feature 'X86' so this test is just disabled completely. Make it XFAIL until a solution is found. llvm-svn: 279966
* Move code only used by codegen out of MC. NFC.Rafael Espindola2016-08-295-51/+64
| | | | | | MC itself never needs to know about these sections. llvm-svn: 279965
* Fix -Wunused-but-set-variable warning.Haojian Wu2016-08-291-4/+0
| | | | | | | | | | | | Summary: A follow-up fix on r279958. Reviewers: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23989 llvm-svn: 279964
* AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()Tom Stellard2016-08-291-0/+11
| | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963
* Fixed a bug in type legalizer for masked gather.Igor Breger2016-08-292-1/+42
| | | | | | | | | The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist. In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain). Differential Revision: http://reviews.llvm.org/D23756 llvm-svn: 279961
* [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + ↵Igor Breger2016-08-297-728/+296
| | | | | | | | TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960
* [InstructionSelect] NumBlocks isn't defined in DEBUG build.Haojian Wu2016-08-291-1/+1
| | | | | | | | | | | | Summary: A follow-up fixing on http://llvm.org/viewvc/llvm-project?view=revision&revision=279905. Reviewers: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23985 llvm-svn: 279959
* [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just ↵Craig Topper2016-08-298-82/+192
| | | | | | | | create a ConstantFPSDNode and let that be lowered. This allows broadcast loads to used when available. llvm-svn: 279958
* [AVX-512] Always use v8i64 when converting 512-bit FAND/FOR/FXOR/FANDN to ↵Craig Topper2016-08-291-5/+3
| | | | | | integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors. llvm-svn: 279957
* [AVX-512] Add 512-bit fabs tests with and without AVX512DQ.Craig Topper2016-08-291-4/+84
| | | | llvm-svn: 279956
* [Orc] Simplify LogicalDylib and move it back inside CompileOnDemandLayer. AlsoLang Hames2016-08-296-324/+158
| | | | | | | | | | | | | | | | | | | | switch to using one indirect stub manager per logical dylib rather than one per input module. LogicalDylib is a helper class used by the CompileOnDemandLayer to manage symbol resolution between modules during lazy compilation. In particular, it ensures that internal symbols resolve correctly even in the case where multiple input modules contain the same internal symbol name (which must to be promoted to external hidden linkage so that functions in any given module can be split out by lazy compilation). LogicalDylib's resolution scheme (before this commit) required one stub-manager per input module. This made recompilation of functions (by adding a module containing a new definition) difficult, as the stub manager for any given symbol was bound to the module that supplied the original definition. By using one stubs manager for the whole logical dylib symbols can be more easily replaced, although support for doing this is not included in this patch (it will be implemented in a follow up). llvm-svn: 279952
* [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.Craig Topper2016-08-283-10/+21
| | | | llvm-svn: 279951
* [AVX-512] Add patterns for selecting 128/256-bit EVEX VPABS instructions.Craig Topper2016-08-282-2/+37
| | | | llvm-svn: 279950
* [AVX-512] Add testcases showing that we don't emit 512-bit vpabsb/vpabsw. ↵Craig Topper2016-08-281-5/+155
| | | | | | Will be fixed in a future commit. llvm-svn: 279949
* Fix some typos in the docSylvestre Ledru2016-08-287-7/+7
| | | | llvm-svn: 279943
* [x86] add tests for <3 x N> vector types (PR29114)Sanjay Patel2016-08-281-0/+40
| | | | llvm-svn: 279939
* [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat ↵Sanjay Patel2016-08-285-50/+42
| | | | | | constant vectors llvm-svn: 279937
* [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same ↵Simon Pilgrim2016-08-282-8/+20
| | | | | | | | | | number of vector elements Over eager combing prevents the correct folding of writemasks. At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask. llvm-svn: 279934
* [PowerPC] Implement lowering for atomicrmw min/max/umin/umaxHal Finkel2016-08-285-5/+587
| | | | | | Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818. llvm-svn: 279933
* [Loop Vectorizer] Fixed memory confilict checks.Elena Demikhovsky2016-08-288-30/+109
| | | | | | | | | Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930
* [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have ↵Craig Topper2016-08-288-56/+177
| | | | | | | | | | AVX512F/AVX512VL. Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available. Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though. llvm-svn: 279929
* [AVX-512] Add tests to show that we don't select masked logic ops if there ↵Craig Topper2016-08-281-0/+51
| | | | | | | | are bitcasts between the logic op and the select. This is taken from optimized IR of clang test cases for masked logic ops. llvm-svn: 279928
* [X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX ↵Craig Topper2016-08-282-40/+40
| | | | | | instructions instead of ending 128/256. NFC llvm-svn: 279927
* AMDGPU/R600: Enable Load combineJan Vesely2016-08-277-135/+1952
| | | | | | | | Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925
OpenPOWER on IntegriCloud