summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64elSagar Thakur2017-02-154-4/+184
| | | | | | | | | Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit. Reviewed by sdardis, dberris Differential: D27697 llvm-svn: 295164
* Revert r295110 and r295144.Daniel Jasper2017-02-151-156/+98
| | | | | | | This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162
* [X86][AVX] Remove REX_W from AVX instructions.Ayman Musa2017-02-151-3/+3
| | | | | | | | There is no meaning for REX_W in VEX encoded AVX instruction. Differential Revision: https://reviews.llvm.org/D29894 llvm-svn: 295157
* [X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input typesCraig Topper2017-02-151-2/+18
| | | | | | | | | | | | | | | | | | | | | | | Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155
* [AVX-512] Add PACKSS/PACKUS instructions to load folding tables.Craig Topper2017-02-151-0/+36
| | | | llvm-svn: 295154
* [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where ↵Craig Topper2017-02-151-46/+24
| | | | | | | | | | | | | | | | | | | inputs are larger than the mask Summary: The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract. This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract. Reviewers: zvi, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29926 llvm-svn: 295152
* SimplifyCFG: Register cloned assume intrinsics with assumption cache when ↵Peter Collingbourne2017-02-151-3/+10
| | | | | | | | creating critical edge. Differential Revision: https://reviews.llvm.org/D29976 llvm-svn: 295145
* WholeProgramDevirt: Separate the code that applies optzns from the code that ↵Peter Collingbourne2017-02-151-48/+86
| | | | | | | | | | | decides whether to apply them. NFCI. The idea is that the apply* functions will also be called when importing devirt optimizations. Differential Revision: https://reviews.llvm.org/D29745 llvm-svn: 295144
* Revert r295138: Instead of a series of string operations, use snprintf().Rui Ueyama2017-02-151-2/+4
| | | | | | This broke buildbots. llvm-svn: 295142
* Instead of a series of string operations, use snprintf().Rui Ueyama2017-02-151-4/+2
| | | | llvm-svn: 295138
* Return early. NFC.Rui Ueyama2017-02-151-16/+17
| | | | llvm-svn: 295137
* Use LLVM-style naming scheme.Rui Ueyama2017-02-151-22/+22
| | | | llvm-svn: 295136
* [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroupsStanislav Mekhanoshin2017-02-151-1/+5
| | | | | | | | | | This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134
* Use LLVM-style naming scheme.Rui Ueyama2017-02-151-18/+18
| | | | llvm-svn: 295132
* Remove useless local variable.Rui Ueyama2017-02-151-3/+1
| | | | llvm-svn: 295131
* Split WinCOFFObjectWriter::defineSection. NFC.Rui Ueyama2017-02-151-47/+38
| | | | llvm-svn: 295128
* Simplify WinCOFFObjectWriter by removing a template member function.Rui Ueyama2017-02-141-15/+5
| | | | llvm-svn: 295126
* Do not lookup a DenseMap twice using the same key.Rui Ueyama2017-02-141-7/+4
| | | | llvm-svn: 295124
* Use endian::write32le instead of endian::write.Rui Ueyama2017-02-141-7/+3
| | | | llvm-svn: 295120
* Use zero-initialization instead of memset.Rui Ueyama2017-02-141-18/+5
| | | | llvm-svn: 295119
* [libFuzzer] increase the size of FixedWord from 27 to 64, see PR31950Kostya Serebryany2017-02-144-1/+24
| | | | llvm-svn: 295117
* Fix a bug in caller's BFI update code after inlining.Easwaran Raman2017-02-141-3/+10
| | | | | | | | | | | | | | | | Multiple blocks in the callee can be mapped to a single cloned block since we prune the callee as we clone it. The existing code iterates over the value map and clones the block frequency (and eventually scales the frequencies of the cloned blocks). Value map's iteration is not deterministic and so the cloned block might get the frequency of any of the original blocks. The fix is to set the max of the original frequencies to the cloned block. The first block in the sequence must have this max frequency and, in the call context, subsequent blocks must have its frequency. Differential Revision: https://reviews.llvm.org/D29696 llvm-svn: 295115
* Use "%zd" format specifier for printing number of testcases executed.Kostya Serebryany2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | Summary: This helps to avoid signed integer overflow after running a fast fuzz target for several hours, e.g.: <...> Done -1097903291 runs in 54001 second(s) Reviewers: kcc Reviewed By: kcc Differential Revision: https://reviews.llvm.org/D29941 llvm-svn: 295112
* [LV] Rename Induction to PrimaryInduction. NFC.Michael Kuperstein2017-02-141-12/+12
| | | | llvm-svn: 295111
* WholeProgramDevirt: Change internal vcall data structures to match summary.Peter Collingbourne2017-02-141-74/+94
| | | | | | | | | | | | | | | | | | Group calls into constant and non-constant arguments up front, and use uint64_t instead of ConstantInt to represent constant arguments. The goal is to allow the information from the summary to fit naturally into this data structure in a future change (specifically, it will be added to CallSiteInfo). This has two side effects: - We disallow VCP for constant integer arguments of width >64 bits. - We remove the restriction that the bitwidth of a vcall's argument and return types must match those of the vfunc definitions. I don't expect either of these to matter in practice. The first case is uncommon, and the second one will lead to UB (so we can do anything we like). Differential Revision: https://reviews.llvm.org/D29744 llvm-svn: 295110
* [mips] Correct mips16 return instructions definitionsSimon Dardis2017-02-141-0/+2
| | | | | | | Correct the definition of MIPS16 instructions that act as return instructions so that isReturn = 1 as expected. llvm-svn: 295109
* [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions ↵Taewook Oh2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | created in SplitBlockPredecessors Summary: When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of ``` 1 typedef struct _node_t { 2 struct _node_t *next; 3 } node_t; 4 5 extern node_t *root; 6 7 int foo() { 8 node_t *node, *tmp; 9 int ret = 0; 10 11 node = tmp = root->next; 12 while (node != root) { 13 while (node) { 14 tmp = node; 15 node = node->next; 16 ret++; 17 } 18 } 19 20 return ret; 21 } ``` , below is the basicblock corresponding to line 12 after Reassociate expressions pass: ``` while.cond: ; preds = %while.cond2, %entry %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ] %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ] tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21 tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31 %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33 br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35 ``` As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is ``` !21 = !DILocation(line: 9, column: 7, scope: !6) ``` , which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction. Reviewers: dblaikie, samsonov, eugenis Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29867 llvm-svn: 295106
* [BranchFolding] Tail common all identical unreachable blocksReid Kleckner2017-02-141-0/+20
| | | | | | | | | | | | | | | | | | Summary: Blocks ending in unreachable are typically cold because they end the program or throw an exception, so merging them with other identical blocks is usually profitable because it reduces the size of cold code. MachineBlockPlacement generally does not arrange to fall through to such blocks, so commoning these blocks will not introduce additional unconditional branches. Reviewers: hans, iteratee, haicheng Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29153 llvm-svn: 295105
* GlobalISel: deal with new G_PTR_MASK instruction on AArch64.Tim Northover2017-02-142-0/+13
| | | | | | It's just an AND-immediate instruction for us, surprisingly simple to select. llvm-svn: 295104
* GlobalISel: introduce G_PTR_MASK to simplify alloca handling.Tim Northover2017-02-142-23/+19
| | | | | | | | | This instruction clears the low bits of a pointer without requiring (possibly dodgy if pointers aren't ints) conversions to and from an integer. Since (as far as I'm aware) all masks are statically known, the instruction takes an immediate operand rather than a register to specify the mask. llvm-svn: 295103
* Re-apply "[profiling] Remove dead profile name vars after emitting name data"Vedant Kumar2017-02-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This reverts 295092 (re-applies 295084), with a fix for dangling references from the array of coverage names passed down from frontends. I missed this in my initial testing because I only checked test/Profile, and not test/CoverageMapping as well. Original commit message: The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295099
* Reformat slightly.Eric Christopher2017-02-141-4/+3
| | | | llvm-svn: 295096
* Reapply r294532, reverted in r294787.Wolfgang Pieb2017-02-141-9/+147
| | | | | | | | | | | | | Store instructions can have more than one memory operand as a result of optimizations that fold different stores into one. When we identify spill instructions to generate DBG_VALUE instructions to record the spilling of a variable, we disregard stores with multiple memory operands for now. We may miss some relevant spills but the handling is a bit more complex, so we'll do it in a different patch. This fixes PR31935. llvm-svn: 295093
* Revert "[profiling] Remove dead profile name vars after emitting name data"Vedant Kumar2017-02-141-3/+0
| | | | | | | | This reverts commit r295084. There is a test failure on: http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/ llvm-svn: 295092
* [Support] Add StringRef::getAsDouble.Zachary Turner2017-02-141-0/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D29918 llvm-svn: 295089
* [profiling] Remove dead profile name vars after emitting name dataVedant Kumar2017-02-141-0/+3
| | | | | | | | | | | | | | | | The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295084
* [Tablegen] Instrumenting table gen DAGGenISelDAGAditya Nandakumar2017-02-141-0/+9
| | | | | | | | | | To help assist in debugging ISEL or to prioritize GlobalISel backend work, this patch adds two more tables to <Target>GenISelDAGISel.inc - one which contains the patterns that are used during selection and the other containing include source location of the patterns Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV llvm-svn: 295081
* [Hexagon] Remove leftover debugging codeKrzysztof Parzyszek2017-02-141-4/+0
| | | | llvm-svn: 295078
* Do not apply redundant LastCallToStaticBonusTaewook Oh2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075
* [LazyBFI] Fix typosAdam Nemet2017-02-141-1/+1
| | | | llvm-svn: 295073
* Add new pass LazyMachineBlockFrequencyInfoAdam Nemet2017-02-144-8/+89
| | | | | | | | | | | | | | | | | And use it in MachineOptimizationRemarkEmitter. A test will follow on top of Justin's changes to enable MachineORE in AsmPrinter. The approach is similar to the IR-level pass. It's a bit simpler because BPI is immutable at the Machine level so we don't need to make that lazy. Because of this, a new function mapping is introduced (BPIPassTrait::getBPI). This function extracts BPI from the pass. In case of the lazy pass, this is when the calculation of the BFI occurs. For Machine-level, this is the identity function. Differential Revision: https://reviews.llvm.org/D29836 llvm-svn: 295072
* fix documentation comments for Argument; NFCSanjay Patel2017-02-141-28/+0
| | | | llvm-svn: 295068
* Correct a typo, s/hosting/hoisting/Brian Cain2017-02-141-1/+1
| | | | llvm-svn: 295066
* Remove unused variable.Diego Novillo2017-02-141-1/+0
| | | | llvm-svn: 295065
* Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-141-10/+47
| | | | | | | | | | | This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
* [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputsSimon Pilgrim2017-02-141-7/+21
| | | | | | Add support for specifying an UNPCK input as UNDEF llvm-svn: 295061
* [SCEV] Cache results during GetMinTrailingZeros queryIgor Laevsky2017-02-141-8/+22
| | | | | | Differential Revision: https://reviews.llvm.org/D29759 llvm-svn: 295060
* [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get putAlexey Bataev2017-02-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056
* Removing a redundant assignmentArtyom Skrobov2017-02-141-1/+0
| | | | llvm-svn: 295055
* Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"Alexander Timofeev2017-02-142-3/+4
| | | | | | This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907. llvm-svn: 295054
OpenPOWER on IntegriCloud