bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[CGP] add an IR builder to memcmp expansion class instead of recreating it; NFCI	Sanjay Patel	2017-06-27	1	-19/+6
\| \| \| \| \| \| \|	This was a clean-up suggestion from: https://reviews.llvm.org/D34005 llvm-svn: 306438
*	fix trivial typos, NFC	Hiroshi Inoue	2017-06-27	1	-1/+1
\| \| \| \| \| \|	succesor -> successor llvm-svn: 306393
*	[CGP, memcmp] replace CreateZextOrTrunc with CreateZext because it can never ↵	Sanjay Patel	2017-06-21	1	-5/+7
\| \| \| \| \| \|	trunc llvm-svn: 305936
*	[CGP] fix variables to be unsigned in memcmp expansion	Sanjay Patel	2017-06-21	1	-12/+14
\| \| \| \|	llvm-svn: 305935
*	[CGP, PowerPC] try to constant fold before creating loads for memcmp expansion	Sanjay Patel	2017-06-19	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the last step needed to avoid regressions for x86 before we flip the switch to allow expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings, so we need to do that here too. FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1 constant tests because its alignment requirements are too strict (shouldn't require alignment for a constant operand). Differential Revision: https://reviews.llvm.org/D34071 llvm-svn: 305734
*	Allow -profile-guided-section-prefix more than once	David Callahan	2017-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: At present, `-profile-guided-section-prefix` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times. Our build system makes it pretty tricky to guarantee this. We often accidentally pass the flag more than once (but always with the same "false" value) which results in an error, after which compilation fails: ``` clang (LLVM option parsing): for the -profile-guided-section-prefix option: may only occur zero or one times! ``` While we work on improving our build system, it also seems reasonable just to allow `-profile-guided-section-prefix` to be passed more than once, by to `cl::ZeroOrMore`. Quoting [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed \| the documentation ]]: > The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times. > ... > If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained. Reviewers: danielcdh Reviewed By: danielcdh Subscribers: twoh, david2050, llvm-commits Differential Revision: https://reviews.llvm.org/D34219 llvm-svn: 305413
*	[CGP] add a reference to DataLayout in MemCmpExpansion; NFCI	Sanjay Patel	2017-06-09	1	-20/+22
\| \| \| \| \| \| \| \|	We're currently passing endian-ness around as a param (and not uniformly), so this eliminates the need for that. I'd like to add a constant fold call too, and that requires a DL. llvm-svn: 305129
*	fix formatting; NFC	Sanjay Patel	2017-06-08	1	-6/+6
\| \| \| \|	llvm-svn: 305008
*	[CGP] don't expand a memcmp with nobuiltin attribute	Sanjay Patel	2017-06-08	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007
*	[CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion	Sanjay Patel	2017-06-08	1	-21/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test diff for PowerPC shows we can better optimize if this case is one block. For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this: _cmp_eq8: movq (%rdi), %rax cmpq (%rsi), %rax je LBB23_1 ## BB#2: ## %res_block movl $1, %ecx jmp LBB23_3 LBB23_1: xorl %ecx, %ecx LBB23_3: ## %endblock xorl %eax, %eax testl %ecx, %ecx sete %al retq We get this: cmp_eq8: movq (%rdi), %rcx xorl %eax, %eax cmpq (%rsi), %rcx sete %al retq And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally. Differential Revision: https://reviews.llvm.org/D34005 llvm-svn: 304987
*	[CGP] avoid zext/trunc of a memcmp expansion compare	Sanjay Patel	2017-06-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This could be viewed as another shortcoming of the DAGCombiner: when both operands of a compare are zexted from the same source type, we should be able to compare the original types. The effect on PowerPC perf is likely unnoticeable, but there's a visible regression for x86 if we feed the suboptimal IR for memcmp expansion to the DAG: _cmp_eq4_zexted_to_i64: movl (%rdi), %ecx movl (%rsi), %edx xorl %eax, %eax cmpq %rdx, %rcx sete %al _cmp_eq4_better: movl (%rdi), %ecx xorl %eax, %eax cmpl (%rsi), %ecx sete %al llvm-svn: 304923
*	[CGP] pass size as param in MemCmpExpansion; NFCI	Sanjay Patel	2017-06-07	1	-10/+5
\| \| \| \| \| \|	Avoid extracting the constant int twice. llvm-svn: 304920
*	[CGP] pass size as param in MemCmpExpansion; NFCI	Sanjay Patel	2017-06-07	1	-13/+8
\| \| \| \| \| \|	Avoid extracting the constant int twice. llvm-svn: 304917
*	[CGP] getParent()->getParent() --> getFunction(); NFCI	Sanjay Patel	2017-06-07	1	-5/+4
\| \| \| \|	llvm-svn: 304916
*	[CGP] add helper function for generating compare of load pairs; NFCI	Sanjay Patel	2017-06-07	1	-5/+16
\| \| \| \| \| \| \| \|	In the special (but also the likely common) case, we can avoid the multi-block complexity of the general algorithm, so moving this part off on its own will make it re-usable. llvm-svn: 304908
*	[CGP] fix formatting in MemCmpExpansion; NFC	Sanjay Patel	2017-06-07	1	-8/+6
\| \| \| \|	llvm-svn: 304903
*	[CGP / PowerPC] use direct compares if there's only one load per block in ↵	Sanjay Patel	2017-06-07	1	-11/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memcmp() expansion I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle. I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time limitation in the DAG. We might have to just avoid those special-cases here in CGP. But regardless of that, I think this is a win for the more general cases. http://rise4fun.com/Alive/cbQ Differential Revision: https://reviews.llvm.org/D33963 llvm-svn: 304849
*	[CGP] fix formatting/typos in MemCmpExpansion; NFC	Sanjay Patel	2017-06-06	1	-36/+34
\| \| \| \|	llvm-svn: 304830
*	Sort the remaining #include lines in include/... and lib/....	Chandler Carruth	2017-06-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
*	[PPC] Inline expansion of memcmp	Zaara Syeda	2017-05-31	1	-1/+611
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313
*	CodeGen: Rename DEBUG_TYPE to match passnames	Matthias Braun	2017-05-25	1	-2/+2
\| \| \| \| \| \| \| \|	Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
*	[IR] De-virtualize ~Value to save a vptr	Reid Kleckner	2017-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362
*	[LegacyPassManager] Remove TargetMachine constructors	Francis Visoiu Mistrih	2017-05-18	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
*	[X86] Relocate code of replacement of subtarget unsupported masked memory ↵	Ayman Musa	2017-05-15	1	-546/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	intrinsics to run also on -O0 option. Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional). CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0). Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation. Differential Revision: https://reviews.llvm.org/D32487 llvm-svn: 303050
*	Fix code section prefix for proper layout	Teresa Johnson	2017-05-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: r284533 added hot and cold section prefixes based on profile information, to enable grouping of hot/cold functions at link time. However, it used "cold" as the prefix for cold sections, but gold only recognizes "unlikely" (which is used by gcc for cold sections). Therefore, cold sections were not properly being grouped. Switch to using "unlikely" Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32983 llvm-svn: 302502
*	Rename WeakVH to WeakTrackingVH; NFC	Sanjoy Das	2017-05-01	1	-4/+5
\| \| \| \| \| \|	This relands r301424. llvm-svn: 301812
*	Kill off the old SimplifyInstruction API by converting remaining users.	Daniel Berlin	2017-04-28	1	-1/+1
\| \| \| \|	llvm-svn: 301673
*	Reverts commit r301424, r301425 and r301426	Sanjoy Das	2017-04-26	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429
*	Rename WeakVH to WeakTrackingVH; NFC	Sanjoy Das	2017-04-26	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424
*	[APInt] Use lshrInPlace to replace lshr where possible	Craig Topper	2017-04-18	1	-4/+2
\| \| \| \| \| \| \| \| \| \|	This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566
*	[CodeGenPrepare] Fix crash due to an invalid CFG	Brendon Cahoon	2017-04-17	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The splitIndirectCriticalEdges function generates and invalid CFG when the 'Target' basic block is a loop to itself. When this occurs, the code that updates the predecessor terminator needs to update the terminator in the split basic block. This occurs when there is an edge from block D back to D. Since D is split in to D0 and D1, the code needs to update the terminator in D1. But D1 is not in the OtherPreds vector, so it was not getting updated. Differential Revision: https://reviews.llvm.org/D32126 llvm-svn: 300480
*	[IR] Redesign the case iterator in SwitchInst to actually be an iterator	Chandler Carruth	2017-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and to expose a handle to represent the actual case rather than having the iterator return a reference to itself. All of this allows the iterator to be used with common STL facilities, standard algorithms, etc. Doing this exposed some missing facilities in the iterator facade that I've fixed and required some work to the actual iterator to fully support the necessary API. Differential Revision: https://reviews.llvm.org/D31548 llvm-svn: 300032
*	Turn on -addr-sink-using-gep by default.	Eli Friedman	2017-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
*	[CodeGenPrep] move aarch64-type-promotion to CGP	Jun Bum Lim	2017-04-03	1	-52/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Move the aarch64-type-promotion pass within the existing type promotion framework in CGP. This change also support forking sexts when a new sext is required for promotion. Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853. Reviewers: jmolloy, mcrosier, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: llvm-commits, aemerson, rengolin, mcrosier Differential Revision: https://reviews.llvm.org/D28680 llvm-svn: 299379
*	[APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt ↵	Craig Topper	2017-04-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	class. Implement them without memory allocation for multiword This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation. Differential Revision: https://reviews.llvm.org/D31565 llvm-svn: 299362
*	Use isFunctionHotInCallGraph to set the function section prefix.	Dehao Chen	2017-03-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31225 llvm-svn: 298656
*	Rename AttributeSet to AttributeList	Reid Kleckner	2017-03-21	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
*	[CodeGenPrep]Restructure promoting Ext to form ExtLoad	Jun Bum Lim	2017-03-17	1	-90/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case. This change was motivated from D26524. Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP. Reviewers: jmolloy, qcolombet, mcrosier, javed.absar Reviewed By: qcolombet Subscribers: rengolin, llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D27853 llvm-svn: 298114
*	CodeGenPrepare: Sink addressing modes for atomics	Matt Arsenault	2017-03-15	1	-1/+30
\| \| \| \|	llvm-svn: 297903
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-28	1	-0/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416
*	Revert "[CGP] Split some critical edges coming out of indirect branches"	Daniel Jasper	2017-02-26	1	-155/+0
\| \| \| \| \| \| \|	This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295
*	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.	Eli Friedman	2017-02-24	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	1	-0/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149
*	Revert r269060 to pacify bots.	Michael Kuperstein	2017-02-24	1	-155/+0
\| \| \| \|	llvm-svn: 296064
*	[CGP] Split some critical edges coming out of indirect branches	Michael Kuperstein	2017-02-24	1	-0/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060
*	[CodeGenPrepare] Sink and duplicate more 'and' instructions.	Geoff Berry	2017-02-21	1	-78/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746
*	TargetLowering: Remove AddrSpace parameter from GetAddrModeArguments	Matt Arsenault	2017-02-08	1	-7/+6
\| \| \| \| \| \| \|	It doesn't make any sense to pass in to what is supposed to be parsing the call, and this can be inferred from the pointer output. llvm-svn: 294412
*	[CodeGenPrepare] Hoist all getSubtargetImpl calls to the beginning of the pass	Igor Laevsky	2017-02-07	1	-22/+30
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29456 llvm-svn: 294301
*	[CodeGenPrep]No negative cost in the ExtLd promotion	Jun Bum Lim	2017-01-27	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument. Reviewers: mcrosier, jmolloy, qcolombet, javed.absar Reviewed By: mcrosier, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28871 llvm-svn: 293307
*	[CodeGenPrepare] Fix a typo in the comment. NFC.	Haicheng Wu	2017-01-18	1	-1/+1
\| \| \| \| \| \| \| \|	encode => endcode. Differential Revision: https://reviews.llvm.org/D28866 llvm-svn: 292438