bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Don't create unnecessary PHIs"	Reid Kleckner	2015-12-14	2	-200/+4
\| \| \| \| \| \| \| \| \|	This reverts commit r255489. It causes test failures in Chromium and does not appear to respect the AlternativeV parameter. llvm-svn: 255562
*	add fast-math-flags to 'call' instructions (PR21290)	Sanjay Patel	2015-12-14	6	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp) to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add support to clang, and extend FMF to the DAG for calls. Motivating example: %y = fmul fast float %x, %x %z = tail call float @sqrtf(float %y) We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide attribute for unsafe-math, but we really want to trigger on the instructions themselves: %z = tail call fast float @sqrtf(float %y) because in an LTO build it's possible that calls with fast semantics have been inlined into a function with non-fast semantics. The code changes and tests are based on the recent commits that added "notail": http://reviews.llvm.org/rL252368 and added FMF to fcmp: http://reviews.llvm.org/rL241901 Differential Revision: http://reviews.llvm.org/D14707 llvm-svn: 255555
*	[IR] Remove terminatepad	David Majnemer	2015-12-14	5	-134/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522
*	[InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)	Sanjay Patel	2015-12-14	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The idea is to take the existing fold of: bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X) ( http://reviews.llvm.org/rL112232 ) And break it into less specific transforms so we'll catch more cases such as the example in the bug report: bitcast ( trunc ( lshr ( bitcast X))) --> bitcast ( extractelement (bitcast X)) --> extractelement (bitcast X) Enabling patches for this change: http://reviews.llvm.org/rL255399 (combine bitcasts) http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X)) Differential Revision: http://reviews.llvm.org/D15392 llvm-svn: 255504
*	Don't create unnecessary PHIs	James Molloy	2015-12-14	2	-4/+200
\| \| \| \| \| \| \| \| \| \| \| \|	In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. llvm-svn: 255489
*	Revert r255460, which still causes test failures on some platforms.	Cong Hou	2015-12-13	2	-70/+1
\| \| \| \| \| \|	Further investigation on the failures is ongoing. llvm-svn: 255463
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	2	-1/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460
*	Revert r255454 as it leads to several test failers on buildbots.	Cong Hou	2015-12-13	2	-69/+1
\| \| \| \|	llvm-svn: 255456
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	2	-1/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454
*	[PGO] Stop using invalid char in instr variable names.	Xinliang David Li	2015-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This is the second try. The fix in this patch is very localized. Only profile symbol names of profile symbols with internal linkage are fixed up while initializer of name syms are not changes. This means there is no format change nor version bump. llvm-svn: 255434
*	[InstCombine] canonicalize (bitcast (extractelement X)) --> ↵	Sanjay Patel	2015-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(extractelement(bitcast X)) This change was discussed in D15392. It allows us to remove the fold that was added in: http://reviews.llvm.org/r255261 ...and it will allow us to generalize this fold: http://reviews.llvm.org/rL112232 while preserving the order of bitcast + extract that it produces and testing shows is better handled by the backend. Note that the existing check for "isVectorTy()" wasn't strong enough in general and specifically because: x86_mmx. It's not a vector, but it's not vectorizable either. So here we check VectorType::isValidElementType() directly before proceeding with the transform. llvm-svn: 255433
*	Move catchpad-phi-cast.ll to the X86 specific subdirectory	David Majnemer	2015-12-12	1	-0/+0
\| \| \| \| \| \| \|	It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426
*	[IR] Reformulate LLVM's EH funclet IR	David Majnemer	2015-12-12	11	-249/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422
*	[InstCombine] allow any pair of bitcasts to be combined	Sanjay Patel	2015-12-12	1	-12/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is discussed in D15392 and should allow us to effectively revert: http://llvm.org/viewvc/llvm-project?view=revision&revision=255261 if we canonicalize bitcasts ahead of extracts. It should be safe to convert any pair of bitcasts into a single bitcast, however, it was mentioned here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html that we're not allowed to bitcast from an x86_mmx to some other types, but I'm not seeing any failures from that, and we have regression tests in CodeGen/X86 that appear to cover all of those cases. Some day we'll get to remove that MMX wart from LLVM IR completely? Differential Revision: http://reviews.llvm.org/D15468 llvm-svn: 255399
*	use FileCheck for better checking	Sanjay Patel	2015-12-12	1	-3/+22
\| \| \| \|	llvm-svn: 255394
*	Add tests for bitcast-bitcast sequences for all scalar/vector permutations	Sanjay Patel	2015-12-11	1	-0/+90
\| \| \| \| \| \|	As noted in http://reviews.llvm.org/D15392 , we should be able to improve this. llvm-svn: 255370
*	[PGO] Revert r255365: solution incomplete, not handling lambda yet	Xinliang David Li	2015-12-11	2	-3/+3
\| \| \| \|	llvm-svn: 255369
*	[PGO] Stop using invalid char in instr variable names.	Xinliang David Li	2015-12-11	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This patch fixed the issue. With the change, the index format version will be bumped up by 1. Backward compatibility is preserved with this change. Differential Revision: http://reviews.llvm.org/D15243 llvm-svn: 255365
*	Revert r255247, r255265, and r255286 due to serious compile-time regressions.	Chad Rosier	2015-12-11	5	-151/+0
\| \| \| \| \| \| \| \|	Revert "[DSE] Disable non-local DSE to see if the bots go green." Revert "[DeadStoreElimination] Use range-based loops. NFC." Revert "[DeadStoreElimination] Add support for non-local DSE." llvm-svn: 255354
*	[Mem2Reg] Respect optnone	James Molloy	2015-12-11	1	-0/+21
\| \| \| \| \| \| \| \|	Mem2Reg shouldn't be optimizing a function that is marked optnone. There is a test checking this that fails when mem2reg is explicitly added to the standard pass pipeline. llvm-svn: 255336
*	[InstCombine] Make MatchBSwap also match bit reversals	James Molloy	2015-12-11	1	-0/+114
\| \| \| \| \| \|	MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse. llvm-svn: 255334
*	EarlyCSE: add tests	JF Bastien	2015-12-10	1	-10/+68
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15371 llvm-svn: 255295
*	[DSE] Disable non-local DSE to see if the bots go green.	Chad Rosier	2015-12-10	4	-4/+4
\| \| \| \| \| \|	I see a few bots timing out, so I'm speculatively disabling r255247. llvm-svn: 255286
*	[PGO] Use %t as the temporary profdata filename in the test cases.	Rong Xu	2015-12-10	10	-19/+19
\| \| \| \| \| \|	Using %t rather %T/<specific_name> as the temporary profdata filename. llvm-svn: 255271
*	[InstCombine] fold bitcasts around an extractelement (3rd try)	Sanjay Patel	2015-12-10	1	-8/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a redo of r255137 (reverted at r255227) which was a redo of r255124 (reverted at r255126) with a fixed check for a scalar source type and an added test for the failure that caused the revert. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255261
*	[DeadStoreElimination] Add support for non-local DSE.	Chad Rosier	2015-12-10	5	-0/+151
\| \| \| \| \| \| \| \| \| \| \| \|	We extend the search for redundant stores to predecessor blocks that unconditionally lead to the block BB with the current store instruction. That also includes single-block loops that unconditionally lead to BB, and if-then-else blocks where then- and else-blocks unconditionally lead to BB. http://reviews.llvm.org/D13363 Patch by Ivan Baev <ibaev@codeaurora.org>! llvm-svn: 255247
*	Revert r255137.	Akira Hatanaka	2015-12-10	1	-20/+8
\| \| \| \| \| \|	This commit broke apple's internal bot. llvm-svn: 255227
*	[PGO] Rename the profdata filename to avoid the conflict b/w tests.	Rong Xu	2015-12-09	1	-2/+2
\| \| \| \| \| \| \| \|	Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same profdata filename which can conflict in current test runs. This patch renames them to have different names. llvm-svn: 255158
*	IR: Make ConstantDataArray::getFP actually return a ConstantDataArray	Justin Bogner	2015-12-09	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>) overload has had a typo in it since it was written, where it will create a Vector instead of an Array. This obviously doesn't work at all, but it turns out that until r254991 there weren't actually any callers of this overload. Fix the typo and add some test coverage. llvm-svn: 255157
*	[Float2Int] Don't operate on vector instructions	Reid Kleckner	2015-12-09	1	-0/+10
\| \| \| \| \| \| \|	This fixes a crash bug. It's also not clear if we'd want to do this transform for vectors. llvm-svn: 255155
*	Use WeakVH to keep track of calls with operand bundles in CloneCodeInfo	Sanjoy Das	2015-12-09	1	-0/+31
\| \| \| \| \| \| \| \|	`CloneAndPruneIntoFromInst` can DCE instructions after cloning them into the new function, and so an AssertingVH is too strong. This change switches CloneCodeInfo to use a std::vector<WeakVH>. llvm-svn: 255148
*	[InstCombine] fold bitcasts around an extractelement (2nd try)	Sanjay Patel	2015-12-09	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a redo of r255124 (reverted at r255126) with an added check for a scalar destination type and an added test for the failure seen in Clang's test/CodeGen/vector.c. The extra test shows a different missing optimization. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255137
*	[PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021)	Rong Xu	2015-12-09	19	-0/+572
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new patch fixes a few bugs that exposed in last submit. It also improves the test cases. --Original Commit Message-- This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 255132
*	Revert "[InstCombine] fold bitcasts around an extractelement"	Mehdi Amini	2015-12-09	1	-5/+8
\| \| \| \| \| \| \| \| \|	This reverts commit r255124. Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255126
*	[InstCombine] fold bitcasts around an extractelement	Sanjay Patel	2015-12-09	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255124
*	Change hasUniqueInitializer() to call isStrongDefinitionForLinker() instead ↵	Mehdi Amini	2015-12-09	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of !isWeakForLinker() Summary: Available_externally global variable with initializer were considered "hasInitializer()", while obviously it can't match the description: Whether the global variable has an initializer, and any changes made to the initializer will turn up in the final executable. since modifying the initializer of an externally available variable does not make sense. Reviewers: pcc, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15351 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255123
*	Don't drop attributes when inlining through "deopt" operand bundles	Sanjoy Das	2015-12-09	1	-0/+39
\| \| \| \| \| \| \|	Test case attached (test case also checks that we don't drop the calling convention, but that functionality was correct before this patch). llvm-svn: 255088
*	[OperandBundles] Have PruneEH work correct with operand bundles.	Sanjoy Das	2015-12-08	1	-0/+26
\| \| \| \| \| \| \| \|	For an invoke with operand bundles, the [op_begin(), op_end()-3] range can contain things other than invoke arguments. This change teaches PruneEH to use arg_begin() and arg_end() explicitly. llvm-svn: 255073
*	[CGP] Reimplement r255055 a different way	Reid Kleckner	2015-12-08	1	-0/+57
\| \| \| \|	llvm-svn: 255070
*	Revert "[CGP] Check that we have an insert point before moving ↵	Reid Kleckner	2015-12-08	1	-57/+0
\| \| \| \| \| \| \| \| \| \|	llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063
*	[OperandBundles] Fix a transform in simplifycfg	Sanjoy Das	2015-12-08	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	Reviewers: pcc, majnemer, reames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D15345 llvm-svn: 255062
*	[CGP] Check that we have an insert point before moving llvm.dbg.value around	Reid Kleckner	2015-12-08	1	-0/+57
\| \| \| \|	llvm-svn: 255055
*	[EarlyCSE] Value forwarding for unordered atomics	Philip Reames	2015-12-08	1	-0/+127
\| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future. The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic. No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch. Differential Revision: http://reviews.llvm.org/D15337 llvm-svn: 255054
*	Revert "Add Available Externally linkage type to isWeakForLinker()"	Mehdi Amini	2015-12-08	1	-22/+0
\| \| \| \| \| \| \|	This reverts r255043, as per post-review concern were raised on the correctness. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255045
*	Cleanup test: remove useless alignment	Mehdi Amini	2015-12-08	1	-2/+2
\| \| \| \| \|	From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255044
*	Add Available Externally linkage type to isWeakForLinker()	Mehdi Amini	2015-12-08	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per LangRef: "Globals with available_externally linkage are allowed to be discarded at will, and are otherwise the same as linkonce_odr", since linkonce_odr is in this list it makes sense to have available_externally there as well. Reviewers: rafael Differential Revision: http://reviews.llvm.org/D15323 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255043
*	[IndVars] Have getInsertPointForUses preserve LCSSA	Sanjoy Das	2015-12-08	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also add a stricter post-condition for IndVarSimplify. Fixes PR25578. Test case by Michael Zolotukhin. Reviewers: hfinkel, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15059 llvm-svn: 254977
*	[SCEVExpander] Have hoistIVInc preserve LCSSA	Sanjoy Das	2015-12-08	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Note: the problematic invocation of hoistIVInc that caused PR24804 came from IndVarSimplify, not from SCEVExpander itself) Fixes PR24804. Test case by David Majnemer. Reviewers: hfinkel, majnemer, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15058 llvm-svn: 254976
*	[InstCombine] Call getCmpPredicateForMinMax only with a valid SPF	Sanjoy Das	2015-12-05	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are `SelectPatternFlavor`s that don't represent min or max idioms, and we should not be passing those to `getCmpPredicateForMinMax`. Fixes PR25745. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15249 llvm-svn: 254869
*	[SimplifyLibCalls] Optimization for pow(x, n) where n is some constant	Weiming Zhao	2015-12-04	1	-0/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In order to avoid calling pow function we generate repeated fmul when n is a positive or negative whole number. For each exponent we pre-compute Addition Chains in order to minimize the no. of fmuls. Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html We pre-compute addition chains for exponents upto 32 (which results in a max of 7 fmuls). For eg: 4 = 2+2 5 = 2+3 6 = 3+3 and so on Hence, pow(x, 4.0) ==> y = fmul x, x x = fmul y, y ret x For negative exponents, we simply compute the reciprocal of the final result. Note: This transformation is only enabled under fast-math. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: weimingz, majnemer, escha, davide, scanon, joerg Subscribers: probinson, escha, llvm-commits Differential Revision: http://reviews.llvm.org/D13994 llvm-svn: 254776