bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add a loop rerolling flag to the PassManagerBuilder	Hal Finkel	2013-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \|	This adds a boolean member variable to the PassManagerBuilder to control loop rerolling (just like we have for unrolling and the various vectorization options). This is necessary for control by the frontend. Loop rerolling remains disabled by default at all optimization levels. llvm-svn: 194966
*	DebugLoc defines LineCol as 32 bit in comment but unsigned in code.	Yaron Keren	2013-11-17	1	-1/+3
\| \| \| \| \| \| \| \|	This patch modifies LineCol to be a uint32_t. See http://llvm.org/bugs/show_bug.cgi?id=17957 llvm-svn: 194957
*	[block-freq] Add BlockFrequency::scale that returns a remainder from the ↵	Michael Gottesman	2013-11-17	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	division and make the private scale in BlockFrequency more performant. This change is the first in a series of changes improving LLVM's Block Frequency propogation implementation to not lose probability mass in branchy code when propogating block frequency information from a basic block to its successors. This patch is a simple infrastructure improvement that does not actually modify the block frequency algorithm. The specific changes are: 1. Changes the division algorithm used when scaling block frequencies by branch probabilities to a short division algorithm. This gives us the remainder for free as well as provides a nice speed boost. When I benched the old routine and the new routine on a Sandy Bridge iMac with disabled turbo mode performing 8192 iterations on an array of length 32768, I saw ~600% increase in speed in mean/median performance. 2. Exposes a scale method that returns a remainder. This is important so we can ensure that when we scale a block frequency by some branch probability BP = N/D, the remainder from the division by D can be retrieved and propagated to other children to ensure no probability mass is lost (more to come on this). llvm-svn: 194950
*	[PM] Completely remove support for explicit 'require' methods on the	Chandler Carruth	2013-11-17	1	-31/+6
\| \| \| \| \| \| \|	AnalysisManager. All this method did was assert something and we have a perfectly good way to trigger that assert from the query path. llvm-svn: 194947
*	Added a size field to the stack map record to handle subregister spills.	Andrew Trick	2013-11-17	2	-4/+22
\| \| \| \| \| \| \| \|	Implementing this on bigendian platforms could get strange. I added a target hook, getStackSlotRange, per Jakob's recommendation to make this as explicit as possible. llvm-svn: 194942
*	Add a loop rerolling pass	Hal Finkel	2013-11-16	4	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3i] = foo(0); x[3i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. llvm-svn: 194939
*	ScalarEvolution: Warn if the result of setFlags/clearFlags is unused.	Benjamin Kramer	2013-11-16	1	-5/+6
\| \| \| \| \| \|	This was a source of bugs in the past. llvm-svn: 194929
*	Annotate APInt methods where it's not clear whether they are in place with ↵	Benjamin Kramer	2013-11-16	2	-35/+41
\| \| \| \| \| \| \| \|	warn_unused_result. Fix ScalarEvolution bugs uncovered by this. llvm-svn: 194928
*	Fix filename in header comment	Duncan P. N. Exon Smith	2013-11-16	1	-1/+1
\| \| \| \|	llvm-svn: 194924
*	X86: Encode the 'h' cpu subtype in the MachO header for x86.	Jim Grosbach	2013-11-16	1	-1/+2
\| \| \| \|	llvm-svn: 194906
*	Implemented aarch64 Neon scalar vmulx_lane intrinsics	Ana Pazos	2013-11-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implemented aarch64 Neon scalar vfma_lane intrinsics Implemented aarch64 Neon scalar vfms_lane intrinsics Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. Implemented legacy vfma_lane_f64, vfms_lane_f64, vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. llvm-svn: 194888
*	[weak vtables] Remove a bunch of weak vtables	Juergen Ributzka	2013-11-15	12	-7/+29
\| \| \| \| \| \| \| \| \| \| \|	This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865
*	[AArch64] Fix the scalar NEON ACLE functions so that they return float/double	Chad Rosier	2013-11-15	1	-8/+8
\| \| \| \| \| \|	rather than the vector equivalent. llvm-svn: 194853
*	Path: Recognize COFF import library file magic.	Rui Ueyama	2013-11-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make identify_magic to recognize COFF import file. Reviewers: Bigcheese CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2165 llvm-svn: 194852
*	Readobj: If NumbersOfSections is 0xffff, it's an COFF import library.	Rui Ueyama	2013-11-15	1	-0/+2
\| \| \| \| \| \| \| \|	0xffff does not mean that there are 65535 sections in a COFF file but indicates that it's a COFF import library. This patch fixes SEGV error when an import library file is passed to llvm-readobj. llvm-svn: 194844
*	Avoid illegal integer promotion in fastisel	Bob Wilson	2013-11-15	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840
*	Add AVX512 unmasked FMA intrinsics and support.	Cameron McInally	2013-11-15	1	-0/+48
\| \| \| \|	llvm-svn: 194824
*	Fix illegal DAG produced by SelectionDAG::getConstant() for v2i64 type	Daniel Sanders	2013-11-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When getConstant() is called for an expanded vector type, it is split into multiple scalar constants which are then combined using appropriate build_vector and bitcast operations. In addition to the usual big/little endian differences, the case where the element-order of the vector does not have the same endianness as the elements themselves is also accounted for. For example, for v4i32 on big-endian MIPS, the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is <0123,4567,89AB,CDEF>. Handling this case turns out to be a nop since getConstant() returns a splatted vector (so reversing the element order doesn't change the value) This fixes a number of cases in MIPS MSA where calling getConstant() during operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger differences between illegal and legal types such as legalizing v2i64 into v8i16. lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling getConstant() so this function has been updated in the same patch. For the sake of transparency, the steps I've taken since the review are: * Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed that the MIPS tests were falsely passing because a polymorphic function was not actually polymorphic in the reviewed patch. * Fixed the tests that were now failing. This involved deleting the code to handle the MIPS MSA element-order (which was previously doing an byte-order swap instead of an element-order swap). This left isVectorEltOrderLittleEndian() unused and it was deleted. * Fixed build failures caused by rebasing beyond r194467-r194472. These build failures involved the bset, bneg, and bclr instructions added in these commits using lowerMSASplatImm() in a way that was no longer valid after this patch. Some of these were fixed by calling SelectionDAG::getConstant() instead, others were fixed by a new function getBuildVectorSplat() that provided the removed functionality of lowerMSASplatImm() in a more sensible way. Reviewers: bkramer Reviewed By: bkramer CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1973 llvm-svn: 194811
*	Add target hook to prevent folding some bitcasted loads.	Matt Arsenault	2013-11-15	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is to avoid this transformation in some cases: fold (conv (load x)) -> (load (conv*)x) On architectures that don't natively support some vector loads efficiently casting the load to a smaller vector of larger types and loading is more efficient. Patch by Micah Villmow. llvm-svn: 194783
*	[llvm-c] Add missing const qualifiers to LLVMCreateTargetMachine	Peter Zotov	2013-11-15	1	-3/+3
\| \| \| \|	llvm-svn: 194770
*	[llvm-c] Simplify signature of LLVMGetTargetFromName	Peter Zotov	2013-11-15	1	-1/+1
\| \| \| \| \| \| \|	LLVMGetTargetFromName was not yet present in an LLVM release, so this does not break compatibility. llvm-svn: 194769
*	Add addrspacecast instruction.	Matt Arsenault	2013-11-15	12	-18/+111
\| \| \| \| \| \|	Patch by Michele Scandale! llvm-svn: 194760
*	Include raw_ostream.h.	Rui Ueyama	2013-11-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Including only Debug.h did not cause a compilation error, but you couldn't do anything (like writing something with <<) to raw_ostreams returned by llvm::dbgs() or llvm::errs() without including raw_ostream.h. So including it from Debug.h should make sense. Differential Revision: http://llvm-reviews.chandlerc.com/D2183 llvm-svn: 194759
*	Fix the header comment of the new pass manager stuff to not claim to be	Chandler Carruth	2013-11-14	1	-1/+1
\| \| \| \| \| \|	the legacy stuff. =] llvm-svn: 194689
*	[AArch64 neon] support poly64 and relevant intrinsic functions.	Kevin Qin	2013-11-14	1	-0/+3
\| \| \| \|	llvm-svn: 194659
*	Implement aarch64 neon instruction class SIMD misc.	Kevin Qin	2013-11-14	1	-0/+33
\| \| \| \|	llvm-svn: 194656
*	Add dyn_cast<> support to YAML I/O's IO class	Nick Kledzik	2013-11-14	1	-3/+8
\| \| \| \|	llvm-svn: 194655
*	Added BlockFrequencyInfo::view for displaying the block frequency ↵	Michael Gottesman	2013-11-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	propagation graph via graphviz. This is useful for debugging issues in the BlockFrequency implementation since one can easily visualize where probability mass and other errors occur in the propagation. llvm-svn: 194654
*	Implement AArch64 NEON instruction set AdvSIMD (table).	Jiangning Liu	2013-11-14	1	-0/+45
\| \| \| \|	llvm-svn: 194648
*	Add simple support for tags in YAML I/O	Nick Kledzik	2013-11-14	1	-2/+4
\| \| \| \|	llvm-svn: 194644
*	llvm-cov: Slightly improved error checking.	Yuchen Wu	2013-11-14	1	-2/+2
\| \| \| \| \| \| \| \|	- readInt() should check all 4 bytes can be read, not just 1. - In the event of false data in the gcno file, it was possible to index into a non-existent index of SmallVector, causing assertion error. llvm-svn: 194639
*	llvm-cov: Removed StringMap holding GCOVLines.	Yuchen Wu	2013-11-14	1	-16/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the hazy gcov documentation, it appeared to be technically possible for lines within a block to belong to different source files. However, upon further investigation, gcov does not actually support multiple source files for a single block. This change removes a level of separation between blocks and lines by replacing the StringMap of GCOVLines with a SmallVector of ints representing line numbers. This also means that the GCOVLines class is no longer needed. This paves the way for supporting the "-a" option, which will output block information. llvm-svn: 194637
*	llvm-cov: Replaced asserts with proper error handling.	Yuchen Wu	2013-11-14	1	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unified the interface for read functions. They all return a boolean indicating if the read from file succeeded. Functions that previously returned the read value now store it into a variable that is passed in by reference instead. Callers will need to check the return value to detect if an error occurred. Also added a new test which ensures that no assertions occur when file contains invalid data. llvm-cov should return with error code 1 upon failure. llvm-svn: 194635
*	[AArch64] Add support for legacy AArch32 NEON scalar shift by immediate	Chad Rosier	2013-11-13	1	-13/+0
\| \| \| \| \| \| \| \|	instructions. This patch does not include the shift right and accumulate instructions. A number of non-overloaded intrinsics have been remove in favor of their overloaded counterparts. llvm-svn: 194598
*	Make sure LLVMLoadLibraryPermanently gets an extern "C" symbol.	Benjamin Kramer	2013-11-13	1	-1/+0
\| \| \| \| \| \| \|	Otherwise it's impossible to use it. Also don't include C++ headers in a C header. llvm-svn: 194581
*	Remove AllowQuotesInName and friends from MCAsmInfo.	Rafael Espindola	2013-11-13	1	-21/+0
\| \| \| \| \| \| \| \| \| \| \|	Accepting quotes is a property of an assembler, not of an object file. For example, ELF can support any names for sections and symbols, but the gnu assembler only accepts quotes in some contexts and llvm-mc in a few more. LLVM should not produce different symbols based on a guess about which assembler will be reading the code it is printing. llvm-svn: 194575
*	SampleProfileLoader pass. Initial setup.	Diego Novillo	2013-11-13	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new scalar pass that reads a file with samples generated by 'perf' during runtime. The samples read from the profile are incorporated and emmited as IR metadata reflecting that profile. The profile file is assumed to have been generated by an external profile source. The profile information is converted into IR metadata, which is later used by the analysis routines to estimate block frequencies, edge weights and other related data. External profile information files have no fixed format, each profiler is free to define its own. This includes both the on-disk representation of the profile and the kind of profile information stored in the file. A common kind of profile is based on sampling (e.g., perf), which essentially counts how many times each line of the program has been executed during the run. The SampleProfileLoader pass is organized as a scalar transformation. On startup, it reads the file given in -sample-profile-file to determine what kind of profile it contains. This file is assumed to contain profile information for the whole application. The profile data in the file is read and incorporated into the internal state of the corresponding profiler. To facilitate testing, I've organized the profilers to support two file formats: text and native. The native format is whatever on-disk representation the profiler wants to support, I think this will mostly be bitcode files, but it could be anything the profiler wants to support. To do this, every profiler must implement the SampleProfile::loadNative() function. The text format is mostly meant for debugging. Records are separated by newlines, but each profiler is free to interpret records as it sees fit. Profilers must implement the SampleProfile::loadText() function. Finally, the pass will call SampleProfile::emitAnnotations() for each function in the current translation unit. This function needs to translate the loaded profile into IR metadata, which the analyzer will later be able to use. This patch implements the first steps towards the above design. I've implemented a sample-based flat profiler. The format of the profile is fairly simplistic. Each sampled function contains a list of relative line locations (from the start of the function) together with a count representing how many samples were collected at that line during execution. I generate this profile using perf and a separate converter tool. Currently, I have only implemented a text format for these profiles. I am interested in initial feedback to the whole approach before I send the other parts of the implementation for review. This patch implements: - The SampleProfileLoader pass. - The base ExternalProfile class with the core interface. - A SampleProfile sub-class using the above interface. The profiler generates branch weight metadata on every branch instructions that matches the profiles. - A text loader class to assist the implementation of SampleProfile::loadText(). - Basic unit tests for the pass. Additionally, the patch uses profile information to compute branch weights based on instruction samples. This patch converts instruction samples into branch weights. It does a fairly simplistic conversion: Given a multi-way branch instruction, it calculates the weight of each branch based on the maximum sample count gathered from each target basic block. Note that this assignment of branch weights is somewhat lossy and can be misleading. If a basic block has more than one incoming branch, all the incoming branches will get the same weight. In reality, it may be that only one of them is the most heavily taken branch. I will adjust this assignment in subsequent patches. llvm-svn: 194566
*	Add another (perhaps better) video for Sean's talk. (Thanks Marshall!)	Chandler Carruth	2013-11-13	1	-0/+1
\| \| \| \|	llvm-svn: 194549
*	Fix a null pointer dereference when copying a null polymorphic pointer.	Chandler Carruth	2013-11-13	1	-1/+1
\| \| \| \| \| \| \| \|	This bug only bit the C++98 build bots because all of the actual uses really do move. ;] But not quite ready to do the whole C++11 switch yet, so clean it up. Also add a unit test that catches this immediately. llvm-svn: 194548
*	Give folks a reference to some material on the fundamental design	Chandler Carruth	2013-11-13	1	-0/+7
\| \| \| \| \| \| \|	pattern in use here. Addresses review feedback from Sean (thanks!) and others. llvm-svn: 194541
*	Introduce an AnalysisManager which is like a pass manager but with a lot	Chandler Carruth	2013-11-13	1	-12/+294
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	more smarts in it. This is where most of the interesting logic that used to live in the implicit-scheduling-hackery of the old pass manager will live. Like the previous commits, note that this is a very early prototype! I expect substantial changes before this is ready to use. The core of the design is the following: - We have an AnalysisManager which can be used across a series of passes over a module. - The code setting up a pass pipeline registers the analyses available with the manager. - Individual transform passes can check than an analysis manager provides the analyses they require in order to fail-fast. - There is no implicit registration or scheduling. - Analysis passes are different from other passes: they produce an analysis result that is cached and made available via the analysis manager. - Cached results are invalidated automatically by the pass managers. - When a transform pass requests an analysis result, either the analysis is run to produce the result or a cached result is provided. There are a few aspects of this design that I know will change in subsequent commits: - Currently there is no "preservation" system, that needs to be added. - All of the analysis management should move up to the analysis library. - The analysis management needs to support at least SCC passes. Maybe loop passes. Living in the analysis library will facilitate this. - Need support for analyses which are both module and function passes. - Need support for pro-actively running module analyses to have cached results within a function pass manager. - Need a clear design for "immutable" passes. - Need support for requesting cached results when available and not re-running the pass even if that would be necessary. - Need more thorough testing of all of this infrastructure. There are other aspects that I view as open questions I'm hoping to resolve as I iterate a bit on the infrastructure, and especially as I start writing actual passes against this. - Should we have separate management layers for function, module, and SCC analyses? I think "yes", but I'm not yet ready to switch the code. Adding SCC support will likely resolve this definitively. - How should the 'require' functionality work? Should that be the only way to request results to ensure that passes always require things? - How should preservation work? - Probably some other things I'm forgetting. =] Look forward to more patches in shorter order now that this is in place. llvm-svn: 194538
*	Removing llvm::huge_vald and llvm::huge_vall because they are not currently ↵	Aaron Ballman	2013-11-13	1	-4/+0
\| \| \| \| \| \|	used, and HUGE_VALD does not appear to be supported everywhere anyways. llvm-svn: 194535
*	Replacing HUGE_VALF with llvm::huge_valf in order to work around a warning ↵	Aaron Ballman	2013-11-13	2	-3/+15
\| \| \| \| \| \| \| \|	triggered in MSVC 12. Patch reviewed by Reid Kleckner and Jim Grosbach. llvm-svn: 194533
*	Remove always true flag.	Rafael Espindola	2013-11-12	1	-7/+0
\| \| \| \|	llvm-svn: 194530
*	delinearization of arrays	Sebastian Pop	2013-11-12	4	-1/+19
\| \| \| \|	llvm-svn: 194527
*	remove virtual methods in SCEVApplyRewriter and SCEVParameterRewriter	Sebastian Pop	2013-11-12	1	-46/+94
\| \| \| \|	llvm-svn: 194526
*	Protect user-supplied runtime library functions in LTO	Justin Bogner	2013-11-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add user-supplied C runtime and compiler-rt library functions to llvm.compiler.used to protect them from premature optimization by passes like -globalopt and -ipsccp. Calls to (seemingly unused) runtime library functions can be added by -instcombine and instruction lowering. Patch by Duncan Exon Smith, thanks! Fixes <rdar://problem/14740087> llvm-svn: 194514
*	Export intrinsics:__builtin_arm_{dmb,dsb} to frontend	Weiming Zhao	2013-11-12	1	-2/+2
\| \| \| \|	llvm-svn: 194505
*	GraphViz CFGPrinter: wrap long lines.	Andrew Trick	2013-11-12	1	-2/+19
\| \| \| \|	llvm-svn: 194496
*	whitespace	Andrew Trick	2013-11-12	1	-5/+5
\| \| \| \|	llvm-svn: 194495