bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[multiversion] Switch the TTI queries from TargetMachine to Subtarget	Chandler Carruth	2015-02-01	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738
*	[multiversion] Remove the cached TargetMachine pointer from the	Chandler Carruth	2015-02-01	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intermediate TTI implementation template and instead query up to the derived class for both the TargetMachine and the TargetLowering. Most of the derived types had a TLI cached already and there is no need to store a less precisely typed target machine pointer. This will in turn make it much cleaner to look up the TLI via a per-function subtarget instead of the generic subtarget, and it will pave the way toward pulling the subtarget used for unroll preferences into the same form once we are always using the function to look up the correct subtarget. llvm-svn: 227737
*	[multiversion] Switch all of the targets over to use the	Chandler Carruth	2015-02-01	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735
*	[multiversion] Remove a false freedom to leave the TargetMachine pointer	Chandler Carruth	2015-02-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	null. For some reason some of the original TTI code supported a null target machine. This seems to have been legacy, and I made matters worse when refactoring this code by spreading that pattern further through the various targets. The TargetMachine can't actually be null, and it doesn't make sense to support that use case. I've now consistently removed it and removed all of the code trying to cope with that situation. This is probably good, as several targets didn't cope with it being null despite the null default argument in their constructors. =] llvm-svn: 227734
*	[PM] Remove a bunch of stale TTI creation method declarations. I nuked	Chandler Carruth	2015-02-01	1	-4/+0
\| \| \| \| \| \| \|	their definitions, but forgot to clean up all the declarations which are in different files. llvm-svn: 227698
*	Fix typo	Matt Arsenault	2015-01-31	1	-1/+1
\| \| \| \|	llvm-svn: 227697
*	R600/SI: Only select cvt_flr/cvt_rpi with no NaNs.	Matt Arsenault	2015-01-31	1	-2/+4
\| \| \| \| \| \|	These have different behavior from cvt_i32_f32 on NaN. llvm-svn: 227693
*	[PM] Switch the TargetMachine interface from accepting a pass manager	Chandler Carruth	2015-01-31	4	-57/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
*	[PM] Change the core design of the TTI analysis to use a polymorphic	Chandler Carruth	2015-01-31	2	-67/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
*	Reuse a bunch of cached subtargets and remove getSubtarget calls	Eric Christopher	2015-01-30	22	-167/+139
\| \| \| \| \| \|	without a Function argument. llvm-svn: 227638
*	R600/SI: Handle SI_SPILL_V96_RESTORE in SIRegisterInfo::eliminateFrameIndex()	Tom Stellard	2015-01-30	1	-0/+1
\| \| \| \| \| \|	This fixes a crash in Unigine Heaven. llvm-svn: 227618
*	R600/SI: Implement enableAggressiveFMAFusion	Matt Arsenault	2015-01-29	2	-1/+31
\| \| \| \| \| \| \| \| \|	Add tests for the various combines. This should always be at least cycle neutral on all subtargets for f64, and faster on some. For f32 we should prefer selecting v_mad_f32 over v_fma_f32. llvm-svn: 227484
*	R600/SI: Add subtarget feature for if f32 fma is fast	Matt Arsenault	2015-01-29	5	-5/+23
\| \| \| \|	llvm-svn: 227483
*	R600/SI: Fix tonga's basic scheduling model	Matt Arsenault	2015-01-29	1	-1/+1
\| \| \| \|	llvm-svn: 227482
*	Compute the ELF SectionKind from the flags.	Rafael Espindola	2015-01-29	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Any code creating an MCSectionELF knows ELF and already provides the flags. SectionKind is an abstraction used by common code that uses a plain MCSection. Use the flags to compute the SectionKind. This removes a lot of guessing and boilerplate from the MCSectionELF construction. llvm-svn: 227476
*	R600/SI: Remove stray debug statements	Tom Stellard	2015-01-29	1	-5/+1
\| \| \| \|	llvm-svn: 227462
*	R600/SI: Define a schedule model and enable the generic machine scheduler	Tom Stellard	2015-01-29	4	-6/+94
\| \| \| \| \| \|	The schedule model is not complete yet, and could be improved. llvm-svn: 227461
*	R600: Move DataLayout to AMDGPUTargetMachine	Tom Stellard	2015-01-28	4	-22/+23
\| \| \| \| \| \| \| \|	This is a follow up to r227113. It is now required to use the amdgcn target for SI and newer GPUs. llvm-svn: 227316
*	R600: Use a Southern Islands GPU as the default for the amdgcn target	Tom Stellard	2015-01-28	2	-3/+7
\| \| \| \|	llvm-svn: 227314
*	R600/SI: Fix MIN3/MAX3 on VI, define MED3	Marek Olsak	2015-01-27	1	-9/+16
\| \| \| \|	llvm-svn: 227213
*	R600/SI: Don't set patterns for chip-specific instructions while having pseudos	Marek Olsak	2015-01-27	1	-50/+43
\| \| \| \| \| \| \| \| \| \| \|	Only pseudos have patterns on them. Also don't set the asm string for VINTRP_Pseudo. All pseudos should have empty asm. This matches what all other multiclasses do. llvm-svn: 227212
*	R600/SI: Add VI versions of LDS atomics	Marek Olsak	2015-01-27	3	-120/+139
\| \| \| \| \| \| \|	Each class is split into two: one adds let statements around non-pseudos, and the other one specifies the parameters. llvm-svn: 227211
*	R600/SI: Add VI versions of MUBUF atomics	Marek Olsak	2015-01-27	2	-73/+80
\| \| \| \|	llvm-svn: 227210
*	R600/SI: Add VI versions of MUBUF loads and stores	Marek Olsak	2015-01-27	3	-131/+39
\| \| \| \| \| \|	This enables a lot of existing patterns for VI. llvm-svn: 227209
*	R600/SI: Add pseudos for MUBUF loads and stores	Marek Olsak	2015-01-27	1	-103/+125
\| \| \| \| \| \| \| \| \|	This defines the SI versions only, so it shouldn't change anything. There are no changes other than using the new multiclasses, adding missing mayLoad/mayStore, and formatting fixes. llvm-svn: 227208
*	Move DataLayout back to the TargetMachine from TargetSubtargetInfo	Eric Christopher	2015-01-26	4	-18/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets() have had subtarget dependent code moved out and onto the TargetMachine. One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
*	R600/SI: Emit .hsa.version section for amdhsa OS	Tom Stellard	2015-01-23	1	-1/+13
\| \| \| \|	llvm-svn: 226970
*	R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()	Tom Stellard	2015-01-23	2	-3/+22
\| \| \| \| \| \| \| \| \| \| \|	We used to do this promotion during DAG legalization, but this caused an infinite loop in ExpandUnalignedLoad() because it assumed that i64 loads were legal if i64 was a legal type. It also seems better to report i64 loads as legal, since they actually are and we were just promoting them to simplify our tablegen files. llvm-svn: 226945
*	R600: Try to use lower types for 64bit division if possible	Jan Vesely	2015-01-22	2	-13/+39
\| \| \| \| \| \| \| \|	v2: add and enable tests for SI Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226881
*	R600: Simplify LowerUDIVREM	Jan Vesely	2015-01-22	1	-19/+11
\| \| \| \| \| \| \| \| \| \| \|	optimizations can handle removing the Hi part operations. The generated code is identical for R600, ~10% icount reduction for SI v2: rebase Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226879
*	R600/SI: Custom lower fround	Matt Arsenault	2015-01-21	4	-24/+117
\| \| \| \| \| \| \| \| \|	This fixes it for SI. It also removes the pattern used previously for Evergreen for f32. I'm not sure if the the new R600 output is better or not, but it uses 1 fewer instructions if BFI is available. llvm-svn: 226682
*	R600/SI: Add subtarget feature to enable VGPR spilling for all shader types	Tom Stellard	2015-01-20	9	-11/+36
\| \| \| \| \| \| \|	This is disabled by default, but can be enabled with the subtarget feature: 'vgpr-spilling' llvm-svn: 226597
*	R600/SI: Fix simple-loop.ll test	Tom Stellard	2015-01-20	2	-5/+9
\| \| \| \|	llvm-svn: 226596
*	R600/SI: Remove stray debugging code from r226586	Tom Stellard	2015-01-20	1	-2/+0
\| \| \| \|	llvm-svn: 226591
*	R600/SI: Use external symbols for scratch buffer	Tom Stellard	2015-01-20	9	-82/+92
\| \| \| \| \| \| \| \|	We were passing the scratch buffer address to the shaders via user sgprs, but now we use external symbols and have the driver patch the shader using reloc information. llvm-svn: 226586
*	R600/SI: Add kill flag when copying scratch offset to a register	Tom Stellard	2015-01-20	1	-1/+1
\| \| \| \| \| \| \|	This allows us to re-use the same register for the scratch offset when accessing large private arrays. llvm-svn: 226585
*	R600/SI: Don't store scratch buffer frame index in MUBUF offset field	Tom Stellard	2015-01-20	1	-16/+0
\| \| \| \| \| \| \| \|	We don't have a good way of legalizing this if the frame index offset is more than the 12-bits, which is size of MUBUF's offset field, so now we store the frame index in the vaddr field. llvm-svn: 226584
*	R600/SI: Update SIInstrInfo:verifyInstruction() after r225662	Tom Stellard	2015-01-20	1	-6/+12
\| \| \| \| \| \| \|	Now that we have our own custom register operand types, we need to handle them in the verifiier. llvm-svn: 226583
*	Add r224985 back with fixes.	Rafael Espindola	2015-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fixes are to note that AArch64 has additional restrictions on when local relocations can be used. In particular, ld64 requires that relocations to cstring/cfstrings use linker visible symbols. Original message: In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 226503
*	std::unique_ptrify the MCStreamer argument to createAsmPrinter	David Blaikie	2015-01-18	2	-6/+9
\| \| \| \|	llvm-svn: 226414
*	R600/SI: Add patterns for v_cvt_{flr\|rpi}_i32_f32	Matt Arsenault	2015-01-15	2	-2/+21
\| \| \| \|	llvm-svn: 226230
*	R600/SI: Fix trailing comma with modifiers	Matt Arsenault	2015-01-15	1	-1/+1
\| \| \| \| \| \| \|	Instructions with 1 operand can still use source modifiers, so make sure we don't print an extra comma afterwards. llvm-svn: 226226
*	R600/SI: Unify VOP2 instructions which are VOP3-only on VI	Marek Olsak	2015-01-15	3	-77/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|	This removes some duplicated classes and definitions. These instructions are defined: _e32 // pseudo _e32_si _e64 // pseudo _e64_si _e64_vi llvm-svn: 226191
*	R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VI	Marek Olsak	2015-01-15	2	-4/+4
\| \| \| \|	llvm-svn: 226190
*	R600/SI: Add V_READLANE_B32 and V_WRITELANE_B32 for VI	Marek Olsak	2015-01-15	2	-11/+28
\| \| \| \| \| \| \| \|	These are VOP3-only on VI. The new multiclass doesn't define VOP3 versions of VOP2 instructions. llvm-svn: 226189
*	R600/SI: Don't shrink instructions whose e32 encoding doesn't exist	Marek Olsak	2015-01-15	8	-42/+57
\| \| \| \| \| \| \| \|	v2: modify hasVALU32BitEncoding instead v3: - add pseudoToMCOpcode helper to AMDGPUInstInfo, which is used by both hasVALU32BitEncoding and AMDGPUMCInstLower::lower - report an error if a pseudo can't be lowered llvm-svn: 226188
*	R600/SI: Add common class VOPAnyCommon	Marek Olsak	2015-01-15	1	-23/+13
\| \| \| \|	llvm-svn: 226187
*	R600/SI: Don't select SI-only VOP3 opcodes on VI	Marek Olsak	2015-01-15	1	-17/+20
\| \| \| \|	llvm-svn: 226186
*	Revert "Add r224985 back with two fixes."	Rafael Espindola	2015-01-14	1	-1/+1
\| \| \| \| \| \|	This reverts commit r225644 while I debug a regression. llvm-svn: 226022
*	R600/SI: Use IMPLICIT_DEF and KILL when failing to spill VGPRs	Tom Stellard	2015-01-14	1	-3/+2
\| \| \| \| \| \| \|	This helps us avoid 'invalid register class for operand' verifier errors. llvm-svn: 225989