Presentation: "Dramatic scalability for data intensive architectures"
Time: Thursday 14:30 - 15:30
Location: Rutherford Room
Increasingly today's data intensive applications have to deal with data from multiple data repositories, and aggregate this in real-time with streaming events (Rich e-commerce portals, electronic trading, web 2.0, grid centric applications like risk analytics, fraud detection, etc). Traditional architectures that use simple clustering and databases for state management and messaging solutions for sharing events are being replaced with a middle tier memory oriented data fabric or grid - a sophisticated caching infrastructure that provides most of the key semantics available in a ACID database along with the ability to continuously analyze flowing events and generate derived events at a predictably high rate.
We present a architecture along with concepts that address the following two issues:
- How do you provide instantaneous access to data that is changing rapidly, large in volume, shared by clustered applications that might be spread across a wide area network without incurring high latency due to disk access, high setup cost for high availability and data consistency issues with asynchronous publish-subscribe?
- How do you support thousands of concurrent clients that want to express complex interest on fast moving data and be notified reliably with predictable latency?
The presentation covers topics like data partitioning across a cluster, process-data affinity, dealing with data hotspots by repartitioning on the fly, dealing with computational hotspots by load conditioning and shedding techniques, etc. On event processing front, we talk about the new paradigm of "continuous querying" on partitioned data in memory.





