-
In essence, data stores are distributed key/value stores that provide unlimited scalability to store data that is closely modeled to objects removing the need for ORM plumbing code. The downsides are the loss of data integrity, which has to be managed by application code and the difficulties in performing business intelligence on the data.
-
Database normalization is a technique for designing relational database schemas that ensures that the data is optimal for ad-hoc querying and that modifications such as deletion or insertion of data does not lead to data inconsistency. Database denormalization is the process of optimizing your database for reads by creating redundant data. A consequence of denormalization is that insertions or deletions could cause data inconsistency if not uniformly applied to all redundant copies of the data within the database.
-
The difficulty in these systems comes with the fact that large amounts of data need to moved around every day. Thus although hundreds of gigabytes or terrabytes of data are not to difficult when sitting still in a storage system, the problem because much, much harder when it must be transformed to support quick lookups and moved between systems on a daily basis.
This post describes the system we built to deploy data to the live site using our key-value storage system, Project Voldemort.
-
GemStone has unveiled GemFire 6.0 which is the culmination of several years of development and the continuous solving of the hardest data management problems in the world. With this release GemFire touts some of the latest innovative features in data management.
-
I few weeks ago, I posted about a conversation I had with Jeff Hammerbacher of Cloudera, in which he discussed a Hadoop-based effort at Facebook he previously directed. Subsequently, Ashish Thusoo and Joydeep Sarma of Facebook contacted me to expand upon and in a couple of instances correct what Jeff had said. They also filled me in on Hive, a data-manipulation add-on to Hadoop that they developed and subsequently open-sourced.
-
I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems.
-
An interesting observation in the talk is that the more robust products are internal to large companies like Amazon and Google or are commercial. A lot of the open source products aren't yet considered ready for prime-time and Bob encourages developers to join a project and make patches rather than start yet another half finished key-value store clone. From my monitoring of the interwebs this does seem to be happening and existing products are starting to mature.
-
attempted to pull together a cross platform Nagios plugin that did it's best to give me what I wanted, and what do you know, it works!
-
HBase: Bigtable-like structured storage for Hadoop HDFS
-
This article is about how to monitor Linux and Windows hosts with SNMP (2c version) and Cacti.
-
While playing around with initrd images a few weeks back, I came across the mkinitrd “–with” option. This option allows you to add additional modules to an initrd image, which is useful when you have a new storage or Ethernet driver that isn’t supported by the base operating system.
-
if you pass a variable to your job through qsub or qrsh with the -v switch, and if that variable starts with SGE_COMPLEX_, the SGE_COMPLEX_ part will be stripped off, and the remainder will be treated as a resource request whose value will be placed in the job's environment.
Leave a comment