Skip navigation

Tag Archives: analytics

The R language is an open source project run by a very large community of “clever” statisticians who have created to date over 2,500 plug-ins for analyzing different data sets tuned by industry and data type. More recently, Oracle has jumped into the picture by creating an analytical product that bridges the hard work of R across to their flagship 11g R2 database product.

R is Ready for Big Data: Take the Open Road to Statistical Analysis

By Timothy Prickett Morgan

Statistical analysis has been around since mainframes were introduced to academia and corporations back in the 1960s.

But the great diversity of telemetry collected by systems today, the need to sift through it for insight and the growing popularity of open-source alternatives is transforming the R programming language for statistical analysis and visualisation. Its new nickname is Red Hat for stats…

http://www.theregister.co.uk/2012/06/03/big_data_r_statistical_analysis/

In the Oracle implementation of R, there is some integration that allows the R console to work with Hadoop distributed file systems and NoSQL databases, both of which are also core components of the world of “Big Data”

NoSQL is an alternative to the standard Relational Database Management System (RDBMS). It apparently performs very fast and efficiently with querying large data sets (also referred to as “big data”); it apparently is not intended as a replacement for SQL databases, but poses a specialized solution for typical database/data store problems in the enterprise-wide scope of an IT organization.

This article can be found online at the following location:

http://www.databasejournal.com/news/mongodb-nosql.html

MongoDB – Why Does NoSQL Matter?
November 7, 2011

In recent years, the drumbeat of vendors proclaiming the ascendancy of NoSQL has become increasingly loud. One of the NoSQL vendors that is seeing business results from its NoSQL solution is 10gen, which is the lead commercial sponsor behind the open source MongoDB NoSQL database.

“We’re seeing the NoSQL space really taking off now and it’s being used in a significant way by a lot of people, including a lot of large enterprises,” Dwight Merriman, CEO and co-founder of 10gen, told InternetNews.com. “So big Internet companies like Craigslist and Shutterfly as well as big companies like SAP, Telefonica and LexisNexis are using it.”

Competition from the traditional database vendors also helps to validate the NoSQL space. At the OpenWorld Conference last month, Oracle announced its own NoSQL solution.

From a competitive perspective, Merriman sees MongoDB as the most popular NoSQL database in use today. His metrics for popular are somewhat indirect, though, and include MongoDB job postings and Google search popularity.

In terms of where NoSQL is fitting into enterprises, Merriman isn’t necessarily seeing NoSQL as a replacement for SQL databases but rather as a new tier of database technology usage. He noted that most Fortune 500 enterprises have an Oracle type relational database in their organizations already. The relational database is used for Online Transaction Processing (OLTP). Most big enterprises also have some form of data warehouse and a business reporting and intelligence database.

“Basically every large enterprise in the world has those two buckets for sure and what we’re seeing are enterprises adding a third bucket, which is a NoSQL basis,” Merriman said. “So on a forward basis, enterprises will have three classes of databases instead of two.”

When it comes to new Internet companies, Merriman expect that those companies will choose only one of three types of databases. The chosen database will likely be the one that maps closest to the problem the company is trying to solve.

Merriman noted that to use MongoDB, an organization must write new code.

“It doesn’t use SQL so if your old code assumes a relational data model so you have to write new code,” Merriman said.

For those that are looking to migrate to MongoDB, Merriman said enterprise developers need to re-write the code that talks to the database. He noted that there is not a lot of tooling for migration, since it’s not easy to automate.

“There are all the normal tools you’d expect in the database for import, export and monitoring of data,” Merriman said. “They’re just not specifically designed for migration.”

MongoDB 2.x

MongoDB 2.0 was recently released providing new concurrency features to the open source NoSQL database.

For the roadmap moving forward, Merriman said there is continuing work to further improve concurrency in the database. Additionally there is a new aggregation framework in the works that will make operations where users are aggregating and merging statistics easier and faster than the current model. MongoDB developers are also working on full-text search for a future release of MongoDB.

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of Internet.com, the network for technology professionals.

Something tells me we’ll be seeing more of NoSQL as large and influential organizations such as Oracle and Google continue to utilize it as one of three different database solutions within their technical architecture. I suspect that most situations will not replace existing RDBMS structure, but instead initiate the database structure of newer projects and applications.

For all you database professionals out there, here’s a great article on a real-world use case of analytic data. The situation was as follows:

 The initial report breaks down the data into per capita spending, prices paid per service, out-of-pocket spending, utilization, and the mix or intensity of services used. The report covers data from 2010 for 33 million privately insured people.

What they found out was that it was the cost of the care provided and not the utilization that incurred the largest expense for the participants of the U.S. Health Care Cost Institute (HCCI), which included Aetna, Humana and United Health care.

Check out some of the metrics that they uncovered from analyzing the HCCI database of health care costs for the past few years. Comparative (past to previous) studies were also conducted.

Published on Business Finance (http://businessfinancemag.com)

Mining the Health Care Cost Database
by Joanne Sammer
Created 05/24/2012 – 17:50

A treasure trove of health care cost information is now available to researchers and others interested in understanding the trends and drivers of health care costs in the U.S. The Health Care Cost Institute (HCCI) issued its first Health Care Cost and Utilization Report based on data from three billion (yes, that’s billion with a “b”) health care claims provided by Aetna, Humana and UnitedHealthcare, three of the nation’s largest health plans, with Kaiser Permanente data to be added in the near future.

This database provides employers with important benchmarking data and opportunities to analyze trends in inpatient and outpatient care, professional services, and prescription drugs obtained by privately insured patients in employer-sponsored group health insurance plans. The initial report breaks down the data into per capita spending, prices paid per service, out-of-pocket spending, utilization, and the mix or intensity of services used. The report covers data from 2010 for 33 million privately insured people.

The key finding from this first report is the fact that it is the cost of care and not utilization that is the main driver of health costs. Although the overall use of health care services declined in 2010, overall costs did not. The spending increase in 2010 was lower at 3.3% for individuals under age 65 than it had been the prior two years, but it was still driven largely by prices for the privately insured that grew more than utilization. For example, prices increased significantly for two key elements of care – hospital care (5.1%) and ambulatory care facilities (10.1%). Meanwhile, usage dropped by more than 5% for medical inpatient admissions, emergency room visits, primary care provider office visits and radiology procedures.

Other key findings include the following:

    • Average per capita spending on health care services was $4,255 for the entire sample, while per capita spending was much higher ($8,327) for people aged 55-64 and much lower ($2,123) for those under 18.
    • Out-of-pocket per capita spending was $689 in 2010, which represents an increase of 7.1% from 2009.
    • The price for an average hospital stay increased 5.1% to $14,662 in 2010, while the cost of an average emergency room visit increased 11% to $1,327. The latter figure emphasizes the potential payoff for any employer effort to reduce unnecessary emergency room visits, particularly if patients rely on the emergency room for non-emergency situations.
    • The average cost of a prescription increased 3% from $80 to $82 in 2010, as brand name drug prices increased 13% and generic drug prices decreased by 6.3%. On average, each insured person filled more than nine prescriptions in 2010. The number of brand name prescriptions dropped by nearly 4%, while the number of generic prescriptions increased by 2.5%. Once again, this data shows the potential cost savings associated with efforts to encourage the use of available generics in lieu of brand name drugs.

Overall, costs to provide the same amount of care as in the previous year went up drastically. It isn’t that many more people are getting older and more frail, nor is it an increasing trend in hypochondriac activity. People are asking for care as much as they always have, and it’s just that the care provided costs more to deliver for one reason or another. Interesting find for the analytic teams involved…

Follow

Get every new post delivered to your Inbox.