MIT's Data Science Machine Is All Good Without Humans

The world's generating data at unthinkable speed and computer engineers have a new big challenge to face - analysing it all to make the data meaningful and useful. Typical big data analysis requires human intuition to tell the computer which 'feature' of the data to analyse to find out hidden patterns that can be used to make future predictions. For example, let's say you are analysing how people click on 'like' button throughout the week on Facebook; you will have to inform the computer that it's actually the span between the two likes that matters more over the number of likes.

A team of engineers from MIT wants the machines to be completely independent of human intervention and have developed a "Data Science Machine". This new machine is capable of figuring out pattern but also designing the feature sets as well, without needing any human inputs.

The research team enrolled their Data Science Machine in three different competitions, in which it had to compete directly with humans. The teams had to discover the predictive patterns in unfamiliar sets of data. The Data Science Machine could beat 615 out of 906 participating teams.

In the first two competitions it fought in, the machine was 94% and 95% as accurate as the winning entries from human teams. In the third competition, the accuracy of the Data Science Machine was about 87%, but the researchers say that it required only 2-12 hours to produce the end results while the rival human teams worked on algorithms for several months.

Max Kanter's master thesis was the base for the Data Science Machine. Kanter views the machine as a natural complement to the human intelligence. He says that there is so much of data that needs analysis and right now it's just doing nothing. A solution needs to be figured out so that we can at least get started.

Kanter is expected to present the Data Science Machine in a paper at the IEEE International Conference next week along with his thesis advisor, Kalyan Veeramachaneni - a research scientist at MIT's CSAIL.

Veeramachaneni is involved in Anyscale Learning at CSAIL to apply AI and machine learning techniques to various practical problems like how much power can a wind farm generate and determining which students are likely to drop out of any online course.

Veeramachaneni says that their observation from solving variety of data science problems for the industry indicates that it's essential to determine the variables that need to be extracted from the database. For example, in order to identify the students that are likely to drop out from an online course, the variables need to be extracted may include how much time the student spends on the website relative to other course takers and how long before the student starts working on the homework before the deadline.

MIT-Data-Science-Machine-Humans

Kanter and Veeramachaneni employ multiple tricks to figure out candidate features to be used for the data analysis. The databases store multiple data-types in different tables and use numerical identifiers to indicate relation between them. The Data Science Machine is capable of tracking these relations for future construction.

For example, in analysing the data generated by an e-commerce website that has the list of retail items and associated costs in one of the tables and second table containing individual purchases, the data science machine would begin by importing the contents of first table into the second. It'd then try to co-relate the purchase numbers for the items to come up with feature candidates viz. average cost per order, total cost per order etc. As these numerical identifiers spread across the data tables, the machine layer computes multiple operations on the top of each other.

The machine is capable of checking the categorical data that appears to be limited to a range of values. For example, it can then include specific days of the week or brand names in its analysis.

For further details on the data science machine and the research, check out the source link below.

Source: Automating big-data analysis | MIT News | Massachusetts Institute of Technology

Replies

You are reading an archived discussion.

Related Posts

Hii, i am doing my final year B.Tech-IT, am not so good at coding.I am interested in OOAD,software engg,Software project management and my biggest interest in going for onsite.How to...
Hii...i m abgeena...new member of crazy engineer...i am pursuing b.tech in cse stream from macet...
What is the cost for making a four wheel steering system?
TRAI (Telecom Regulatory Authority of India) has asked telecom operators to compensate the users with one rupee per dropped call. TRAI is taking steps to push the telecom operators to...
Im an IT engineer studying in final year Got campus placement in 3 companies-Infy,CTS and Capsgemini