A key issue today is that data is siloed, whether its personal data, data inside an organization, or data sharing across different organizations. Data discovery and integration is difficult and presents complex technical, organizational and policy challenges. A Living Lab allows MIT to be a microcosm for many big data efforts whether in government or in industry. One of our goals is to work with MIT in opening up repositories of information on campus that contain the data needed to discover valuable new insights about important topics such as wellness, innovation, learning and sustainability. MIT is well positioned to take a leadership role in demonstrating not only how organizations can leverage data in the future, but how we collect, manage, and use personal information, from setting appropriate privacy policies to demonstrating systems that can implement it in practice.
Living labs is developing a scalable data management platform, allowing us to collect and integrate multiple types of data including: personal data or “small data” (collected by smart phones, activity tracking devices, or new wearable sensors); MIT data (wifi data, campus maps, event data etc); as well as external data types (social media data, transportation data, weather, city data etc).
Developed by the Database Group at CSAIL, led by Vartan, Madden, et al., ModelDB is an end-to-end system that tracks models as they are built, extracts and stores relevant metadata (e.g., hyperparameters, data sources) for models, and makes this data available for easy querying and visualization.
The codebase and instructions for getting set up are available for public use, and a short paper from the HILDA workshop, SIGMOD 2016, is available for reading.
Developed by the Database Group at CSAIL, led by Castro-Fernandez and Madden et al., Aurum is a system to tackle data discovery problems at large. It introduces a new discovery algebra, R2QL, that permits users to declare their intuition of what is relevant through a set of data primitives that expose the relations of the underlying data. The algebra relies on a metaschema graph to answer queries in human-scale latencies. Furthermore, Aurum is scalable: it builds the metaschema graph in linear time, despite the complexity of extracting complex relationships among thousands of data sources.
Aurum’s codebase is available for public use. A position paper is available from the ACM. You can test drive Aurum on the State of Massachusetts open data.
Developed by Lincoln Labs, led by Vijay Gadepally et al., The BigDAWG polystore is a federated DB system for multiple, disparate data models. It supports the notions of location transparency and semantic completeness through islands of information which support a data model, query language and candidate set of DB engines. A prototype of the BigDAWG system has shown great promise when applied to diverse medical data.
Developed by the Database Group at CSAIL, led by Madden et al., DataHub, is a scalable, hosted platform for organizing, managing, sharing, collaborating, and making sense of data. Think of it as a mashup of github and postgreSQL, accessible through your web browser. It provides an efficient platform and easy to use tools/interfaces for:
DataHub’s documentation, codebase, and API are available for public use.The platform allows testing of new frameworks and applications for collecting and managing personal data, including the Open Personal Data Store (OpenPDS) architecture. Publications are viewable on the DataHub site.
Developed by the Human Dynamics Group at the MIT Media Lab, led by Pentland et al, OpenPDS provides users control over how applications use their data:
Our goal in building these platform is to enable researchers and students to dream up and run new data-driven applications and projects at MIT. Some examples are below:
Aggregating a diversity of data allows us to combine and derive patterns from disparate data types. Even analyzing aggregate anonymized data can reveal new valuable insights about trends and patterns within our community on campus.
Earlier in April, Vijay Gadepally, Jeremy Kepner, and the kind people at MIT Lincoln Labs were kind enough to lend us their 3 dimensional model of MIT’s campus. It took some skews, stretches, rotations, and elbow grease, but we ultimately able to pair the model with a dataset of MIT’s wifi activity. From there, we were able to visualize […]
Read moreThe MIT big data Living Lab team is partnering with the MIT Medical getfit@MIT fitness challenge team to develop a mobile activity logger. Every spring semester, MIT Medical runs a getfit@mit challenge to encourage fitness at MIT. For the challenge, participants join teams and log their activities to the getfit@mit website. The challenge had been […]
Read moreLast week, Kelly, Guy, and I attended HackMIT – one of the most prestigious hackathons in the world – as sponsor judges for the Big Data Initiative at MIT CSAIL. There, we listened to speeches, mentored students, and (importantly) encouraged students to hack on our newly released wifi data. Oh. What’s that? You didn’t […]
Read more