Computer science Ph.D. candidate, professor, and alumni work together to change the world of big data modeling

You pick up your laptop, go to Netflix and search for a movie. You grab your smartphone, open Pandora and browse through millions of songs. You visit the website of your favorite store to shop for a new jacket. All of these simple everyday activities, multiplied by millions of consumers, generate what is known as “big data” — massive amounts of valuable digital information.

“To keep applications running smoothly, thousands of organizations across all industries manage their large datasets using one of the leading open-source database systems — Apache Cassandra — which can run across tens of thousands of machines and deliver high read-and-write performance,” says Andrey Kashlev, Ph.D. candidate in the Department of Computer Science.

Databases like Cassandra store all of their information in numerous tables that support queries powering the various actions that the application performs. “In Netflix, for example, one query may list all films in a particular genre," says Kashlev. The set of tables supporting queries for an application constitute a data model. Thus far, data models have been created manually. That presents a major challenge.

“Thousands of data architects — the experts behind applications like Netflix — must create tables by manually applying multiple design rules by hand. Big data modeling is still a tedious and error-prone process that requires specialized training and experience. Data models can take an expert anywhere from hours to weeks to create, and a single error at any stage of the design could result in the entire application not running,” says Kashlev. “For companies like Netflix, application downtime means unhappy consumers and, often, loss of revenue.”

Kashlev set out to address this problem and, as a result, may have solved one of the biggest challenges in big data management.

Working with his advisor, Shiyong Lu, associate professor of computer science, and alumnus Artem Chebotko, PhDCS'08 MACS'05, a Solution architect and data modeling expert with big data company DataStax, Kashlev invented a method to automate big data modeling.

“Using our knowledge of Cassandra and expertise in data modeling, we developed an innovative software tool — the Kashlev Data Modeler (KDM) — that automates the most complex and time-consuming data modeling tasks and ensures errorfree Cassandra database design,” says Lu. Instead of requiring data architects to build query after query by hand, KDM builds intelligent big data models in seconds.

“Our tool automatically produces data models that efficiently sort and partition large sets of data and ultimately help drive efficiency and accuracy," says Kashlev.

The team developed an intuitive, userfriendly graphical interface for KDM and released it publicly at no cost. The software was presented through major forums like IEEE BigData Congress 2015, the highly visible Planet Cassandra blog and social media. Just six months after its initial release, KDM attracted more than 600 registered users from 15 universities and more than 200 companies in more than 65 countries.

“Our users include professors, researchers, students, entrepreneurs and developers, who have successfully used KDM to generate hundreds of big data models in a variety of fields, including health care, education, Internet of Things, investment markets, transportation, retail, security and many more," says Lu. KDM is also being used as an educational tool to teach NoSQL and Cassandra.

“The creation of KDM is an exciting milestone in big data research,” says Lu. “A breakthrough with such immediate impact in industry is quite significant for a Ph.D. student. I am very proud of Andrey.”

Kashlev says the best validation of KDM is the number of users and the scope of its application. “Practitioners, expert data architects from major organizations, are not only using the tool, but reaching out to us to request collaboration," he says.

The team, which is working under Lu's Big Data Research Laboratory, looks forward to continuing the development and deployment of KDM, specifically in areas most related to quality of life, such as health care and safety. “We want to use KDM and big data to make things smarter and better for society,” says Kashlev.

Lu and Kashlev Welcome feedback and collaboration. They can be contacted at Shiyong@Wayne.edu and andrey.kashlevowayne.edu. To learn more about or register to use KDM, visit kdm.dataview.org.