justinkendra - Fotolia

Applying data science to doc migrations brought sense to the City of LA

When the city of LA was burdened with antiquated systems of sharing information, data science was applied and doc migrations performed.

As the second largest city in the country, covering almost five hundred square miles, Los Angeles has been dealing with its massive amount of city-owned real estate using incredibly antiquated tools. Initiatives involving industry, affordable housing, homelessness, and disaster preparedness all require a sound grasp of the local geography, but without any easy way to review, query, or analyze the information the city had buried in Excel spreadsheets and Microsoft Word documents, it seemed like the information that decision makers needed was never immediately at hand. It took a major overhaul and a highly motivated data scientest to change the way Los Angeles handled data for the better.

In his role as Data and Engagement Strategist in the Mayor's Operations Innovation team, Juan Vasquez was shocked to discover how difficult it was to access even the most basic information in a timely manner. "If a council member needed a list of all the properties in their district, it took seven months to get a pdf of a csv file with a handful of records. There were at least fifteen ad hoc systems or processes in place ranging from Excel files to Google Docs. That really surprised me because my friends and I use Google Docs to plan road trips. Five people planning a trip to the mountains should not be the same way that the second largest city in the country manages its real estate!" His team was tasked with developing a better data science system—one that created efficiency and allowed the city to be strategic and forward thinking in its management of an extensive real estate portfolio.

Moving away from spreadsheet style data

The most obvious issue with using simplistic data in the form of rows, columns, and tables was that it had no relationship to the real world. "Real estate is geography, and that should be in maps not just listed as an address." In fact, these real estate parcels did not even have unique identifiers to ensure accuracy or standardized hierarchies to organize and structure the data. Juan and his team were determined to find a way to centralize the data. They simply had to find the right foundation. "The most comprehensive listing we found to use as our baseline was the LA county assessor's office. They had most of the assets documented, even though the data was largely outdated."

With the available time and resources, it made sense to apply data science to the existing information and make it more accurate rather than to start from scratch. "People knew that was where the most information was but it hadn't been updated recently, so they thought it was bad. But we decided to begin there and work on updating it and layering more information on it."

Turning bad information into good took a lot of work. The team ended up using fifty five different data sets to build a more complete and current database of LA's real estate holdings. They started with the parcel numbers and treated them as unique identifiers, then began mapping with geocoding. Merging duplicate records was a big part of the process. From an initial 27,000 records, they whittled the list down to about 10,000 parcels. With the help of existing real estate management solutions and API expertise from local experts, they began making real progress.

The data begins to take shape and make connections

Even more important, the team finally had the city's assets mapped out as polygons rather than simply lists of addresses. From a practical perspective, real estate is not just about location. "When people are planning a real estate project, they need to know the shape of a parcel. For example, a long strip wouldn't be right for a housing unit. Through polygons we get a shape of the actual real estate asset. It also allows for better decision making."

In addition, being able to understand what other entities are nearby will prove enormously helpful. "Real estate exists in the context of a community. You need to be aware of what is in the surrounding area." Juan pointed out that building a homeless shelter near a mental health facility makes much more sense than putting one several miles away. With the ability to overlay a wide variety of data points on a current map of the city's real estate assets, planning can become much more efficient.

Bringing users on board with the new data solution

For Vasquez, educating politicians and other stakeholders on how to use this new system is just as important as building it. "The technology, the best practices, the standards, these are all important. But when you talk about large bureaucracies, how humans interact with our insights and systems becomes extremely important for them to be sustainable and long-lasting. It's critical to have buy-in from the elected officials. So, I'm sitting down with them doing active training on how to use the system. We are also building a video training library."

A public portal is in the works that will give visitors search functionality to identify properties. Residents and tourists may eventually be able to use this tool to find everything from parks and libraries to information about seismic activity in various areas. With input from the public, the team will be able to clean up the data even more. Hopefully, this positive feedback loop will help keep the data real for the City of LA's real estate.

Next Steps

The benefits of intelligent data storage infrastructure

Data science management a challenge for those doing big data

Data science and how it impacts on storage infrastructures

Dig Deeper on Front-end, back-end and middle-tier frameworks

App Architecture
Software Quality
Cloud Computing