Keep It Smart – Basics #8: What is Big Data?
23 August, 2019
We’re in the middle of a data revolution. And it’s shaping how all of us share knowledge, guide business decisions, and influence governance.
At mySmart, we’re all about data – understanding data flow is the key to creating intelligent environments. But as the data revolution took hold, we had to get on board with a new raft of terms. If you’re struggling too, here’s our guide to the most popular labels relating to data.
Big data means a data set so large and complex that traditional software and databases cannot process it. The speed of delivery and size of big data require a processing capacity far greater than an everyday desktop computer or laptop. Big data, which can include both structured and unstructured data (see below for an explanation of these terms), will likely come from a variety of sources. Take the internet, for example, where the range, speed and size of data flows seem infinite.
As rates of data delivery increase, processors and storage need to keep pace. Equally, big data requires analysis into bite-size chunks before we can understand and utilise it as manageable small data.
Small data means a data set that the human mind can understand without the use of complex software. Times tables only go so far, so we’ll often need the aid of a machine or software applications. Whether we use a calculator, a spreadsheet, or our brains alone, small data is information we can integrate and manipulate to suit our purposes. For example, a facility manager might collate maintenance data into Microsoft Excel for analysis. They may use Excel to create a complex report for occupants or owners but the report will be understood without the need for more sophisticated software.
Structured data is the type of textual or numerical information we’re used to organising. For mySmart, examples would be spreadsheets of customer details, stocktake numbers or even an office supplies shopping list. Structured data is also known as quantitative data or relational data.
Unstructured data cannot be processed and analysed using conventional tools and methods. Also known as qualitative data, it can include information in textual or audio format, or video or imagery data. What separates it from structured data is that it won’t fit neatly into a spreadsheet or table. You may also hear the term ‘semi-structured data’ which means that only part of the data can be tabulated.
Open data is information that is publicly available, (or what people believe should be publicly available!) It traditionally represented government data sets and statistics but increasingly corporations are making certain data sets available for analysis or to seek improvements in machine learning from open source communities. The terms ‘open data’ is often used synonymously with ‘public data’.
In Australia, the government’s Public Data Policy Statement provides their position on the use and availability of public data.
More on Big Data
We’ve covered the main terms but there’s far more to understand around big data and its associates, as you’d expect. Here’s our quick overview of the wider perspective.
Data is only useful if it can be stored affordably. Since the advent of open source frameworks and the Cloud, the price of data storage has fallen. This means that massive volumes of data can be stored affordably for an indefinite period. In a smart building, for example, this means that historical data is always accessible and can be used for predictive analysis.
As conventional hardware and software can’t handle the volume, big data requires an overarching software framework that can handle distributed mass storage and offer enormous processing power. Apache Hadoop is a popular option, being flexible and open source. Many companies e.g. Cloudera-Hortonworks, Microsoft and Qubole have adapted it to offer their own customised versions.
Perhaps the greatest challenge facing the acceleration and capture of big data, or any data, is security. This doesn’t only cover cybercrime but also environmental protection. The Cloud may seem untouchable at times but your data resides in a tangible storage facility somewhere on Planet Earth. That facility is vulnerable to a range of scenarios: fire, flooding, overheating, vandalism and plain old human error.
Whatever label you apply, data is only useful when it’s put to work. Partner with a company that understands how data connects people, spaces and devices, and you can reimagine the built world around you.
So if you’re ready to connect your building to the intelligent data revolution, contact mySmart.