On data explosion and latent data

PinIt
latent data

What should companies do with all that data?

The Internet of Things, which employs large quantities of sensors, also entails data explosion. The amount of created data is growing exponentially over time. One interesting issue is whether all the raw data needs to be “persistent”—kept on digital media for long period of time.

Let’s take some sensor data examples:

  • A camera takes pictures of all traffic approaching a junction, and then an image processing application counts the number of vehicles.
  • A fitness tracker records every step that a person takes.
  • A sensor connected to a thermostat takes the temperature every minute.

In all of these cases, the output is an aggregation of data, and not a specific piece of data. The traffic light policies may be tuned by the quantity of vehicles and by the trend – the quantity of vehicles in a certain direction is increasing, while the other direction is decreasing. The person (or a caretaker) gets alerts based on the accumulated number of steps in a day. The temperature in a specific minute is not that important, but average, maximum, and minimum over a day, and possibly a trend during a day.

If the data is only used for aggregation but is not kept and becomes unavailable after it undergoes computational process, the data is called “latent data.”

Due to data explosion, an important issue is about the classification of data, whether the data should be persisted or become latent. Also, if the data is persisted, how long should it be kept? This is called “retention policy.”

One school of thought advocates keeping all data, even if it looks today as redundant, since a future analysis may require the data. One of the early projects that advocated keeping all climate data for future research use was Sequoia 2000.

Sometimes there are legal obligations to keep raw data for a period of time – for example the data from traffic light cameras may be used as evidence in court in the future.

Practical considerations may advocate removal of raw data immediately, or after a certain period of time. Retention policies are now one of the major design considerations of massive data accumulation systems such as IoT-based systems.


Want more? Check out our most-read content:

Intelligent Business Operations: White Paper
7 Essential Elements in a Real-Time Streaming Analytics Platform
Fog Computing: A Reference Architecture
Why Edge Computing Is Here to Stay: Five Use Cases
A Business Intelligence Strategy for Real-Time Analytics
Why IoT Edge Computing Is Crucial


Liked this article? Share it with your colleagues!

Dr. Opher Etzion

About Dr. Opher Etzion

Dr. Opher Etzion is professor of information systems and head of the Technological Empowerment Institute in Yezreel Valley College in Israel. He is also a former chief scientist of event processing at the IBM Haifa Research Lab (full bio) . Follow him on Twitter @opheretzion.

Leave a Reply