What is a Data Object?
Data Objects represent data pieces that are grouped together to form a single structure or entity. Two very good examples of Data Objects are ORM Entity Objects and Data Transfer Objects.
Attributes and Behavior
Objects are made up of attributes and behavior but Data Objects by definition represent only data and hence can have only attributes. Books, Movies, Files, even IO Streams do not have behavior. A book has a title but it does not know how to read. A movie has actors but it does not know how to play. A file has content but it does not know how to delete. A stream has content but it does not know how to open/close or stop. These are all examples of Data Objects that have attributes but do not have behavior. As such, they should be treated as dumb data objects and we as software engineers should not force behavior upon them.
Passing Around Data Instead of Behavior
Data Objects are moved around through different execution environments but behavior should be encapsulated and is usually pertinent only to one environment. In any application data is passed around, parsed, manipulated, persisted, retrieved, serialized, deserialized, and so on. An entity for example usually passes from the hibernate layer, to the service layer, to the frontend layer, and back again. In a distributed system it might pass through several pipes, queues, caches and end up in a new execution context. Attributes can apply to all three layers, but particular behavior such as save, parse, serialize only make sense in individual layers. Therefore, adding behavior to data objects violates encapsulation, modularization and even security principles.
What about File.Delete(), Stream.Close(), ...
Most developers would agree that entity objects should not have behavior, but on the other hand it would be very odd to have for instance a File object without the expected close(), delete() methods or a Stream without open() and close() methods. After all, they are so convenient and widely used that we expect them to be part of the interface.
I would argue that any methods on the File and Stream violate the natural aspect of object-oriented design because both of them represent passive data but at the same time they also provide behavior. Why unnatural you ask? Well, can a file really delete itself? No, that behavior belongs to the FileSystemDao/Manager/Service. Imagine a Stream of water/data. Can the stream open and close itself naturally? No, it is the action of a third party that causes streams to open and close. StreamReader.read() is perfectly natural and analogous to a person taking some water out of a real stream but Stream.read() is not natural because there is no analogous action on a natural water stream (more on this in another post).
A lot of the issues described above can be clarified by paying a bit more attention to the class names. If you are creating a FileSystemService, don't call it a File. If you find yourself adding behavior to Stream object, take a step back and think about giving the class a more active name, such as a StreamReader or StreamWriter.
Two Classes in One
A developer creates a Book class and adds data fields for Title, Author, Pages. So far, so good. He then continues his progress by adding the methods Open(), Read(), Write(), Highlight(), Bookmark() and so on. Before you know it, the Book has methods to GetRelatedBooks(), Print(), Publish(), Buy() and the list goes on an on without any limits, the class is now a couple of thousand lines long, tied to every other class, and most importantly, no one wants to work on it. This is a maintenance nightmare and we have all run into it. A class like this is begging to be refactored. It's ok for the Book class to have a Title, Author, Pages...they are all natural properties of the book. However, Open(), Read(), Highlight(), Bookmark() should be moved in a separate class, one called BookReader or Kindle. The method Write() should belong to the Author class. The methods Print() and Publish() should be moved to a Printer/Publisher class. The method Buy() should be part of the Customer class. The method GetRelatedBooks() should be a method of the Librarian or RecommendationService.
Code written like this:
book.Write(); book.Print(); book.Publish(); book.Buy(); book.Open(); book.Read(); book.Highlight(); book.Bookmark(); book.GetRelatedBooks();
can be refactored like so:
Book book = author.WriteBook(); printer.Print(book); publisher.Publish(book); customer.Buy(book); reader = new BookReader(); reader.Open(Book); reader.Read(); reader.Highlight(); reader.Bookmark(); librarian.GetRelatedBooks(book);
What a difference natural object-oriented modeling can make! We went from a single monstrous Book class to six separate classes, each of them responsible for their own individual behavior.
This makes the code:
- easier to read and understand because it is more natural
- easier to update because the functionality is contained in smaller encapsulated classes
- more flexible because we can easily substitute one or more of the six individual classes with overridden versions.
- easier to test because the functionality is separated, and easier to mock