Outdoors, Green Living, Homesteading, Sustainable living, Green Building

Introduction to Data Structures and Java Collections

We all know programs have data, in fact in OOP an object is Data and Related program logic packaged into this thing called an Object. In Java, we have primitives byte, short, int, long, float, double, boolean, char. A String is an object made of the primitive char. But in general, all objects can be broken down into these primitives. And of course, all primitives are made of bytes. So you can say the most basic data type is the byte. And the most basic data structure is a series of bytes. Or an array of bytes. And the most basic collection is the Array.

  • The bit (on or off, 0 or 1, true or false)
    • boolean actually its stored as around 1 byte in the JVM, but its JVM dependent.
    • BitSet is a class that stores and handles bits in bit size memory allocations.
  • The Byte (8 bits)
    • byte  1 byte signed whole -128 to 127  byte times 0xff will convert it to 0 to 255
    • short 2 bytes signed whole -32,768 to 32,767
    • int     4 bytes signed whole -2,147,483,648 to 2,147,483
    • long  8 bytes signed whole -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
    • float  4 bytes signed real 6-7 significant digits  ±3.40282347E+38F
    • double 8 bytes signed real 15 significant digits  ±1.79769313486231570E+308
  • Character
    • char 2 bytes  whole unsigned 0 to 65535  (for unicode characters) ASCII still fits in first 256 or 0 to 255 space.
    • String  a most common object that holds any number of char’s.
  • Reference
    • reference 4 bytes on 32 bit JVM, 8 bytes on 64 bit JVM so 8 bytes for most of us.

One might say a Java Reference value is a primitive whos value is a memory location. That would, however, be a pointer that languages like C use. But a reference is special in that you can’t see that value or use it. You can only swap values that you never see or use. References can have a value of null for example or be assigned a value with the new operator or assigned a value = to another reference’s value. Also in Java as of lambda’s you can pass a function by value, as well as an object. Object::methodName is a reference to an function.

If you want to see the size of something in Java, just use the Runtime object methods to determine free ram, make a bunch of that one thing or object, then subtract. Divide by the number of things or objects and it will give a rough estimate of its size. Java lang probably didn’t provide a sizeof operator because it wouldn’t be pretty. If you want to check the execution time of something you may do a similar feat. Just get the System.currentTimeMillis(); At the start of a method call, then again at the end of it. Subtract. You might even run it a hundred times then divide by 100 to get an average execution time.

  • make it work
  • make it right
  • make it fast
  • make it cheap

There are several methods used to determine the efficiency of algorithms. Big-O notation being one. I can not explain these because I have never really understood them. But I will simply tell you about them and you can research it for yourself. There are the Best case, Average case, and Worse case analysis. Also, IDE’s come with profiler tools. These tools can give you information on execution times for various parts of your programs. This can help to reveal “Bottlenecks” in performance. As a newbie, you may not need to be too worried about such issues. Java language Collections API will already be optimized for most of your needs. Just remember the following above list. Performance is not usually the first consideration when developing apps.

  • Classes
  • Objects
  • Arrays

So an object is made up of a list or collection of these values be it primitive or reference. This makes an object something like a data record. A data record is a list of name:value pairs in a given order. The data in an object is called fields and since methods are now referenced methods as well. The constructors are “constructor methods”. So yes I suppose they are references too. I’m not quite sure how inner classes are handled. But they are not part of the data of the class. They are a class (object) with their own data. I sometimes say Class and sometimes say Object.

A class is a blueprint for the objects. Any static fields and methods or other members belong to the class. Constructors cannot be static and are implicitly final. So the class has its set of name:value pairs also. But the distinction is that the class members are not copied. You can, however, make as many objects as you want from the class definition. This is what makes an object like a record in a database table. Make sense?

So an Array is a linear contiguous list of whatever data type you make an array of.  Each item in this list is an element and has a position number. So an Array[100] has elements numbered 0 to 99. Computers count starting at 0 if you recall. An array of a primitive is simply a list of numbers. An Array of an object of some kind is like a database table. Each object is then the record in the table. But the point here is that the two most basic data structures are Arrays and Objects.

  • Files
    • local storage
    • remote storage
    • archive storage
  • Streams
    • from input or to output
    • from memory to memory
    • storage to memory or memory to storage

A file on disk might also be considered to be a data structure. It is in its most basic form a series of bytes. As the bytes are read or written from memory, they are converted to and from primitives or objects. A line of text from a file is the object String. A file can be read as a series of bytes, or characters. And with utility classes other primitives and objects as well. IO Streams can be read and written as bytes, primitives, objects as well. But a stream is not a data structure. A stream is just a flow of data. You might say that data structures are stored in memory, passed through streams and stored in files on storage devices. The streams connect files with memory or computers with computers. So computer to computer is like a memory to memory stream. Streams may also come from input devices or go to output devices etc.

  • storing and retrieving
    • basic structures
      • Arrays
      • List
      • Hash Tables
      • Maps
      • Sets
      • Heaps
      • Dictionaries
      • Trees
      • Graphs
    • basic methods
      • random access
      • sequential
  • searching
  • sorting
  • traversing
  • recursion
  • randomization

Data structures become quite a bit more complex than what we have talked about thus far. A typical semester college course will cover them. And if you look you can find used textbooks everywhere about data structures. Look for Java Collections and Data Structures books. So what do these books cover?  Basically, storage and retrieval, sorting and searching.  As it turns out there are simple and quick to implement methods and more complex methods and difficult to implement methods. These methods have been pretty well standardized and are called algorithms. The name of the algorithm is sometimes named after the guy or gal who invented it. Or in the case of Bubble Sort, named based on how it works. All of these algorithms are used in varied situations throughout the computer world and even as a common computer user you see them in practical use every day.

Much can be said about Data Structures and Collections. I am going to be as brief as possible in this article, mostly listing and defining things for you. You can almost think of data structures as patterns for the way data has been kept for particular uses. These patterns sometimes seem like things we see in nature, a stream (List), branches (Tree), a web (Graph? Should have been Network) for example. So structures can have similar sounding names. Sometimes developers seem to misname things. Once the name is stuck, its there for good as if written in stone. For example, the Graph structure, in my opinion, would have been better called a Webb or a Net. Heck, a 2-dimensional array is more like a graph.  I hereby and from now on declare 2D arrays to be called Graphs and 3D arrays Cubes!

A 4D array a MultiVerse! A 5D Array is a Twilight Zone!  Naw… Let’s not overdo it.

One thing I want to note is that Java had collection classes originally that were synchronized. This means thread safe. There was a certain inefficiency in having any class to be synchronized so they decided to make new classes that were not synchronized. This is why we have Vector class vs ArrayList class. Vector is synchronized.

  • Arrays
    • Multi-Dimensional Arrays  (In Java these are Arrays of Arrays)
      • One Dimensional
      • Two Dimensional
      • Three Dimensional
    • Arrays of primitives.
    • Arrays of Objects.
    • The Arrays Class
    • The ArrayList Class
    • Treating a Random Access File like a storage-based array.
    • Stacks and Queues using Arrays

Using Arrays and Array-based objects is where you will start in programming data structures generally. One thing you will note is that you can use Arrays for stacks and queues, or linked list for stacks and queues. This is data structures within data structures more or less. You can also form trees within arrays. For example, you can do quicksort using trees in arrays or a dynamic tree structure. Same goes for other searching and sorting techniques. The Arrays class has some nice methods for search and sorting arrays. The ArrayList class is a wrapper for an array.  The ArrayList class has some handy methods for things you might normally have to code yourself, such as insert and delete. It will also give you a ListIterator for processing the array items sequentially.

  • Iterators
  • Comparable interface
  • compareTo() method

Iterator is something you will need to look into, its used much with List. It allows you to step through a list with hasMore() and next() methods. Otherwise, you would use a for loop to do the same. Also for sorting you will need to look into Comparable Interface and compareTo() method. Objects you want to sort will need to implement this interface.

  • Lists
    • ArrayList
      • ArrayList class
    • Vectors
      • Vector class
    • LinkedList
      • LinkedList class
      • Double Linked List
      • Circularly Linked List
      • Sorted Linked List
    • Stacks
      • Stack class
    • Queues
      • Sequences
    • Double Ended Queues
      • Dequeues
    • Priority Queues
      • Heaps

Lists are a series of some kind of data. The ArrayList and Vector above wrap an Array ( Static Data Structure) Whereas, Linked data structures are dynamic meaning they grow and shrink as needed. Sequences are ordered queues. A queue is a first in first out list (FIFO). Examples of a queue are keyboard buffer queue. A production queue worklist in a video game.   A priority queue is like multiple queues where each priority level has its own queue, higher priority queues are worked first. A stack is a last in first out list (LIFO). Imagine a stack of papers on desk where you you routinely add new things to the top of it. But you only work whats on top first. So you may work the stack down to half it size, or one day get busy and work it down to the bottom. I used a stack for a type of random surface generator a few years back on the Java Games and Graphics project, actually I did this many years ago with Pascal too. It generates a 3D surface that looks like a volcanic island.  Another example of a stack is the call trace stack that Java generates when exceptions are thrown. Heaps are randomly stored data of random sizes. Common examples are memory heaps. I also think of heaps as piles, mounds. Storage media manages data as heaps more or less. With heaps size and free space has to be managed properly or things run out of control. This is why we defrag occasionally. Memory can be defragged as well, I had a tool for this once but forgot what it was called.

  • Hash Tables
    • HashMap class
    • HashSet class
    • Hashtable class
    • LinkedHashMap class
    • LinkedHashSet class
    • WeakHashMap class

Has tables are lookup tables that store using key:value pairs. The way they work makes them very fast and efficient at storing and looking something up. You will see more data structures on data structures in the above.  Maps, List, Sets, Linked etc.

  • Maps
    • EnumMap
    • HashMap
    • Hashtable,
    • LinkedHashMap
    • TreeMap
    • WeakHashMap

Map, Dictionary, Hashtable all almost the same thing. Store using key:value pairs. Keys generally have to be unique.

  • Sets
    • TreeSet class
    • HashSet class
    • LinkedHashSet class
    • EnumSet class

Sets contain no duplicate elements. No objects who are equal when using the equals() method and only one null value at most. A set is really not as much of a data structure as a data requirement or restriction.

  • Trees
    • General Trees
    • Binary Trees
    • AA-Trees
    • Threaded Trees
    • Balanced and AVL Trees
    • NonBinary Trees
    • B-Trees
    • Binary Search Trees
    • Multi-Way Search Trees
    • (2,4) Trees
    • Red-Black Trees
    • Splay Trees
      • Bottom-Up
      • Top-Down
    • TreeMap Class
    • TreeSet Class

Trees are a common structure in nature. It’s talking about a branching structure. Other similar structures might be a river, stream system for water flow. A ridge structure branches a lot as well. The lungs have a branching structure whereas the circulatory system is more of a graph structure. Graphs or Networks can be broken down into a tree structure. Of course, a file system is a well-known tree structure that everyone uses and knows. Document object models are tree structures, such as XML or HTML DOM. This is a structure that developers should learn and know well.

  • Dictionaries
    • Log files
    • Hash Tables
    • Ordered Dictionaries
    • Lookup Tables
    • Skip List

Dictionaries are more or less like Hash tables or Maps

  • Graphs
    •  Networks
    •  Paths
      • Shortest Paths
        • negative  weighted path
        • positive weighted path
        • acyclic path
      • critical path
      • backtracking
    • Weighted Graphs
    • Directed Graphs
    • Undirected Graphs
    • Spanning Trees

As I said before I think Graphs data structure was misnamed. Net, Network, Mesh, or Web might have been better. Common graph structures in nature are spider webs, highway systems, electrical grids, and the internet itself. A cave system might be mapped out like a graph if it is a complex one. A graph is basically like a tree except that branches can connect to other branches. So in traversing a graph structure, you could end up going in circles. With graph structures, you get into Paths and pathfinding, for example finding shortest paths. I can see great uses for this in AI and game coding.

  • searching
    • linear searching
    • sequential search
    • binary searching
    • interpolation search
    • tree search

Java has some built-in implementations for searching, Look at the Arrays class. Its easy enough to code a linear search with for loop or an iterator.

  • sorting
    • Bubble Sort
    • Selection Sort
    • Insertion Sort
    • Shell Sort
    • Merge Sort
    • Tree Selection Sort (Tournament Sort)
    • Heap Sort
    • Quick Sort
    • NonRecursive QuickSort
    • Bucket Sort
    • Radix Sort

Why so many sort algorithms? Some are more efficient than others in given situations. Each one has a different best case, worst case and average case execution time. Java has some built-in implementations of the quicksort. Look at the Arrays class. Bubble sort is an easy one to code and is quick if the list is already in a mostly sorted state. You must iterate the list as many times as there are elements and simply swap higher with lower ones as you go. Lower values rise and high values sink. Or vice versa depending on how you set up the comparison. A very efficient sorting algorithm would analyze the data somehow and then pick the best one of the above to use for the given situation.

  • recursion
    • indirect recursion

Recursion is where a function calls itself. You would think this would end up in an infinite loop, but if set up properly it will not. Recursion is used heavily in sorting and traversing and working with trees. Though you don’t have to use recursion, you can code just about anything with or without recursion using loops and conditions instead. Indirect recursion is where function a calls function b and function b calls function a.

So this is an outline and guide for you to use in your computer science studies. It was taken from 4 textbooks about data structures. You should also search for things on Wikipedia such as “sorting algorithms”. Wikipedia list one I have not listed here Comb sort. Sometimes an algorithm has two names. I think this article could be made better, so consider this version 0.1 apha. As time goes by I will amend it. This is going to be the first in a series of articles I will do over the next one or two years where I make up my own examples of how to use everything talked about in the article using Java and hopefully eventually Javascript as well.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.