What is the network history database?

- The network history database is a database that records all of the inbound TCP connections that come into the Data Archive

  - It is located under %PISERVER%\dat\NetworkHistoryDatabase.db

- It is created and updated by pinetmgr

- There are 4 types of data stored

  - Active connections (connections that are currently open)

  - Expired connections (connections that were open, but are now closed)

  - License data associated with each connections

    - Specifically the name of associated with each application license that was used in the connection to the server

  - Users that have connected over each connection

    - This information is only available in V2 and beyond of the database (PI Server 2017 R2)

    - It stores the username, type, and whether or not the connection was successful

- We store the data in raw pitimestamp format (the time stamp looks like a very large integer)

  - In V1 we got some feedback that our choice of ISO8601 strings was not userfriendly, so we changed to the raw view with the intent to provide tools to parse everything out in a user friendly way



How is it stored?

- We store the data in an open source database called "SQLite"

- There are a few scenarios that trigger data being added to the database

  - New inbound TCP connection

  - TCP connection being closed

  - New user coming in over a connection

  - New license coming in over a connection (generally corresponds to a product using a connection, like ProcessBook)

- To prevent frequent disk writes, we actually batch the writes

  - Just in case the system isn't being heavily used, we flush all the data to the database every 10 seconds

  - For example, if 10 connections are established in a ten second window, we don't automatically update the database when we get each connection. We wait for ten seconds to elapse, then flush

    - This is because writing to the database is expensive and we don't want to write too often for small amounts of data

    - If 1000 connections are established in a time frame smaller than 10 seconds (lets say 5), we will then flush the database early

  - Due this behavior, a new connection might not appear in the database until 10 seconds has elapsed


What is the schema?

- There are a total of 5 tables in the database

- To link all the data between the tables, each table has a column for a guid (globally unique identifier)

  - It is a 64 bit integer

  - To combine data from either the activeConnections or connectionHistory to the credential or license table, a user would need to do a SQL JOIN query and combine tables based on this value (i.e. where activeConnections.guid = licenseGuids.connection_guid)

    - Some useful queries with examples will be provided later

  - activeConnections table

    - Stores the connections that are currently open

    - It stores the connection id (a value to represent a connection, its a value that rolls over so isn't that valuable)

    - The source ip address

    - The time that the connection was started

    - The name of the executable that has connected

    - A guid, which is essentially a globally unique identifier to uniquely identify each connections (the connection ID does not do this)

  - connectionHistory table

    - It has all of the same information as the activeConnections table, however it has 2 additional fields

    - The end time of the connection is stored

    - unexpected_termination indicates whether the connection expired due to a connection disconnecting (Like ProcessBook shutting down) or pinetmgr crashing. If pinetmgr crashes, the value is set to 1 (true), if not its set to 0 (false)

      - On startup of pinetmgr, if there are entries in the activeConnections table, they are migrated to the connectionHistory table. Those connections are marked as unexpected_termination. Also the timestamps for end_time are the time at which pinetmgr restarted, since we don't exactly know when it crashed

  - licenseGuids table

    - connection_guid, this value links entries to the rows in the other tables

      - This value does not uniquely identify a row

    - license_guid is not a number in this case, its actually a string that represents a license guid. This is different from the executable name in the activeConnections and connectionHistory table

  - credentials table

    - connection_id is the same as a corresponding guid value in either the activeConnections or connectionHistory table

      - This value can be repeated in the table

    - user_credentials is the user name essentially

    - auth_type represents the type of security that the user came in on

      - It can be explicit login, trust, SSPI (windows authentication), or TLS/HA for collective scenarios

    - auth_time is when the user came in

    - successful_login is whether or not the login was successful (0 is false 1 is true)

  - metadata table

    - This table is a bit strange, there can only be one row in it

    - The only value useful column in the table is the "guid" column

      - It represents the next guid to be used when assigning a guid to a connection



What is the difference between the versions?

- In V1, only the activeConnections, connectionHistory, licenseGuids, and metadata tables existed

- In V2, we introduced the credentials table

- In V3, we introduced performance improvements



What tools can be used to read the database?

- The developers of SQLite provide their own SQLite command line interface tool (https://www.sqlite.org/download.html specfically the sqlite-tools-win32-x86-3260000.zip file the version might change, but the general syntax of the file will stay the same).

- SQL Server can potentially load this database

- There are open source GUI tools that can read the database such as this

  - https://sqlitebrowser.org/



What are some useful queries?

You can do most useful queries with a join from one of the connection tables to either the license or user tables. However, you might want to constrain the query to a certain time range. To use the time syntax in sqlite, look at this reference.


There are examples of how to use it below. Unfortunately, when we store the data in the database, we store the time in UTC and store the seconds and fractional seconds. The upper 32 bits are the seconds and the lower 32 bits are the fractional seconds. When using strftime to go from a time string to seconds, be sure to left shift the value over 32 bits, that way the comparison will compare the the times correctly.



This is an example of a query to get all the expired connections and the licenses that came over those connections

`SELECT * FROM connectionHistory LEFT OUTER JOIN licenseGuids on connectionHistory.guid = licenseGuids.connection_guid;`



This is a similar query, however we are now constraining the time, where we want all the connections with end_time less than 5 months from this moment

`SELECT * FROM connectionHistory LEFT OUTER JOIN licenseGuids on connectionHistory.guid = licenseGuids.connection_guid WHERE ((SELECT strftime('%s', 'now', '-5 months') << 32) < end_time);`



Here is another query to see all the connections that a specific user has made

`SELECT * FROM connectionHistory LEFT OUTER JOIN credentials on connectionHistory.guid = credentials.connection_id WHERE (user_credentials="MY_USER");`


In the example above, MY_USER would be a username. If you are on active directory, it would look like this




What are some things to not do?

- Don't write to the database, adding a row to the activeConnections or connectionHistory database will cause everything to stop working if a new guid is added

  - This is because there is a unique SQL constraint on that column in the connectionHistory and activeConnections tables, if a guid is inserted that wasn't already there, eventually the database will try to add a row with that guid and cause a unique constraint violation and fail to add data.

- Try not to read a lot of data very often

  - Netmgr is trying to write to the database every 10 seconds (at least) and reading from sqlite will prevent writes (and vice versa), while this shouldn't affect the performance of netmgr, it will create unnecessary contention and cause temporary memory usage increases in netmgr, since it needs to buffer the data before writing to disk. If something is blocking the write, it needs to buffer more until it gets the ability to write.