one area for optimization is in the For every TAG part. In PI SDK, it is not possible to make a bulk (multi-tag) RecordedValues call, so several roundtrips must be made. However, PI SDK offers a way to do this asynchronously so these calls are not serialized.
Here is the method signature taken from the PISDK.chm guide.
object.RecordedValues StartTime, EndTime, BoundaryType, [FilterExp], [ShowFiltered], [AsyncStatus]
In AsyncStatus, you can pass in a PIAsyncStatus object, which acts as a handle to the async call. You can declare and initialize as such "Dim asynch As New PIAsynchStatus", and then pass in asynch into the PIPoint.Data.RecordedValues call. There is an example for using this object in the PISDK.chm guide under "PIAsynchStatus Example".
All of the above is assuming that network roundtrips is the bottleneck. Can you provide more details on where you think the code may be slow? If it is client-side processing that is slow, then other areas will need to be looked at.