Getting Started
Table of contents
In order to be able to use ECPTextStream.cls
module within your project, please review the installation instructions by clicking the button below.
VBA has several native functions and statements to enable I/O operations over text files. In adition, the language can use other API’s like the FileSystemObject
(FSO) provided by the Microsoft Scripting Runtime library. In this Stack Overflow post, andrewsi concludes that read the whole file content using VBA’s native Binary
access is faster than read the file line by line with the Input Line
statement and also faster than read using the FSO’s API.
After run some tests, comes to reality a “weakness” related to the memory overload and the VBA’s data communication with external APIs. Excel application crashed when trying to load the content from some files and, dealing with the same files, VBA throw an exception. ECPTextStream is intended to overcome this difficulties and enhance the overall performance of the I/O operations over text files.
Philosophy
The VBA ECPTextStream class module is designed for turn the VBA’s embedded file statements into a powerful, extensible and easy to use set of code, that let the user controls how a file is sliced and streamed.
Usage
Stream a CSV file
Sub StreamCSV()
Dim Path As String
Dim TextStream As ECPTextStream
Path = "C:\16000000.quoted.csv"
Set TextStream = New ECPTextStream
TextStream.OpenStream Path
TextStream.EndStreamOnLineBreak = True
Do While Not TextStream.AtEndOfStream
TextStream.ReadText
Loop
TextStream.CloseStream
Set TextStream = Nothing
End Sub
Stream file and save its content to an array
Sub StreamAndSave()
Dim tmpArray() As String
Dim Path As String
Dim i As Single
Dim TextStream As ECPTextStream
Path = "C:\6400000.quoted.csv"
Set TextStream = New ECPTextStream
TextStream.OpenStream Path
TextStream.EndStreamOnLineBreak = True
ReDim tmpArray(0 To Fix(TextStream.StreamLength / TextStream.BufferLen) + 1)
Do While Not TextStream.AtEndOfStream
TextStream.ReadText
tmpArray(i) = TextStream.BufferString
i = i + 1
Loop
TextStream.CloseStream
End Sub
⚠️Caution
Isn’t recommended to store the streamed data, especially if the opened file has more than 1 GB of size.
Benchmark
The ECPTextStream class was tested with files from 191 MB to 1.87 GB of size, you can download the test files from here. Below is showed the procedures for stablish the benchmark for the read operations. Some procedures are available on internet.
Sub BufferedRead(FilePath As String)
Dim TextStream As ECPTextStream
Set TextStream = New ECPTextStream
TextStream.OpenStream FilePath
Do While Not TextStream.AtEndOfStream
TextStream.ReadText
Loop
TextStream.CloseStream
Set TextStream = Nothing
End Sub
Sub BinayWholeFileRead(FilePath As String, ByRef OutputStr As String)
Dim EOFile As Long
Dim FileHandled As Integer
FileHandled = FreeFile
Open FilePath For Binary As #FileHandled
EOFile = LOF(FileHandled)
OutputStr = Space$(EOFile)
Get #FileHandled, , OutputStr
Close #FileHandled
End Sub
Sub InputAccessWholeFileRead(FilePath As String, ByRef OutputStr As String)
Dim EOFile As Long
Dim FileHandled As Integer
FileHandled = FreeFile
Open FilePath For Input As #FileHandled
EOFile = LOF(FileHandled)
OutputStr = Input(EOFile, FileHandled)
Close #FileHandled
End Sub
Sub ScriptingWholeFileRead(FilePath As String, ByRef OutputStr As String)
'@--------------------------------------------------------------------------
' VB & VBA in a Nutshell: The language, eMaster Edition, O'Reilly, 2000, Paul Lomax
Dim EOFile As Long
Dim ofsFileSys As New Scripting.FileSystemObject
Dim ofsTextStream As TextStream
Set ofsTextStream = ofsFileSys.OpenTextFile(FilePath)
OutputStr = ofsTextStream.ReadAll
Set ofsTextStream = Nothing
End Sub
Sub ScriptingBufferedRead(FilePath As String, ByRef OutputStr As String)
Dim ofsFileSys As New Scripting.FileSystemObject
Dim ofsTextStream As TextStream
Dim BufferLen As Long
Dim EOFile As Boolean
BufferLen = CLng(0.5 * 1048576 / 2)
Set ofsTextStream = ofsFileSys.OpenTextFile(FilePath)
With ofsTextStream
Do Until EOFile
OutputStr = ofsTextStream.Read(BufferLen)
EOFile = .AtEndOfStream
Loop
End With
Set ofsTextStream = Nothing
End Sub
The image below shows the overall performance for the read operations.
📝Note
The procedures for read the whole file content at once, fails dealing with the 1.49 and 1.87 GB files. The
ScriptingWholeFileRead
procedure crash Excel and cause its restart, the rest of this kind of procedures cause a VBA exception when working over the cited files.
Conclusions
ECPTextStream
buffered read is 6x times faster than theFileSystemObject
(FSO) counterpart, with both working from VBA.- Open text file for Binary access is faster than other methods.
- The VBA performance is, apparently, linked to memory load. This can explain the performance drop of procedures for read the whole text file’s content at once.
- Read file using buffer is faster than read the whole file.
Licence
Copyright (C) 2021-2023 W. García.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.html.