Directory & file search

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
VBA’s not really my thing, so here’s hoping…</SPAN>

What we have:
Fairly simple file structure (2 / 3 levels, with one file per day at the root of each tree)
Files are either xls or txt
Files contain daily banking records, including a unique ID
Location of unique ID is fixed (column D or whatever)
Max records / file / day is about 20,000. Usually only files from last 12 months are relevant

Issue:
Team needs to be able to search for a unique ID (usually the most recent), identify the file it's in, go to file and get some other info from the corresponding row. At the moment, they are searching each file manually until they get a hit.

What I thought would work:

Run some code (daily) to list out something of the form:

unique ID | File reference | row reference

...in a worksheet somewhere so they could search a single xl file for the matches, then maybe link of some sort to open the file they want. Probably on big initial run to populate enough history, then code would need to be updatable such that it could be run each night and have the results appended to the rest of the list / not need to interrogate the entire file lists each time.

</SPAN>Obviously, any better ideas most welcome.

Details:
</SPAN>
For xls files:</SPAN>
Directory structure is …\YYYY\MM MMM YY\DDMMYYYYCL.xls</SPAN>
e.g. …\2014\05 MAY 2014\01052014CL.xls</SPAN>
Unique ID is in col N
</SPAN>
For txt files:</SPAN>
Directory structure is …\YYYY\MM MMM\V8 Cards_NZAP_YYYYMMDD.txt</SPAN>
e.g. …\2014\05 May\ V8 Cards_NZAP_20140501.txt</SPAN>
Unique ID is not in a consistent character position, but is always a number string located between the first instance of BD, DC, or AP and the next space.
</SPAN>
As ever, all help gratefully received :)</SPAN></SPAN></SPAN>
 

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
This code will create links to all files it finds in all folders below where it is saved, and put them in column A of the active sheet. You could modify this to do what you need.

Code:
Private Const vbDot = 46
         
Private Const INVALID_HANDLE_VALUE As Long = -1
Private Const MAX_PATH As Long = 260
Private Const FILE_ATTRIBUTE_READONLY As Long = &H1
Private Const FILE_ATTRIBUTE_HIDDEN As Long = &H2
Private Const FILE_ATTRIBUTE_SYSTEM As Long = &H4
Private Const FILE_ATTRIBUTE_DIRECTORY As Long = &H10
Private Const FILE_ATTRIBUTE_ARCHIVE As Long = &H20
Private Const FILE_ATTRIBUTE_NORMAL As Long = &H80
Private Const FILE_ATTRIBUTE_TEMPORARY As Long = &H100
Private Const FILE_ATTRIBUTE_COMPRESSED As Long = &H800
Private Const FILE_ATTRIBUTE_ALL As Long = FILE_ATTRIBUTE_READONLY Or _
                                           FILE_ATTRIBUTE_HIDDEN Or _
                                           FILE_ATTRIBUTE_SYSTEM Or _
                                           FILE_ATTRIBUTE_ARCHIVE Or _
                                           FILE_ATTRIBUTE_NORMAL Or _
                                           FILE_ATTRIBUTE_COMPRESSED
Private Type FILETIME
   dwLowDateTime As Long
   dwHighDateTime As Long
End Type

Public Type SYSTEMTIME
    wYear As Integer
    wMonth As Integer
    wDayOfWeek As Integer
    wDay As Integer
    wHour As Integer
    wMinute As Integer
    wSecond As Integer
    wMilliseconds As Long
End Type

Private Type WIN32_FIND_DATA
   dwFileAttributes As Long
   ftCreationTime As FILETIME
   ftLastAccessTime As FILETIME
   ftLastWriteTime As FILETIME
   nFileSizeHigh As Long
   nFileSizeLow As Long
   dwReserved0 As Long
   dwReserved1 As Long
   cFileName As String * MAX_PATH
   cAlternate As String * 14
End Type

Private Declare Function FindClose Lib "kernel32" _
  (ByVal hFindFile As Long) As Long
   
Private Declare Function FindFirstFile Lib "kernel32" _
   Alias "FindFirstFileA" _
  (ByVal lpFileName As String, _
   lpFindFileData As WIN32_FIND_DATA) As Long
   
Private Declare Function FindNextFile Lib "kernel32" _
   Alias "FindNextFileA" _
  (ByVal hFindFile As Long, _
   lpFindFileData As WIN32_FIND_DATA) As Long
   
Private Declare Function GetFileAttributes Lib "kernel32" _
   Alias "GetFileAttributesA" _
  (ByVal lpFileName As String) As Long

Private Declare Function lstrlenW Lib "kernel32" _
  (ByVal lpString As Long) As Long
  
Private Declare Function FileTimeToSystemTime Lib "kernel32" _
    (lpFileTime As FILETIME, lpSystemTime As SYSTEMTIME) As Long

Private Declare Function FileTimeToLocalFileTime Lib "kernel32" _
    (lpFileTime As FILETIME, lpLocalFileTime As FILETIME) As Long

Global Next_Available_Row As Integer

Public Sub Start()

    ssource = ThisWorkbook.Path & "\"
    
    Next_Available_Row = 0
    Do
        Next_Available_Row = Next_Available_Row + 1
        DoEvents
    Loop Until Cells(Next_Available_Row, 1) = ""
    
    GetDirectoryContents ssource
    
End Sub

Private Sub GetDirectoryContents(ByVal ssource As String)
   
    Dim wfd As WIN32_FIND_DATA
    Dim hFile As Long
    Dim fCount As Long
    Dim t_attrib As Long
    Dim tstatus As Long
    Dim tsource As String
    Dim Available_Row As Integer
    
    hFile = FindFirstFile(ssource & "*.*", wfd)
    
    If hFile <> INVALID_HANDLE_VALUE Then
    
        Do
       
            t_attrib = GetFileAttributes(ssource & wfd.cFileName)
           
            'Is it a directory?
            If (FILE_ATTRIBUTE_DIRECTORY And t_attrib) And _
                (Asc(wfd.cFileName) <> vbDot) Then
                
                  tsource = Left(wfd.cFileName, InStr(wfd.cFileName, Chr(0)) - 1) & "\"
                  
                  On Error Resume Next
                  Application.StatusBar = "Scanning " & ssource & tsource
                  On Error GoTo 0
                  
                  GetDirectoryContents ssource & tsource
            
            ElseIf (FILE_ATTRIBUTE_ALL And t_attrib) And _
                (Asc(wfd.cFileName) <> vbDot) Then
                
                  temp = ""
                  t = wfd.cFileName
                  i = 1
                  Do
                    If Asc(Mid(t, i, 1)) <> 0 Then temp = temp & Mid(t, i, 1)
                    i = i + 1
                  Loop Until Asc(Mid(t, i, 1)) = 0
                  
                  
                  URL = ssource & temp
            
                  i = 2
                  foundit = False
                  Do
                      If InStr(UCase(URL), UCase(Cells(i, 1))) > 0 And Cells(i, 1) <> "" Then
                          foundit = True
                          Exit Do
                      End If
                      i = i + 1
                  Loop Until Cells(i, 1) = ""
                  
                  If foundit = False Then
                      i = Next_Available_Row
                      Next_Available_Row = Next_Available_Row + 1
                  End If
                 
                  Cells(i, 1).Select
                  ActiveSheet.Hyperlinks.Add Anchor:=Selection, Address:=URL, TextToDisplay:=URL
                    
            End If
                              
            tstatus = FindNextFile(hFile, wfd)
            
            DoEvents
             
        Loop Until tstatus = 0
       
    End If
    
    'Close the search handle
    Call FindClose(hFile)

End Sub
 
Upvote 0
hi mjbeam.

Thanks for the reply. I'm not near excel at the moment, but I've had a quick look at the code. If I'm right, what you've posted will produce a file listing of everying below a starting folder. Although a good start, it doesn't get to the key problem of interrogating and retrieving the required file contents and only updating for changes.


Notwithstanding your optimism re "You could modify this to do what you need", I'm no good with vba :)

Thanks for the start, though, and further suggestions welcome (unless I've misread the code).
 
Upvote 0
Some more info would help, Paddy

Such as

  • the Excel version the code will run from



  • confirmation that the data files from Excel are .xls version (so not .xlsx or whatever)



  • name of the worksheet in the Excel file



  • sample of a worksheet. Dummy data for just a few rows. To see things like if there are headers & what the header is for unique ID, if there are blank rows, are the unique IDs text or numeric



  • are there multiple records for a unique ID within one data file. So might one day's file have (up to) 20,000 records for 20,000 different IDs, or, 20,000 records for 3,000 different IDs



  • as not just the latest data may be required for an ID, is the latest two enough. Or three. Or the last day or last x days or maybe y months. Or what? If relevant, any info on a date stamp field in the data (to distinguish different records on a single day)



  • the quantum of records you're after in the list. How many records? I'm seeing one year of records, say 250 business days x up to 20,000 records a day. So some millions. But maybe only a much smaller number need be kept in the list. I'm unclear of the requirement. If there are lots, Access might be better. Or if that isn't an option, a solution using Excel and maybe storing the data in an mdb file (which can be done without requiring users to have licences for Access



  • maybe some sample from a txt file too?

regards
 
Upvote 0
hi mjbeam.

Thanks for the reply. I'm not near excel at the moment, but I've had a quick look at the code. If I'm right, what you've posted will produce a file listing of everying below a starting folder. Although a good start, it doesn't get to the key problem of interrogating and retrieving the required file contents and only updating for changes.


Notwithstanding your optimism re "You could modify this to do what you need", I'm no good with vba :)

Thanks for the start, though, and further suggestions welcome (unless I've misread the code).


At this line in the code,

Code:
ActiveSheet.Hyperlinks.Add Anchor:=Selection, Address:=URL, TextToDisplay:=URL

...the variable URL contains the name and path of a newly discovered file. Code could be added here to open the file and extract the data you need and put it in your workbook, next to the hyperlink that was just added in column A. Since I don't know the format of your files or where the pertinent data is in your files, it is not possible for me to provide you with any code to extract this information.

As PaddyD mentioned above, these is a lot of information we would need to be able to provide any additional help.
 
Upvote 0
Fazza, mjbeam,

I've been away, hence the delay - thanks for the interest :)

Fazza, I have added further info re your questions within the quoted text below.

Thanks again :)

Some more info would help, Paddy

Such as

  • the Excel version the code will run from

2010


  • confirmation that the data files from Excel are .xls version (so not .xlsx or whatever)

excel files confirmed as .xls
others are .txt


  • name of the worksheet in the Excel file

sheetname is 'Other Party Details Report'

  • sample of a worksheet. Dummy data for just a few rows. To see things like if there are headers & what the header is for unique ID, if there are blank rows, are the unique IDs text or numeric

Copied below for the xls. some details:
- value I'm after is in col N ('Other Party Account')
- Data start is row 16. However, in principle I don't care if code worries about header rows etc, as I'm imagining team will just be doing a number string search, so extra stuff won't get in the way.


Other Party Name</SPAN>
MTS</SPAN>Particulars</SPAN>Analysis Code</SPAN>Reference</SPAN>TC</SPAN>Debit</SPAN>Credit</SPAN>Date</SPAN>Other Party Account</SPAN>
Name</SPAN>Txt</SPAN>503999</SPAN>503946</SPAN>64</SPAN>50.00 </SPAN>23/05/14</SPAN>03-1709-0999999-010</SPAN>
Name</SPAN>Txt</SPAN>503999</SPAN>503946</SPAN>64</SPAN>50.00 </SPAN>23/05/14</SPAN>03-1709-0999999-011</SPAN>
Name</SPAN>Txt</SPAN>503999</SPAN>9004714607</SPAN>64</SPAN>100.00 </SPAN>23/05/14</SPAN>03-1709-0999999-012</SPAN>
Name</SPAN>Txt</SPAN>503999</SPAN>9019810101</SPAN>64</SPAN>80.00 </SPAN>23/05/14</SPAN>03-1709-0999999-013</SPAN>
Name</SPAN>Txt</SPAN>503999</SPAN>9022688700</SPAN>64</SPAN>40.00 </SPAN>23/05/14</SPAN>03-1709-0999999-014</SPAN>
Name</SPAN> Txt</SPAN>503999</SPAN>9028014109</SPAN>64</SPAN> 50.00 </SPAN>23/05/14</SPAN>03-1709-0999999-015</SPAN>

<TBODY>
</TBODY><COLGROUP><COL><COL><COL><COL><COL><COL><COL><COL><COL><COL></COLGROUP>




  • are there multiple records for a unique ID within one data file. So might one day's file have (up to) 20,000 records for 20,000 different IDs, or, 20,000 records for 3,000 different IDs


There can be more than one record per day. (Data is about payments, so multiple payments across multiple days = multiple records in multiple files).

  • as not just the latest data may be required for an ID, is the latest two enough. Or three. Or the last day or last x days or maybe y months. Or what? If relevant, any info on a date stamp field in the data (to distinguish different records on a single day)

In general, more recent values are more relevant, but difficult to pin down precise range. Records are searched when customer says things like 'Where's my payment? I sent you a cheque for about $500 a few weeks ago'. They ususally notice before too long but not always. In reality, most issues are likely to be captured within most recent 3 months. The date stamp is derivable from the file name / file structure.



  • the quantum of records you're after in the list. How many records? I'm seeing one year of records, say 250 business days x up to 20,000 records a day. So some millions. But maybe only a much smaller number need be kept in the list. I'm unclear of the requirement. If there are lots, Access might be better. Or if that isn't an option, a solution using Excel and maybe storing the data in an mdb file (which can be done without requiring users to have licences for Access


Right ball park re record set, although 20,000 is max of daily range.



  • maybe some sample from a txt file too?


example below. Data starts from row 2.

Unique ID is not in a consistent character position, but is always a number string located between the first instance of BD, DC, or AP and the next space. (e.g. 0302060029999999 for the first record).

0105000000000400 2021187999 601073230514230514230514030047BP0302060029999999 Eilyn Balmes Balmes E B </SPAN>
0105000000000500 2020768999 601073230514230514230514123007AP1230570059999999 R Booth R BOOTH </SPAN>
0105000000000500 20214640999 601073230514230514230514031725AP0317250049999999 Braun S Braun S R </SPAN>
0105000000000500 2021144999 601073230514230514230514010886BP0108860019999999 Calleja, J CALLEJA JL & </SPAN>
0105000000000500 20218711999 601073230514230514230514030047BP0307130049999999 Roberts J Roberts J </SPAN>
0105000000000100 8411230514230514230514031751DC031757000889999 S KAMETA KAMETA S O</SPAN>
0105000000000500 0020103999 436479230514230514230514153948BP1539480029999999 ODONNELL K KISSICK MR C M </SPAN>

<TBODY>
</TBODY><COLGROUP><COL></COLGROUP>


  • Once a data file exists, can it change

Nope.



regards
 
Upvote 0
hello, Paddy. I'll do some work on this as I can. Hope to post something this week. regards, Fazza
 
Upvote 0
please, Paddy, some further info

The xls files. Data starts on row 16. I don't know if that includes/excludes headers. To be clear,
- which row has headers?
- which is the first data row under the headers? (Just 'cause the posted image, as I see it, suggests a blank row between headers & data. And I need to know exactly how it is.)

The txt files. The first instance of "BD, DC or AP". The sample data has BP, DC and AP. And the first row (you mention) has BP. It looks like a typo has mixed up BD & BP : which means I don't know which is correct. Please advise.

cheers
 
Upvote 0
Hi fazza,

Thanks for the perserving with me on this :)


xls - headers in row 15, data from row 16
txt - my bad, sample is correct - typo on my part - value we're looking for is BP not BD
 
Upvote 0

Forum statistics

Threads
1,214,956
Messages
6,122,465
Members
449,085
Latest member
ExcelError

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top