Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MATIO Perfomance compared to matlab libs #65

Open
JonatanTingstrom opened this issue Jun 16, 2017 · 9 comments
Open

MATIO Perfomance compared to matlab libs #65

JonatanTingstrom opened this issue Jun 16, 2017 · 9 comments

Comments

@JonatanTingstrom
Copy link

Hi, I am wondering how the perfomance of MATIO is compared to the standard matlab libraries. I need to read a lot of data from multiple .mat files and also write alot of data so it is of high importance that it can be done quickly. So I tried comparing MATIO with Matlab 2015b libraries. Unfortunately MATIO was much slower (when Matlab libs took 60s to read a bunch of files it took almost 400s for MATIO).

But I don't know if I that have compiled MATIO with settings that caused it to be much slower or if my benchmarking software has some bugs in it. Is there a big performance difference or have I just done something wrong on my end?

@tbeu
Copy link
Owner

tbeu commented Jun 16, 2017

I am very interested in such a performance benchmark. Do you think you can share your code and the MAT file you were using such that I can try to reproduce?

@JonatanTingstrom
Copy link
Author

JonatanTingstrom commented Jun 17, 2017

Sure thing. No work has been put in to making the code clean however, it was just meant to be a quick comparison between the two libraries. The code was written only to work in windows env with visual studio. Also good to know that I had the option "Character set" set to "Use Multi-Byte Character Set".

I build two seperate .exe in Visual Studio, one that includes the matio libs and headers and one that include the matlab libs and headers.

Libs added to project settings was libmx.lib and libmat.lib for Matlab and libmatio.lib for the Matio project.

Include files used in both projects:

#include <iostream>
#include <windows.h>
#include <ctime>

And then also #include "mat.h" for Matlab exe and #include "matio.h" for Matio exe.

Then I had the following code in the main function that choose folder, start timer and calls the ReadMatFile function. The main function was identical for the two exe.

int main()
{
	string pathToFiles = "c:\\rerun\\TwoLogs\\";
	string fileExtension = "*.mat";
	WIN32_FIND_DATA search_data;

	memset(&search_data, 0, sizeof(WIN32_FIND_DATA));

	HANDLE handle = FindFirstFile((pathToFiles+fileExtension).c_str(), &search_data);
	cout << "Tick!" << endl;
	
	clock_t startTime = clock();
	while (handle != INVALID_HANDLE_VALUE)
	{
		ReadMatFile((pathToFiles+search_data.cFileName).c_str());
		if (FindNextFile(handle, &search_data) == FALSE)
			break;
	}
	clock_t endTime = clock();
	cout << "Tock!" << endl;

	FindClose(handle);
	cout << "Done in: " << ((float)(endTime - startTime) / CLOCKS_PER_SEC) << endl;
	system("pause");
	return 0;
}

Then I had two different versions of the ReadMatFile, one that uses the Matlab commands and one that uses Matios commands.

Matlab:

void ReadMatFile(const char* file)
{
	MATFile *pmat;
	mxArray *pa;
	const char *name;
	int varCnt = 0;

	
	cout << "Try to read all variables in: " << file << endl;

	pmat = matOpen(file, "r");
	if (pmat == NULL) 
	{
		cout << "Failed to open!" << endl;
		return;
	}

	while ((pa = matGetNextVariable(pmat, &name)) != NULL) 
	{
		varCnt++;
		mxDestroyArray(pa);
	}

	if (matClose(pmat) != 0) 
	{
		cout << "Failed to close! " << endl;
		return;
	}

	cout << varCnt << " variables found and read in file..." << endl;
	return;
}

And Matio:

void ReadMatFile(const char* file)
{
	mat_t *pmat;
	matvar_t *pa;
	int varCnt = 0;
	
	cout << "Try to read all variables in: " << file << endl;

	pmat = Mat_Open(file, MAT_ACC_RDONLY);
	if (pmat == NULL) 
	{
		cout << "Failed to open!" << endl;
		return;
	}

	while ((pa = Mat_VarReadNext(pmat)) != NULL)
	{
		varCnt++;
		Mat_VarFree(pa);
	}
	
	if (Mat_Close(pmat) != 0) 
	{
		cout << "Failed to close! " << endl;
		return;
	}
	cout << varCnt << " variables found and read in file..." << endl;
	return;
}

@tbeu
Copy link
Owner

tbeu commented Jun 17, 2017

Thanks for the code snippets. Based on them I've created https://github.com/tbeu/matioPerformance which compiles with VS 2012 - the same VS version that the MATLAB R2015b libraries were built with. libmatio.dll was built from current master and tweaked to link with hdf5.lib v1.8.12 (the version required by MATLAB R2105b) and zlib1.lib v1.2.11.

I observe that matPerf.exe crawls the MAT-files in the data folder in about 1.2 seconds whereas matioPerf.exe needs about 8.4 seconds.

@emmenlau
Copy link

emmenlau commented Jun 19, 2017

Very interesting! Please let me know if I can do something to help? We have files with relatively complex structures inside, is that relevant for the performance difference?

@tbeu
Copy link
Owner

tbeu commented Jun 19, 2017

I am expecting the three performance bottle-necks

  • structs and cells handling
  • decompression/inflate handling (e.g. there is no inflateCopy called by the MATLAB mx API)
  • file I/O where the expected data type does not match the stored data type in file (and each value actually is read sequentially and casted afterwards)

@tbeu
Copy link
Owner

tbeu commented Jun 19, 2017

@emmenlau Well, you could run matPer/matioPerf on your files and try to get it down to a single struct.

On the other side I am not sure if matGetNextVariable and Mat_VarReadNext are really comparable. We could also try matGetNextVariableInfo and Mat_VarReadNextInfo to ignore the data I/O.

@tbeu
Copy link
Owner

tbeu commented Jun 19, 2017

https://github.com/tbeu/matioPerformance was updated

  • the -i flag switches between reading the variable info only (compared to full data reading)
  • the -m flag switches the MAT-file API: If present the MATLAB API is used, otherwise the Matio API
  • matioPerf.exe was removed
  • high performance counter are utilized

tbeu added a commit that referenced this issue Aug 21, 2017
@tbeu
Copy link
Owner

tbeu commented Oct 20, 2017

@emmenlau Is there anything you figured out?

papadop pushed a commit to papadop/matio that referenced this issue Nov 29, 2017
tbeu added a commit that referenced this issue Jul 25, 2019
tbeu added a commit that referenced this issue Jul 26, 2019
tbeu added a commit that referenced this issue Jul 26, 2019
@tbeu
Copy link
Owner

tbeu commented Oct 21, 2023

@emmenlau FYI I updated https://github.com/tbeu/matioPerformance to the upcoming libmatio v1.5.24.

test_suites.zip from #157 (comment) still is a performance bottle-neck.

tbeu added a commit that referenced this issue Oct 23, 2023
* The performance gain is obtained by removing the slow HDF5 API function H5Iget_name being the main bottleneck. Handles of HDF5 groups or datasets are now kept open for the lifetime of the matvar_t instance.
* As a side-effect, the hdf5_name could be removed from matvar_t.internal, too.
* Fix reference counting in Mat_VarDuplicate
* As reported by #65 and #198
seanm pushed a commit to seanm/matio that referenced this issue Jan 31, 2024
* The performance gain is obtained by removing the slow HDF5 API function H5Iget_name being the main bottleneck. Handles of HDF5 groups or datasets are now kept open for the lifetime of the matvar_t instance.
* As a side-effect, the hdf5_name could be removed from matvar_t.internal, too.
* Fix reference counting in Mat_VarDuplicate
* As reported by tbeu#65 and tbeu#198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants