We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multi-core and massively multithreaded computers. As computers with larger number of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms based on searching for augmenting paths for this problem have shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm? We answer this question in the affirmative. We report efficient multithreaded implementations of two key algorithms (Hopcroft- Karp based on breadth-first-search, and Pothen-Fan based on depth-first-search) and their variants, combined with the Karp- Sipser initialization algorithm. We report extensive results and insights using three shared-memory platforms (a 48-core AMD Opteron, a 32-core Intel Nehalem, and a 128-processor Cray XMT) on a representative set of real-world and synthetic graphs. To the best of our knowledge, this is the first extensive study of augmentation-based parallel algorithms for bipartite cardinality matching.